- Jan 12, 2023
-
-
The intent was good, but in practice it results in a significant amount of problems due to various compiler bugs for negligible gains.
-
- Dec 14, 2022
-
-
James Almer authored
Should be useful for scenarios like wanting only keyframes to quickly generate a set of preview images of the whole stream.
-
James Almer authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
- Dec 13, 2022
-
-
bits_left could underflow after reaching EOB. Credit to OSS-Fuzz.
-
-
-
- Dec 09, 2022
-
-
-
A length of 1 is by far the most common case, and having a special case for that is not only slightly faster but also reduces code size by a decent amount due to not having to pass a length argument every time.
-
The Dav1dSequenceHeader struct is already zero-initialized, so zeroing individual values a second time is redundant.
-
According to section 6.4.1 of the AV1 specification, the value should be equal to BUFFER_POOL_MAX_SIZE (10) when not explicitly signaled.
-
James Almer authored
Fixes segfaults if you run the CLI with an invalid argument for --inloopfilters
-
- Dec 04, 2022
-
-
Luca Barbato authored
Fixes: #412
-
- Nov 21, 2022
-
-
Luca Barbato authored
It mirrors what is done with neon as well. Fixes: #413
-
Luca Barbato authored
clang-15 doesn't consider it compile-time-constant anymore.
-
- Nov 10, 2022
-
-
- Oct 30, 2022
- Oct 27, 2022
-
-
Victorien Le Couviour--Tuffet authored
-
- Oct 26, 2022
-
-
Martin Storsjö authored
This fixes building with MSVC (and older GCC versions) after 3e7886db.
-
- Oct 20, 2022
-
-
Victorien Le Couviour--Tuffet authored
The completion of the first frame to decode while an async reset request on that same frame is pending will render it stale. The processing of such a stale request is likely to result in a hang. One reason this happens is the skip condition at the beginning of reset_task_cur(). => Consume the async request before that check. Another reason is several threads producing async reset requests in parallel: an async request for the first frame could cascade through the other threads (other frames) during completion of that frame, meaning not being caught by the last synchronous reset_task_cur() after signaling the main thread and before releasing the lock. => To solve this we need to add protections at the racy locations. That means after we increase first, before returning from reset_task_cur_async(), and after consuming the async request.
-
- Oct 10, 2022
-
-
Sebastian Dröge authored
Despite not being documented in Meson's list of canonical system names, Meson does accept 'ios' mostly a synonym for darwin. By using 'ios' instead of darwin, it allows distinguishing between the two in the cases where that is necessary. Therefore, within dav1d, allow using the 'ios' name as alias for 'darwin' for system name, to allow using cross files that does this distinction. meson itself also allows 'tvos' in addition to 'ios' in the internal `is_darwin()` function, as such all 3 are handled the same here.
-
- Sep 30, 2022
-
-
-
Henrik Gramner authored
'-fvisibility=hidden' only applies to definitions, not declarations, so the compiler has to be conservative about how references to global data symbols are performed. Explicitly specifying the visibility allows for better code generation.
-
- Sep 28, 2022
-
-
Whitespace is added to the result if compiling with MSVC using /std:c11 which breaks various things. Adding strip() fixes the problem.
-
-
Use explicit parameter type detection and manually clobber the upper bits instead of relying on internal compiler behavior.
-
- Sep 26, 2022
-
-
The 32-bit width parameter was used directly as a pointer offset, but the upper half is undefined. Fix it by replacing 'cmp' with 'sub' to explicitly zero those bits.
-
- Sep 19, 2022
-
-
Martin Storsjö authored
This fixes conformance with the argon test samples, in particular with these samples: profile0_core/streams/test10100_579_8614.obu profile0_core/streams/test10218_6914.obu This gives a pretty notable slowdown to these transforms - some examples: Before: Cortex A53 A72 A73 Apple M1 inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 365.7 290.2 299.8 0.3 inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 1865.2 1384.1 1457.5 2.6 inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 33976.3 26817.0 24864.2 40.4 After: inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 397.7 322.2 335.1 0.4 inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 2121.9 1336.7 1664.6 2.6 inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 38569.4 27622.6 28176.0 51.0 Thus, for the transforms alone, it makes them around 10-13% slower (the Apple M1 measurements are too noisy to be conclusive here). Measured on actual full decoding, it makes decoding of 10 bpc Chimera around maybe 1% slower on an Apple M1 - close to measurement noise anyway.
-
Henrik Gramner authored
-
Henrik Gramner authored
Using smaller immediates also results in a small code size reduction in some cases, so apply those changes to the (10bpc-only) SSE code as well.
-
Henrik Gramner authored
Certain clips were incorrectly performed on negated values, which caused things to be off-by-one in both directions. Correct this by negating such values prior to clipping instead of afterwards.
-
- Sep 15, 2022
-
-
Martin Storsjö authored
Since meson 0.58.0 (released in May 2021), meson accepts adding '.S' assembly files as source files to the clang-cl compiler. If using an older version of meson, keep using gas-preprocessor just like for MSVC builds.
-
David Conrad authored
Section 7.9.2 returns 0 "If RefMiRows[ srcIdx ] is not equal to MiRows, RefMiCols[ srcIdx ] is not equal to MiCols" dav1d was comparing pixel width/height, not block width/height, so conform with the spec
-
David Conrad authored
Individual OBMC lapped predictions have a max width of 64 pixels for the top lap and have a max height of 64 for the left laps This is 7.11.3.9. Overlapped motion compensation process step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] ) dav1d wasn't clipping this as needed, which means that with scaled MC, the interpolation of the 2nd half of a 128 block was incorrect, since mx/my for subpel filter selection need to be reset at the 64 pixel boundary
-
David Conrad authored
This is the parameter combination: num_y_points == 0 && num_cb_points == 0 && num_cr_points == 0 && chroma_scaling_from_luma == 1 && clip_to_restricted_range == 1 Film grain application has two effects: adding noise, and optionally clipping to video range For luma, the spec skips film grain application if there's no noise (num_y_points == 0), but for chroma, it's only skipped if there's no chroma noise *and* chroma_scaling_from_luma is false This means it's possible for there to be no noise (num_*_points = 0), but if clip_to_restricted_range is true then chroma pixels can be clipped to video range, if chroma_scaling_from_luma is true. Luma pixels, however, aren't clipped to video range unless there's noise to apply. dav1d currently skips applying film grain entirely if there is no noise, regardless of the secondary clipping.
-
David Conrad authored
The syntax of itu_t_t35_payload_bytes is not defined in the AV1 specification, but it does state that decoders should ignore the entire OBU if they do not understand it.
-
David Conrad authored
In section 5.11.34 txSz is always defined to TX_4X4 if Lossless is true Chroma deblock filter size calculation needs to use this overridden txSz when lossless is enabled
-
David Conrad authored
The spec divides err by two, rounding to 0, instead of >>1, which rounds towards negative infinity
-