- Feb 13, 2023
-
-
Matthias Dressel authored
Calling meson with no command is deprecated since 0.64.0
-
Matthias Dressel authored
-
- Feb 10, 2023
-
-
Victorien Le Couviour--Tuffet authored
The code in dav1d_drain_picture could result in a desync between c->task_thread.first (oldest submitted frame) and c->frame_thread.next (first frame to retrieve and/or next submit location). As we loop through drain, we always increment next, but first only if the frame has data. If the frame is visible we return. The problem arises when encountering (an) invisible frame(s), and the next entries haven't been fed yet, we then keep on looping increasing next but not first, as these have no data. We should always return when we encountered data (visible or invisible decoded frame): for visible, the code already returns, for invisible, we can store a boolean indicating we drained at least one frame, whenever we reach an empty entry after that, we return (all subsequent entries are guaranteed to be empty anyway), not incrementing next nor first. This will have the effect to insert the next frame at the first free spot (which is much better than the weird skips it's doing now). So basically, c->frame_thread.next could skip some (empty) entries. Now it's contiguous. Fixes #416.
-
- Feb 09, 2023
-
-
Victorien Le Couviour--Tuffet authored
This reverts commit a51b6ce4. We can't increment first when no data is there, otherwise we might do it while the first frame was not yet decoded, messing up ordering: imagine having a framedelay of 8, and a file with 7 frames. We feed 7 frames over 8 slots, now next points to [7] (empty entry), and we start draining cause EOF. We do need next to be incremented to reach the first frame ([0]), so it can be outputted, and only then first too. Fixes #418.
-
- Feb 03, 2023
-
-
- Jan 31, 2023
-
-
Martin Storsjö authored
Add an option for selecting the core where the single thread of checkasm runs. This allows benchmarking on specific CPU cores on heterogenous CPUs, like ARM big.LITTLE configurations. On Linux, one can easily wrap an invocation of checkasm with "taskset -c <n> [...]" - so this option isn't very essential there - however it is quite useful on Windows. On Windows, it is somewhat possible to do the same by launching the tool with "start /B /affinity <hexmask> [...]", but that doesn't work well with scripting ("start" returns before the command has finished running, and it's not obvious how to invoke "start" from within WSL). Using "taskset" to launch processes on specific cores within WSL on Windows doesn't work - regardless of the Linux level affinity, the process ends up running on the performance cores anyway.
-
Martin Storsjö authored
The implementation is a hybrid between two approaches; one generic (but non-ideal) for cases with large max_base_y, which fills two pixel columns at a time, i.e. looping over pixels first vertically, then horizontally - i.e. in a non-optimal manner. For cases with smaller max_base_y, it does two rows at a time, essentially doing gathers with the TBX instruction. Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z3_w4_8bpc_neon: 3.32 2.89 2.78 3.52 2.52 9.67 intra_pred_z3_w8_8bpc_neon: 6.24 5.55 4.76 5.60 4.11 6.40 intra_pred_z3_w16_8bpc_neon: 7.64 7.07 4.37 6.23 4.18 8.60 intra_pred_z3_w32_8bpc_neon: 7.51 7.21 4.34 5.92 4.27 7.88 intra_pred_z3_w64_8bpc_neon: 6.82 6.25 4.08 5.83 3.52 7.31
-
- Jan 27, 2023
-
-
Martin Storsjö authored
Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z1_w4_8bpc_neon: 4.09 3.15 3.63 4.16 3.27 13.00 intra_pred_z1_w8_8bpc_neon: 6.93 5.66 5.57 6.76 5.51 5.50 intra_pred_z1_w16_8bpc_neon: 7.81 6.85 6.24 7.78 6.59 9.00 intra_pred_z1_w32_8bpc_neon: 10.56 9.95 8.72 10.95 8.28 13.33 intra_pred_z1_w64_8bpc_neon: 11.00 11.38 9.11 11.62 8.65 14.61 (The speedup numbers for M1 are kinda noisy due to the very coarse granularity of the timer used there.)
-
Martin Storsjö authored
These functions contain a number of different codepaths; try to make sure that we hit most codepaths for each size combination. This both gives better test coverage in one single run of checkasm, but also should give a better averaged runtime in benchmarks.
-
- Jan 26, 2023
-
-
Henrik Gramner authored
-
Victorien Le Couviour--Tuffet authored
Fixes #416.
-
- Jan 12, 2023
-
-
The intent was good, but in practice it results in a significant amount of problems due to various compiler bugs for negligible gains.
-
- Dec 14, 2022
-
-
James Almer authored
Should be useful for scenarios like wanting only keyframes to quickly generate a set of preview images of the whole stream.
-
James Almer authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
- Dec 13, 2022
-
-
bits_left could underflow after reaching EOB. Credit to OSS-Fuzz.
-
-
-
- Dec 09, 2022
-
-
-
A length of 1 is by far the most common case, and having a special case for that is not only slightly faster but also reduces code size by a decent amount due to not having to pass a length argument every time.
-
The Dav1dSequenceHeader struct is already zero-initialized, so zeroing individual values a second time is redundant.
-
According to section 6.4.1 of the AV1 specification, the value should be equal to BUFFER_POOL_MAX_SIZE (10) when not explicitly signaled.
-
James Almer authored
Fixes segfaults if you run the CLI with an invalid argument for --inloopfilters
-
- Dec 04, 2022
-
-
Luca Barbato authored
Fixes: #412
-
- Nov 21, 2022
-
-
Luca Barbato authored
It mirrors what is done with neon as well. Fixes: #413
-
Luca Barbato authored
clang-15 doesn't consider it compile-time-constant anymore.
-
- Nov 10, 2022
-
-
- Oct 30, 2022
- Oct 27, 2022
-
-
Victorien Le Couviour--Tuffet authored
-
- Oct 26, 2022
-
-
Martin Storsjö authored
This fixes building with MSVC (and older GCC versions) after 3e7886db.
-
- Oct 20, 2022
-
-
Victorien Le Couviour--Tuffet authored
The completion of the first frame to decode while an async reset request on that same frame is pending will render it stale. The processing of such a stale request is likely to result in a hang. One reason this happens is the skip condition at the beginning of reset_task_cur(). => Consume the async request before that check. Another reason is several threads producing async reset requests in parallel: an async request for the first frame could cascade through the other threads (other frames) during completion of that frame, meaning not being caught by the last synchronous reset_task_cur() after signaling the main thread and before releasing the lock. => To solve this we need to add protections at the racy locations. That means after we increase first, before returning from reset_task_cur_async(), and after consuming the async request.
-
- Oct 10, 2022
-
-
Sebastian Dröge authored
Despite not being documented in Meson's list of canonical system names, Meson does accept 'ios' mostly a synonym for darwin. By using 'ios' instead of darwin, it allows distinguishing between the two in the cases where that is necessary. Therefore, within dav1d, allow using the 'ios' name as alias for 'darwin' for system name, to allow using cross files that does this distinction. meson itself also allows 'tvos' in addition to 'ios' in the internal `is_darwin()` function, as such all 3 are handled the same here.
-
- Sep 30, 2022
-
-
-
Henrik Gramner authored
'-fvisibility=hidden' only applies to definitions, not declarations, so the compiler has to be conservative about how references to global data symbols are performed. Explicitly specifying the visibility allows for better code generation.
-
- Sep 28, 2022
-
-
Whitespace is added to the result if compiling with MSVC using /std:c11 which breaks various things. Adding strip() fixes the problem.
-
-
Use explicit parameter type detection and manually clobber the upper bits instead of relying on internal compiler behavior.
-
- Sep 26, 2022
-
-
The 32-bit width parameter was used directly as a pointer offset, but the upper half is undefined. Fix it by replacing 'cmp' with 'sub' to explicitly zero those bits.
-