- Mar 13, 2023
-
-
Victorien Le Couviour--Tuffet authored
We must reload error just before calling dav1d_decode_frame_exit, as it may have become stale between the last load and that call. This can result in crashes since we signal a seemingly successfully decoded frame, when it's not. Reloading error within the frame done condition's body ensures a non-stale value, as we use 'f->task_thread.task_counter == 0' to ensure all other threads / tasks have already completed when entering it. In other words, only the last thread still working on this frame can execute this code, after all other threads have returned to doing something else.
-
- Mar 07, 2023
-
-
Henrik Gramner authored
-
- Mar 06, 2023
-
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
-
Victorien Le Couviour--Tuffet authored
Pack the 5 bytes of data to improve memory and perf.
-
- Mar 03, 2023
-
-
Tristan Matthews authored
This fixes a regression from 7409a189
-
- Mar 01, 2023
-
-
Matthias Dressel authored
-
-
Matthias Dressel authored
Co-authored-by:
Henrik Gramner <gramner@twoorioles.com>
-
- Feb 28, 2023
-
-
-
Improves readability.
-
-
- Feb 27, 2023
-
-
It would previously print the full report() info for C functions (with broken horizontal alignment as a side effect).
-
- Feb 26, 2023
-
-
Martin Storsjö authored
98b0c96d added an include of src/ref.h in src/fg_apply_tmpl.c. That template source file is included in tests/checkasm/filmgrain.c. src/ref.h includes <stdatomic.h>. Including this file requires declaring a dependency on stdatomic_dependencies in meson, which provides the fallback implementation of stdatomic.h when building with MSVC.
-
- Feb 25, 2023
-
-
James Almer authored
Create new references instead. Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- Feb 23, 2023
-
-
Luca Barbato authored
-
- Feb 14, 2023
-
-
Jean-Baptiste Kempf authored
"From VideoLAN with love"
-
- Feb 13, 2023
-
-
Matthias Dressel authored
"only and except are not being actively developed. rules is the preferred keyword to control when to add jobs to pipelines." [0] [0] https://docs.gitlab.com/ee/ci/yaml/index.html#only--except
-
Matthias Dressel authored
Calling meson with no command is deprecated since 0.64.0
-
Matthias Dressel authored
-
- Feb 10, 2023
-
-
Victorien Le Couviour--Tuffet authored
The code in dav1d_drain_picture could result in a desync between c->task_thread.first (oldest submitted frame) and c->frame_thread.next (first frame to retrieve and/or next submit location). As we loop through drain, we always increment next, but first only if the frame has data. If the frame is visible we return. The problem arises when encountering (an) invisible frame(s), and the next entries haven't been fed yet, we then keep on looping increasing next but not first, as these have no data. We should always return when we encountered data (visible or invisible decoded frame): for visible, the code already returns, for invisible, we can store a boolean indicating we drained at least one frame, whenever we reach an empty entry after that, we return (all subsequent entries are guaranteed to be empty anyway), not incrementing next nor first. This will have the effect to insert the next frame at the first free spot (which is much better than the weird skips it's doing now). So basically, c->frame_thread.next could skip some (empty) entries. Now it's contiguous. Fixes #416.
-
- Feb 09, 2023
-
-
Victorien Le Couviour--Tuffet authored
This reverts commit a51b6ce4. We can't increment first when no data is there, otherwise we might do it while the first frame was not yet decoded, messing up ordering: imagine having a framedelay of 8, and a file with 7 frames. We feed 7 frames over 8 slots, now next points to [7] (empty entry), and we start draining cause EOF. We do need next to be incremented to reach the first frame ([0]), so it can be outputted, and only then first too. Fixes #418.
-
- Feb 03, 2023
-
-
- Jan 31, 2023
-
-
Martin Storsjö authored
Add an option for selecting the core where the single thread of checkasm runs. This allows benchmarking on specific CPU cores on heterogenous CPUs, like ARM big.LITTLE configurations. On Linux, one can easily wrap an invocation of checkasm with "taskset -c <n> [...]" - so this option isn't very essential there - however it is quite useful on Windows. On Windows, it is somewhat possible to do the same by launching the tool with "start /B /affinity <hexmask> [...]", but that doesn't work well with scripting ("start" returns before the command has finished running, and it's not obvious how to invoke "start" from within WSL). Using "taskset" to launch processes on specific cores within WSL on Windows doesn't work - regardless of the Linux level affinity, the process ends up running on the performance cores anyway.
-
Martin Storsjö authored
The implementation is a hybrid between two approaches; one generic (but non-ideal) for cases with large max_base_y, which fills two pixel columns at a time, i.e. looping over pixels first vertically, then horizontally - i.e. in a non-optimal manner. For cases with smaller max_base_y, it does two rows at a time, essentially doing gathers with the TBX instruction. Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z3_w4_8bpc_neon: 3.32 2.89 2.78 3.52 2.52 9.67 intra_pred_z3_w8_8bpc_neon: 6.24 5.55 4.76 5.60 4.11 6.40 intra_pred_z3_w16_8bpc_neon: 7.64 7.07 4.37 6.23 4.18 8.60 intra_pred_z3_w32_8bpc_neon: 7.51 7.21 4.34 5.92 4.27 7.88 intra_pred_z3_w64_8bpc_neon: 6.82 6.25 4.08 5.83 3.52 7.31
-
- Jan 27, 2023
-
-
Martin Storsjö authored
Relative speedup over the C code: Cortex A53 A55 A72 A73 A76 Apple M1 intra_pred_z1_w4_8bpc_neon: 4.09 3.15 3.63 4.16 3.27 13.00 intra_pred_z1_w8_8bpc_neon: 6.93 5.66 5.57 6.76 5.51 5.50 intra_pred_z1_w16_8bpc_neon: 7.81 6.85 6.24 7.78 6.59 9.00 intra_pred_z1_w32_8bpc_neon: 10.56 9.95 8.72 10.95 8.28 13.33 intra_pred_z1_w64_8bpc_neon: 11.00 11.38 9.11 11.62 8.65 14.61 (The speedup numbers for M1 are kinda noisy due to the very coarse granularity of the timer used there.)
-
Martin Storsjö authored
These functions contain a number of different codepaths; try to make sure that we hit most codepaths for each size combination. This both gives better test coverage in one single run of checkasm, but also should give a better averaged runtime in benchmarks.
-
- Jan 26, 2023
-
-
Henrik Gramner authored
-
Victorien Le Couviour--Tuffet authored
Fixes #416.
-
- Jan 12, 2023
-
-
The intent was good, but in practice it results in a significant amount of problems due to various compiler bugs for negligible gains.
-
- Dec 14, 2022
-
-
James Almer authored
Should be useful for scenarios like wanting only keyframes to quickly generate a set of preview images of the whole stream.
-
James Almer authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
- Dec 13, 2022
-
-
bits_left could underflow after reaching EOB. Credit to OSS-Fuzz.
-
-
-
- Dec 09, 2022
-
-
-
A length of 1 is by far the most common case, and having a special case for that is not only slightly faster but also reduces code size by a decent amount due to not having to pass a length argument every time.
-