- 20 Nov, 2018 11 commits
-
-
James Almer authored
Fixes warnings about redefinition of _WIN32_WINNT on Windows targets introduced by b716083c.
-
Ronald S. Bultje authored
Also ensure we apply film-grain to delayed pictures.
-
Janne Grunau authored
-
Janne Grunau authored
-
Janne Grunau authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5730334348410880, with credits to oss-fuzz.
-
Janne Grunau authored
Fixes an undefined left shift of a negative value in clusterfuzz-testcase-minimized-dav1d_fuzzer-5707215277654016. Credits to oss-fuzz.
-
Janne Grunau authored
This does not adjust the AVX2 asm. The asm clips in many places to the required range (16-bit signed) for performance reason. No mismatch observed with coefs generated by the forward transform in checkasm in 10 thousand runs.
-
Ronald S. Bultje authored
These edges don't encode LR coefficients anyway. Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5731769337249792. Credits to oss-fuzz.
-
Martin Storsjö authored
This fixes compiler errors like these: src/film_grain_tmpl.c(238): error C2036: 'void *': unknown size Don't rely on sizeof(void) == 1 in pointer arithmetic, but instead cast the row pointers to the pixel datatype immediately, use PXSTRIDE() for converting a stride in byte units to pixel units, and skip sizeof(pixel) for horizontal offsets that previously were applied on a void pointer.
-
- 19 Nov, 2018 7 commits
-
-
Janne Grunau authored
Fixes a deadlock on teardown with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5636065151418368. Credits to oss-fuzz.
-
Marvin Scholz authored
Current versions of meson have a bug that causes the need to add the nasm generated objects to checkasm, even though this should already be covered by the extract_all_objects() for libdav1d. Meson versions >= 0.48.999 (that is, Meson 0.49 and development versions states of that on git) fixed this issue so now adding this is not longer needed. Adding it regardless would actually cause an error because of symbols being present twice.
-
James Almer authored
Fixes warnings about redefinition of _WIN32_WINNT on Windows targets
-
Ronald S. Bultje authored
-
Niklas Haas authored
This is using a slightly adapted version of my GPU-based algorithm. The major difference to the algorithm suggested by the spec (and implemented in libaom) is that instead of using a line buffer to hold the previous row's film grain blocks, we compute each row/block fully independently. This opens up the door to exploit parallelism in the future, since we don't have any left->right or top->down dependency except for the PRNG state. (Which we could pre-compute for a massively parallel / GPU implementation) That being said, it's probably somewhat slower than using a line buffer for the serial / single CPU case, although most likely not by much (since the areas with the most redundant work get progressively smaller, down to a single 2x2 square for the worst case).
-
Niklas Haas authored
This becomes part of the picture properties, since users may want to apply film grain themselves (e.g. for a GPU implementation).
-
Niklas Haas authored
The spec subtracts the signed offset from all of these when using them, like it does for e.g. ar_coeffs_y_plus_128, although for some reason the naming scheme is inconsistent here. Either way, it makes more sense to treat them as signed integers than unsigned integers. To avoid confusion since the name of the field is the same as the one in the spec, we mark the type as int8_t (resp. int16_t for the 9-bit field) to make it clear to the user that these are already signed integers.
-
- 18 Nov, 2018 8 commits
-
-
Nathan Egge authored
-
Nathan Egge authored
-
Ronald S. Bultje authored
Fixes #183.
-
Ronald S. Bultje authored
Also fix a calculation for u_idx. Fixes 5646860283281408 of #183.
-
Janne Grunau authored
-
Martin Storsjö authored
These functions have been tuned against Cortex A53 and Snapdragon 835. The bilin functions have mainly been written with code size in mind, as they aren't used much in practice. Relative speedups for the actual filtering fuctions (that don't just do a plain copy) are around 4-15x, some over 20x. This is in comparison with GCC 5.4 with autovectorization disabled; the actual real-world speedup against autovectorized C code is around 4-10x. Relative speedups measured with checkasm: Cortex A53 Snapdragon 835 mc_8tap_regular_w2_0_8bpc_neon: 6.96 5.28 mc_8tap_regular_w2_h_8bpc_neon: 5.16 4.35 mc_8tap_regular_w2_hv_8bpc_neon: 5.37 4.98 mc_8tap_regular_w2_v_8bpc_neon: 6.35 4.85 mc_8tap_regular_w4_0_8bpc_neon: 6.78 5.73 mc_8tap_regular_w4_h_8bpc_neon: 8.40 6.60 mc_8tap_regular_w4_hv_8bpc_neon: 7.23 7.10 mc_8tap_regular_w4_v_8bpc_neon: 9.06 7.76 mc_8tap_regular_w8_0_8bpc_neon: 6.96 5.55 mc_8tap_regular_w8_h_8bpc_neon: 10.36 6.88 mc_8tap_regular_w8_hv_8bpc_neon: 9.49 6.86 mc_8tap_regular_w8_v_8bpc_neon: 12.06 9.61 mc_8tap_regular_w16_0_8bpc_neon: 6.68 4.51 mc_8tap_regular_w16_h_8bpc_neon: 12.30 7.77 mc_8tap_regular_w16_hv_8bpc_neon: 9.50 6.68 mc_8tap_regular_w16_v_8bpc_neon: 12.93 9.68 mc_8tap_regular_w32_0_8bpc_neon: 3.91 2.93 mc_8tap_regular_w32_h_8bpc_neon: 13.06 7.89 mc_8tap_regular_w32_hv_8bpc_neon: 9.37 6.70 mc_8tap_regular_w32_v_8bpc_neon: 12.88 9.49 mc_8tap_regular_w64_0_8bpc_neon: 2.89 1.68 mc_8tap_regular_w64_h_8bpc_neon: 13.48 8.00 mc_8tap_regular_w64_hv_8bpc_neon: 9.23 6.53 mc_8tap_regular_w64_v_8bpc_neon: 13.11 9.68 mc_8tap_regular_w128_0_8bpc_neon: 1.89 1.24 mc_8tap_regular_w128_h_8bpc_neon: 13.58 7.98 mc_8tap_regular_w128_hv_8bpc_neon: 8.86 6.53 mc_8tap_regular_w128_v_8bpc_neon: 12.46 9.63 mc_bilinear_w2_0_8bpc_neon: 7.02 5.40 mc_bilinear_w2_h_8bpc_neon: 3.65 3.14 mc_bilinear_w2_hv_8bpc_neon: 4.36 4.84 mc_bilinear_w2_v_8bpc_neon: 5.22 4.28 mc_bilinear_w4_0_8bpc_neon: 6.87 5.99 mc_bilinear_w4_h_8bpc_neon: 6.50 8.61 mc_bilinear_w4_hv_8bpc_neon: 7.70 7.99 mc_bilinear_w4_v_8bpc_neon: 7.04 9.10 mc_bilinear_w8_0_8bpc_neon: 7.03 5.70 mc_bilinear_w8_h_8bpc_neon: 11.30 15.14 mc_bilinear_w8_hv_8bpc_neon: 15.74 13.50 mc_bilinear_w8_v_8bpc_neon: 13.40 17.54 mc_bilinear_w16_0_8bpc_neon: 6.75 4.48 mc_bilinear_w16_h_8bpc_neon: 17.02 13.95 mc_bilinear_w16_hv_8bpc_neon: 17.37 13.78 mc_bilinear_w16_v_8bpc_neon: 23.69 22.98 mc_bilinear_w32_0_8bpc_neon: 3.88 3.18 mc_bilinear_w32_h_8bpc_neon: 18.80 14.97 mc_bilinear_w32_hv_8bpc_neon: 17.74 14.02 mc_bilinear_w32_v_8bpc_neon: 24.46 23.04 mc_bilinear_w64_0_8bpc_neon: 2.87 1.66 mc_bilinear_w64_h_8bpc_neon: 19.54 16.02 mc_bilinear_w64_hv_8bpc_neon: 17.80 14.32 mc_bilinear_w64_v_8bpc_neon: 24.79 23.63 mc_bilinear_w128_0_8bpc_neon: 2.13 1.23 mc_bilinear_w128_h_8bpc_neon: 19.89 16.24 mc_bilinear_w128_hv_8bpc_neon: 17.55 14.15 mc_bilinear_w128_v_8bpc_neon: 24.45 23.54 mct_8tap_regular_w4_0_8bpc_neon: 5.56 5.51 mct_8tap_regular_w4_h_8bpc_neon: 7.48 5.80 mct_8tap_regular_w4_hv_8bpc_neon: 7.27 7.09 mct_8tap_regular_w4_v_8bpc_neon: 7.80 6.84 mct_8tap_regular_w8_0_8bpc_neon: 9.54 9.25 mct_8tap_regular_w8_h_8bpc_neon: 9.08 6.55 mct_8tap_regular_w8_hv_8bpc_neon: 9.16 6.30 mct_8tap_regular_w8_v_8bpc_neon: 10.79 8.66 mct_8tap_regular_w16_0_8bpc_neon: 15.35 10.50 mct_8tap_regular_w16_h_8bpc_neon: 10.18 6.76 mct_8tap_regular_w16_hv_8bpc_neon: 9.17 6.11 mct_8tap_regular_w16_v_8bpc_neon: 11.52 8.72 mct_8tap_regular_w32_0_8bpc_neon: 15.82 10.09 mct_8tap_regular_w32_h_8bpc_neon: 10.75 6.85 mct_8tap_regular_w32_hv_8bpc_neon: 9.00 6.22 mct_8tap_regular_w32_v_8bpc_neon: 11.58 8.67 mct_8tap_regular_w64_0_8bpc_neon: 15.28 9.68 mct_8tap_regular_w64_h_8bpc_neon: 10.93 6.96 mct_8tap_regular_w64_hv_8bpc_neon: 8.81 6.53 mct_8tap_regular_w64_v_8bpc_neon: 11.42 8.73 mct_8tap_regular_w128_0_8bpc_neon: 14.41 7.67 mct_8tap_regular_w128_h_8bpc_neon: 10.92 6.96 mct_8tap_regular_w128_hv_8bpc_neon: 8.56 6.51 mct_8tap_regular_w128_v_8bpc_neon: 11.16 8.70 mct_bilinear_w4_0_8bpc_neon: 5.66 5.77 mct_bilinear_w4_h_8bpc_neon: 5.16 6.40 mct_bilinear_w4_hv_8bpc_neon: 6.86 6.82 mct_bilinear_w4_v_8bpc_neon: 4.75 6.09 mct_bilinear_w8_0_8bpc_neon: 9.78 10.00 mct_bilinear_w8_h_8bpc_neon: 8.98 11.37 mct_bilinear_w8_hv_8bpc_neon: 14.42 10.83 mct_bilinear_w8_v_8bpc_neon: 9.12 11.62 mct_bilinear_w16_0_8bpc_neon: 15.59 10.76 mct_bilinear_w16_h_8bpc_neon: 11.98 8.77 mct_bilinear_w16_hv_8bpc_neon: 15.83 10.73 mct_bilinear_w16_v_8bpc_neon: 14.70 14.60 mct_bilinear_w32_0_8bpc_neon: 15.89 10.32 mct_bilinear_w32_h_8bpc_neon: 13.47 9.07 mct_bilinear_w32_hv_8bpc_neon: 16.01 10.95 mct_bilinear_w32_v_8bpc_neon: 14.85 14.16 mct_bilinear_w64_0_8bpc_neon: 15.36 10.51 mct_bilinear_w64_h_8bpc_neon: 14.00 9.61 mct_bilinear_w64_hv_8bpc_neon: 15.82 11.27 mct_bilinear_w64_v_8bpc_neon: 14.61 14.76 mct_bilinear_w128_0_8bpc_neon: 14.41 7.92 mct_bilinear_w128_h_8bpc_neon: 13.31 9.58 mct_bilinear_w128_hv_8bpc_neon: 14.07 11.18 mct_bilinear_w128_v_8bpc_neon: 11.57 14.42
-
James Almer authored
-
Janne Grunau authored
Fixes #149.
-
- 17 Nov, 2018 2 commits
-
-
Ronald S. Bultje authored
This reverts commit 597a6eb9. It leads to assertion failures in oss-fuzz.
-
Janne Grunau authored
Fixes unaligned writes while splatting coefs for skip blocks with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5684725352497152 and clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576.
-
- 16 Nov, 2018 6 commits
-
-
Janne Grunau authored
-
Janne Grunau authored
-
Janne Grunau authored
Catches warnings in assert statements.
-
Ronald S. Bultje authored
ruy and rux are in unit_size dimensions, whereas lr_mask are in sb128 dimensions, and unit_idx is in sb64 dimensions, so one can't be derived from the other. Instead, remove ruy/rux and derive unit_idx and sb_idx directly from the block positions aligned to the unit_size.
-
Ronald S. Bultje authored
This is consistent with what libaom does. Should fix #175.
-
Ronald S. Bultje authored
Fixes #172.
-
- 15 Nov, 2018 6 commits
-
-
Ronald S. Bultje authored
-
Janne Grunau authored
With the decoupled decoding data there might be remaining input data during draining which can cause bitstream parsing errors.
-
Janne Grunau authored
The race is exposed by not draining the decoder correctly after 02606969 (decoupled decoding api). Fixes a memleak with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576. Credits to oss-fuzz.
-
Ronald S. Bultje authored
But don't abort decoding; instead, simply force translational motion.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-