- 22 Nov, 2018 2 commits
-
-
Janne Grunau authored
Fixes #183. Fixes use of uninitialized data in apply_to_row_uv with odd width in clusterfuzz-testcase-minimized-dav1d_fuzzer-5684823666982912. Credits to oss-fuzz.
-
Ronald S. Bultje authored
Fixed 00000802.ivf.
-
- 21 Nov, 2018 3 commits
-
-
Janne Grunau authored
-
This avoids a misoptimization in clang, https://bugs.llvm.org/show_bug.cgi?id=39550, where the root cause has been around for a number of years, but a change in LLVM 6.0 allowed for better optimizations, exposing this bug. This bug is on good track to be fixed in LLVM for the 8.0 release and hopefully also for backporting into 7.0.1. It is however present in 6.0, 6.0.1 and 7.0, and other downstream users such as Xcode 10.0/10.1.
-
Fixes 00000527.ivf in #186.
-
- 20 Nov, 2018 8 commits
-
-
Ronald S. Bultje authored
Also ensure we apply film-grain to delayed pictures.
-
Janne Grunau authored
-
-
Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5730334348410880, with credits to oss-fuzz.
-
Janne Grunau authored
Fixes an undefined left shift of a negative value in clusterfuzz-testcase-minimized-dav1d_fuzzer-5707215277654016. Credits to oss-fuzz.
-
Janne Grunau authored
This does not adjust the AVX2 asm. The asm clips in many places to the required range (16-bit signed) for performance reason. No mismatch observed with coefs generated by the forward transform in checkasm in 10 thousand runs.
-
These edges don't encode LR coefficients anyway. Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5731769337249792. Credits to oss-fuzz.
-
This fixes compiler errors like these: src/film_grain_tmpl.c(238): error C2036: 'void *': unknown size Don't rely on sizeof(void) == 1 in pointer arithmetic, but instead cast the row pointers to the pixel datatype immediately, use PXSTRIDE() for converting a stride in byte units to pixel units, and skip sizeof(pixel) for horizontal offsets that previously were applied on a void pointer.
-
- 19 Nov, 2018 6 commits
-
-
Janne Grunau authored
Fixes a deadlock on teardown with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5636065151418368. Credits to oss-fuzz.
-
James Almer authored
Fixes warnings about redefinition of _WIN32_WINNT on Windows targets
-
-
This is using a slightly adapted version of my GPU-based algorithm. The major difference to the algorithm suggested by the spec (and implemented in libaom) is that instead of using a line buffer to hold the previous row's film grain blocks, we compute each row/block fully independently. This opens up the door to exploit parallelism in the future, since we don't have any left->right or top->down dependency except for the PRNG state. (Which we could pre-compute for a massively parallel / GPU implementation) That being said, it's probably somewhat slower than using a line buffer for the serial / single CPU case, although most likely not by much (since the areas with the most redundant work get progressively smaller, down to a single 2x2 square for the worst case).
-
This becomes part of the picture properties, since users may want to apply film grain themselves (e.g. for a GPU implementation).
-
The spec subtracts the signed offset from all of these when using them, like it does for e.g. ar_coeffs_y_plus_128, although for some reason the naming scheme is inconsistent here. Either way, it makes more sense to treat them as signed integers than unsigned integers. To avoid confusion since the name of the field is the same as the one in the spec, we mark the type as int8_t (resp. int16_t for the 9-bit field) to make it clear to the user that these are already signed integers.
-
- 18 Nov, 2018 8 commits
-
-
Nathan Egge authored
-
Nathan Egge authored
-
Fixes #183.
-
Also fix a calculation for u_idx. Fixes 5646860283281408 of #183.
-
Janne Grunau authored
-
These functions have been tuned against Cortex A53 and Snapdragon 835. The bilin functions have mainly been written with code size in mind, as they aren't used much in practice. Relative speedups for the actual filtering fuctions (that don't just do a plain copy) are around 4-15x, some over 20x. This is in comparison with GCC 5.4 with autovectorization disabled; the actual real-world speedup against autovectorized C code is around 4-10x. Relative speedups measured with checkasm: Cortex A53 Snapdragon 835 mc_8tap_regular_w2_0_8bpc_neon: 6.96 5.28 mc_8tap_regular_w2_h_8bpc_neon: 5.16 4.35 mc_8tap_regular_w2_hv_8bpc_neon: 5.37 4.98 mc_8tap_regular_w2_v_8bpc_neon: 6.35 4.85 mc_8tap_regular_w4_0_8bpc_neon: 6.78 5.73 mc_8tap_regular_w4_h_8bpc_neon: 8.40 6.60 mc_8tap_regular_w4_hv_8bpc_neon: 7.23 7.10 mc_8tap_regular_w4_v_8bpc_neon: 9.06 7.76 mc_8tap_regular_w8_0_8bpc_neon: 6.96 5.55 mc_8tap_regular_w8_h_8bpc_neon: 10.36 6.88 mc_8tap_regular_w8_hv_8bpc_neon: 9.49 6.86 mc_8tap_regular_w8_v_8bpc_neon: 12.06 9.61 mc_8tap_regular_w16_0_8bpc_neon: 6.68 4.51 mc_8tap_regular_w16_h_8bpc_neon: 12.30 7.77 mc_8tap_regular_w16_hv_8bpc_neon: 9.50 6.68 mc_8tap_regular_w16_v_8bpc_neon: 12.93 9.68 mc_8tap_regular_w32_0_8bpc_neon: 3.91 2.93 mc_8tap_regular_w32_h_8bpc_neon: 13.06 7.89 mc_8tap_regular_w32_hv_8bpc_neon: 9.37 6.70 mc_8tap_regular_w32_v_8bpc_neon: 12.88 9.49 mc_8tap_regular_w64_0_8bpc_neon: 2.89 1.68 mc_8tap_regular_w64_h_8bpc_neon: 13.48 8.00 mc_8tap_regular_w64_hv_8bpc_neon: 9.23 6.53 mc_8tap_regular_w64_v_8bpc_neon: 13.11 9.68 mc_8tap_regular_w128_0_8bpc_neon: 1.89 1.24 mc_8tap_regular_w128_h_8bpc_neon: 13.58 7.98 mc_8tap_regular_w128_hv_8bpc_neon: 8.86 6.53 mc_8tap_regular_w128_v_8bpc_neon: 12.46 9.63 mc_bilinear_w2_0_8bpc_neon: 7.02 5.40 mc_bilinear_w2_h_8bpc_neon: 3.65 3.14 mc_bilinear_w2_hv_8bpc_neon: 4.36 4.84 mc_bilinear_w2_v_8bpc_neon: 5.22 4.28 mc_bilinear_w4_0_8bpc_neon: 6.87 5.99 mc_bilinear_w4_h_8bpc_neon: 6.50 8.61 mc_bilinear_w4_hv_8bpc_neon: 7.70 7.99 mc_bilinear_w4_v_8bpc_neon: 7.04 9.10 mc_bilinear_w8_0_8bpc_neon: 7.03 5.70 mc_bilinear_w8_h_8bpc_neon: 11.30 15.14 mc_bilinear_w8_hv_8bpc_neon: 15.74 13.50 mc_bilinear_w8_v_8bpc_neon: 13.40 17.54 mc_bilinear_w16_0_8bpc_neon: 6.75 4.48 mc_bilinear_w16_h_8bpc_neon: 17.02 13.95 mc_bilinear_w16_hv_8bpc_neon: 17.37 13.78 mc_bilinear_w16_v_8bpc_neon: 23.69 22.98 mc_bilinear_w32_0_8bpc_neon: 3.88 3.18 mc_bilinear_w32_h_8bpc_neon: 18.80 14.97 mc_bilinear_w32_hv_8bpc_neon: 17.74 14.02 mc_bilinear_w32_v_8bpc_neon: 24.46 23.04 mc_bilinear_w64_0_8bpc_neon: 2.87 1.66 mc_bilinear_w64_h_8bpc_neon: 19.54 16.02 mc_bilinear_w64_hv_8bpc_neon: 17.80 14.32 mc_bilinear_w64_v_8bpc_neon: 24.79 23.63 mc_bilinear_w128_0_8bpc_neon: 2.13 1.23 mc_bilinear_w128_h_8bpc_neon: 19.89 16.24 mc_bilinear_w128_hv_8bpc_neon: 17.55 14.15 mc_bilinear_w128_v_8bpc_neon: 24.45 23.54 mct_8tap_regular_w4_0_8bpc_neon: 5.56 5.51 mct_8tap_regular_w4_h_8bpc_neon: 7.48 5.80 mct_8tap_regular_w4_hv_8bpc_neon: 7.27 7.09 mct_8tap_regular_w4_v_8bpc_neon: 7.80 6.84 mct_8tap_regular_w8_0_8bpc_neon: 9.54 9.25 mct_8tap_regular_w8_h_8bpc_neon: 9.08 6.55 mct_8tap_regular_w8_hv_8bpc_neon: 9.16 6.30 mct_8tap_regular_w8_v_8bpc_neon: 10.79 8.66 mct_8tap_regular_w16_0_8bpc_neon: 15.35 10.50 mct_8tap_regular_w16_h_8bpc_neon: 10.18 6.76 mct_8tap_regular_w16_hv_8bpc_neon: 9.17 6.11 mct_8tap_regular_w16_v_8bpc_neon: 11.52 8.72 mct_8tap_regular_w32_0_8bpc_neon: 15.82 10.09 mct_8tap_regular_w32_h_8bpc_neon: 10.75 6.85 mct_8tap_regular_w32_hv_8bpc_neon: 9.00 6.22 mct_8tap_regular_w32_v_8bpc_neon: 11.58 8.67 mct_8tap_regular_w64_0_8bpc_neon: 15.28 9.68 mct_8tap_regular_w64_h_8bpc_neon: 10.93 6.96 mct_8tap_regular_w64_hv_8bpc_neon: 8.81 6.53 mct_8tap_regular_w64_v_8bpc_neon: 11.42 8.73 mct_8tap_regular_w128_0_8bpc_neon: 14.41 7.67 mct_8tap_regular_w128_h_8bpc_neon: 10.92 6.96 mct_8tap_regular_w128_hv_8bpc_neon: 8.56 6.51 mct_8tap_regular_w128_v_8bpc_neon: 11.16 8.70 mct_bilinear_w4_0_8bpc_neon: 5.66 5.77 mct_bilinear_w4_h_8bpc_neon: 5.16 6.40 mct_bilinear_w4_hv_8bpc_neon: 6.86 6.82 mct_bilinear_w4_v_8bpc_neon: 4.75 6.09 mct_bilinear_w8_0_8bpc_neon: 9.78 10.00 mct_bilinear_w8_h_8bpc_neon: 8.98 11.37 mct_bilinear_w8_hv_8bpc_neon: 14.42 10.83 mct_bilinear_w8_v_8bpc_neon: 9.12 11.62 mct_bilinear_w16_0_8bpc_neon: 15.59 10.76 mct_bilinear_w16_h_8bpc_neon: 11.98 8.77 mct_bilinear_w16_hv_8bpc_neon: 15.83 10.73 mct_bilinear_w16_v_8bpc_neon: 14.70 14.60 mct_bilinear_w32_0_8bpc_neon: 15.89 10.32 mct_bilinear_w32_h_8bpc_neon: 13.47 9.07 mct_bilinear_w32_hv_8bpc_neon: 16.01 10.95 mct_bilinear_w32_v_8bpc_neon: 14.85 14.16 mct_bilinear_w64_0_8bpc_neon: 15.36 10.51 mct_bilinear_w64_h_8bpc_neon: 14.00 9.61 mct_bilinear_w64_hv_8bpc_neon: 15.82 11.27 mct_bilinear_w64_v_8bpc_neon: 14.61 14.76 mct_bilinear_w128_0_8bpc_neon: 14.41 7.92 mct_bilinear_w128_h_8bpc_neon: 13.31 9.58 mct_bilinear_w128_hv_8bpc_neon: 14.07 11.18 mct_bilinear_w128_v_8bpc_neon: 11.57 14.42
-
James Almer authored
-
Janne Grunau authored
Fixes #149.
-
- 17 Nov, 2018 2 commits
-
-
Ronald S. Bultje authored
This reverts commit 597a6eb9. It leads to assertion failures in oss-fuzz.
-
Janne Grunau authored
Fixes unaligned writes while splatting coefs for skip blocks with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5684725352497152 and clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576.
-
- 16 Nov, 2018 3 commits
-
-
ruy and rux are in unit_size dimensions, whereas lr_mask are in sb128 dimensions, and unit_idx is in sb64 dimensions, so one can't be derived from the other. Instead, remove ruy/rux and derive unit_idx and sb_idx directly from the block positions aligned to the unit_size.
-
Ronald S. Bultje authored
This is consistent with what libaom does. Should fix #175.
-
Ronald S. Bultje authored
Fixes #172.
-
- 15 Nov, 2018 8 commits
-
-
-
Janne Grunau authored
The race is exposed by not draining the decoder correctly after 02606969 (decoupled decoding api). Fixes a memleak with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576. Credits to oss-fuzz.
-
But don't abort decoding; instead, simply force translational motion.
-
-
Ronald S. Bultje authored
-
James Almer authored
-
See section 7.11.2.4 in AV1 spec. Because frame contexts are not passed into the ipred_z*_c functions the flag is set as a bit inside the 'angle' function argument.
-
A new coded video sequence (see page 193; section 7.5 of the spec) begins when we see a sequence header that isn't bit identical to previous ones. This is the point at which we can throw away previous frames etc.
-