- 18 Nov, 2018 5 commits
-
-
Ronald S. Bultje authored
Also fix a calculation for u_idx. Fixes 5646860283281408 of #183.
-
Janne Grunau authored
-
Martin Storsjö authored
These functions have been tuned against Cortex A53 and Snapdragon 835. The bilin functions have mainly been written with code size in mind, as they aren't used much in practice. Relative speedups for the actual filtering fuctions (that don't just do a plain copy) are around 4-15x, some over 20x. This is in comparison with GCC 5.4 with autovectorization disabled; the actual real-world speedup against autovectorized C code is around 4-10x. Relative speedups measured with checkasm: Cortex A53 Snapdragon 835 mc_8tap_regular_w2_0_8bpc_neon: 6.96 5.28 mc_8tap_regular_w2_h_8bpc_neon: 5.16 4.35 mc_8tap_regular_w2_hv_8bpc_neon: 5.37 4.98 mc_8tap_regular_w2_v_8bpc_neon: 6.35 4.85 mc_8tap_regular_w4_0_8bpc_neon: 6.78 5.73 mc_8tap_regular_w4_h_8bpc_neon: 8.40 6.60 mc_8tap_regular_w4_hv_8bpc_neon: 7.23 7.10 mc_8tap_regular_w4_v_8bpc_neon: 9.06 7.76 mc_8tap_regular_w8_0_8bpc_neon: 6.96 5.55 mc_8tap_regular_w8_h_8bpc_neon: 10.36 6.88 mc_8tap_regular_w8_hv_8bpc_neon: 9.49 6.86 mc_8tap_regular_w8_v_8bpc_neon: 12.06 9.61 mc_8tap_regular_w16_0_8bpc_neon: 6.68 4.51 mc_8tap_regular_w16_h_8bpc_neon: 12.30 7.77 mc_8tap_regular_w16_hv_8bpc_neon: 9.50 6.68 mc_8tap_regular_w16_v_8bpc_neon: 12.93 9.68 mc_8tap_regular_w32_0_8bpc_neon: 3.91 2.93 mc_8tap_regular_w32_h_8bpc_neon: 13.06 7.89 mc_8tap_regular_w32_hv_8bpc_neon: 9.37 6.70 mc_8tap_regular_w32_v_8bpc_neon: 12.88 9.49 mc_8tap_regular_w64_0_8bpc_neon: 2.89 1.68 mc_8tap_regular_w64_h_8bpc_neon: 13.48 8.00 mc_8tap_regular_w64_hv_8bpc_neon: 9.23 6.53 mc_8tap_regular_w64_v_8bpc_neon: 13.11 9.68 mc_8tap_regular_w128_0_8bpc_neon: 1.89 1.24 mc_8tap_regular_w128_h_8bpc_neon: 13.58 7.98 mc_8tap_regular_w128_hv_8bpc_neon: 8.86 6.53 mc_8tap_regular_w128_v_8bpc_neon: 12.46 9.63 mc_bilinear_w2_0_8bpc_neon: 7.02 5.40 mc_bilinear_w2_h_8bpc_neon: 3.65 3.14 mc_bilinear_w2_hv_8bpc_neon: 4.36 4.84 mc_bilinear_w2_v_8bpc_neon: 5.22 4.28 mc_bilinear_w4_0_8bpc_neon: 6.87 5.99 mc_bilinear_w4_h_8bpc_neon: 6.50 8.61 mc_bilinear_w4_hv_8bpc_neon: 7.70 7.99 mc_bilinear_w4_v_8bpc_neon: 7.04 9.10 mc_bilinear_w8_0_8bpc_neon: 7.03 5.70 mc_bilinear_w8_h_8bpc_neon: 11.30 15.14 mc_bilinear_w8_hv_8bpc_neon: 15.74 13.50 mc_bilinear_w8_v_8bpc_neon: 13.40 17.54 mc_bilinear_w16_0_8bpc_neon: 6.75 4.48 mc_bilinear_w16_h_8bpc_neon: 17.02 13.95 mc_bilinear_w16_hv_8bpc_neon: 17.37 13.78 mc_bilinear_w16_v_8bpc_neon: 23.69 22.98 mc_bilinear_w32_0_8bpc_neon: 3.88 3.18 mc_bilinear_w32_h_8bpc_neon: 18.80 14.97 mc_bilinear_w32_hv_8bpc_neon: 17.74 14.02 mc_bilinear_w32_v_8bpc_neon: 24.46 23.04 mc_bilinear_w64_0_8bpc_neon: 2.87 1.66 mc_bilinear_w64_h_8bpc_neon: 19.54 16.02 mc_bilinear_w64_hv_8bpc_neon: 17.80 14.32 mc_bilinear_w64_v_8bpc_neon: 24.79 23.63 mc_bilinear_w128_0_8bpc_neon: 2.13 1.23 mc_bilinear_w128_h_8bpc_neon: 19.89 16.24 mc_bilinear_w128_hv_8bpc_neon: 17.55 14.15 mc_bilinear_w128_v_8bpc_neon: 24.45 23.54 mct_8tap_regular_w4_0_8bpc_neon: 5.56 5.51 mct_8tap_regular_w4_h_8bpc_neon: 7.48 5.80 mct_8tap_regular_w4_hv_8bpc_neon: 7.27 7.09 mct_8tap_regular_w4_v_8bpc_neon: 7.80 6.84 mct_8tap_regular_w8_0_8bpc_neon: 9.54 9.25 mct_8tap_regular_w8_h_8bpc_neon: 9.08 6.55 mct_8tap_regular_w8_hv_8bpc_neon: 9.16 6.30 mct_8tap_regular_w8_v_8bpc_neon: 10.79 8.66 mct_8tap_regular_w16_0_8bpc_neon: 15.35 10.50 mct_8tap_regular_w16_h_8bpc_neon: 10.18 6.76 mct_8tap_regular_w16_hv_8bpc_neon: 9.17 6.11 mct_8tap_regular_w16_v_8bpc_neon: 11.52 8.72 mct_8tap_regular_w32_0_8bpc_neon: 15.82 10.09 mct_8tap_regular_w32_h_8bpc_neon: 10.75 6.85 mct_8tap_regular_w32_hv_8bpc_neon: 9.00 6.22 mct_8tap_regular_w32_v_8bpc_neon: 11.58 8.67 mct_8tap_regular_w64_0_8bpc_neon: 15.28 9.68 mct_8tap_regular_w64_h_8bpc_neon: 10.93 6.96 mct_8tap_regular_w64_hv_8bpc_neon: 8.81 6.53 mct_8tap_regular_w64_v_8bpc_neon: 11.42 8.73 mct_8tap_regular_w128_0_8bpc_neon: 14.41 7.67 mct_8tap_regular_w128_h_8bpc_neon: 10.92 6.96 mct_8tap_regular_w128_hv_8bpc_neon: 8.56 6.51 mct_8tap_regular_w128_v_8bpc_neon: 11.16 8.70 mct_bilinear_w4_0_8bpc_neon: 5.66 5.77 mct_bilinear_w4_h_8bpc_neon: 5.16 6.40 mct_bilinear_w4_hv_8bpc_neon: 6.86 6.82 mct_bilinear_w4_v_8bpc_neon: 4.75 6.09 mct_bilinear_w8_0_8bpc_neon: 9.78 10.00 mct_bilinear_w8_h_8bpc_neon: 8.98 11.37 mct_bilinear_w8_hv_8bpc_neon: 14.42 10.83 mct_bilinear_w8_v_8bpc_neon: 9.12 11.62 mct_bilinear_w16_0_8bpc_neon: 15.59 10.76 mct_bilinear_w16_h_8bpc_neon: 11.98 8.77 mct_bilinear_w16_hv_8bpc_neon: 15.83 10.73 mct_bilinear_w16_v_8bpc_neon: 14.70 14.60 mct_bilinear_w32_0_8bpc_neon: 15.89 10.32 mct_bilinear_w32_h_8bpc_neon: 13.47 9.07 mct_bilinear_w32_hv_8bpc_neon: 16.01 10.95 mct_bilinear_w32_v_8bpc_neon: 14.85 14.16 mct_bilinear_w64_0_8bpc_neon: 15.36 10.51 mct_bilinear_w64_h_8bpc_neon: 14.00 9.61 mct_bilinear_w64_hv_8bpc_neon: 15.82 11.27 mct_bilinear_w64_v_8bpc_neon: 14.61 14.76 mct_bilinear_w128_0_8bpc_neon: 14.41 7.92 mct_bilinear_w128_h_8bpc_neon: 13.31 9.58 mct_bilinear_w128_hv_8bpc_neon: 14.07 11.18 mct_bilinear_w128_v_8bpc_neon: 11.57 14.42
-
James Almer authored
-
Janne Grunau authored
Fixes #149.
-
- 17 Nov, 2018 2 commits
-
-
Ronald S. Bultje authored
This reverts commit 597a6eb9. It leads to assertion failures in oss-fuzz.
-
Janne Grunau authored
Fixes unaligned writes while splatting coefs for skip blocks with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5684725352497152 and clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576.
-
- 16 Nov, 2018 6 commits
-
-
Janne Grunau authored
-
Janne Grunau authored
-
Janne Grunau authored
Catches warnings in assert statements.
-
Ronald S. Bultje authored
ruy and rux are in unit_size dimensions, whereas lr_mask are in sb128 dimensions, and unit_idx is in sb64 dimensions, so one can't be derived from the other. Instead, remove ruy/rux and derive unit_idx and sb_idx directly from the block positions aligned to the unit_size.
-
Ronald S. Bultje authored
This is consistent with what libaom does. Should fix #175.
-
Ronald S. Bultje authored
Fixes #172.
-
- 15 Nov, 2018 10 commits
-
-
Ronald S. Bultje authored
-
Janne Grunau authored
With the decoupled decoding data there might be remaining input data during draining which can cause bitstream parsing errors.
-
Janne Grunau authored
The race is exposed by not draining the decoder correctly after 02606969 (decoupled decoding api). Fixes a memleak with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576. Credits to oss-fuzz.
-
Ronald S. Bultje authored
But don't abort decoding; instead, simply force translational motion.
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
James Almer authored
-
boyuanxiao-argondesign authored
See section 7.11.2.4 in AV1 spec. Because frame contexts are not passed into the ipred_z*_c functions the flag is set as a bit inside the 'angle' function argument.
-
boyuanxiao-argondesign authored
A new coded video sequence (see page 193; section 7.5 of the spec) begins when we see a sequence header that isn't bit identical to previous ones. This is the point at which we can throw away previous frames etc.
-
Janne Grunau authored
The number of read bits can be equal to the size of the packet. Fixes a triggered assert in clusterfuzz-testcase-minimized-dav1d_fuzzer-5746175664193536. Credits to oss-fuzz.
-
- 14 Nov, 2018 14 commits
-
-
Janne Grunau authored
-
Konstantin Pavlov authored
This version now includes clang.
-
Rupert Swarbrick authored
This code originally looked like "assert (init_bit_pos % 8 == 0)" and I changed it to use "& 7" to match the prevaling style. Unfortunately, "&" binds more weakly than "==". Oops!
-
Janne Grunau authored
-
Rupert Swarbrick authored
This fixes failures when an OBU has more than a byte's worth of trailing zeros. As part of this work, it also rejigs the dav1d_flush_get_bits function slightly. This worked before, but it wasn't very obvious why (it worked because bits_left was never more than 7). This patch renames it to dav1d_bytealign_get_bits, which makes it clearer what it does and adds a comment explaining why it works properly. The new dav1d_bytealign_get_bits is also now void (rather than returning the next byte to read). The patch defines dav1d_get_bits_pos, which returns the current bit position. This feels a little easier to reason about. We also add a new check to make sure that we haven't fallen off the end of the OBU. This can happen when a byte buffer contains more than one OBU: the GetBits might not have got to EOF, but we might now be half-way through the next OBU.
-
Rupert Swarbrick authored
See section 5.9.12 of the AV1 spec. The flag controlling U and V share a quantization level wasn't being read.
-
boyuanxiao-argondesign authored
The previous code raised an error if !segmentation.update_map but the reference frame didn't yield any segmentation data. (The first "goto error" that the patch removes happens if the reference frame was the right size but had no segmentation data; the second happens if the reference frame was the wrong size). This doesn't match the logic in the description of load_previous_segment_ids in section 6.8.2 of the spec. This patch allows such streams, allocating and zeroing cur_segmap in this case. It is still an error for a stream to signal a temporal update but not to have valid segmentation data from the ref frame - that's the error case that the patch puts back in.
-
boyuanxiao-argondesign authored
The first memset is dead code: if primary_ref_frame is PRIMARY_REF_NONE then segmentation.update_data is always true. The patch removes this memset and explains why the copy in the other branch is correct. The second memset should always fire: if segmentation is not enabled for this frame, the seg_data structure should be set to zero rather than copied from a reference frame (see section 5.9.14 of the AV1 spec).
-
Ronald S. Bultje authored
Fixes #166.
-
Janne Grunau authored
Fixes a heap buffer overflow in emu_edge_c with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5089954858795008 if the reference frame is smaller than the current frame. Credits to oss-fuzz.
-
Janne Grunau authored
Fixes undefined shifts in put_bilin_scaled_c with clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5732654503165952. Credits to oss-fuzz.
-
Ronald S. Bultje authored
-
Janne Grunau authored
Fixes a heap buffer overflow with high bit depth scaled reference frames in clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5652139771166720. Credits to oss-fuzz.
-
James Almer authored
-
- 13 Nov, 2018 3 commits
-
-
Ronald S. Bultje authored
Fixes #121.
-
David Michael Barr authored
-
Ronald S. Bultje authored
Fixes decoding of keyframe in #121.
-