1. 20 Nov, 2018 2 commits
    • Ronald S. Bultje's avatar
      Don't set LR coefficient defaults at image edges · c627f16f
      Ronald S. Bultje authored
      These edges don't encode LR coefficients anyway. Fixes
      clusterfuzz-testcase-minimized-dav1d_fuzzer-5731769337249792.
      Credits to oss-fuzz.
      c627f16f
    • Martin Storsjö's avatar
      film_grain: Fix compilation with MSVC · 86fd0b6d
      Martin Storsjö authored
      This fixes compiler errors like these:
      src/film_grain_tmpl.c(238): error C2036: 'void *': unknown size
      
      Don't rely on sizeof(void) == 1 in pointer arithmetic, but instead
      cast the row pointers to the pixel datatype immediately, use PXSTRIDE()
      for converting a stride in byte units to pixel units, and skip
      sizeof(pixel) for horizontal offsets that previously were applied on
      a void pointer.
      86fd0b6d
  2. 19 Nov, 2018 7 commits
    • Janne Grunau's avatar
      frame mt: mark frame as failed in dav1d_close() · acee4345
      Janne Grunau authored
      Fixes a deadlock on teardown with
      clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5636065151418368. Credits
      to oss-fuzz.
      acee4345
    • Marvin Scholz's avatar
      build: Only add libdav1d_nasm_objs if needed · b3c522d5
      Marvin Scholz authored
      Current versions of meson have a bug that causes the need to add
      the nasm generated objects to checkasm, even though this should
      already be covered by the extract_all_objects() for libdav1d.
      Meson versions >= 0.48.999 (that is, Meson 0.49 and development
      versions states of that on git) fixed this issue so now adding
      this is not longer needed.
      Adding it regardless would actually cause an error because of
      symbols being present twice.
      b3c522d5
    • James Almer's avatar
      film_grain: include config.h before other headers · 9f77d9c3
      James Almer authored
      Fixes warnings about redefinition of _WIN32_WINNT on Windows targets
      9f77d9c3
    • Ronald S. Bultje's avatar
      a194d478
    • Niklas Haas's avatar
      film_grain: implement film grain synthesis · cfa986fe
      Niklas Haas authored
      This is using a slightly adapted version of my GPU-based algorithm. The
      major difference to the algorithm suggested by the spec (and implemented
      in libaom) is that instead of using a line buffer to hold the previous
      row's film grain blocks, we compute each row/block fully independently.
      
      This opens up the door to exploit parallelism in the future, since we
      don't have any left->right or top->down dependency except for the PRNG
      state. (Which we could pre-compute for a massively parallel / GPU
      implementation)
      
      That being said, it's probably somewhat slower than using a line buffer
      for the serial / single CPU case, although most likely not by much
      (since the areas with the most redundant work get progressively smaller,
      down to a single 2x2 square for the worst case).
      cfa986fe
    • Niklas Haas's avatar
      picture: make the film grain metadata public · 20e9f4df
      Niklas Haas authored
      This becomes part of the picture properties, since users may want to
      apply film grain themselves (e.g. for a GPU implementation).
      20e9f4df
    • Niklas Haas's avatar
      obu: parse uv_mult etc. as signed integers · df5230ef
      Niklas Haas authored
      The spec subtracts the signed offset from all of these when using them,
      like it does for e.g. ar_coeffs_y_plus_128, although for some reason
      the naming scheme is inconsistent here. Either way, it makes more sense
      to treat them as signed integers than unsigned integers.
      
      To avoid confusion since the name of the field is the same as the one in
      the spec, we mark the type as int8_t (resp. int16_t for the 9-bit field)
      to make it clear to the user that these are already signed integers.
      df5230ef
  3. 18 Nov, 2018 8 commits
    • Nathan Egge's avatar
      Call msac_decode_bool() for 2 element CDFs. · 5698bc91
      Nathan Egge authored
      5698bc91
    • Nathan Egge's avatar
      9f812914
    • Ronald S. Bultje's avatar
      Clip resize height to image size · ecf72597
      Ronald S. Bultje authored
      Fixes #183.
      ecf72597
    • Ronald S. Bultje's avatar
      Don't initialize the LR values if LR is disabled for a plane · 92020899
      Ronald S. Bultje authored
      Also fix a calculation for u_idx. Fixes 5646860283281408 of #183.
      92020899
    • Janne Grunau's avatar
    • Martin Storsjö's avatar
      arm64: mc: Implement 8tap and bilin functions · 4aa0363a
      Martin Storsjö authored
      These functions have been tuned against Cortex A53 and Snapdragon
      835. The bilin functions have mainly been written with code size
      in mind, as they aren't used much in practice.
      
      Relative speedups for the actual filtering fuctions (that don't
      just do a plain copy) are around 4-15x, some over 20x. This is
      in comparison with GCC 5.4 with autovectorization disabled; the
      actual real-world speedup against autovectorized C code is around
      4-10x.
      
      Relative speedups measured with checkasm:
                                      Cortex A53   Snapdragon 835
      mc_8tap_regular_w2_0_8bpc_neon:       6.96   5.28
      mc_8tap_regular_w2_h_8bpc_neon:       5.16   4.35
      mc_8tap_regular_w2_hv_8bpc_neon:      5.37   4.98
      mc_8tap_regular_w2_v_8bpc_neon:       6.35   4.85
      mc_8tap_regular_w4_0_8bpc_neon:       6.78   5.73
      mc_8tap_regular_w4_h_8bpc_neon:       8.40   6.60
      mc_8tap_regular_w4_hv_8bpc_neon:      7.23   7.10
      mc_8tap_regular_w4_v_8bpc_neon:       9.06   7.76
      mc_8tap_regular_w8_0_8bpc_neon:       6.96   5.55
      mc_8tap_regular_w8_h_8bpc_neon:      10.36   6.88
      mc_8tap_regular_w8_hv_8bpc_neon:      9.49   6.86
      mc_8tap_regular_w8_v_8bpc_neon:      12.06   9.61
      mc_8tap_regular_w16_0_8bpc_neon:      6.68   4.51
      mc_8tap_regular_w16_h_8bpc_neon:     12.30   7.77
      mc_8tap_regular_w16_hv_8bpc_neon:     9.50   6.68
      mc_8tap_regular_w16_v_8bpc_neon:     12.93   9.68
      mc_8tap_regular_w32_0_8bpc_neon:      3.91   2.93
      mc_8tap_regular_w32_h_8bpc_neon:     13.06   7.89
      mc_8tap_regular_w32_hv_8bpc_neon:     9.37   6.70
      mc_8tap_regular_w32_v_8bpc_neon:     12.88   9.49
      mc_8tap_regular_w64_0_8bpc_neon:      2.89   1.68
      mc_8tap_regular_w64_h_8bpc_neon:     13.48   8.00
      mc_8tap_regular_w64_hv_8bpc_neon:     9.23   6.53
      mc_8tap_regular_w64_v_8bpc_neon:     13.11   9.68
      mc_8tap_regular_w128_0_8bpc_neon:     1.89   1.24
      mc_8tap_regular_w128_h_8bpc_neon:    13.58   7.98
      mc_8tap_regular_w128_hv_8bpc_neon:    8.86   6.53
      mc_8tap_regular_w128_v_8bpc_neon:    12.46   9.63
      mc_bilinear_w2_0_8bpc_neon:           7.02   5.40
      mc_bilinear_w2_h_8bpc_neon:           3.65   3.14
      mc_bilinear_w2_hv_8bpc_neon:          4.36   4.84
      mc_bilinear_w2_v_8bpc_neon:           5.22   4.28
      mc_bilinear_w4_0_8bpc_neon:           6.87   5.99
      mc_bilinear_w4_h_8bpc_neon:           6.50   8.61
      mc_bilinear_w4_hv_8bpc_neon:          7.70   7.99
      mc_bilinear_w4_v_8bpc_neon:           7.04   9.10
      mc_bilinear_w8_0_8bpc_neon:           7.03   5.70
      mc_bilinear_w8_h_8bpc_neon:          11.30  15.14
      mc_bilinear_w8_hv_8bpc_neon:         15.74  13.50
      mc_bilinear_w8_v_8bpc_neon:          13.40  17.54
      mc_bilinear_w16_0_8bpc_neon:          6.75   4.48
      mc_bilinear_w16_h_8bpc_neon:         17.02  13.95
      mc_bilinear_w16_hv_8bpc_neon:        17.37  13.78
      mc_bilinear_w16_v_8bpc_neon:         23.69  22.98
      mc_bilinear_w32_0_8bpc_neon:          3.88   3.18
      mc_bilinear_w32_h_8bpc_neon:         18.80  14.97
      mc_bilinear_w32_hv_8bpc_neon:        17.74  14.02
      mc_bilinear_w32_v_8bpc_neon:         24.46  23.04
      mc_bilinear_w64_0_8bpc_neon:          2.87   1.66
      mc_bilinear_w64_h_8bpc_neon:         19.54  16.02
      mc_bilinear_w64_hv_8bpc_neon:        17.80  14.32
      mc_bilinear_w64_v_8bpc_neon:         24.79  23.63
      mc_bilinear_w128_0_8bpc_neon:         2.13   1.23
      mc_bilinear_w128_h_8bpc_neon:        19.89  16.24
      mc_bilinear_w128_hv_8bpc_neon:       17.55  14.15
      mc_bilinear_w128_v_8bpc_neon:        24.45  23.54
      mct_8tap_regular_w4_0_8bpc_neon:      5.56   5.51
      mct_8tap_regular_w4_h_8bpc_neon:      7.48   5.80
      mct_8tap_regular_w4_hv_8bpc_neon:     7.27   7.09
      mct_8tap_regular_w4_v_8bpc_neon:      7.80   6.84
      mct_8tap_regular_w8_0_8bpc_neon:      9.54   9.25
      mct_8tap_regular_w8_h_8bpc_neon:      9.08   6.55
      mct_8tap_regular_w8_hv_8bpc_neon:     9.16   6.30
      mct_8tap_regular_w8_v_8bpc_neon:     10.79   8.66
      mct_8tap_regular_w16_0_8bpc_neon:    15.35  10.50
      mct_8tap_regular_w16_h_8bpc_neon:    10.18   6.76
      mct_8tap_regular_w16_hv_8bpc_neon:    9.17   6.11
      mct_8tap_regular_w16_v_8bpc_neon:    11.52   8.72
      mct_8tap_regular_w32_0_8bpc_neon:    15.82  10.09
      mct_8tap_regular_w32_h_8bpc_neon:    10.75   6.85
      mct_8tap_regular_w32_hv_8bpc_neon:    9.00   6.22
      mct_8tap_regular_w32_v_8bpc_neon:    11.58   8.67
      mct_8tap_regular_w64_0_8bpc_neon:    15.28   9.68
      mct_8tap_regular_w64_h_8bpc_neon:    10.93   6.96
      mct_8tap_regular_w64_hv_8bpc_neon:    8.81   6.53
      mct_8tap_regular_w64_v_8bpc_neon:    11.42   8.73
      mct_8tap_regular_w128_0_8bpc_neon:   14.41   7.67
      mct_8tap_regular_w128_h_8bpc_neon:   10.92   6.96
      mct_8tap_regular_w128_hv_8bpc_neon:   8.56   6.51
      mct_8tap_regular_w128_v_8bpc_neon:   11.16   8.70
      mct_bilinear_w4_0_8bpc_neon:          5.66   5.77
      mct_bilinear_w4_h_8bpc_neon:          5.16   6.40
      mct_bilinear_w4_hv_8bpc_neon:         6.86   6.82
      mct_bilinear_w4_v_8bpc_neon:          4.75   6.09
      mct_bilinear_w8_0_8bpc_neon:          9.78  10.00
      mct_bilinear_w8_h_8bpc_neon:          8.98  11.37
      mct_bilinear_w8_hv_8bpc_neon:        14.42  10.83
      mct_bilinear_w8_v_8bpc_neon:          9.12  11.62
      mct_bilinear_w16_0_8bpc_neon:        15.59  10.76
      mct_bilinear_w16_h_8bpc_neon:        11.98   8.77
      mct_bilinear_w16_hv_8bpc_neon:       15.83  10.73
      mct_bilinear_w16_v_8bpc_neon:        14.70  14.60
      mct_bilinear_w32_0_8bpc_neon:        15.89  10.32
      mct_bilinear_w32_h_8bpc_neon:        13.47   9.07
      mct_bilinear_w32_hv_8bpc_neon:       16.01  10.95
      mct_bilinear_w32_v_8bpc_neon:        14.85  14.16
      mct_bilinear_w64_0_8bpc_neon:        15.36  10.51
      mct_bilinear_w64_h_8bpc_neon:        14.00   9.61
      mct_bilinear_w64_hv_8bpc_neon:       15.82  11.27
      mct_bilinear_w64_v_8bpc_neon:        14.61  14.76
      mct_bilinear_w128_0_8bpc_neon:       14.41   7.92
      mct_bilinear_w128_h_8bpc_neon:       13.31   9.58
      mct_bilinear_w128_hv_8bpc_neon:      14.07  11.18
      mct_bilinear_w128_v_8bpc_neon:       11.57  14.42
      4aa0363a
    • James Almer's avatar
      obu: support frame_refs_short_signaling · 842b2074
      James Almer authored
      842b2074
    • Janne Grunau's avatar
      arm: define PIC based on __PIC__ or __pic__ if not defined · 58bcccc9
      Janne Grunau authored
      Fixes #149.
      58bcccc9
  4. 17 Nov, 2018 2 commits
  5. 16 Nov, 2018 6 commits
  6. 15 Nov, 2018 10 commits
  7. 14 Nov, 2018 5 commits
    • Janne Grunau's avatar
      meson: fix disabling asm for arm/arm64 · a6b94ca9
      Janne Grunau authored
      a6b94ca9
    • Konstantin Pavlov's avatar
      CI: bump the dav1d-debian-unstable image version. · 949853f2
      Konstantin Pavlov authored
      This version now includes clang.
      949853f2
    • Rupert Swarbrick's avatar
      Fix operator order in obu.c · ca33a9b7
      Rupert Swarbrick authored
      This code originally looked like "assert (init_bit_pos % 8 == 0)" and
      I changed it to use "& 7" to match the prevaling style. Unfortunately,
      "&" binds more weakly than "==". Oops!
      ca33a9b7
    • Janne Grunau's avatar
    • Rupert Swarbrick's avatar
      Correctly flush at the end of OBUs · c59f1940
      Rupert Swarbrick authored
      This fixes failures when an OBU has more than a byte's worth of
      trailing zeros.
      
      As part of this work, it also rejigs the dav1d_flush_get_bits function
      slightly. This worked before, but it wasn't very obvious why (it
      worked because bits_left was never more than 7). This patch renames it
      to dav1d_bytealign_get_bits, which makes it clearer what it does and
      adds a comment explaining why it works properly.
      
      The new dav1d_bytealign_get_bits is also now void (rather than
      returning the next byte to read). The patch defines
      dav1d_get_bits_pos, which returns the current bit position. This feels
      a little easier to reason about.
      
      We also add a new check to make sure that we haven't fallen off the
      end of the OBU. This can happen when a byte buffer contains more than
      one OBU: the GetBits might not have got to EOF, but we might now be
      half-way through the next OBU.
      c59f1940