1. 19 Nov, 2018 3 commits
    • Niklas Haas's avatar
      film_grain: implement film grain synthesis · cfa986fe
      Niklas Haas authored
      This is using a slightly adapted version of my GPU-based algorithm. The
      major difference to the algorithm suggested by the spec (and implemented
      in libaom) is that instead of using a line buffer to hold the previous
      row's film grain blocks, we compute each row/block fully independently.
      
      This opens up the door to exploit parallelism in the future, since we
      don't have any left->right or top->down dependency except for the PRNG
      state. (Which we could pre-compute for a massively parallel / GPU
      implementation)
      
      That being said, it's probably somewhat slower than using a line buffer
      for the serial / single CPU case, although most likely not by much
      (since the areas with the most redundant work get progressively smaller,
      down to a single 2x2 square for the worst case).
      cfa986fe
    • Niklas Haas's avatar
      picture: make the film grain metadata public · 20e9f4df
      Niklas Haas authored
      This becomes part of the picture properties, since users may want to
      apply film grain themselves (e.g. for a GPU implementation).
      20e9f4df
    • Niklas Haas's avatar
      obu: parse uv_mult etc. as signed integers · df5230ef
      Niklas Haas authored
      The spec subtracts the signed offset from all of these when using them,
      like it does for e.g. ar_coeffs_y_plus_128, although for some reason
      the naming scheme is inconsistent here. Either way, it makes more sense
      to treat them as signed integers than unsigned integers.
      
      To avoid confusion since the name of the field is the same as the one in
      the spec, we mark the type as int8_t (resp. int16_t for the 9-bit field)
      to make it clear to the user that these are already signed integers.
      df5230ef
  2. 18 Nov, 2018 8 commits
    • Nathan Egge's avatar
      Call msac_decode_bool() for 2 element CDFs. · 5698bc91
      Nathan Egge authored
      5698bc91
    • Nathan Egge's avatar
      9f812914
    • Ronald S. Bultje's avatar
      Clip resize height to image size · ecf72597
      Ronald S. Bultje authored
      Fixes #183.
      ecf72597
    • Ronald S. Bultje's avatar
      Don't initialize the LR values if LR is disabled for a plane · 92020899
      Ronald S. Bultje authored
      Also fix a calculation for u_idx. Fixes 5646860283281408 of #183.
      92020899
    • Janne Grunau's avatar
    • Martin Storsjö's avatar
      arm64: mc: Implement 8tap and bilin functions · 4aa0363a
      Martin Storsjö authored
      These functions have been tuned against Cortex A53 and Snapdragon
      835. The bilin functions have mainly been written with code size
      in mind, as they aren't used much in practice.
      
      Relative speedups for the actual filtering fuctions (that don't
      just do a plain copy) are around 4-15x, some over 20x. This is
      in comparison with GCC 5.4 with autovectorization disabled; the
      actual real-world speedup against autovectorized C code is around
      4-10x.
      
      Relative speedups measured with checkasm:
                                      Cortex A53   Snapdragon 835
      mc_8tap_regular_w2_0_8bpc_neon:       6.96   5.28
      mc_8tap_regular_w2_h_8bpc_neon:       5.16   4.35
      mc_8tap_regular_w2_hv_8bpc_neon:      5.37   4.98
      mc_8tap_regular_w2_v_8bpc_neon:       6.35   4.85
      mc_8tap_regular_w4_0_8bpc_neon:       6.78   5.73
      mc_8tap_regular_w4_h_8bpc_neon:       8.40   6.60
      mc_8tap_regular_w4_hv_8bpc_neon:      7.23   7.10
      mc_8tap_regular_w4_v_8bpc_neon:       9.06   7.76
      mc_8tap_regular_w8_0_8bpc_neon:       6.96   5.55
      mc_8tap_regular_w8_h_8bpc_neon:      10.36   6.88
      mc_8tap_regular_w8_hv_8bpc_neon:      9.49   6.86
      mc_8tap_regular_w8_v_8bpc_neon:      12.06   9.61
      mc_8tap_regular_w16_0_8bpc_neon:      6.68   4.51
      mc_8tap_regular_w16_h_8bpc_neon:     12.30   7.77
      mc_8tap_regular_w16_hv_8bpc_neon:     9.50   6.68
      mc_8tap_regular_w16_v_8bpc_neon:     12.93   9.68
      mc_8tap_regular_w32_0_8bpc_neon:      3.91   2.93
      mc_8tap_regular_w32_h_8bpc_neon:     13.06   7.89
      mc_8tap_regular_w32_hv_8bpc_neon:     9.37   6.70
      mc_8tap_regular_w32_v_8bpc_neon:     12.88   9.49
      mc_8tap_regular_w64_0_8bpc_neon:      2.89   1.68
      mc_8tap_regular_w64_h_8bpc_neon:     13.48   8.00
      mc_8tap_regular_w64_hv_8bpc_neon:     9.23   6.53
      mc_8tap_regular_w64_v_8bpc_neon:     13.11   9.68
      mc_8tap_regular_w128_0_8bpc_neon:     1.89   1.24
      mc_8tap_regular_w128_h_8bpc_neon:    13.58   7.98
      mc_8tap_regular_w128_hv_8bpc_neon:    8.86   6.53
      mc_8tap_regular_w128_v_8bpc_neon:    12.46   9.63
      mc_bilinear_w2_0_8bpc_neon:           7.02   5.40
      mc_bilinear_w2_h_8bpc_neon:           3.65   3.14
      mc_bilinear_w2_hv_8bpc_neon:          4.36   4.84
      mc_bilinear_w2_v_8bpc_neon:           5.22   4.28
      mc_bilinear_w4_0_8bpc_neon:           6.87   5.99
      mc_bilinear_w4_h_8bpc_neon:           6.50   8.61
      mc_bilinear_w4_hv_8bpc_neon:          7.70   7.99
      mc_bilinear_w4_v_8bpc_neon:           7.04   9.10
      mc_bilinear_w8_0_8bpc_neon:           7.03   5.70
      mc_bilinear_w8_h_8bpc_neon:          11.30  15.14
      mc_bilinear_w8_hv_8bpc_neon:         15.74  13.50
      mc_bilinear_w8_v_8bpc_neon:          13.40  17.54
      mc_bilinear_w16_0_8bpc_neon:          6.75   4.48
      mc_bilinear_w16_h_8bpc_neon:         17.02  13.95
      mc_bilinear_w16_hv_8bpc_neon:        17.37  13.78
      mc_bilinear_w16_v_8bpc_neon:         23.69  22.98
      mc_bilinear_w32_0_8bpc_neon:          3.88   3.18
      mc_bilinear_w32_h_8bpc_neon:         18.80  14.97
      mc_bilinear_w32_hv_8bpc_neon:        17.74  14.02
      mc_bilinear_w32_v_8bpc_neon:         24.46  23.04
      mc_bilinear_w64_0_8bpc_neon:          2.87   1.66
      mc_bilinear_w64_h_8bpc_neon:         19.54  16.02
      mc_bilinear_w64_hv_8bpc_neon:        17.80  14.32
      mc_bilinear_w64_v_8bpc_neon:         24.79  23.63
      mc_bilinear_w128_0_8bpc_neon:         2.13   1.23
      mc_bilinear_w128_h_8bpc_neon:        19.89  16.24
      mc_bilinear_w128_hv_8bpc_neon:       17.55  14.15
      mc_bilinear_w128_v_8bpc_neon:        24.45  23.54
      mct_8tap_regular_w4_0_8bpc_neon:      5.56   5.51
      mct_8tap_regular_w4_h_8bpc_neon:      7.48   5.80
      mct_8tap_regular_w4_hv_8bpc_neon:     7.27   7.09
      mct_8tap_regular_w4_v_8bpc_neon:      7.80   6.84
      mct_8tap_regular_w8_0_8bpc_neon:      9.54   9.25
      mct_8tap_regular_w8_h_8bpc_neon:      9.08   6.55
      mct_8tap_regular_w8_hv_8bpc_neon:     9.16   6.30
      mct_8tap_regular_w8_v_8bpc_neon:     10.79   8.66
      mct_8tap_regular_w16_0_8bpc_neon:    15.35  10.50
      mct_8tap_regular_w16_h_8bpc_neon:    10.18   6.76
      mct_8tap_regular_w16_hv_8bpc_neon:    9.17   6.11
      mct_8tap_regular_w16_v_8bpc_neon:    11.52   8.72
      mct_8tap_regular_w32_0_8bpc_neon:    15.82  10.09
      mct_8tap_regular_w32_h_8bpc_neon:    10.75   6.85
      mct_8tap_regular_w32_hv_8bpc_neon:    9.00   6.22
      mct_8tap_regular_w32_v_8bpc_neon:    11.58   8.67
      mct_8tap_regular_w64_0_8bpc_neon:    15.28   9.68
      mct_8tap_regular_w64_h_8bpc_neon:    10.93   6.96
      mct_8tap_regular_w64_hv_8bpc_neon:    8.81   6.53
      mct_8tap_regular_w64_v_8bpc_neon:    11.42   8.73
      mct_8tap_regular_w128_0_8bpc_neon:   14.41   7.67
      mct_8tap_regular_w128_h_8bpc_neon:   10.92   6.96
      mct_8tap_regular_w128_hv_8bpc_neon:   8.56   6.51
      mct_8tap_regular_w128_v_8bpc_neon:   11.16   8.70
      mct_bilinear_w4_0_8bpc_neon:          5.66   5.77
      mct_bilinear_w4_h_8bpc_neon:          5.16   6.40
      mct_bilinear_w4_hv_8bpc_neon:         6.86   6.82
      mct_bilinear_w4_v_8bpc_neon:          4.75   6.09
      mct_bilinear_w8_0_8bpc_neon:          9.78  10.00
      mct_bilinear_w8_h_8bpc_neon:          8.98  11.37
      mct_bilinear_w8_hv_8bpc_neon:        14.42  10.83
      mct_bilinear_w8_v_8bpc_neon:          9.12  11.62
      mct_bilinear_w16_0_8bpc_neon:        15.59  10.76
      mct_bilinear_w16_h_8bpc_neon:        11.98   8.77
      mct_bilinear_w16_hv_8bpc_neon:       15.83  10.73
      mct_bilinear_w16_v_8bpc_neon:        14.70  14.60
      mct_bilinear_w32_0_8bpc_neon:        15.89  10.32
      mct_bilinear_w32_h_8bpc_neon:        13.47   9.07
      mct_bilinear_w32_hv_8bpc_neon:       16.01  10.95
      mct_bilinear_w32_v_8bpc_neon:        14.85  14.16
      mct_bilinear_w64_0_8bpc_neon:        15.36  10.51
      mct_bilinear_w64_h_8bpc_neon:        14.00   9.61
      mct_bilinear_w64_hv_8bpc_neon:       15.82  11.27
      mct_bilinear_w64_v_8bpc_neon:        14.61  14.76
      mct_bilinear_w128_0_8bpc_neon:       14.41   7.92
      mct_bilinear_w128_h_8bpc_neon:       13.31   9.58
      mct_bilinear_w128_hv_8bpc_neon:      14.07  11.18
      mct_bilinear_w128_v_8bpc_neon:       11.57  14.42
      4aa0363a
    • James Almer's avatar
      obu: support frame_refs_short_signaling · 842b2074
      James Almer authored
      842b2074
    • Janne Grunau's avatar
      58bcccc9
  3. 17 Nov, 2018 2 commits
  4. 16 Nov, 2018 6 commits
  5. 15 Nov, 2018 10 commits
  6. 14 Nov, 2018 11 commits
    • Janne Grunau's avatar
      meson: fix disabling asm for arm/arm64 · a6b94ca9
      Janne Grunau authored
      a6b94ca9
    • Konstantin Pavlov's avatar
      CI: bump the dav1d-debian-unstable image version. · 949853f2
      Konstantin Pavlov authored
      This version now includes clang.
      949853f2
    • Rupert Swarbrick's avatar
      Fix operator order in obu.c · ca33a9b7
      Rupert Swarbrick authored
      This code originally looked like "assert (init_bit_pos % 8 == 0)" and
      I changed it to use "& 7" to match the prevaling style. Unfortunately,
      "&" binds more weakly than "==". Oops!
      ca33a9b7
    • Janne Grunau's avatar
    • Rupert Swarbrick's avatar
      Correctly flush at the end of OBUs · c59f1940
      Rupert Swarbrick authored
      This fixes failures when an OBU has more than a byte's worth of
      trailing zeros.
      
      As part of this work, it also rejigs the dav1d_flush_get_bits function
      slightly. This worked before, but it wasn't very obvious why (it
      worked because bits_left was never more than 7). This patch renames it
      to dav1d_bytealign_get_bits, which makes it clearer what it does and
      adds a comment explaining why it works properly.
      
      The new dav1d_bytealign_get_bits is also now void (rather than
      returning the next byte to read). The patch defines
      dav1d_get_bits_pos, which returns the current bit position. This feels
      a little easier to reason about.
      
      We also add a new check to make sure that we haven't fallen off the
      end of the OBU. This can happen when a byte buffer contains more than
      one OBU: the GetBits might not have got to EOF, but we might now be
      half-way through the next OBU.
      c59f1940
    • Rupert Swarbrick's avatar
      Fix how we read the UV quantization level · 2532642b
      Rupert Swarbrick authored
      See section 5.9.12 of the AV1 spec. The flag controlling U and V share
      a quantization level wasn't being read.
      2532642b
    • boyuanxiao-argondesign's avatar
      Segmentation map reference logic · 066b02c2
      boyuanxiao-argondesign authored
      The previous code raised an error if !segmentation.update_map but the
      reference frame didn't yield any segmentation data. (The first "goto
      error" that the patch removes happens if the reference frame was the
      right size but had no segmentation data; the second happens if the
      reference frame was the wrong size).
      
      This doesn't match the logic in the description of
      load_previous_segment_ids in section 6.8.2 of the spec.
      
      This patch allows such streams, allocating and zeroing cur_segmap in
      this case. It is still an error for a stream to signal a temporal
      update but not to have valid segmentation data from the ref frame -
      that's the error case that the patch puts back in.
      066b02c2
    • boyuanxiao-argondesign's avatar
      Fix parsing segmentation data in parse_frame_hdr · 2f7eb1e9
      boyuanxiao-argondesign authored
      The first memset is dead code: if primary_ref_frame is
      PRIMARY_REF_NONE then segmentation.update_data is always true. The
      patch removes this memset and explains why the copy in the other
      branch is correct.
      
      The second memset should always fire: if segmentation is not enabled
      for this frame, the seg_data structure should be set to zero rather
      than copied from a reference frame (see section 5.9.14 of the AV1
      spec).
      2f7eb1e9
    • Ronald S. Bultje's avatar
      Fix segmentation map size check · 0bf59f09
      Ronald S. Bultje authored
      Fixes #166.
      0bf59f09
    • Janne Grunau's avatar
      mc: use width/height of reference frame in warp_affine · cf9ec49a
      Janne Grunau authored
      Fixes a heap buffer overflow in emu_edge_c with
      clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5089954858795008 if the
      reference frame is smaller than the current frame. Credits to oss-fuzz.
      cf9ec49a
    • Janne Grunau's avatar
      mc: ensure order of evaluation of macro arguments in FILTER_BILIN · faa09008
      Janne Grunau authored
      Fixes undefined shifts in put_bilin_scaled_c with
      clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5732654503165952. Credits
      to oss-fuzz.
      faa09008