1. 02 Jan, 2020 1 commit
  2. 03 Oct, 2019 1 commit
  3. 19 Sep, 2019 1 commit
    • Ronald S. Bultje's avatar
      x86: add deblocking loopfilters SSSE3 asm (64-bit) · 1e4e6c7a
      Ronald S. Bultje authored
      ```------------------
      x86_64:
      ```
      
      ---------------------------------------
      lpf_h_sb_uv_w4_8bpc_c: 430.6
      lpf_h_sb_uv_w4_8bpc_ssse3: 322.0
      lpf_h_sb_uv_w4_8bpc_avx2: 200.4
      ---------------------
      lpf_h_sb_uv_w6_8bpc_c: 981.9
      lpf_h_sb_uv_w6_8bpc_ssse3: 421.5
      lpf_h_sb_uv_w6_8bpc_avx2: 270.0
      ---------------------
      lpf_h_sb_y_w4_8bpc_c: 3001.7
      lpf_h_sb_y_w4_8bpc_ssse3: 466.3
      lpf_h_sb_y_w4_8bpc_avx2: 383.1
      ---------------------
      lpf_h_sb_y_w8_8bpc_c: 4457.7
      lpf_h_sb_y_w8_8bpc_ssse3: 818.9
      lpf_h_sb_y_w8_8bpc_avx2: 537.0
      ---------------------
      lpf_h_sb_y_w16_8bpc_c: 1967.9
      lpf_h_sb_y_w16_8bpc_ssse3: 1836.7
      lpf_h_sb_y_w16_8bpc_avx2: 1078.2
      ---------------------
      lpf_v_sb_uv_w4_8bpc_c: 369.4
      lpf_v_sb_uv_w4_8bpc_ssse3: 110.9
      lpf_v_sb_uv_w4_8bpc_avx2: 58.1
      ---------------------
      lpf_v_sb_uv_w6_8bpc_c: 769.6
      lpf_v_sb_uv_w6_8bpc_ssse3: 222.2
      lpf_v_sb_uv_w6_8bpc_avx2: 117.8
      ---------------------
      lpf_v_sb_y_w4_8bpc_c: 772.4
      lpf_v_sb_y_w4_8bpc_ssse3: 179.8
      lpf_v_sb_y_w4_8bpc_avx2: 173.6
      ---------------------
      lpf_v_sb_y_w8_8bpc_c: 1660.2
      lpf_v_sb_y_w8_8bpc_ssse3: 468.3
      lpf_v_sb_y_w8_8bpc_avx2: 345.8
      ---------------------
      lpf_v_sb_y_w16_8bpc_c: 1889.6
      lpf_v_sb_y_w16_8bpc_ssse3: 1142.0
      lpf_v_sb_y_w16_8bpc_avx2: 568.1
      ------------------------------------------
      1e4e6c7a
  4. 10 Sep, 2019 1 commit
  5. 03 Sep, 2019 1 commit
    • Janne Grunau's avatar
      TileContext: reorder scratch buffer to avoid conflicts · 863c3731
      Janne Grunau authored
      The chroma part of pal_idx potentially conflicts during intra
      reconstruction with edge_{8,16}bpc. Fixes out of range pixel values
      caused by invalid palette indices in
      clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5076736684851200.
      Fixes #294. Reported as integer overflows in boxsum5sqr with undefined
      behavior sanitizer. Credits to oss-fuzz.
      863c3731
  6. 28 Aug, 2019 1 commit
  7. 23 Aug, 2019 1 commit
  8. 13 Aug, 2019 1 commit
    • Henrik Gramner's avatar
      Add msac optimizations · e29fd5c0
      Henrik Gramner authored
       * Eliminate the trailing zero after the CDF probabilities. We can
         reuse the count value as a terminator instead. This reduces the
         size of the CDF context by around 8%.
      
       * Align the CDF arrays.
      
       * Various other minor optimizations.
      e29fd5c0
  9. 07 Aug, 2019 1 commit
  10. 02 Jul, 2019 2 commits
  11. 14 May, 2019 1 commit
  12. 14 Feb, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for pal_pred · d5cc8503
      Xuefeng Jiang authored
      pal_pred_w4_8bpc_c: 141.0
      pal_pred_w4_8bpc_ssse3: 23.4
      pal_pred_w8_8bpc_c: 374.5
      pal_pred_w8_8bpc_ssse3: 29.0
      pal_pred_w16_8bpc_c: 946.3
      pal_pred_w16_8bpc_ssse3: 45.6
      pal_pred_w32_8bpc_c: 1946.1
      pal_pred_w32_8bpc_ssse3: 92.3
      pal_pred_w64_8bpc_c: 4925.9
      pal_pred_w64_8bpc_ssse3: 180.1
      d5cc8503
  13. 13 Feb, 2019 2 commits
  14. 09 Feb, 2019 1 commit
  15. 28 Jan, 2019 1 commit
  16. 15 Jan, 2019 1 commit
  17. 05 Dec, 2018 2 commits
  18. 04 Dec, 2018 1 commit
  19. 28 Nov, 2018 2 commits
  20. 25 Nov, 2018 7 commits
  21. 23 Nov, 2018 2 commits
  22. 20 Nov, 2018 1 commit
  23. 19 Nov, 2018 2 commits
    • Niklas Haas's avatar
      film_grain: implement film grain synthesis · cfa986fe
      Niklas Haas authored
      This is using a slightly adapted version of my GPU-based algorithm. The
      major difference to the algorithm suggested by the spec (and implemented
      in libaom) is that instead of using a line buffer to hold the previous
      row's film grain blocks, we compute each row/block fully independently.
      
      This opens up the door to exploit parallelism in the future, since we
      don't have any left->right or top->down dependency except for the PRNG
      state. (Which we could pre-compute for a massively parallel / GPU
      implementation)
      
      That being said, it's probably somewhat slower than using a line buffer
      for the serial / single CPU case, although most likely not by much
      (since the areas with the most redundant work get progressively smaller,
      down to a single 2x2 square for the worst case).
      cfa986fe
    • Niklas Haas's avatar
      picture: make the film grain metadata public · 20e9f4df
      Niklas Haas authored
      This becomes part of the picture properties, since users may want to
      apply film grain themselves (e.g. for a GPU implementation).
      20e9f4df
  24. 16 Nov, 2018 1 commit
  25. 14 Nov, 2018 1 commit
  26. 13 Nov, 2018 1 commit
  27. 06 Nov, 2018 1 commit
  28. 04 Nov, 2018 1 commit