1. 25 Feb, 2019 1 commit
  2. 13 Feb, 2019 1 commit
  3. 10 Feb, 2019 1 commit
  4. 08 Feb, 2019 1 commit
    • Victorien Le Couviour--Tuffet's avatar
      looprestoration: add SSSE3 implementation · ab3da909
      Victorien Le Couviour--Tuffet authored
      AVX2 code adaption
      
      ---------------------
      x86_64:
      ------------------------------------------
      selfguided_3x3_8bpc_c: 308692.1
      selfguided_3x3_8bpc_ssse3: 112436.5
      selfguided_3x3_8bpc_avx2: 61749.8
      ------------------------------------------
      selfguided_5x5_8bpc_c: 312132.8
      selfguided_5x5_8bpc_ssse3: 79513.3
      selfguided_5x5_8bpc_avx2: 45947.3
      ------------------------------------------
      selfguided_mix_8bpc_c: 588951.9
      selfguided_mix_8bpc_ssse3: 196751.5
      selfguided_mix_8bpc_avx2: 109091.6
      ------------------------------------------
      wiener_chroma_8bpc_c: 258874.8
      wiener_chroma_8bpc_ssse3: 28172.4
      wiener_chroma_8bpc_avx2: 16910.5
      ------------------------------------------
      wiener_luma_8bpc_c: 264432.3
      wiener_luma_8bpc_ssse3: 27958.3
      wiener_luma_8bpc_avx2: 17303.8
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      selfguided_3x3_8bpc_c: 350430.5
      selfguided_3x3_8bpc_ssse3: 128850.8
      ------------------------------------------
      selfguided_5x5_8bpc_c: 313963.6
      selfguided_5x5_8bpc_ssse3: 81988.8
      ------------------------------------------
      selfguided_mix_8bpc_c: 630584.2
      selfguided_mix_8bpc_ssse3: 211802.0
      ------------------------------------------
      wiener_chroma_8bpc_c: 288928.5
      wiener_chroma_8bpc_ssse3: 30336.7
      ------------------------------------------
      wiener_luma_8bpc_c: 284500.6
      wiener_luma_8bpc_ssse3: 29521.9
      ------------------------------------------
      ab3da909
  5. 03 Feb, 2019 1 commit
  6. 28 Jan, 2019 1 commit
  7. 06 Dec, 2018 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for dav1d_ipred_h · 6f2f0188
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_h_w4_8bpc_c: 146.6
      intra_pred_h_w4_8bpc_ssse3: 30.6
      intra_pred_h_w8_8bpc_c: 236.3
      intra_pred_h_w8_8bpc_ssse3: 42.2
      intra_pred_h_w16_8bpc_c: 446.6
      intra_pred_h_w16_8bpc_ssse3: 55.8
      intra_pred_h_w32_8bpc_c: 688.2
      intra_pred_h_w32_8bpc_ssse3: 85.9
      intra_pred_h_w64_8bpc_c: 634.2
      intra_pred_h_w64_8bpc_ssse3: 169.2
      6f2f0188
  8. 05 Dec, 2018 1 commit
  9. 04 Dec, 2018 1 commit
  10. 28 Nov, 2018 1 commit
  11. 25 Nov, 2018 1 commit
    • Martin Storsjö's avatar
      arm64: looprestoration: NEON optimized wiener filter · 513dfa99
      Martin Storsjö authored
      The relative speedup compared to C code is around 4.2 for a Cortex A53
      and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code),
      6-7x compared to GCC's output without autovectorization, and ~8x
      compared to clang's output (which doesn't seem to try to vectorize
      this function).
      513dfa99
  12. 19 Nov, 2018 1 commit
    • Niklas Haas's avatar
      film_grain: implement film grain synthesis · cfa986fe
      Niklas Haas authored
      This is using a slightly adapted version of my GPU-based algorithm. The
      major difference to the algorithm suggested by the spec (and implemented
      in libaom) is that instead of using a line buffer to hold the previous
      row's film grain blocks, we compute each row/block fully independently.
      
      This opens up the door to exploit parallelism in the future, since we
      don't have any left->right or top->down dependency except for the PRNG
      state. (Which we could pre-compute for a massively parallel / GPU
      implementation)
      
      That being said, it's probably somewhat slower than using a line buffer
      for the serial / single CPU case, although most likely not by much
      (since the areas with the most redundant work get progressively smaller,
      down to a single 2x2 square for the worst case).
      cfa986fe
  13. 12 Nov, 2018 1 commit
  14. 28 Oct, 2018 1 commit
  15. 25 Oct, 2018 3 commits
    • Janne Grunau's avatar
      rename arch specific bitdepth template files · e214351b
      Janne Grunau authored
      Missed in 46e2a2d0. Arm asm will be hard to template so move them to
      the plain source list.
      
      Fix #96.
      e214351b
    • Marvin Scholz's avatar
      Build: Add suffix to templated BITDEPTH files · 46e2a2d0
      Marvin Scholz authored
      Fix #96
      46e2a2d0
    • Martin Storsjö's avatar
      arm/mc: Add 8 bit neon asm for avg, w_avg and mask · 515e2667
      Martin Storsjö authored
      checkasm --bench numbers from a Snapdragon 835:
      nop: 23.0
      avg_w4_8bpc_c: 385.0
      avg_w4_8bpc_neon: 34.0
      avg_w8_8bpc_c: 590.5
      avg_w8_8bpc_neon: 65.5
      avg_w16_8bpc_c: 1304.4
      avg_w16_8bpc_neon: 161.3
      avg_w32_8bpc_c: 4098.4
      avg_w32_8bpc_neon: 589.2
      avg_w64_8bpc_c: 8405.0
      avg_w64_8bpc_neon: 1367.1
      avg_w128_8bpc_c: 19667.9
      avg_w128_8bpc_neon: 3409.0
      w_avg_w4_8bpc_c: 453.8
      w_avg_w4_8bpc_neon: 50.0
      w_avg_w8_8bpc_c: 749.0
      w_avg_w8_8bpc_neon: 105.7
      w_avg_w16_8bpc_c: 1851.2
      w_avg_w16_8bpc_neon: 283.7
      w_avg_w32_8bpc_c: 5991.5
      w_avg_w32_8bpc_neon: 1080.9
      w_avg_w64_8bpc_c: 12763.5
      w_avg_w64_8bpc_neon: 2544.4
      w_avg_w128_8bpc_c: 30311.3
      w_avg_w128_8bpc_neon: 6350.5
      mask_w4_8bpc_c: 492.9
      mask_w4_8bpc_neon: 57.7
      mask_w8_8bpc_c: 1108.5
      mask_w8_8bpc_neon: 123.0
      mask_w16_8bpc_c: 2880.3
      mask_w16_8bpc_neon: 349.2
      mask_w32_8bpc_c: 8996.4
      mask_w32_8bpc_neon: 1368.1
      mask_w64_8bpc_c: 19570.3
      mask_w64_8bpc_neon: 3263.5
      mask_w128_8bpc_c: 46757.4
      mask_w128_8bpc_neon: 8743.1
      515e2667
  16. 20 Oct, 2018 1 commit
    • Janne Grunau's avatar
      arm64/mc: add 8-bit neon asm for avg, w_avg and mask · 80e47425
      Janne Grunau authored
      checkasm --bench on a Qualcomm Kryo (Sanpdragon 820):
      nop: 33.0
      avg_w4_8bpc_c: 450.5
      avg_w4_8bpc_neon: 20.1
      avg_w8_8bpc_c: 438.6
      avg_w8_8bpc_neon: 45.2
      avg_w16_8bpc_c: 1003.7
      avg_w16_8bpc_neon: 112.8
      avg_w32_8bpc_c: 3249.6
      avg_w32_8bpc_neon: 429.9
      avg_w64_8bpc_c: 7213.3
      avg_w64_8bpc_neon: 1299.4
      avg_w128_8bpc_c: 16791.3
      avg_w128_8bpc_neon: 2978.4
      w_avg_w4_8bpc_c: 605.7
      w_avg_w4_8bpc_neon: 30.9
      w_avg_w8_8bpc_c: 545.8
      w_avg_w8_8bpc_neon: 72.9
      w_avg_w16_8bpc_c: 1430.1
      w_avg_w16_8bpc_neon: 193.5
      w_avg_w32_8bpc_c: 4876.3
      w_avg_w32_8bpc_neon: 715.3
      w_avg_w64_8bpc_c: 11338.0
      w_avg_w64_8bpc_neon: 2147.0
      w_avg_w128_8bpc_c: 26822.0
      w_avg_w128_8bpc_neon: 4596.3
      mask_w4_8bpc_c: 604.6
      mask_w4_8bpc_neon: 37.2
      mask_w8_8bpc_c: 654.8
      mask_w8_8bpc_neon: 96.0
      mask_w16_8bpc_c: 1663.0
      mask_w16_8bpc_neon: 272.4
      mask_w32_8bpc_c: 5707.6
      mask_w32_8bpc_neon: 1028.9
      mask_w64_8bpc_c: 12735.3
      mask_w64_8bpc_neon: 2533.2
      mask_w128_8bpc_c: 31027.6
      mask_w128_8bpc_neon: 6247.2
      80e47425
  17. 17 Oct, 2018 1 commit
  18. 08 Oct, 2018 2 commits
  19. 03 Oct, 2018 2 commits
  20. 02 Oct, 2018 2 commits
    • Hugo Beauzée-Luyssen's avatar
      7efdb714
    • Marvin Scholz's avatar
      Build: Fix static library building · 9684908d
      Marvin Scholz authored
      Due to bugs in meson the approach with the intermediate static library
      for tests does not work very well, see #44. Therefore this commits
      removes that helper library and instead uses extract_all_objects for
      the tests.
      
      Due to the removal of the static helper library, it means we can no
      longer force static linking for dav1d tool on windows which means that
      when building a shared library the dav1d.exe will not be runnable in
      the build directory again.
      
      Fix #44
      9684908d
  21. 29 Sep, 2018 4 commits