1. 28 Mar, 2019 1 commit
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use a better constant for SSE4 · 22c3594d
      Victorien Le Couviour--Tuffet authored
      Port of dc2ae517 for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 141.7
      cdef_filter_4x4_8bpc_sse4: 128.3
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 253.4
      cdef_filter_4x8_8bpc_sse4: 228.5
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 429.6
      cdef_filter_8x8_8bpc_sse4: 379.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 184.3
      cdef_filter_4x4_8bpc_sse4: 168.9
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 335.3
      cdef_filter_4x8_8bpc_sse4: 305.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 579.1
      cdef_filter_8x8_8bpc_sse4: 517.0
      ------------------------------------------
      22c3594d
  2. 26 Mar, 2019 1 commit
  3. 24 Mar, 2019 1 commit
    • Martin Storsjö's avatar
      Only define DAV1D_API to dllexport when building dav1d itself · 3f2bb0d9
      Martin Storsjö authored
      As meson still doesn't allow specifying different cflags between
      static and dynamic libraries, this still includes the dllexport
      in the static library when built with default_library=both, but
      it at least is avoided in static-only builds, and avoids
      defining these symbols as dllexport in the callers' translation
      units.
      3f2bb0d9
  4. 12 Mar, 2019 1 commit
  5. 09 Mar, 2019 1 commit
  6. 01 Mar, 2019 2 commits
  7. 26 Feb, 2019 2 commits
    • Victorien Le Couviour--Tuffet's avatar
      x86: add SSSE3 cdef filters implementation · 791ec219
      Victorien Le Couviour--Tuffet authored
      AVX2 adaption
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_c: 1370.2
      cdef_filter_4x4_8bpc_ssse3: 142.3
      cdef_filter_4x4_8bpc_avx2: 106.7
      ------------------------------------------
      cdef_filter_4x8_8bpc_c: 2749.3
      cdef_filter_4x8_8bpc_ssse3: 257.2
      cdef_filter_4x8_8bpc_avx2: 178.8
      ------------------------------------------
      cdef_filter_8x8_8bpc_c: 5609.5
      cdef_filter_8x8_8bpc_ssse3: 438.1
      cdef_filter_8x8_8bpc_avx2: 250.6
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_c: 1548.7
      cdef_filter_4x4_8bpc_ssse3: 179.8
      ------------------------------------------
      cdef_filter_4x8_8bpc_c: 3128.2
      cdef_filter_4x8_8bpc_ssse3: 328.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_c: 6454.5
      cdef_filter_8x8_8bpc_ssse3: 584.4
      ------------------------------------------
      791ec219
    • Janne Grunau's avatar
      fix dav1d spelling · ada9231c
      Janne Grunau authored
      ada9231c
  8. 25 Feb, 2019 1 commit
  9. 13 Feb, 2019 1 commit
  10. 10 Feb, 2019 1 commit
  11. 08 Feb, 2019 1 commit
    • Victorien Le Couviour--Tuffet's avatar
      looprestoration: add SSSE3 implementation · ab3da909
      Victorien Le Couviour--Tuffet authored
      AVX2 code adaption
      
      ---------------------
      x86_64:
      ------------------------------------------
      selfguided_3x3_8bpc_c: 308692.1
      selfguided_3x3_8bpc_ssse3: 112436.5
      selfguided_3x3_8bpc_avx2: 61749.8
      ------------------------------------------
      selfguided_5x5_8bpc_c: 312132.8
      selfguided_5x5_8bpc_ssse3: 79513.3
      selfguided_5x5_8bpc_avx2: 45947.3
      ------------------------------------------
      selfguided_mix_8bpc_c: 588951.9
      selfguided_mix_8bpc_ssse3: 196751.5
      selfguided_mix_8bpc_avx2: 109091.6
      ------------------------------------------
      wiener_chroma_8bpc_c: 258874.8
      wiener_chroma_8bpc_ssse3: 28172.4
      wiener_chroma_8bpc_avx2: 16910.5
      ------------------------------------------
      wiener_luma_8bpc_c: 264432.3
      wiener_luma_8bpc_ssse3: 27958.3
      wiener_luma_8bpc_avx2: 17303.8
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      selfguided_3x3_8bpc_c: 350430.5
      selfguided_3x3_8bpc_ssse3: 128850.8
      ------------------------------------------
      selfguided_5x5_8bpc_c: 313963.6
      selfguided_5x5_8bpc_ssse3: 81988.8
      ------------------------------------------
      selfguided_mix_8bpc_c: 630584.2
      selfguided_mix_8bpc_ssse3: 211802.0
      ------------------------------------------
      wiener_chroma_8bpc_c: 288928.5
      wiener_chroma_8bpc_ssse3: 30336.7
      ------------------------------------------
      wiener_luma_8bpc_c: 284500.6
      wiener_luma_8bpc_ssse3: 29521.9
      ------------------------------------------
      ab3da909
  12. 03 Feb, 2019 1 commit
  13. 28 Jan, 2019 1 commit
  14. 06 Dec, 2018 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for dav1d_ipred_h · 6f2f0188
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_h_w4_8bpc_c: 146.6
      intra_pred_h_w4_8bpc_ssse3: 30.6
      intra_pred_h_w8_8bpc_c: 236.3
      intra_pred_h_w8_8bpc_ssse3: 42.2
      intra_pred_h_w16_8bpc_c: 446.6
      intra_pred_h_w16_8bpc_ssse3: 55.8
      intra_pred_h_w32_8bpc_c: 688.2
      intra_pred_h_w32_8bpc_ssse3: 85.9
      intra_pred_h_w64_8bpc_c: 634.2
      intra_pred_h_w64_8bpc_ssse3: 169.2
      6f2f0188
  15. 05 Dec, 2018 1 commit
  16. 04 Dec, 2018 1 commit
  17. 28 Nov, 2018 1 commit
  18. 25 Nov, 2018 1 commit
    • Martin Storsjö's avatar
      arm64: looprestoration: NEON optimized wiener filter · 513dfa99
      Martin Storsjö authored
      The relative speedup compared to C code is around 4.2 for a Cortex A53
      and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code),
      6-7x compared to GCC's output without autovectorization, and ~8x
      compared to clang's output (which doesn't seem to try to vectorize
      this function).
      513dfa99
  19. 19 Nov, 2018 1 commit
    • Niklas Haas's avatar
      film_grain: implement film grain synthesis · cfa986fe
      Niklas Haas authored
      This is using a slightly adapted version of my GPU-based algorithm. The
      major difference to the algorithm suggested by the spec (and implemented
      in libaom) is that instead of using a line buffer to hold the previous
      row's film grain blocks, we compute each row/block fully independently.
      
      This opens up the door to exploit parallelism in the future, since we
      don't have any left->right or top->down dependency except for the PRNG
      state. (Which we could pre-compute for a massively parallel / GPU
      implementation)
      
      That being said, it's probably somewhat slower than using a line buffer
      for the serial / single CPU case, although most likely not by much
      (since the areas with the most redundant work get progressively smaller,
      down to a single 2x2 square for the worst case).
      cfa986fe
  20. 12 Nov, 2018 1 commit
  21. 28 Oct, 2018 1 commit
  22. 25 Oct, 2018 3 commits
    • Janne Grunau's avatar
      rename arch specific bitdepth template files · e214351b
      Janne Grunau authored
      Missed in 46e2a2d0. Arm asm will be hard to template so move them to
      the plain source list.
      
      Fix #96.
      e214351b
    • Marvin Scholz's avatar
      Build: Add suffix to templated BITDEPTH files · 46e2a2d0
      Marvin Scholz authored
      Fix #96
      46e2a2d0
    • Martin Storsjö's avatar
      arm/mc: Add 8 bit neon asm for avg, w_avg and mask · 515e2667
      Martin Storsjö authored
      checkasm --bench numbers from a Snapdragon 835:
      nop: 23.0
      avg_w4_8bpc_c: 385.0
      avg_w4_8bpc_neon: 34.0
      avg_w8_8bpc_c: 590.5
      avg_w8_8bpc_neon: 65.5
      avg_w16_8bpc_c: 1304.4
      avg_w16_8bpc_neon: 161.3
      avg_w32_8bpc_c: 4098.4
      avg_w32_8bpc_neon: 589.2
      avg_w64_8bpc_c: 8405.0
      avg_w64_8bpc_neon: 1367.1
      avg_w128_8bpc_c: 19667.9
      avg_w128_8bpc_neon: 3409.0
      w_avg_w4_8bpc_c: 453.8
      w_avg_w4_8bpc_neon: 50.0
      w_avg_w8_8bpc_c: 749.0
      w_avg_w8_8bpc_neon: 105.7
      w_avg_w16_8bpc_c: 1851.2
      w_avg_w16_8bpc_neon: 283.7
      w_avg_w32_8bpc_c: 5991.5
      w_avg_w32_8bpc_neon: 1080.9
      w_avg_w64_8bpc_c: 12763.5
      w_avg_w64_8bpc_neon: 2544.4
      w_avg_w128_8bpc_c: 30311.3
      w_avg_w128_8bpc_neon: 6350.5
      mask_w4_8bpc_c: 492.9
      mask_w4_8bpc_neon: 57.7
      mask_w8_8bpc_c: 1108.5
      mask_w8_8bpc_neon: 123.0
      mask_w16_8bpc_c: 2880.3
      mask_w16_8bpc_neon: 349.2
      mask_w32_8bpc_c: 8996.4
      mask_w32_8bpc_neon: 1368.1
      mask_w64_8bpc_c: 19570.3
      mask_w64_8bpc_neon: 3263.5
      mask_w128_8bpc_c: 46757.4
      mask_w128_8bpc_neon: 8743.1
      515e2667
  23. 20 Oct, 2018 1 commit
    • Janne Grunau's avatar
      arm64/mc: add 8-bit neon asm for avg, w_avg and mask · 80e47425
      Janne Grunau authored
      checkasm --bench on a Qualcomm Kryo (Sanpdragon 820):
      nop: 33.0
      avg_w4_8bpc_c: 450.5
      avg_w4_8bpc_neon: 20.1
      avg_w8_8bpc_c: 438.6
      avg_w8_8bpc_neon: 45.2
      avg_w16_8bpc_c: 1003.7
      avg_w16_8bpc_neon: 112.8
      avg_w32_8bpc_c: 3249.6
      avg_w32_8bpc_neon: 429.9
      avg_w64_8bpc_c: 7213.3
      avg_w64_8bpc_neon: 1299.4
      avg_w128_8bpc_c: 16791.3
      avg_w128_8bpc_neon: 2978.4
      w_avg_w4_8bpc_c: 605.7
      w_avg_w4_8bpc_neon: 30.9
      w_avg_w8_8bpc_c: 545.8
      w_avg_w8_8bpc_neon: 72.9
      w_avg_w16_8bpc_c: 1430.1
      w_avg_w16_8bpc_neon: 193.5
      w_avg_w32_8bpc_c: 4876.3
      w_avg_w32_8bpc_neon: 715.3
      w_avg_w64_8bpc_c: 11338.0
      w_avg_w64_8bpc_neon: 2147.0
      w_avg_w128_8bpc_c: 26822.0
      w_avg_w128_8bpc_neon: 4596.3
      mask_w4_8bpc_c: 604.6
      mask_w4_8bpc_neon: 37.2
      mask_w8_8bpc_c: 654.8
      mask_w8_8bpc_neon: 96.0
      mask_w16_8bpc_c: 1663.0
      mask_w16_8bpc_neon: 272.4
      mask_w32_8bpc_c: 5707.6
      mask_w32_8bpc_neon: 1028.9
      mask_w64_8bpc_c: 12735.3
      mask_w64_8bpc_neon: 2533.2
      mask_w128_8bpc_c: 31027.6
      mask_w128_8bpc_neon: 6247.2
      80e47425
  24. 17 Oct, 2018 1 commit
  25. 08 Oct, 2018 2 commits
  26. 03 Oct, 2018 2 commits
  27. 02 Oct, 2018 2 commits
    • Hugo Beauzée-Luyssen's avatar
      7efdb714
    • Marvin Scholz's avatar
      Build: Fix static library building · 9684908d
      Marvin Scholz authored
      Due to bugs in meson the approach with the intermediate static library
      for tests does not work very well, see #44. Therefore this commits
      removes that helper library and instead uses extract_all_objects for
      the tests.
      
      Due to the removal of the static helper library, it means we can no
      longer force static linking for dav1d tool on windows which means that
      when building a shared library the dav1d.exe will not be runnable in
      the build directory again.
      
      Fix #44
      9684908d
  28. 29 Sep, 2018 4 commits