1. 21 Jan, 2020 2 commits
  2. 20 Jan, 2020 3 commits
  3. 15 Jan, 2020 1 commit
  4. 14 Jan, 2020 2 commits
  5. 10 Jan, 2020 3 commits
    • Ronald S. Bultje's avatar
      SSSE3 implementations of film grain · 8ff89463
      Ronald S. Bultje authored
      gen_grain_y_ar0_8bpc_c: 84853.3
      gen_grain_y_ar0_8bpc_ssse3: 23528.0
      gen_grain_y_ar1_8bpc_c: 140775.5
      gen_grain_y_ar1_8bpc_ssse3: 70410.2
      gen_grain_y_ar2_8bpc_c: 251311.3
      gen_grain_y_ar2_8bpc_ssse3: 95222.2
      gen_grain_y_ar3_8bpc_c: 394763.0
      gen_grain_y_ar3_8bpc_ssse3: 103541.9
      
      gen_grain_uv_ar0_8bpc_420_c: 29773.7
      gen_grain_uv_ar0_8bpc_420_ssse3: 7068.9
      gen_grain_uv_ar1_8bpc_420_c: 46113.2
      gen_grain_uv_ar1_8bpc_420_ssse3: 22148.1
      gen_grain_uv_ar2_8bpc_420_c: 70061.4
      gen_grain_uv_ar2_8bpc_420_ssse3: 25479.0
      gen_grain_uv_ar3_8bpc_420_c: 113826.0
      gen_grain_uv_ar3_8bpc_420_ssse3: 30004.9
      
      fguv_32x32xn_8bpc_420_csfl0_c: 8148.9
      fguv_32x32xn_8bpc_420_csfl0_ssse3: 1371.3
      fguv_32x32xn_8bpc_420_csfl1_c: 6391.9
      fguv_32x32xn_8bpc_420_csfl1_ssse3: 1034.8
      
      fgy_32x32xn_8bpc_c: 14201.3
      fgy_32x32xn_8bpc_ssse3: 3443.0
      8ff89463
    • Dale Curtis's avatar
      Reduce scope of NO_SANITIZE usage · e79e5ceb
      Dale Curtis authored
      dav1d_open() is part of the public API and should be sanitized, limit
      sanitizer disable to just the problematic dlsym() method.
      e79e5ceb
    • Henrik Gramner's avatar
      Add a workaround for -fsanitize=cfi + dlsym() issue · c192e0db
      Henrik Gramner authored
      CFI will SIGILL when calling a function pointer obtained through
      dlsym(), regardless of whether or not the signature is correct.
      
      See https://bugs.llvm.org/show_bug.cgi?id=44500
      c192e0db
  6. 09 Jan, 2020 3 commits
    • Victorien Le Couviour--Tuffet's avatar
      x86: add prep_bilin AVX512 asm · 5462c2a8
      Victorien Le Couviour--Tuffet authored
      ```---------------------------------------
      mct_bilinear_w4_0_8bpc_avx2:      3.8
      mct_bilinear_w4_0_8bpc_avx512icl: 3.7
      ```
      
      ------------------
      mct_bilinear_w8_0_8bpc_avx2:      5.0
      mct_bilinear_w8_0_8bpc_avx512icl: 4.8
      ---------------------
      mct_bilinear_w16_0_8bpc_avx2:      8.5
      mct_bilinear_w16_0_8bpc_avx512icl: 7.1
      ---------------------
      mct_bilinear_w32_0_8bpc_avx2:      29.5
      mct_bilinear_w32_0_8bpc_avx512icl: 17.1
      ---------------------
      mct_bilinear_w64_0_8bpc_avx2:      68.1
      mct_bilinear_w64_0_8bpc_avx512icl: 34.7
      ---------------------
      mct_bilinear_w128_0_8bpc_avx2:      180.5
      mct_bilinear_w128_0_8bpc_avx512icl: 138.0
      ------------------------------------------
      mct_bilinear_w4_h_8bpc_avx2:      4.0
      mct_bilinear_w4_h_8bpc_avx512icl: 3.9
      ---------------------
      mct_bilinear_w8_h_8bpc_avx2:      5.3
      mct_bilinear_w8_h_8bpc_avx512icl: 5.0
      ---------------------
      mct_bilinear_w16_h_8bpc_avx2:      11.7
      mct_bilinear_w16_h_8bpc_avx512icl:  7.5
      ---------------------
      mct_bilinear_w32_h_8bpc_avx2:      41.8
      mct_bilinear_w32_h_8bpc_avx512icl: 20.3
      ---------------------
      mct_bilinear_w64_h_8bpc_avx2:      94.9
      mct_bilinear_w64_h_8bpc_avx512icl: 35.0
      ---------------------
      mct_bilinear_w128_h_8bpc_avx2:      240.1
      mct_bilinear_w128_h_8bpc_avx512icl: 143.8
      ------------------------------------------
      mct_bilinear_w4_v_8bpc_avx2:      4.1
      mct_bilinear_w4_v_8bpc_avx512icl: 4.0
      ---------------------
      mct_bilinear_w8_v_8bpc_avx2:      6.0
      mct_bilinear_w8_v_8bpc_avx512icl: 5.4
      ---------------------
      mct_bilinear_w16_v_8bpc_avx2:      10.3
      mct_bilinear_w16_v_8bpc_avx512icl:  8.9
      ---------------------
      mct_bilinear_w32_v_8bpc_avx2:      29.5
      mct_bilinear_w32_v_8bpc_avx512icl: 25.9
      ---------------------
      mct_bilinear_w64_v_8bpc_avx2:      64.3
      mct_bilinear_w64_v_8bpc_avx512icl: 41.3
      ---------------------
      mct_bilinear_w128_v_8bpc_avx2:      198.2
      mct_bilinear_w128_v_8bpc_avx512icl: 139.6
      ------------------------------------------
      mct_bilinear_w4_hv_8bpc_avx2:      5.6
      mct_bilinear_w4_hv_8bpc_avx512icl: 5.2
      ---------------------
      mct_bilinear_w8_hv_8bpc_avx2:      8.3
      mct_bilinear_w8_hv_8bpc_avx512icl: 7.0
      ---------------------
      mct_bilinear_w16_hv_8bpc_avx2:      19.4
      mct_bilinear_w16_hv_8bpc_avx512icl: 12.1
      ---------------------
      mct_bilinear_w32_hv_8bpc_avx2:      69.1
      mct_bilinear_w32_hv_8bpc_avx512icl: 32.5
      ---------------------
      mct_bilinear_w64_hv_8bpc_avx2:      164.4
      mct_bilinear_w64_hv_8bpc_avx512icl:  71.1
      ---------------------
      mct_bilinear_w128_hv_8bpc_avx2:      405.2
      mct_bilinear_w128_hv_8bpc_avx512icl: 193.1
      ------------------------------------------
      5462c2a8
    • Victorien Le Couviour--Tuffet's avatar
    • Victorien Le Couviour--Tuffet's avatar
      checkasm: x86: ensure all SIMD lanes are turned on at all times · 430967a6
      Victorien Le Couviour--Tuffet authored
      YMM and ZMM registers on x86 are turned off to save power when they haven't
      been used for some period of time. When they are used there will be a
      "warmup" period during which performance will be reduced and inconsistent
      which is problematic when trying to benchmark individual functions.
      
      Periodically issue "dummy" instructions that uses those registers to
      prevent them from being powered down. The end result is more consistent
      benchmark results.
      
      Credits to Henrik Gramner's commit
      1878c7f2af0a9c73e291488209109782c428cfcf from x264.
      430967a6
  7. 08 Jan, 2020 5 commits
  8. 07 Jan, 2020 1 commit
  9. 05 Jan, 2020 1 commit
    • Martin Storsjö's avatar
      arm64: msac: Avoid 32 bit intermediates in symbol_adapt · 8d574f70
      Martin Storsjö authored
      This gives small gains on A72 and A73, and on A53 on symbol_adapt16.
      
      Before:                      Cortex A53    A72    A73
      msac_decode_symbol_adapt4_neon:    63.2   52.8   53.3
      msac_decode_symbol_adapt8_neon:    68.5   57.9   55.7
      msac_decode_symbol_adapt16_neon:   92.8   59.7   62.8
      After:
      msac_decode_symbol_adapt4_neon:    63.3   48.3   50.0
      msac_decode_symbol_adapt8_neon:    68.7   55.5   54.0
      msac_decode_symbol_adapt16_neon:   88.6   58.8   60.0
      8d574f70
  10. 02 Jan, 2020 4 commits
  11. 01 Jan, 2020 3 commits
  12. 31 Dec, 2019 2 commits
  13. 29 Dec, 2019 2 commits
  14. 28 Dec, 2019 3 commits
  15. 24 Dec, 2019 1 commit
  16. 18 Dec, 2019 1 commit
    • Martin Storsjö's avatar
      Don't assume dlsym exists on linux · 14d586ac
      Martin Storsjö authored
      After checking if -ldl exists, use it for checking for the dlsym
      function.
      
      This fixes building in environments where the dlsym function is
      unavailable. (My testcase is NDK builds with -static, where dlsym
      isn't available for static linking, only if linking dynamically.)
      14d586ac
  17. 17 Dec, 2019 1 commit
  18. 14 Dec, 2019 2 commits