1. 29 Jun, 2019 3 commits
  2. 27 Jun, 2019 1 commit
  3. 24 Jun, 2019 2 commits
  4. 10 Jun, 2019 3 commits
  5. 07 Jun, 2019 1 commit
  6. 01 Jun, 2019 1 commit
  7. 21 May, 2019 1 commit
  8. 19 May, 2019 1 commit
  9. 15 May, 2019 1 commit
    • Martin Storsjö's avatar
      arm64: msac: Add handwritten versions of msac_decode_bool functions · 2e8a3a21
      Martin Storsjö authored
      GCC                     Cortex A53   A72   A73
      msac_decode_bool_c:           29.9  17.9  23.2
      msac_decode_bool_neon:        27.4  15.3  20.4
      msac_decode_bool_adapt_c:     49.2  26.8  31.0
      msac_decode_bool_adapt_neon:  38.2  22.2  25.4
      msac_decode_bool_equi_c:      26.6  16.8  19.4
      msac_decode_bool_equi_neon:   23.9  13.7  15.7
      
      Clang                   Cortex A53   A72   A73
      msac_decode_bool_c:           28.0  16.4  23.1
      msac_decode_bool_neon:        26.9  14.6  21.0
      msac_decode_bool_adapt_c:     46.8  25.1  31.4
      msac_decode_bool_adapt_neon:  36.2  19.0  26.2
      msac_decode_bool_equi_c:      23.7  13.4  18.8
      msac_decode_bool_equi_neon:   23.7  11.3  14.2
      
      This is as fast as, or faster than, what either GCC or Clang
      produces.
      2e8a3a21
  10. 14 May, 2019 2 commits
  11. 08 May, 2019 1 commit
  12. 04 May, 2019 1 commit
    • Martin Storsjö's avatar
      arm64: msac: Implement NEON msac_decode_symbol_adapt · 1d5c1a49
      Martin Storsjö authored
                                   Cortex A53    A72    A73
      msac_decode_symbol_adapt4_c:      107.6   57.1   67.8
      msac_decode_symbol_adapt4_neon:    70.4   56.4   55.1
      msac_decode_symbol_adapt8_c:      157.1   74.5   90.3
      msac_decode_symbol_adapt8_neon:    75.6   57.2   56.9
      msac_decode_symbol_adapt16_c:     257.4  106.6  135.9
      msac_decode_symbol_adapt16_neon:  101.8   62.0   65.2
      1d5c1a49
  13. 15 Apr, 2019 1 commit
  14. 04 Apr, 2019 1 commit
  15. 26 Feb, 2019 1 commit
  16. 13 Feb, 2019 1 commit
  17. 12 Feb, 2019 1 commit
  18. 08 Feb, 2019 1 commit
  19. 07 Feb, 2019 2 commits
  20. 06 Feb, 2019 1 commit
  21. 05 Feb, 2019 1 commit
  22. 18 Dec, 2018 1 commit
  23. 08 Dec, 2018 1 commit
  24. 07 Dec, 2018 1 commit
  25. 05 Dec, 2018 3 commits
  26. 03 Dec, 2018 1 commit
    • Ronald S. Bultje's avatar
      Make per-width versions of cfl_ac · 70fb01d8
      Ronald S. Bultje authored
      Also use aligned reads and writes in sub_loop, and integrate sum_loop into
      the main loop.
      
      before:
      cfl_ac_420_w4_8bpc_c: 367.4
      cfl_ac_420_w4_8bpc_avx2: 72.8
      cfl_ac_420_w8_8bpc_c: 621.6
      cfl_ac_420_w8_8bpc_avx2: 85.1
      cfl_ac_420_w16_8bpc_c: 983.4
      cfl_ac_420_w16_8bpc_avx2: 141.0
      
      after:
      cfl_ac_420_w4_8bpc_c: 376.2
      cfl_ac_420_w4_8bpc_avx2: 28.5
      cfl_ac_420_w8_8bpc_c: 607.2
      cfl_ac_420_w8_8bpc_avx2: 29.9
      cfl_ac_420_w16_8bpc_c: 962.1
      cfl_ac_420_w16_8bpc_avx2: 48.8
      70fb01d8
  27. 28 Nov, 2018 1 commit
  28. 27 Nov, 2018 1 commit
    • Nathan Egge's avatar
      Reset the random seed when testing each CPU type. · 560dc684
      Nathan Egge authored
      Any benchmark that uses random data as input gives bunk results as it
       currently uses differently random data on each run.
      This now makes any non-determinism in the tests repeatable across each
       call to check_cpu_flags() and checkasm_check_func().
      560dc684
  29. 23 Nov, 2018 1 commit
  30. 16 Nov, 2018 1 commit
  31. 12 Nov, 2018 1 commit
    • Ronald S. Bultje's avatar
      Add a max_width/height argument to angular_ipred_fn · 2f251bd1
      Ronald S. Bultje authored
      This is used in z2 to limit the number of pixels over which the
      filter is applied, as per "numPx" in 7.11.2.4 point 4 in the AV1
      specification. This only applies to z2, because in z1/3, the edge
      filter is (incomprehensibly) lengtened by the opposite side's edge
      length, which undoes the limit on the filter length (like a bug
      undoing another bug).
      
      I admit the code is getting rather complex, so we may want to
      redesign this to make writing SIMD easier.
      2f251bd1