1. 06 Dec, 2018 2 commits
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 4x4 blocks in itx · 87a377e9
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_4x4_adst_adst_0_8bpc_c: 445.9
      inv_txfm_add_4x4_adst_adst_0_8bpc_ssse3: 23.7
      inv_txfm_add_4x4_adst_adst_1_8bpc_c: 443.7
      inv_txfm_add_4x4_adst_adst_1_8bpc_ssse3: 52.6
      inv_txfm_add_4x4_adst_dct_0_8bpc_c: 474.5
      inv_txfm_add_4x4_adst_dct_0_8bpc_ssse3: 23.9
      inv_txfm_add_4x4_adst_dct_1_8bpc_c: 482.0
      inv_txfm_add_4x4_adst_dct_1_8bpc_ssse3: 51.1
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_c: 587.2
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_ssse3: 24.0
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_c: 457.2
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_ssse3: 52.8
      inv_txfm_add_4x4_adst_identity_0_8bpc_c: 412.4
      inv_txfm_add_4x4_adst_identity_0_8bpc_ssse3: 43.3
      inv_txfm_add_4x4_adst_identity_1_8bpc_c: 412.0
      inv_txfm_add_4x4_adst_identity_1_8bpc_ssse3: 43.3
      inv_txfm_add_4x4_dct_adst_0_8bpc_c: 467.4
      inv_txfm_add_4x4_dct_adst_0_8bpc_ssse3: 23.2
      inv_txfm_add_4x4_dct_adst_1_8bpc_c: 588.3
      inv_txfm_add_4x4_dct_adst_1_8bpc_ssse3: 48.6
      inv_txfm_add_4x4_dct_dct_0_8bpc_c: 611.5
      inv_txfm_add_4x4_dct_dct_0_8bpc_ssse3: 23.1
      inv_txfm_add_4x4_dct_dct_1_8bpc_c: 576.2
      inv_txfm_add_4x4_dct_dct_1_8bpc_ssse3: 47.6
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_c: 479.5
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_ssse3: 23.4
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_c: 549.3
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_ssse3: 48.3
      inv_txfm_add_4x4_dct_identity_0_8bpc_c: 576.9
      inv_txfm_add_4x4_dct_identity_0_8bpc_ssse3: 25.4
      inv_txfm_add_4x4_dct_identity_1_8bpc_c: 610.7
      inv_txfm_add_4x4_dct_identity_1_8bpc_ssse3: 25.1
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_c: 532.8
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_ssse3: 23.8
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_c: 666.7
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_ssse3: 61.0
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_c: 539.6
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_ssse3: 23.8
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_c: 484.6
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_ssse3: 51.1
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_c: 503.1
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_ssse3: 23.9
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_c: 463.0
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_ssse3: 54.0
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_c: 719.9
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_ssse3: 43.0
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_c: 456.8
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_ssse3: 44.1
      inv_txfm_add_4x4_identity_adst_0_8bpc_c: 422.8
      inv_txfm_add_4x4_identity_adst_0_8bpc_ssse3: 42.4
      inv_txfm_add_4x4_identity_adst_1_8bpc_c: 417.1
      inv_txfm_add_4x4_identity_adst_1_8bpc_ssse3: 42.3
      inv_txfm_add_4x4_identity_dct_0_8bpc_c: 435.4
      inv_txfm_add_4x4_identity_dct_0_8bpc_ssse3: 25.7
      inv_txfm_add_4x4_identity_dct_1_8bpc_c: 434.1
      inv_txfm_add_4x4_identity_dct_1_8bpc_ssse3: 25.3
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_c: 528.1
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_ssse3: 40.9
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_c: 720.0
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_ssse3: 41.8
      inv_txfm_add_4x4_identity_identity_0_8bpc_c: 383.2
      inv_txfm_add_4x4_identity_identity_0_8bpc_ssse3: 28.3
      inv_txfm_add_4x4_identity_identity_1_8bpc_c: 378.9
      inv_txfm_add_4x4_identity_identity_1_8bpc_ssse3: 28.2
      inv_txfm_add_4x4_wht_wht_0_8bpc_c: 271.5
      inv_txfm_add_4x4_wht_wht_0_8bpc_ssse3: 34.0
      inv_txfm_add_4x4_wht_wht_1_8bpc_c: 266.0
      inv_txfm_add_4x4_wht_wht_1_8bpc_ssse3: 33.9
      87a377e9
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for dav1d_ipred_h · 6f2f0188
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_h_w4_8bpc_c: 146.6
      intra_pred_h_w4_8bpc_ssse3: 30.6
      intra_pred_h_w8_8bpc_c: 236.3
      intra_pred_h_w8_8bpc_ssse3: 42.2
      intra_pred_h_w16_8bpc_c: 446.6
      intra_pred_h_w16_8bpc_ssse3: 55.8
      intra_pred_h_w32_8bpc_c: 688.2
      intra_pred_h_w32_8bpc_ssse3: 85.9
      intra_pred_h_w64_8bpc_c: 634.2
      intra_pred_h_w64_8bpc_ssse3: 169.2
      6f2f0188
  2. 05 Dec, 2018 1 commit
  3. 04 Dec, 2018 2 commits
  4. 03 Dec, 2018 2 commits
    • Ronald S. Bultje's avatar
      Make per-width versions of cfl_ac · 70fb01d8
      Ronald S. Bultje authored
      Also use aligned reads and writes in sub_loop, and integrate sum_loop into
      the main loop.
      
      before:
      cfl_ac_420_w4_8bpc_c: 367.4
      cfl_ac_420_w4_8bpc_avx2: 72.8
      cfl_ac_420_w8_8bpc_c: 621.6
      cfl_ac_420_w8_8bpc_avx2: 85.1
      cfl_ac_420_w16_8bpc_c: 983.4
      cfl_ac_420_w16_8bpc_avx2: 141.0
      
      after:
      cfl_ac_420_w4_8bpc_c: 376.2
      cfl_ac_420_w4_8bpc_avx2: 28.5
      cfl_ac_420_w8_8bpc_c: 607.2
      cfl_ac_420_w8_8bpc_avx2: 29.9
      cfl_ac_420_w16_8bpc_c: 962.1
      cfl_ac_420_w16_8bpc_avx2: 48.8
      70fb01d8
    • David Michael Barr's avatar
      e2aa2d14
  5. 28 Nov, 2018 1 commit
  6. 10 Nov, 2018 1 commit
    • Henrik Gramner's avatar
      Split MC blend · 58fc5165
      Henrik Gramner authored
      The mstride == 0, mstride == 1, and mstride == w cases are very different
      from each other, and splitting them into separate functions makes it easier
      top optimize them.
      
      Also add some further optimizations to the AVX2 asm that became possible
      after this change.
      58fc5165
  7. 08 Nov, 2018 2 commits
  8. 07 Nov, 2018 1 commit
    • Ronald S. Bultje's avatar
      AVX2 for emu_edge · 1e852dc1
      Ronald S. Bultje authored
      emu_edge_w4_8bpc_c: 226.7
      emu_edge_w4_8bpc_avx2: 72.6
      emu_edge_w8_8bpc_c: 317.7
      emu_edge_w8_8bpc_avx2: 87.9
      emu_edge_w16_8bpc_c: 499.2
      emu_edge_w16_8bpc_avx2: 92.1
      emu_edge_w32_8bpc_c: 617.4
      emu_edge_w32_8bpc_avx2: 165.0
      emu_edge_w64_8bpc_c: 1579.0
      emu_edge_w64_8bpc_avx2: 412.3
      emu_edge_w128_8bpc_c: 3266.9
      emu_edge_w128_8bpc_avx2: 1548.0
      1e852dc1
  9. 05 Nov, 2018 3 commits
  10. 29 Oct, 2018 2 commits
  11. 28 Oct, 2018 1 commit
  12. 25 Oct, 2018 1 commit
  13. 20 Oct, 2018 3 commits
  14. 19 Oct, 2018 3 commits
  15. 17 Oct, 2018 1 commit
  16. 14 Oct, 2018 1 commit
  17. 09 Oct, 2018 1 commit
  18. 08 Oct, 2018 4 commits
  19. 04 Oct, 2018 1 commit
  20. 03 Oct, 2018 2 commits
  21. 29 Sep, 2018 1 commit
  22. 28 Sep, 2018 3 commits
  23. 27 Sep, 2018 1 commit