1. 10 Apr, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_paeth · 9ba20025
      Xuefeng Jiang authored
      intra_pred_paeth_w4_8bpc_c: 561.6
      intra_pred_paeth_w4_8bpc_ssse3: 49.2
      intra_pred_paeth_w8_8bpc_c: 1475.8
      intra_pred_paeth_w8_8bpc_ssse3: 103.0
      intra_pred_paeth_w16_8bpc_c: 4697.8
      intra_pred_paeth_w16_8bpc_ssse3: 279.0
      intra_pred_paeth_w32_8bpc_c: 13245.1
      intra_pred_paeth_w32_8bpc_ssse3: 614.7
      intra_pred_paeth_w64_8bpc_c: 32638.9
      intra_pred_paeth_w64_8bpc_ssse3: 1477.6
      9ba20025
  2. 08 Apr, 2019 1 commit
  3. 07 Apr, 2019 1 commit
    • Martin Storsjö's avatar
      arm: Fix typos in comments · 556780b7
      Martin Storsjö authored
      The width register has been set to clz(w)-24, not the other way
      around. And the 32 bit prep function has got the h parameter in
      r4, not in r5.
      556780b7
  4. 04 Apr, 2019 2 commits
    • Martin Storsjö's avatar
      arm: Consistently use 8/24 columns indentation for assembly · 5d888dde
      Martin Storsjö authored
      For cases with indented, nested .if/.macro in asm.S, ident those
      by 4 chars.
      
      Some initial assembly files were indented to 4/16 columns, while all
      the actual implementation files, starting with src/arm/64/mc.S, have
      used 8/24 for indentation.
      5d888dde
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_444 · 0d936a1a
      Xuefeng Jiang authored
      cfl_ac_444_w4_8bpc_c: 978.2
      cfl_ac_444_w4_8bpc_ssse3: 110.4
      cfl_ac_444_w8_8bpc_c: 2312.3
      cfl_ac_444_w8_8bpc_ssse3: 197.5
      cfl_ac_444_w16_8bpc_c: 4081.1
      cfl_ac_444_w16_8bpc_ssse3: 274.1
      cfl_ac_444_w32_8bpc_c: 9544.3
      cfl_ac_444_w32_8bpc_ssse3: 617.1
      0d936a1a
  5. 28 Mar, 2019 5 commits
    • Henrik Gramner's avatar
      CI: Check for newline at end of file · abb972a5
      Henrik Gramner authored
      abb972a5
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_dir: optimize best cost finding for SSE · 91568b2a
      Victorien Le Couviour--Tuffet authored
      Port of 65ee1233 for AVX-2
      from Kyle Siefring to SSE41, and optimize SSSE3.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 110.3
       after: cdef_dir_8bpc_ssse3: 105.9
         new: cdef_dir_8bpc_sse4:   96.4
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_dir_8bpc_ssse3: 120.6
       after: cdef_dir_8bpc_ssse3: 110.7
         new: cdef_dir_8bpc_sse4:  106.5
      ------------------------------------------
      91568b2a
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use 8-bit arithmetic for SSE · 75e88fab
      Victorien Le Couviour--Tuffet authored
      Port of c204da0f for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 141.7
       after: cdef_filter_4x4_8bpc_ssse3: 131.6
      before: cdef_filter_4x4_8bpc_sse4: 128.3
       after: cdef_filter_4x4_8bpc_sse4: 119.0
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 253.4
       after: cdef_filter_4x8_8bpc_ssse3: 236.1
      before: cdef_filter_4x8_8bpc_sse4: 228.5
       after: cdef_filter_4x8_8bpc_sse4: 213.2
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 429.6
       after: cdef_filter_8x8_8bpc_ssse3: 386.9
      before: cdef_filter_8x8_8bpc_sse4: 379.9
       after: cdef_filter_8x8_8bpc_sse4: 335.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      before: cdef_filter_4x4_8bpc_ssse3: 184.3
       after: cdef_filter_4x4_8bpc_ssse3: 163.3
      before: cdef_filter_4x4_8bpc_sse4: 168.9
       after: cdef_filter_4x4_8bpc_sse4: 146.1
      ------------------------------------------
      before: cdef_filter_4x8_8bpc_ssse3: 335.3
       after: cdef_filter_4x8_8bpc_ssse3: 280.7
      before: cdef_filter_4x8_8bpc_sse4: 305.1
       after: cdef_filter_4x8_8bpc_sse4: 257.9
      ------------------------------------------
      before: cdef_filter_8x8_8bpc_ssse3: 579.1
       after: cdef_filter_8x8_8bpc_ssse3: 500.5
      before: cdef_filter_8x8_8bpc_sse4: 517.0
       after: cdef_filter_8x8_8bpc_sse4: 455.8
      ------------------------------------------
      75e88fab
    • Victorien Le Couviour--Tuffet's avatar
      x86: cdef_filter: use a better constant for SSE4 · 22c3594d
      Victorien Le Couviour--Tuffet authored
      Port of dc2ae517 for AVX-2
      from Kyle Siefring.
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 141.7
      cdef_filter_4x4_8bpc_sse4: 128.3
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 253.4
      cdef_filter_4x8_8bpc_sse4: 228.5
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 429.6
      cdef_filter_8x8_8bpc_sse4: 379.9
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_ssse3: 184.3
      cdef_filter_4x4_8bpc_sse4: 168.9
      ------------------------------------------
      cdef_filter_4x8_8bpc_ssse3: 335.3
      cdef_filter_4x8_8bpc_sse4: 305.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_ssse3: 579.1
      cdef_filter_8x8_8bpc_sse4: 517.0
      ------------------------------------------
      22c3594d
    • Victorien Le Couviour--Tuffet's avatar
  6. 27 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itx · bd12b1ec
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6
      inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9
      inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0
      inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2
      inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6
      inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3
      inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6
      inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0
      inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7
      inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7
      inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6
      inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9
      inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4
      inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7
      inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4
      inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0
      inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4
      inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3
      inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9
      inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1
      inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8
      inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4
      inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9
      inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0
      inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7
      inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0
      inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7
      inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3
      inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5
      inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5
      inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2
      inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9
      inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0
      inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0
      inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2
      inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0
      inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2
      inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0
      inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2
      inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2
      inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1
      inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4
      inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1
      inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9
      inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1
      inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2
      inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4
      inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1
      inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5
      inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8
      inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6
      inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
      bd12b1ec
  7. 26 Mar, 2019 1 commit
  8. 24 Mar, 2019 2 commits
  9. 20 Mar, 2019 1 commit
  10. 19 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 8x32 and 32x8 blocks in itx · 585ac462
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7
      inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5
      inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6
      inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5
      inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4
      inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9
      inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5
      inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2
      inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3
      inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9
      inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0
      inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8
      inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5
      inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4
      inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1
      inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8
      inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4
      inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6
      inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7
      inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8
      inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3
      inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7
      inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3
      inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1
      inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8
      inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5
      inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1
      inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6
      inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1
      inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6
      inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2
      inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0
      inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
      585ac462
  11. 18 Mar, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422 · 5d944dc6
      Xuefeng Jiang authored
      cfl_ac_420_w4_8bpc_c: 1621.0
      cfl_ac_420_w4_8bpc_ssse3: 92.5
      cfl_ac_420_w8_8bpc_c: 3344.1
      cfl_ac_420_w8_8bpc_ssse3: 115.4
      cfl_ac_420_w16_8bpc_c: 6024.9
      cfl_ac_420_w16_8bpc_ssse3: 187.8
      cfl_ac_422_w4_8bpc_c: 1762.5
      cfl_ac_422_w4_8bpc_ssse3: 81.4
      cfl_ac_422_w8_8bpc_c: 4941.2
      cfl_ac_422_w8_8bpc_ssse3: 166.5
      cfl_ac_422_w16_8bpc_c: 8261.8
      cfl_ac_422_w16_8bpc_ssse3: 272.3
      5d944dc6
  12. 16 Mar, 2019 2 commits
  13. 14 Mar, 2019 2 commits
  14. 13 Mar, 2019 1 commit
  15. 12 Mar, 2019 1 commit
  16. 11 Mar, 2019 5 commits
  17. 09 Mar, 2019 2 commits
  18. 08 Mar, 2019 2 commits
    • Janne Grunau's avatar
      let dav1d_version() return the project version · 754487c0
      Janne Grunau authored
      Increments the soname revision number for this behavior change.
      Removes the DAV1D_VERSION and DAV1D_VERSION_INT defines and
      dav1d_version_vcs() and dav1d_version_int().
      Also cleans up the version usage in dav1d CLI.
      Refs #241, #255.
      754487c0
    • Victorien Le Couviour--Tuffet's avatar
      x86: add SSSE3 cdef dir implementation · d67e3476
      Victorien Le Couviour--Tuffet authored
      ```------------------
      x86_64:
      ```
      
      ---------------------------------------
      cdef_dir_8bpc_c: 1023.1
      cdef_dir_8bpc_ssse3: 110.3
      cdef_dir_8bpc_avx2: 71.1
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_dir_8bpc_c: 1074.8
      cdef_dir_8bpc_ssse3: 120.6
      ------------------------------------------
      
      Thanks to Ronald for the AVX2 XMM version which was a very good starting
      point.
      d67e3476
  19. 06 Mar, 2019 3 commits
  20. 05 Mar, 2019 4 commits
    • Martin Storsjö's avatar
      arm64: cdef: Clarify a slightly confusing comment · 86ce4a3c
      Martin Storsjö authored
      This might have said pri_taps[k]/sec_taps[k] at some earlier time.
      86ce4a3c
    • Martin Storsjö's avatar
      arm64: cdef: Use a smarter padding constant · 8f8dc928
      Martin Storsjö authored
      Pad with a value which works both as a large unsigned value and a
      negative signed value. This allows doing the max operation using
      signed max, avoiding the conditional altogether.
      
      Based on the same idea for x86 by Kyle Siefring.
      
      Before:                  Cortex A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    645.5   401.9   422.5
      cdef_filter_4x8_8bpc_neon:   1193.7   756.6   782.4
      cdef_filter_8x8_8bpc_neon:   2162.4  1361.9  1375.6
      After:
      cdef_filter_4x4_8bpc_neon:    596.3   377.8   384.8
      cdef_filter_4x8_8bpc_neon:   1097.4   705.5   707.1
      cdef_filter_8x8_8bpc_neon:   1967.4  1232.3  1239.9
      8f8dc928
    • Martin Storsjö's avatar
      arm64: cdef: Do saturating subtractions to avoid max operations with 0 · 4f5261a0
      Martin Storsjö authored
      Before:                  Cortex A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    677.4   433.9   452.9
      cdef_filter_4x8_8bpc_neon:   1255.0   815.2   841.8
      cdef_filter_8x8_8bpc_neon:   2278.5  1440.0  1505.0
      After:
      cdef_filter_4x4_8bpc_neon:    645.5   401.9   422.5
      cdef_filter_4x8_8bpc_neon:   1193.7   756.6   782.4
      cdef_filter_8x8_8bpc_neon:   2162.4  1361.9  1375.6
      4f5261a0
    • Kyle Siefring's avatar
      Utilize a better CDEF constant for avx2 · dc2ae517
      Kyle Siefring authored
      Before:
      ```
      cdef_filter_8x8_8bpc_avx2: 275.5
      cdef_filter_4x8_8bpc_avx2: 193.3
      cdef_filter_4x4_8bpc_avx2: 113.5
      ```
      After:
      ```
      cdef_filter_8x8_8bpc_avx2: 252.3
      cdef_filter_4x8_8bpc_avx2: 182.1
      cdef_filter_4x4_8bpc_avx2: 105.7
      ```
      dc2ae517
  21. 04 Mar, 2019 1 commit