1. 28 Mar, 2019 1 commit
  2. 27 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itx · bd12b1ec
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6
      inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6
      inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9
      inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0
      inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2
      inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6
      inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3
      inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6
      inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0
      inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7
      inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7
      inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6
      inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9
      inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9
      inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4
      inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7
      inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4
      inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0
      inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4
      inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3
      inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9
      inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1
      inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8
      inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4
      inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9
      inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0
      inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7
      inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0
      inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7
      inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3
      inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5
      inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5
      inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2
      inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9
      inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0
      inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0
      inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0
      inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2
      inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0
      inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2
      inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0
      inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2
      inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2
      inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1
      inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4
      inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1
      inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9
      inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1
      inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0
      inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2
      inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4
      inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1
      inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5
      inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8
      inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6
      inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
      bd12b1ec
  3. 26 Mar, 2019 1 commit
  4. 24 Mar, 2019 2 commits
  5. 20 Mar, 2019 1 commit
  6. 19 Mar, 2019 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 8x32 and 32x8 blocks in itx · 585ac462
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7
      inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5
      inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6
      inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5
      inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4
      inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9
      inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5
      inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2
      inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3
      inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9
      inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0
      inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7
      inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7
      inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8
      inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5
      inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4
      inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1
      inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8
      inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4
      inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6
      inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7
      inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8
      inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3
      inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7
      inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3
      inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1
      inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8
      inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5
      inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1
      inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6
      inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1
      inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6
      inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3
      inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2
      inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0
      inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
      585ac462
  7. 18 Mar, 2019 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422 · 5d944dc6
      Xuefeng Jiang authored
      cfl_ac_420_w4_8bpc_c: 1621.0
      cfl_ac_420_w4_8bpc_ssse3: 92.5
      cfl_ac_420_w8_8bpc_c: 3344.1
      cfl_ac_420_w8_8bpc_ssse3: 115.4
      cfl_ac_420_w16_8bpc_c: 6024.9
      cfl_ac_420_w16_8bpc_ssse3: 187.8
      cfl_ac_422_w4_8bpc_c: 1762.5
      cfl_ac_422_w4_8bpc_ssse3: 81.4
      cfl_ac_422_w8_8bpc_c: 4941.2
      cfl_ac_422_w8_8bpc_ssse3: 166.5
      cfl_ac_422_w16_8bpc_c: 8261.8
      cfl_ac_422_w16_8bpc_ssse3: 272.3
      5d944dc6
  8. 16 Mar, 2019 2 commits
  9. 12 Mar, 2019 1 commit
  10. 11 Mar, 2019 4 commits
  11. 09 Mar, 2019 1 commit
  12. 08 Mar, 2019 2 commits
    • Janne Grunau's avatar
      let dav1d_version() return the project version · 754487c0
      Janne Grunau authored
      Increments the soname revision number for this behavior change.
      Removes the DAV1D_VERSION and DAV1D_VERSION_INT defines and
      dav1d_version_vcs() and dav1d_version_int().
      Also cleans up the version usage in dav1d CLI.
      Refs #241, #255.
      754487c0
    • Victorien Le Couviour--Tuffet's avatar
      x86: add SSSE3 cdef dir implementation · d67e3476
      Victorien Le Couviour--Tuffet authored
      ```------------------
      x86_64:
      ```
      
      ---------------------------------------
      cdef_dir_8bpc_c: 1023.1
      cdef_dir_8bpc_ssse3: 110.3
      cdef_dir_8bpc_avx2: 71.1
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_dir_8bpc_c: 1074.8
      cdef_dir_8bpc_ssse3: 120.6
      ------------------------------------------
      
      Thanks to Ronald for the AVX2 XMM version which was a very good starting
      point.
      d67e3476
  13. 06 Mar, 2019 3 commits
  14. 05 Mar, 2019 4 commits
    • Martin Storsjö's avatar
      arm64: cdef: Clarify a slightly confusing comment · 86ce4a3c
      Martin Storsjö authored
      This might have said pri_taps[k]/sec_taps[k] at some earlier time.
      86ce4a3c
    • Martin Storsjö's avatar
      arm64: cdef: Use a smarter padding constant · 8f8dc928
      Martin Storsjö authored
      Pad with a value which works both as a large unsigned value and a
      negative signed value. This allows doing the max operation using
      signed max, avoiding the conditional altogether.
      
      Based on the same idea for x86 by Kyle Siefring.
      
      Before:                  Cortex A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    645.5   401.9   422.5
      cdef_filter_4x8_8bpc_neon:   1193.7   756.6   782.4
      cdef_filter_8x8_8bpc_neon:   2162.4  1361.9  1375.6
      After:
      cdef_filter_4x4_8bpc_neon:    596.3   377.8   384.8
      cdef_filter_4x8_8bpc_neon:   1097.4   705.5   707.1
      cdef_filter_8x8_8bpc_neon:   1967.4  1232.3  1239.9
      8f8dc928
    • Martin Storsjö's avatar
      arm64: cdef: Do saturating subtractions to avoid max operations with 0 · 4f5261a0
      Martin Storsjö authored
      Before:                  Cortex A53     A72     A73
      cdef_filter_4x4_8bpc_neon:    677.4   433.9   452.9
      cdef_filter_4x8_8bpc_neon:   1255.0   815.2   841.8
      cdef_filter_8x8_8bpc_neon:   2278.5  1440.0  1505.0
      After:
      cdef_filter_4x4_8bpc_neon:    645.5   401.9   422.5
      cdef_filter_4x8_8bpc_neon:   1193.7   756.6   782.4
      cdef_filter_8x8_8bpc_neon:   2162.4  1361.9  1375.6
      4f5261a0
    • Kyle Siefring's avatar
      Utilize a better CDEF constant for avx2 · dc2ae517
      Kyle Siefring authored
      Before:
      ```
      cdef_filter_8x8_8bpc_avx2: 275.5
      cdef_filter_4x8_8bpc_avx2: 193.3
      cdef_filter_4x4_8bpc_avx2: 113.5
      ```
      After:
      ```
      cdef_filter_8x8_8bpc_avx2: 252.3
      cdef_filter_4x8_8bpc_avx2: 182.1
      cdef_filter_4x4_8bpc_avx2: 105.7
      ```
      dc2ae517
  15. 04 Mar, 2019 2 commits
    • Kyle Siefring's avatar
      Remove unused data from x86/cdef.asm · 8e379f1d
      Kyle Siefring authored
      8e379f1d
    • François Cartegnie's avatar
      x86: add SSSE3 mc prep_8tap implementation · 0afec6b1
      François Cartegnie authored
      ```------------------
      x86_64:
      ```
      
      ---------------------------------------
      mct_8tap_regular_w4_0_8bpc_c: 115.6
      mct_8tap_regular_w4_0_8bpc_ssse3: 13.1
      mct_8tap_regular_w4_0_8bpc_avx2: 13.3
      ------------------------------------------
      mct_8tap_regular_w4_h_8bpc_c: 363.0
      mct_8tap_regular_w4_h_8bpc_ssse3: 19.1
      mct_8tap_regular_w4_h_8bpc_avx2: 16.5
      ------------------------------------------
      mct_8tap_regular_w4_hv_8bpc_c: 832.2
      mct_8tap_regular_w4_hv_8bpc_ssse3: 113.4
      mct_8tap_regular_w4_hv_8bpc_avx2: 53.1
      ------------------------------------------
      mct_8tap_regular_w4_v_8bpc_c: 488.5
      mct_8tap_regular_w4_v_8bpc_ssse3: 38.9
      mct_8tap_regular_w4_v_8bpc_avx2: 26.0
      ------------------------------------------
      mct_8tap_regular_w8_0_8bpc_c: 259.3
      mct_8tap_regular_w8_0_8bpc_ssse3: 20.4
      mct_8tap_regular_w8_0_8bpc_avx2: 18.0
      ------------------------------------------
      mct_8tap_regular_w8_h_8bpc_c: 1124.3
      mct_8tap_regular_w8_h_8bpc_ssse3: 67.7
      mct_8tap_regular_w8_h_8bpc_avx2: 43.3
      ------------------------------------------
      mct_8tap_regular_w8_hv_8bpc_c: 2155.0
      mct_8tap_regular_w8_hv_8bpc_ssse3: 340.8
      mct_8tap_regular_w8_hv_8bpc_avx2: 151.3
      ------------------------------------------
      mct_8tap_regular_w8_v_8bpc_c: 1195.4
      mct_8tap_regular_w8_v_8bpc_ssse3: 72.4
      mct_8tap_regular_w8_v_8bpc_avx2: 39.8
      ------------------------------------------
      mct_8tap_regular_w16_0_8bpc_c: 158.3
      mct_8tap_regular_w16_0_8bpc_ssse3: 52.9
      mct_8tap_regular_w16_0_8bpc_avx2: 30.2
      ------------------------------------------
      mct_8tap_regular_w16_h_8bpc_c: 4267.4
      mct_8tap_regular_w16_h_8bpc_ssse3: 211.9
      mct_8tap_regular_w16_h_8bpc_avx2: 121.4
      ------------------------------------------
      mct_8tap_regular_w16_hv_8bpc_c: 5430.9
      mct_8tap_regular_w16_hv_8bpc_ssse3: 986.8
      mct_8tap_regular_w16_hv_8bpc_avx2: 428.4
      ------------------------------------------
      mct_8tap_regular_w16_v_8bpc_c: 4604.2
      mct_8tap_regular_w16_v_8bpc_ssse3: 199.1
      mct_8tap_regular_w16_v_8bpc_avx2: 100.7
      ------------------------------------------
      mct_8tap_regular_w32_0_8bpc_c: 372.9
      mct_8tap_regular_w32_0_8bpc_ssse3: 231.9
      mct_8tap_regular_w32_0_8bpc_avx2: 99.7
      ------------------------------------------
      mct_8tap_regular_w32_h_8bpc_c: 15975.0
      mct_8tap_regular_w32_h_8bpc_ssse3: 802.9
      mct_8tap_regular_w32_h_8bpc_avx2: 468.5
      ------------------------------------------
      mct_8tap_regular_w32_hv_8bpc_c: 18555.5
      mct_8tap_regular_w32_hv_8bpc_ssse3: 3673.5
      mct_8tap_regular_w32_hv_8bpc_avx2: 1587.6
      ------------------------------------------
      mct_8tap_regular_w32_v_8bpc_c: 16632.4
      mct_8tap_regular_w32_v_8bpc_ssse3: 743.5
      mct_8tap_regular_w32_v_8bpc_avx2: 337.8
      ------------------------------------------
      mct_8tap_regular_w64_0_8bpc_c: 675.9
      mct_8tap_regular_w64_0_8bpc_ssse3: 513.6
      mct_8tap_regular_w64_0_8bpc_avx2: 285.4
      ------------------------------------------
      mct_8tap_regular_w64_h_8bpc_c: 37161.3
      mct_8tap_regular_w64_h_8bpc_ssse3: 1929.7
      mct_8tap_regular_w64_h_8bpc_avx2: 1138.1
      ------------------------------------------
      mct_8tap_regular_w64_hv_8bpc_c: 42434.0
      mct_8tap_regular_w64_hv_8bpc_ssse3: 8822.1
      mct_8tap_regular_w64_hv_8bpc_avx2: 3853.5
      ------------------------------------------
      mct_8tap_regular_w64_v_8bpc_c: 37969.1
      mct_8tap_regular_w64_v_8bpc_ssse3: 1805.6
      mct_8tap_regular_w64_v_8bpc_avx2: 826.1
      ------------------------------------------
      mct_8tap_regular_w128_0_8bpc_c: 1532.7
      mct_8tap_regular_w128_0_8bpc_ssse3: 1397.7
      mct_8tap_regular_w128_0_8bpc_avx2: 813.8
      ------------------------------------------
      mct_8tap_regular_w128_h_8bpc_c: 91204.3
      mct_8tap_regular_w128_h_8bpc_ssse3: 4783.0
      mct_8tap_regular_w128_h_8bpc_avx2: 2767.2
      ------------------------------------------
      mct_8tap_regular_w128_hv_8bpc_c: 102396.0
      mct_8tap_regular_w128_hv_8bpc_ssse3: 22202.3
      mct_8tap_regular_w128_hv_8bpc_avx2: 9637.2
      ------------------------------------------
      mct_8tap_regular_w128_v_8bpc_c: 92294.3
      mct_8tap_regular_w128_v_8bpc_ssse3: 4952.8
      mct_8tap_regular_w128_v_8bpc_avx2: 2370.1
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      mct_8tap_regular_w4_0_8bpc_c: 131.3
      mct_8tap_regular_w4_0_8bpc_ssse3: 18.7
      ------------------------------------------
      mct_8tap_regular_w4_h_8bpc_c: 422.0
      mct_8tap_regular_w4_h_8bpc_ssse3: 27.3
      ------------------------------------------
      mct_8tap_regular_w4_hv_8bpc_c: 1012.6
      mct_8tap_regular_w4_hv_8bpc_ssse3: 123.6
      ------------------------------------------
      mct_8tap_regular_w4_v_8bpc_c: 589.6
      mct_8tap_regular_w4_v_8bpc_ssse3: 48.9
      ------------------------------------------
      mct_8tap_regular_w8_0_8bpc_c: 278.5
      mct_8tap_regular_w8_0_8bpc_ssse3: 26.3
      ------------------------------------------
      mct_8tap_regular_w8_h_8bpc_c: 1129.3
      mct_8tap_regular_w8_h_8bpc_ssse3: 80.6
      ------------------------------------------
      mct_8tap_regular_w8_hv_8bpc_c: 2556.4
      mct_8tap_regular_w8_hv_8bpc_ssse3: 354.6
      ------------------------------------------
      mct_8tap_regular_w8_v_8bpc_c: 1460.2
      mct_8tap_regular_w8_v_8bpc_ssse3: 103.8
      ------------------------------------------
      mct_8tap_regular_w16_0_8bpc_c: 218.9
      mct_8tap_regular_w16_0_8bpc_ssse3: 58.4
      ------------------------------------------
      mct_8tap_regular_w16_h_8bpc_c: 4471.8
      mct_8tap_regular_w16_h_8bpc_ssse3: 237.2
      ------------------------------------------
      mct_8tap_regular_w16_hv_8bpc_c: 5570.5
      mct_8tap_regular_w16_hv_8bpc_ssse3: 1044.1
      ------------------------------------------
      mct_8tap_regular_w16_v_8bpc_c: 4885.5
      mct_8tap_regular_w16_v_8bpc_ssse3: 268.3
      ------------------------------------------
      mct_8tap_regular_w32_0_8bpc_c: 495.6
      mct_8tap_regular_w32_0_8bpc_ssse3: 236.6
      ------------------------------------------
      mct_8tap_regular_w32_h_8bpc_c: 15903.5
      mct_8tap_regular_w32_h_8bpc_ssse3: 872.5
      ------------------------------------------
      mct_8tap_regular_w32_hv_8bpc_c: 19402.2
      mct_8tap_regular_w32_hv_8bpc_ssse3: 3832.8
      ------------------------------------------
      mct_8tap_regular_w32_v_8bpc_c: 17119.5
      mct_8tap_regular_w32_v_8bpc_ssse3: 935.2
      ------------------------------------------
      mct_8tap_regular_w64_0_8bpc_c: 877.0
      mct_8tap_regular_w64_0_8bpc_ssse3: 515.7
      ------------------------------------------
      mct_8tap_regular_w64_h_8bpc_c: 36832.1
      mct_8tap_regular_w64_h_8bpc_ssse3: 2094.1
      ------------------------------------------
      mct_8tap_regular_w64_hv_8bpc_c: 43965.3
      mct_8tap_regular_w64_hv_8bpc_ssse3: 9423.0
      ------------------------------------------
      mct_8tap_regular_w64_v_8bpc_c: 37041.2
      mct_8tap_regular_w64_v_8bpc_ssse3: 2348.9
      ------------------------------------------
      mct_8tap_regular_w128_0_8bpc_c: 1929.9
      mct_8tap_regular_w128_0_8bpc_ssse3: 1392.3
      ------------------------------------------
      mct_8tap_regular_w128_h_8bpc_c: 86022.5
      mct_8tap_regular_w128_h_8bpc_ssse3: 5110.8
      ------------------------------------------
      mct_8tap_regular_w128_hv_8bpc_c: 105793.5
      mct_8tap_regular_w128_hv_8bpc_ssse3: 23278.8
      ------------------------------------------
      mct_8tap_regular_w128_v_8bpc_c: 88223.5
      mct_8tap_regular_w128_v_8bpc_ssse3: 7442.7
      ------------------------------------------
      0afec6b1
  16. 03 Mar, 2019 1 commit
  17. 02 Mar, 2019 1 commit
  18. 01 Mar, 2019 7 commits
    • Henrik Gramner's avatar
      x86: Check for BMI1 and BMI2 flags in addition to AVX2 · 493155af
      Henrik Gramner authored
      All known AVX2-capable CPU:s has BMI1 and BMI2, but apparently some
      x86 emulators can be configured to emulate esoteric combinations of
      instruction sets that doesn't correspond to any existing hardware.
      493155af
    • James Almer's avatar
      picture: fix default_picture_allocator() return value on failure · d7c3420b
      James Almer authored
      The doxy for Dav1dPicAllocator.alloc_picture_callback() states it must be a
      negative errno value.
      Propagate it as well in picture_alloc_with_edges().
      d7c3420b
    • James Almer's avatar
    • Jean-Baptiste Kempf's avatar
      Update copyright years · 5c9bd45e
      Jean-Baptiste Kempf authored
      5c9bd45e
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 16x16 blocks in itx · 1b30cf2a
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_16x16_adst_adst_0_8bpc_c: 19643.8
      inv_txfm_add_16x16_adst_adst_0_8bpc_ssse3: 870.0
      inv_txfm_add_16x16_adst_adst_1_8bpc_c: 19611.7
      inv_txfm_add_16x16_adst_adst_1_8bpc_ssse3: 870.3
      inv_txfm_add_16x16_adst_adst_2_8bpc_c: 19554.2
      inv_txfm_add_16x16_adst_adst_2_8bpc_ssse3: 869.9
      inv_txfm_add_16x16_adst_dct_0_8bpc_c: 19499.2
      inv_txfm_add_16x16_adst_dct_0_8bpc_ssse3: 761.1
      inv_txfm_add_16x16_adst_dct_1_8bpc_c: 19819.1
      inv_txfm_add_16x16_adst_dct_1_8bpc_ssse3: 760.9
      inv_txfm_add_16x16_adst_dct_2_8bpc_c: 19684.5
      inv_txfm_add_16x16_adst_dct_2_8bpc_ssse3: 761.4
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_c: 19309.3
      inv_txfm_add_16x16_adst_flipadst_0_8bpc_ssse3: 877.2
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_c: 19374.3
      inv_txfm_add_16x16_adst_flipadst_1_8bpc_ssse3: 876.8
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_c: 19548.6
      inv_txfm_add_16x16_adst_flipadst_2_8bpc_ssse3: 879.4
      inv_txfm_add_16x16_dct_adst_0_8bpc_c: 19715.3
      inv_txfm_add_16x16_dct_adst_0_8bpc_ssse3: 757.6
      inv_txfm_add_16x16_dct_adst_1_8bpc_c: 19586.6
      inv_txfm_add_16x16_dct_adst_1_8bpc_ssse3: 756.8
      inv_txfm_add_16x16_dct_adst_2_8bpc_c: 19447.3
      inv_txfm_add_16x16_dct_adst_2_8bpc_ssse3: 757.2
      inv_txfm_add_16x16_dct_dct_0_8bpc_c: 19188.0
      inv_txfm_add_16x16_dct_dct_0_8bpc_ssse3: 64.3
      inv_txfm_add_16x16_dct_dct_1_8bpc_c: 19230.1
      inv_txfm_add_16x16_dct_dct_1_8bpc_ssse3: 649.1
      inv_txfm_add_16x16_dct_dct_2_8bpc_c: 19276.7
      inv_txfm_add_16x16_dct_dct_2_8bpc_ssse3: 649.5
      inv_txfm_add_16x16_dct_flipadst_0_8bpc_c: 19967.8
      inv_txfm_add_16x16_dct_flipadst_0_8bpc_ssse3: 761.1
      inv_txfm_add_16x16_dct_flipadst_1_8bpc_c: 19665.7
      inv_txfm_add_16x16_dct_flipadst_1_8bpc_ssse3: 761.0
      inv_txfm_add_16x16_dct_flipadst_2_8bpc_c: 19766.2
      inv_txfm_add_16x16_dct_flipadst_2_8bpc_ssse3: 760.6
      inv_txfm_add_16x16_dct_identity_0_8bpc_c: 13874.5
      inv_txfm_add_16x16_dct_identity_0_8bpc_ssse3: 97.3
      inv_txfm_add_16x16_dct_identity_1_8bpc_c: 13931.8
      inv_txfm_add_16x16_dct_identity_1_8bpc_ssse3: 76.3
      inv_txfm_add_16x16_dct_identity_2_8bpc_c: 13801.5
      inv_txfm_add_16x16_dct_identity_2_8bpc_ssse3: 454.6
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_c: 18900.6
      inv_txfm_add_16x16_flipadst_adst_0_8bpc_ssse3: 884.6
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_c: 19180.2
      inv_txfm_add_16x16_flipadst_adst_1_8bpc_ssse3: 886.7
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_c: 19320.8
      inv_txfm_add_16x16_flipadst_adst_2_8bpc_ssse3: 884.6
      inv_txfm_add_16x16_flipadst_dct_0_8bpc_c: 19399.7
      inv_txfm_add_16x16_flipadst_dct_0_8bpc_ssse3: 775.0
      inv_txfm_add_16x16_flipadst_dct_1_8bpc_c: 19345.0
      inv_txfm_add_16x16_flipadst_dct_1_8bpc_ssse3: 774.6
      inv_txfm_add_16x16_flipadst_dct_2_8bpc_c: 19426.2
      inv_txfm_add_16x16_flipadst_dct_2_8bpc_ssse3: 775.6
      inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_c: 19457.6
      inv_txfm_add_16x16_flipadst_flipadst_0_8bpc_ssse3: 887.8
      inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_c: 19413.8
      inv_txfm_add_16x16_flipadst_flipadst_1_8bpc_ssse3: 885.3
      inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_c: 19425.6
      inv_txfm_add_16x16_flipadst_flipadst_2_8bpc_ssse3: 886.3
      inv_txfm_add_16x16_identity_dct_0_8bpc_c: 14150.7
      inv_txfm_add_16x16_identity_dct_0_8bpc_ssse3: 104.3
      inv_txfm_add_16x16_identity_dct_1_8bpc_c: 14041.5
      inv_txfm_add_16x16_identity_dct_1_8bpc_ssse3: 104.2
      inv_txfm_add_16x16_identity_dct_2_8bpc_c: 13917.7
      inv_txfm_add_16x16_identity_dct_2_8bpc_ssse3: 459.7
      inv_txfm_add_16x16_identity_identity_0_8bpc_c: 8761.7
      inv_txfm_add_16x16_identity_identity_0_8bpc_ssse3: 263.3
      inv_txfm_add_16x16_identity_identity_1_8bpc_c: 8669.5
      inv_txfm_add_16x16_identity_identity_1_8bpc_ssse3: 263.4
      inv_txfm_add_16x16_identity_identity_2_8bpc_c: 8282.1
      inv_txfm_add_16x16_identity_identity_2_8bpc_ssse3: 263.3
      1b30cf2a
    • Ronald S. Bultje's avatar
      255581d5
    • Matthias Dressel's avatar
      Update the copyright year to 2019 · f1cdb441
      Matthias Dressel authored
      f1cdb441
  19. 26 Feb, 2019 4 commits
    • Janne Grunau's avatar
      obu: ignore operating_parameter_info in new sequence check · 2abc436e
      Janne Grunau authored
      The operating_parameter_info is allowed to change in a single sequence.
      Reorder Dav1dSequenceHeader so the check for new sequence can still be
      done with memcmp and pffsetof.
      2abc436e
    • Victorien Le Couviour--Tuffet's avatar
      x86: add SSSE3 cdef filters implementation · 791ec219
      Victorien Le Couviour--Tuffet authored
      AVX2 adaption
      
      ---------------------
      x86_64:
      ------------------------------------------
      cdef_filter_4x4_8bpc_c: 1370.2
      cdef_filter_4x4_8bpc_ssse3: 142.3
      cdef_filter_4x4_8bpc_avx2: 106.7
      ------------------------------------------
      cdef_filter_4x8_8bpc_c: 2749.3
      cdef_filter_4x8_8bpc_ssse3: 257.2
      cdef_filter_4x8_8bpc_avx2: 178.8
      ------------------------------------------
      cdef_filter_8x8_8bpc_c: 5609.5
      cdef_filter_8x8_8bpc_ssse3: 438.1
      cdef_filter_8x8_8bpc_avx2: 250.6
      ------------------------------------------
      
      ---------------------
      x86_32:
      ------------------------------------------
      cdef_filter_4x4_8bpc_c: 1548.7
      cdef_filter_4x4_8bpc_ssse3: 179.8
      ------------------------------------------
      cdef_filter_4x8_8bpc_c: 3128.2
      cdef_filter_4x8_8bpc_ssse3: 328.1
      ------------------------------------------
      cdef_filter_8x8_8bpc_c: 6454.5
      cdef_filter_8x8_8bpc_ssse3: 584.4
      ------------------------------------------
      791ec219
    • Victorien Le Couviour--Tuffet's avatar
      x86: optimize AVX2 cdef filters · 80650d4c
      Victorien Le Couviour--Tuffet authored
      before: cdef_filter_4x4_8bpc_avx2: 110.4
       after: cdef_filter_4x4_8bpc_avx2: 106.0
      
      before: cdef_filter_4x8_8bpc_avx2: 188.3
       after: cdef_filter_4x8_8bpc_avx2: 182.2
      
      before: cdef_filter_8x8_8bpc_avx2: 276.7
       after: cdef_filter_8x8_8bpc_avx2: 252.5
      
      Credit to Gramner.
      80650d4c
    • Victorien Le Couviour--Tuffet's avatar