1. 10 Jan, 2019 1 commit
  2. 07 Jan, 2019 3 commits
  3. 05 Jan, 2019 1 commit
  4. 28 Dec, 2018 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementations for dav1d_ipred_top, dav1d_ipred_left and dav1d_ipred_128 · 9ea56386
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_dc_128_w4_8bpc_c: 905.2
      intra_pred_dc_128_w4_8bpc_ssse3: 61.6
      intra_pred_dc_128_w8_8bpc_c: 1393.1
      intra_pred_dc_128_w8_8bpc_ssse3: 82.3
      intra_pred_dc_128_w16_8bpc_c: 2227.4
      intra_pred_dc_128_w16_8bpc_ssse3: 119.6
      intra_pred_dc_128_w32_8bpc_c: 2696.0
      intra_pred_dc_128_w32_8bpc_ssse3: 195.5
      intra_pred_dc_128_w64_8bpc_c: 4298.6
      intra_pred_dc_128_w64_8bpc_ssse3: 465.1
      intra_pred_dc_left_w4_8bpc_c: 974.2
      intra_pred_dc_left_w4_8bpc_ssse3: 80.2
      intra_pred_dc_left_w8_8bpc_c: 1478.4
      intra_pred_dc_left_w8_8bpc_ssse3: 103.7
      intra_pred_dc_left_w16_8bpc_c: 2313.0
      intra_pred_dc_left_w16_8bpc_ssse3: 159.1
      intra_pred_dc_left_w32_8bpc_c: 2835.1
      intra_pred_dc_left_w32_8bpc_ssse3: 305.3
      intra_pred_dc_left_w64_8bpc_c: 4462.2
      intra_pred_dc_left_w64_8bpc_ssse3: 525.5
      intra_pred_dc_top_w4_8bpc_c: 949.5
      intra_pred_dc_top_w4_8bpc_ssse3: 95.5
      intra_pred_dc_top_w8_8bpc_c: 1462.2
      intra_pred_dc_top_w8_8bpc_ssse3: 103.1
      intra_pred_dc_top_w16_8bpc_c: 2312.5
      intra_pred_dc_top_w16_8bpc_ssse3: 146.4
      intra_pred_dc_top_w32_8bpc_c: 2895.9
      intra_pred_dc_top_w32_8bpc_ssse3: 250.4
      intra_pred_dc_top_w64_8bpc_c: 4617.9
      intra_pred_dc_top_w64_8bpc_ssse3: 493.3
      9ea56386
  5. 27 Dec, 2018 1 commit
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 8x8 blocks in itx · 5fa6c44a
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_8x8_adst_adst_0_8bpc_c: 2165.6
      inv_txfm_add_8x8_adst_adst_0_8bpc_ssse3: 194.5
      inv_txfm_add_8x8_adst_adst_1_8bpc_c: 2158.3
      inv_txfm_add_8x8_adst_adst_1_8bpc_ssse3: 194.7
      inv_txfm_add_8x8_adst_dct_0_8bpc_c: 2241.0
      inv_txfm_add_8x8_adst_dct_0_8bpc_ssse3: 165.1
      inv_txfm_add_8x8_adst_dct_1_8bpc_c: 2242.6
      inv_txfm_add_8x8_adst_dct_1_8bpc_ssse3: 164.2
      inv_txfm_add_8x8_adst_flipadst_0_8bpc_c: 2178.2
      inv_txfm_add_8x8_adst_flipadst_0_8bpc_ssse3: 194.4
      inv_txfm_add_8x8_adst_flipadst_1_8bpc_c: 2183.0
      inv_txfm_add_8x8_adst_flipadst_1_8bpc_ssse3: 194.2
      inv_txfm_add_8x8_adst_identity_0_8bpc_c: 1592.1
      inv_txfm_add_8x8_adst_identity_0_8bpc_ssse3: 125.2
      inv_txfm_add_8x8_adst_identity_1_8bpc_c: 1597.7
      inv_txfm_add_8x8_adst_identity_1_8bpc_ssse3: 126.3
      inv_txfm_add_8x8_dct_adst_0_8bpc_c: 2214.1
      inv_txfm_add_8x8_dct_adst_0_8bpc_ssse3: 162.0
      inv_txfm_add_8x8_dct_adst_1_8bpc_c: 2221.5
      inv_txfm_add_8x8_dct_adst_1_8bpc_ssse3: 161.9
      inv_txfm_add_8x8_dct_dct_0_8bpc_c: 2247.8
      inv_txfm_add_8x8_dct_dct_0_8bpc_ssse3: 34.0
      inv_txfm_add_8x8_dct_dct_1_8bpc_c: 2243.1
      inv_txfm_add_8x8_dct_dct_1_8bpc_ssse3: 133.7
      inv_txfm_add_8x8_dct_flipadst_0_8bpc_c: 2255.1
      inv_txfm_add_8x8_dct_flipadst_0_8bpc_ssse3: 161.2
      inv_txfm_add_8x8_dct_flipadst_1_8bpc_c: 2244.6
      inv_txfm_add_8x8_dct_flipadst_1_8bpc_ssse3: 161.8
      inv_txfm_add_8x8_dct_identity_0_8bpc_c: 1632.3
      inv_txfm_add_8x8_dct_identity_0_8bpc_ssse3: 41.3
      inv_txfm_add_8x8_dct_identity_1_8bpc_c: 1629.6
      inv_txfm_add_8x8_dct_identity_1_8bpc_ssse3: 97.7
      inv_txfm_add_8x8_flipadst_adst_0_8bpc_c: 2185.6
      inv_txfm_add_8x8_flipadst_adst_0_8bpc_ssse3: 191.0
      inv_txfm_add_8x8_flipadst_adst_1_8bpc_c: 2165.7
      inv_txfm_add_8x8_flipadst_adst_1_8bpc_ssse3: 191.6
      inv_txfm_add_8x8_flipadst_dct_0_8bpc_c: 2246.4
      inv_txfm_add_8x8_flipadst_dct_0_8bpc_ssse3: 162.8
      inv_txfm_add_8x8_flipadst_dct_1_8bpc_c: 2252.1
      inv_txfm_add_8x8_flipadst_dct_1_8bpc_ssse3: 163.9
      inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_c: 2180.9
      inv_txfm_add_8x8_flipadst_flipadst_0_8bpc_ssse3: 196.3
      inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_c: 2192.2
      inv_txfm_add_8x8_flipadst_flipadst_1_8bpc_ssse3: 194.5
      inv_txfm_add_8x8_flipadst_identity_0_8bpc_c: 1600.9
      inv_txfm_add_8x8_flipadst_identity_0_8bpc_ssse3: 126.6
      inv_txfm_add_8x8_flipadst_identity_1_8bpc_c: 1600.5
      inv_txfm_add_8x8_flipadst_identity_1_8bpc_ssse3: 126.4
      inv_txfm_add_8x8_identity_adst_0_8bpc_c: 1558.0
      inv_txfm_add_8x8_identity_adst_0_8bpc_ssse3: 120.7
      inv_txfm_add_8x8_identity_adst_1_8bpc_c: 1556.7
      inv_txfm_add_8x8_identity_adst_1_8bpc_ssse3: 121.0
      inv_txfm_add_8x8_identity_dct_0_8bpc_c: 1600.8
      inv_txfm_add_8x8_identity_dct_0_8bpc_ssse3: 37.9
      inv_txfm_add_8x8_identity_dct_1_8bpc_c: 1599.5
      inv_txfm_add_8x8_identity_dct_1_8bpc_ssse3: 90.3
      inv_txfm_add_8x8_identity_flipadst_0_8bpc_c: 1584.9
      inv_txfm_add_8x8_identity_flipadst_0_8bpc_ssse3: 120.2
      inv_txfm_add_8x8_identity_flipadst_1_8bpc_c: 1584.3
      inv_txfm_add_8x8_identity_flipadst_1_8bpc_ssse3: 120.5
      inv_txfm_add_8x8_identity_identity_0_8bpc_c: 975.9
      inv_txfm_add_8x8_identity_identity_0_8bpc_ssse3: 54.7
      inv_txfm_add_8x8_identity_identity_1_8bpc_c: 975.7
      inv_txfm_add_8x8_identity_identity_1_8bpc_ssse3: 54.7
      5fa6c44a
  6. 26 Dec, 2018 1 commit
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for dav1d_ipred_v and dav1d_ipred_dc · 71e13008
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_dc_w4_8bpc_c: 1051.4
      intra_pred_dc_w4_8bpc_ssse3: 58.8
      intra_pred_dc_w8_8bpc_c: 1587.6
      intra_pred_dc_w8_8bpc_ssse3: 75.3
      intra_pred_dc_w16_8bpc_c: 2526.2
      intra_pred_dc_w16_8bpc_ssse3: 103.5
      intra_pred_dc_w32_8bpc_c: 2646.6
      intra_pred_dc_w32_8bpc_ssse3: 179.5
      intra_pred_dc_w64_8bpc_c: 4084.6
      intra_pred_dc_w64_8bpc_ssse3: 356.1
      intra_pred_v_w4_8bpc_c: 468.5
      intra_pred_v_w4_8bpc_ssse3: 46.8
      intra_pred_v_w8_8bpc_c: 839.1
      intra_pred_v_w8_8bpc_ssse3: 56.7
      intra_pred_v_w16_8bpc_c: 1750.5
      intra_pred_v_w16_8bpc_ssse3: 73.0
      intra_pred_v_w32_8bpc_c: 1552.5
      intra_pred_v_w32_8bpc_ssse3: 135.4
      intra_pred_v_w64_8bpc_c: 2463.6
      intra_pred_v_w64_8bpc_ssse3: 305.6
      71e13008
  7. 22 Dec, 2018 1 commit
  8. 21 Dec, 2018 2 commits
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 4x8 and 8x4 blocks in itx · 1703f21f
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_4x8_adst_adst_0_8bpc_c: 1167.6
      inv_txfm_add_4x8_adst_adst_0_8bpc_ssse3: 114.6
      inv_txfm_add_4x8_adst_adst_1_8bpc_c: 1167.2
      inv_txfm_add_4x8_adst_adst_1_8bpc_ssse3: 114.1
      inv_txfm_add_4x8_adst_dct_0_8bpc_c: 1174.7
      inv_txfm_add_4x8_adst_dct_0_8bpc_ssse3: 34.8
      inv_txfm_add_4x8_adst_dct_1_8bpc_c: 1158.0
      inv_txfm_add_4x8_adst_dct_1_8bpc_ssse3: 101.0
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_c: 1150.9
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_ssse3: 115.8
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_c: 1157.6
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_ssse3: 115.8
      inv_txfm_add_4x8_adst_identity_0_8bpc_c: 848.4
      inv_txfm_add_4x8_adst_identity_0_8bpc_ssse3: 59.1
      inv_txfm_add_4x8_adst_identity_1_8bpc_c: 850.1
      inv_txfm_add_4x8_adst_identity_1_8bpc_ssse3: 59.1
      inv_txfm_add_4x8_dct_adst_0_8bpc_c: 1205.6
      inv_txfm_add_4x8_dct_adst_0_8bpc_ssse3: 107.0
      inv_txfm_add_4x8_dct_adst_1_8bpc_c: 1183.7
      inv_txfm_add_4x8_dct_adst_1_8bpc_ssse3: 107.0
      inv_txfm_add_4x8_dct_dct_0_8bpc_c: 1227.0
      inv_txfm_add_4x8_dct_dct_0_8bpc_ssse3: 34.6
      inv_txfm_add_4x8_dct_dct_1_8bpc_c: 1229.7
      inv_txfm_add_4x8_dct_dct_1_8bpc_ssse3: 96.1
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_c: 1188.2
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_ssse3: 109.3
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_c: 1192.7
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_ssse3: 109.9
      inv_txfm_add_4x8_dct_identity_0_8bpc_c: 878.4
      inv_txfm_add_4x8_dct_identity_0_8bpc_ssse3: 31.9
      inv_txfm_add_4x8_dct_identity_1_8bpc_c: 879.0
      inv_txfm_add_4x8_dct_identity_1_8bpc_ssse3: 54.8
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_c: 1181.8
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_ssse3: 114.7
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_c: 1203.0
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_ssse3: 114.5
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_c: 1203.6
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_ssse3: 34.1
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_c: 1204.4
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_ssse3: 100.2
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_c: 1180.6
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_ssse3: 117.1
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_c: 1178.7
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_ssse3: 116.8
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_c: 871.3
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_ssse3: 69.0
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_c: 872.3
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_ssse3: 70.0
      inv_txfm_add_4x8_identity_adst_0_8bpc_c: 1125.2
      inv_txfm_add_4x8_identity_adst_0_8bpc_ssse3: 98.7
      inv_txfm_add_4x8_identity_adst_1_8bpc_c: 1092.6
      inv_txfm_add_4x8_identity_adst_1_8bpc_ssse3: 99.6
      inv_txfm_add_4x8_identity_dct_0_8bpc_c: 1139.4
      inv_txfm_add_4x8_identity_dct_0_8bpc_ssse3: 38.8
      inv_txfm_add_4x8_identity_dct_1_8bpc_c: 1111.0
      inv_txfm_add_4x8_identity_dct_1_8bpc_ssse3: 84.1
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_c: 1112.4
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_ssse3: 100.7
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_c: 1098.7
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_ssse3: 100.8
      inv_txfm_add_4x8_identity_identity_0_8bpc_c: 791.6
      inv_txfm_add_4x8_identity_identity_0_8bpc_ssse3: 43.9
      inv_txfm_add_4x8_identity_identity_1_8bpc_c: 797.0
      inv_txfm_add_4x8_identity_identity_1_8bpc_ssse3: 43.8
      inv_txfm_add_8x4_adst_adst_0_8bpc_c: 1102.8
      inv_txfm_add_8x4_adst_adst_0_8bpc_ssse3: 108.7
      inv_txfm_add_8x4_adst_adst_1_8bpc_c: 1101.8
      inv_txfm_add_8x4_adst_adst_1_8bpc_ssse3: 108.9
      inv_txfm_add_8x4_adst_dct_0_8bpc_c: 1146.9
      inv_txfm_add_8x4_adst_dct_0_8bpc_ssse3: 98.7
      inv_txfm_add_8x4_adst_dct_1_8bpc_c: 1157.9
      inv_txfm_add_8x4_adst_dct_1_8bpc_ssse3: 98.9
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_c: 1144.6
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_ssse3: 111.4
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_c: 1128.2
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_ssse3: 112.4
      inv_txfm_add_8x4_adst_identity_0_8bpc_c: 1051.1
      inv_txfm_add_8x4_adst_identity_0_8bpc_ssse3: 87.1
      inv_txfm_add_8x4_adst_identity_1_8bpc_c: 1059.2
      inv_txfm_add_8x4_adst_identity_1_8bpc_ssse3: 87.7
      inv_txfm_add_8x4_dct_adst_0_8bpc_c: 1130.2
      inv_txfm_add_8x4_dct_adst_0_8bpc_ssse3: 29.0
      inv_txfm_add_8x4_dct_adst_1_8bpc_c: 1130.1
      inv_txfm_add_8x4_dct_adst_1_8bpc_ssse3: 89.2
      inv_txfm_add_8x4_dct_dct_0_8bpc_c: 1186.0
      inv_txfm_add_8x4_dct_dct_0_8bpc_ssse3: 26.3
      inv_txfm_add_8x4_dct_dct_1_8bpc_c: 1172.2
      inv_txfm_add_8x4_dct_dct_1_8bpc_ssse3: 78.8
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_c: 1154.7
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_ssse3: 29.1
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_c: 1150.2
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_ssse3: 92.2
      inv_txfm_add_8x4_dct_identity_0_8bpc_c: 1078.7
      inv_txfm_add_8x4_dct_identity_0_8bpc_ssse3: 29.2
      inv_txfm_add_8x4_dct_identity_1_8bpc_c: 1090.1
      inv_txfm_add_8x4_dct_identity_1_8bpc_ssse3: 72.2
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_c: 1111.6
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_ssse3: 108.6
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_c: 1112.1
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_ssse3: 107.6
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_c: 1163.0
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_ssse3: 98.3
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_c: 1160.0
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_ssse3: 99.6
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_c: 1137.9
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_ssse3: 112.0
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_c: 1140.0
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_ssse3: 112.0
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_c: 1057.2
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_ssse3: 88.1
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_c: 1058.3
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_ssse3: 87.1
      inv_txfm_add_8x4_identity_adst_0_8bpc_c: 794.0
      inv_txfm_add_8x4_identity_adst_0_8bpc_ssse3: 60.6
      inv_txfm_add_8x4_identity_adst_1_8bpc_c: 793.4
      inv_txfm_add_8x4_identity_adst_1_8bpc_ssse3: 60.6
      inv_txfm_add_8x4_identity_dct_0_8bpc_c: 838.4
      inv_txfm_add_8x4_identity_dct_0_8bpc_ssse3: 27.4
      inv_txfm_add_8x4_identity_dct_1_8bpc_c: 838.5
      inv_txfm_add_8x4_identity_dct_1_8bpc_ssse3: 52.0
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_c: 825.3
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_ssse3: 66.7
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_c: 831.7
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_ssse3: 66.7
      inv_txfm_add_8x4_identity_identity_0_8bpc_c: 768.6
      inv_txfm_add_8x4_identity_identity_0_8bpc_ssse3: 40.0
      inv_txfm_add_8x4_identity_identity_1_8bpc_c: 743.3
      inv_txfm_add_8x4_identity_identity_1_8bpc_ssse3: 39.9
      1703f21f
    • Ronald S. Bultje's avatar
  9. 20 Dec, 2018 7 commits
  10. 19 Dec, 2018 1 commit
  11. 18 Dec, 2018 10 commits
  12. 17 Dec, 2018 4 commits
  13. 16 Dec, 2018 1 commit
  14. 15 Dec, 2018 5 commits
  15. 14 Dec, 2018 1 commit
    • Ronald S. Bultje's avatar
      Rewrite inverse transforms to prevent integer overflows · 6a10a981
      Ronald S. Bultje authored
      The basic idea is that with intermediates of 19+sign bits and
      multipliers of 12+sign bits, the intermediates are 19+12=31+sign
      bits, and adding two of these together can overflow, which is UB
      in C. These are not valid AV1 streams, but they are codable, and
      so although we don't particularly care about the pixel-level
      output for such streams, we do want to prevent triggering UB,
      since that could be considered a security vulnerability.
      
      To resolve this, we clip all multipliers to 11 bit by inverting
      them:
      
      (a * constant_1 + b * constant_2 + 2048) >> 12, where
      constant_1 < 2048 but constant_2 >= 2048, is identical to:
      ((a * constant_1 + b * (4096 - constant_2) + 2048) >> 12) + b,
      and 4096 - constant_2 < 2048. In other places, where both
      constants are a multiple of 2, we can reduce the magnitude of
      both and round/shift by 11 instead of 12.
      
      Do this in dct4,8,16,32,64 as well as adst8,16. Also slightly
      simplify the final phase of idct64_1d by moving the add/sub to
      before the multiply.
      
      The adst4 is rewritten to be shaped like a matrix-multiply, and
      then use the same idea on all 4 multipliers in the matrix, since
      the sum of all 4 multipliers is still under 4096 in all cases.
      
      Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5709759466962944,
      credits to oss-fuzz. Also fixes #223.
      6a10a981