1. 22 Dec, 2018 1 commit
  2. 21 Dec, 2018 2 commits
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 4x8 and 8x4 blocks in itx · 1703f21f
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_4x8_adst_adst_0_8bpc_c: 1167.6
      inv_txfm_add_4x8_adst_adst_0_8bpc_ssse3: 114.6
      inv_txfm_add_4x8_adst_adst_1_8bpc_c: 1167.2
      inv_txfm_add_4x8_adst_adst_1_8bpc_ssse3: 114.1
      inv_txfm_add_4x8_adst_dct_0_8bpc_c: 1174.7
      inv_txfm_add_4x8_adst_dct_0_8bpc_ssse3: 34.8
      inv_txfm_add_4x8_adst_dct_1_8bpc_c: 1158.0
      inv_txfm_add_4x8_adst_dct_1_8bpc_ssse3: 101.0
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_c: 1150.9
      inv_txfm_add_4x8_adst_flipadst_0_8bpc_ssse3: 115.8
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_c: 1157.6
      inv_txfm_add_4x8_adst_flipadst_1_8bpc_ssse3: 115.8
      inv_txfm_add_4x8_adst_identity_0_8bpc_c: 848.4
      inv_txfm_add_4x8_adst_identity_0_8bpc_ssse3: 59.1
      inv_txfm_add_4x8_adst_identity_1_8bpc_c: 850.1
      inv_txfm_add_4x8_adst_identity_1_8bpc_ssse3: 59.1
      inv_txfm_add_4x8_dct_adst_0_8bpc_c: 1205.6
      inv_txfm_add_4x8_dct_adst_0_8bpc_ssse3: 107.0
      inv_txfm_add_4x8_dct_adst_1_8bpc_c: 1183.7
      inv_txfm_add_4x8_dct_adst_1_8bpc_ssse3: 107.0
      inv_txfm_add_4x8_dct_dct_0_8bpc_c: 1227.0
      inv_txfm_add_4x8_dct_dct_0_8bpc_ssse3: 34.6
      inv_txfm_add_4x8_dct_dct_1_8bpc_c: 1229.7
      inv_txfm_add_4x8_dct_dct_1_8bpc_ssse3: 96.1
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_c: 1188.2
      inv_txfm_add_4x8_dct_flipadst_0_8bpc_ssse3: 109.3
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_c: 1192.7
      inv_txfm_add_4x8_dct_flipadst_1_8bpc_ssse3: 109.9
      inv_txfm_add_4x8_dct_identity_0_8bpc_c: 878.4
      inv_txfm_add_4x8_dct_identity_0_8bpc_ssse3: 31.9
      inv_txfm_add_4x8_dct_identity_1_8bpc_c: 879.0
      inv_txfm_add_4x8_dct_identity_1_8bpc_ssse3: 54.8
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_c: 1181.8
      inv_txfm_add_4x8_flipadst_adst_0_8bpc_ssse3: 114.7
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_c: 1203.0
      inv_txfm_add_4x8_flipadst_adst_1_8bpc_ssse3: 114.5
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_c: 1203.6
      inv_txfm_add_4x8_flipadst_dct_0_8bpc_ssse3: 34.1
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_c: 1204.4
      inv_txfm_add_4x8_flipadst_dct_1_8bpc_ssse3: 100.2
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_c: 1180.6
      inv_txfm_add_4x8_flipadst_flipadst_0_8bpc_ssse3: 117.1
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_c: 1178.7
      inv_txfm_add_4x8_flipadst_flipadst_1_8bpc_ssse3: 116.8
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_c: 871.3
      inv_txfm_add_4x8_flipadst_identity_0_8bpc_ssse3: 69.0
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_c: 872.3
      inv_txfm_add_4x8_flipadst_identity_1_8bpc_ssse3: 70.0
      inv_txfm_add_4x8_identity_adst_0_8bpc_c: 1125.2
      inv_txfm_add_4x8_identity_adst_0_8bpc_ssse3: 98.7
      inv_txfm_add_4x8_identity_adst_1_8bpc_c: 1092.6
      inv_txfm_add_4x8_identity_adst_1_8bpc_ssse3: 99.6
      inv_txfm_add_4x8_identity_dct_0_8bpc_c: 1139.4
      inv_txfm_add_4x8_identity_dct_0_8bpc_ssse3: 38.8
      inv_txfm_add_4x8_identity_dct_1_8bpc_c: 1111.0
      inv_txfm_add_4x8_identity_dct_1_8bpc_ssse3: 84.1
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_c: 1112.4
      inv_txfm_add_4x8_identity_flipadst_0_8bpc_ssse3: 100.7
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_c: 1098.7
      inv_txfm_add_4x8_identity_flipadst_1_8bpc_ssse3: 100.8
      inv_txfm_add_4x8_identity_identity_0_8bpc_c: 791.6
      inv_txfm_add_4x8_identity_identity_0_8bpc_ssse3: 43.9
      inv_txfm_add_4x8_identity_identity_1_8bpc_c: 797.0
      inv_txfm_add_4x8_identity_identity_1_8bpc_ssse3: 43.8
      inv_txfm_add_8x4_adst_adst_0_8bpc_c: 1102.8
      inv_txfm_add_8x4_adst_adst_0_8bpc_ssse3: 108.7
      inv_txfm_add_8x4_adst_adst_1_8bpc_c: 1101.8
      inv_txfm_add_8x4_adst_adst_1_8bpc_ssse3: 108.9
      inv_txfm_add_8x4_adst_dct_0_8bpc_c: 1146.9
      inv_txfm_add_8x4_adst_dct_0_8bpc_ssse3: 98.7
      inv_txfm_add_8x4_adst_dct_1_8bpc_c: 1157.9
      inv_txfm_add_8x4_adst_dct_1_8bpc_ssse3: 98.9
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_c: 1144.6
      inv_txfm_add_8x4_adst_flipadst_0_8bpc_ssse3: 111.4
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_c: 1128.2
      inv_txfm_add_8x4_adst_flipadst_1_8bpc_ssse3: 112.4
      inv_txfm_add_8x4_adst_identity_0_8bpc_c: 1051.1
      inv_txfm_add_8x4_adst_identity_0_8bpc_ssse3: 87.1
      inv_txfm_add_8x4_adst_identity_1_8bpc_c: 1059.2
      inv_txfm_add_8x4_adst_identity_1_8bpc_ssse3: 87.7
      inv_txfm_add_8x4_dct_adst_0_8bpc_c: 1130.2
      inv_txfm_add_8x4_dct_adst_0_8bpc_ssse3: 29.0
      inv_txfm_add_8x4_dct_adst_1_8bpc_c: 1130.1
      inv_txfm_add_8x4_dct_adst_1_8bpc_ssse3: 89.2
      inv_txfm_add_8x4_dct_dct_0_8bpc_c: 1186.0
      inv_txfm_add_8x4_dct_dct_0_8bpc_ssse3: 26.3
      inv_txfm_add_8x4_dct_dct_1_8bpc_c: 1172.2
      inv_txfm_add_8x4_dct_dct_1_8bpc_ssse3: 78.8
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_c: 1154.7
      inv_txfm_add_8x4_dct_flipadst_0_8bpc_ssse3: 29.1
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_c: 1150.2
      inv_txfm_add_8x4_dct_flipadst_1_8bpc_ssse3: 92.2
      inv_txfm_add_8x4_dct_identity_0_8bpc_c: 1078.7
      inv_txfm_add_8x4_dct_identity_0_8bpc_ssse3: 29.2
      inv_txfm_add_8x4_dct_identity_1_8bpc_c: 1090.1
      inv_txfm_add_8x4_dct_identity_1_8bpc_ssse3: 72.2
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_c: 1111.6
      inv_txfm_add_8x4_flipadst_adst_0_8bpc_ssse3: 108.6
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_c: 1112.1
      inv_txfm_add_8x4_flipadst_adst_1_8bpc_ssse3: 107.6
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_c: 1163.0
      inv_txfm_add_8x4_flipadst_dct_0_8bpc_ssse3: 98.3
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_c: 1160.0
      inv_txfm_add_8x4_flipadst_dct_1_8bpc_ssse3: 99.6
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_c: 1137.9
      inv_txfm_add_8x4_flipadst_flipadst_0_8bpc_ssse3: 112.0
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_c: 1140.0
      inv_txfm_add_8x4_flipadst_flipadst_1_8bpc_ssse3: 112.0
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_c: 1057.2
      inv_txfm_add_8x4_flipadst_identity_0_8bpc_ssse3: 88.1
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_c: 1058.3
      inv_txfm_add_8x4_flipadst_identity_1_8bpc_ssse3: 87.1
      inv_txfm_add_8x4_identity_adst_0_8bpc_c: 794.0
      inv_txfm_add_8x4_identity_adst_0_8bpc_ssse3: 60.6
      inv_txfm_add_8x4_identity_adst_1_8bpc_c: 793.4
      inv_txfm_add_8x4_identity_adst_1_8bpc_ssse3: 60.6
      inv_txfm_add_8x4_identity_dct_0_8bpc_c: 838.4
      inv_txfm_add_8x4_identity_dct_0_8bpc_ssse3: 27.4
      inv_txfm_add_8x4_identity_dct_1_8bpc_c: 838.5
      inv_txfm_add_8x4_identity_dct_1_8bpc_ssse3: 52.0
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_c: 825.3
      inv_txfm_add_8x4_identity_flipadst_0_8bpc_ssse3: 66.7
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_c: 831.7
      inv_txfm_add_8x4_identity_flipadst_1_8bpc_ssse3: 66.7
      inv_txfm_add_8x4_identity_identity_0_8bpc_c: 768.6
      inv_txfm_add_8x4_identity_identity_0_8bpc_ssse3: 40.0
      inv_txfm_add_8x4_identity_identity_1_8bpc_c: 743.3
      inv_txfm_add_8x4_identity_identity_1_8bpc_ssse3: 39.9
      1703f21f
    • Ronald S. Bultje's avatar
  3. 20 Dec, 2018 6 commits
  4. 19 Dec, 2018 1 commit
  5. 18 Dec, 2018 7 commits
  6. 17 Dec, 2018 4 commits
  7. 15 Dec, 2018 1 commit
    • Janne Grunau's avatar
      intrabc: use visible width/height in mv correction · 7677c120
      Janne Grunau authored
      Prevents adjusting intra block copy motion vectors to values pointing
      out of the current tile. This happens with not entirely visible blocks in
      a one super block wide/high tile. Fixes an use of uninitilized value in
      inv_txfm_add_c() with
      clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5746740678885376. Credits
      to oss-fuzz.
      7677c120
  8. 14 Dec, 2018 1 commit
    • Ronald S. Bultje's avatar
      Rewrite inverse transforms to prevent integer overflows · 6a10a981
      Ronald S. Bultje authored
      The basic idea is that with intermediates of 19+sign bits and
      multipliers of 12+sign bits, the intermediates are 19+12=31+sign
      bits, and adding two of these together can overflow, which is UB
      in C. These are not valid AV1 streams, but they are codable, and
      so although we don't particularly care about the pixel-level
      output for such streams, we do want to prevent triggering UB,
      since that could be considered a security vulnerability.
      
      To resolve this, we clip all multipliers to 11 bit by inverting
      them:
      
      (a * constant_1 + b * constant_2 + 2048) >> 12, where
      constant_1 < 2048 but constant_2 >= 2048, is identical to:
      ((a * constant_1 + b * (4096 - constant_2) + 2048) >> 12) + b,
      and 4096 - constant_2 < 2048. In other places, where both
      constants are a multiple of 2, we can reduce the magnitude of
      both and round/shift by 11 instead of 12.
      
      Do this in dct4,8,16,32,64 as well as adst8,16. Also slightly
      simplify the final phase of idct64_1d by moving the add/sub to
      before the multiply.
      
      The adst4 is rewritten to be shaped like a matrix-multiply, and
      then use the same idea on all 4 multipliers in the matrix, since
      the sum of all 4 multipliers is still under 4096 in all cases.
      
      Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5709759466962944,
      credits to oss-fuzz. Also fixes #223.
      6a10a981
  9. 13 Dec, 2018 1 commit
  10. 12 Dec, 2018 3 commits
  11. 10 Dec, 2018 5 commits
  12. 09 Dec, 2018 2 commits
  13. 08 Dec, 2018 6 commits