1. 15 Dec, 2018 3 commits
  2. 14 Dec, 2018 1 commit
    • Ronald S. Bultje's avatar
      Rewrite inverse transforms to prevent integer overflows · 6a10a981
      Ronald S. Bultje authored
      The basic idea is that with intermediates of 19+sign bits and
      multipliers of 12+sign bits, the intermediates are 19+12=31+sign
      bits, and adding two of these together can overflow, which is UB
      in C. These are not valid AV1 streams, but they are codable, and
      so although we don't particularly care about the pixel-level
      output for such streams, we do want to prevent triggering UB,
      since that could be considered a security vulnerability.
      
      To resolve this, we clip all multipliers to 11 bit by inverting
      them:
      
      (a * constant_1 + b * constant_2 + 2048) >> 12, where
      constant_1 < 2048 but constant_2 >= 2048, is identical to:
      ((a * constant_1 + b * (4096 - constant_2) + 2048) >> 12) + b,
      and 4096 - constant_2 < 2048. In other places, where both
      constants are a multiple of 2, we can reduce the magnitude of
      both and round/shift by 11 instead of 12.
      
      Do this in dct4,8,16,32,64 as well as adst8,16. Also slightly
      simplify the final phase of idct64_1d by moving the add/sub to
      before the multiply.
      
      The adst4 is rewritten to be shaped like a matrix-multiply, and
      then use the same idea on all 4 multipliers in the matrix, since
      the sum of all 4 multipliers is still under 4096 in all cases.
      
      Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5709759466962944,
      credits to oss-fuzz. Also fixes #223.
      6a10a981
  3. 13 Dec, 2018 2 commits
  4. 12 Dec, 2018 4 commits
  5. 11 Dec, 2018 3 commits
  6. 10 Dec, 2018 7 commits
  7. 09 Dec, 2018 2 commits
  8. 08 Dec, 2018 9 commits
  9. 07 Dec, 2018 5 commits
  10. 06 Dec, 2018 4 commits
    • Ronald S. Bultje's avatar
      Fix mc.avg/w_avg/mask for x86-32 · d3a1ebad
      Ronald S. Bultje authored
      d3a1ebad
    • Ronald S. Bultje's avatar
      Special w=4/8 cases · da5a5df8
      Ronald S. Bultje authored
      da5a5df8
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 4x4 blocks in itx · 87a377e9
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_4x4_adst_adst_0_8bpc_c: 445.9
      inv_txfm_add_4x4_adst_adst_0_8bpc_ssse3: 23.7
      inv_txfm_add_4x4_adst_adst_1_8bpc_c: 443.7
      inv_txfm_add_4x4_adst_adst_1_8bpc_ssse3: 52.6
      inv_txfm_add_4x4_adst_dct_0_8bpc_c: 474.5
      inv_txfm_add_4x4_adst_dct_0_8bpc_ssse3: 23.9
      inv_txfm_add_4x4_adst_dct_1_8bpc_c: 482.0
      inv_txfm_add_4x4_adst_dct_1_8bpc_ssse3: 51.1
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_c: 587.2
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_ssse3: 24.0
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_c: 457.2
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_ssse3: 52.8
      inv_txfm_add_4x4_adst_identity_0_8bpc_c: 412.4
      inv_txfm_add_4x4_adst_identity_0_8bpc_ssse3: 43.3
      inv_txfm_add_4x4_adst_identity_1_8bpc_c: 412.0
      inv_txfm_add_4x4_adst_identity_1_8bpc_ssse3: 43.3
      inv_txfm_add_4x4_dct_adst_0_8bpc_c: 467.4
      inv_txfm_add_4x4_dct_adst_0_8bpc_ssse3: 23.2
      inv_txfm_add_4x4_dct_adst_1_8bpc_c: 588.3
      inv_txfm_add_4x4_dct_adst_1_8bpc_ssse3: 48.6
      inv_txfm_add_4x4_dct_dct_0_8bpc_c: 611.5
      inv_txfm_add_4x4_dct_dct_0_8bpc_ssse3: 23.1
      inv_txfm_add_4x4_dct_dct_1_8bpc_c: 576.2
      inv_txfm_add_4x4_dct_dct_1_8bpc_ssse3: 47.6
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_c: 479.5
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_ssse3: 23.4
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_c: 549.3
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_ssse3: 48.3
      inv_txfm_add_4x4_dct_identity_0_8bpc_c: 576.9
      inv_txfm_add_4x4_dct_identity_0_8bpc_ssse3: 25.4
      inv_txfm_add_4x4_dct_identity_1_8bpc_c: 610.7
      inv_txfm_add_4x4_dct_identity_1_8bpc_ssse3: 25.1
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_c: 532.8
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_ssse3: 23.8
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_c: 666.7
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_ssse3: 61.0
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_c: 539.6
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_ssse3: 23.8
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_c: 484.6
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_ssse3: 51.1
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_c: 503.1
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_ssse3: 23.9
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_c: 463.0
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_ssse3: 54.0
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_c: 719.9
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_ssse3: 43.0
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_c: 456.8
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_ssse3: 44.1
      inv_txfm_add_4x4_identity_adst_0_8bpc_c: 422.8
      inv_txfm_add_4x4_identity_adst_0_8bpc_ssse3: 42.4
      inv_txfm_add_4x4_identity_adst_1_8bpc_c: 417.1
      inv_txfm_add_4x4_identity_adst_1_8bpc_ssse3: 42.3
      inv_txfm_add_4x4_identity_dct_0_8bpc_c: 435.4
      inv_txfm_add_4x4_identity_dct_0_8bpc_ssse3: 25.7
      inv_txfm_add_4x4_identity_dct_1_8bpc_c: 434.1
      inv_txfm_add_4x4_identity_dct_1_8bpc_ssse3: 25.3
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_c: 528.1
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_ssse3: 40.9
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_c: 720.0
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_ssse3: 41.8
      inv_txfm_add_4x4_identity_identity_0_8bpc_c: 383.2
      inv_txfm_add_4x4_identity_identity_0_8bpc_ssse3: 28.3
      inv_txfm_add_4x4_identity_identity_1_8bpc_c: 378.9
      inv_txfm_add_4x4_identity_identity_1_8bpc_ssse3: 28.2
      inv_txfm_add_4x4_wht_wht_0_8bpc_c: 271.5
      inv_txfm_add_4x4_wht_wht_0_8bpc_ssse3: 34.0
      inv_txfm_add_4x4_wht_wht_1_8bpc_c: 266.0
      inv_txfm_add_4x4_wht_wht_1_8bpc_ssse3: 33.9
      87a377e9
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for dav1d_ipred_h · 6f2f0188
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_h_w4_8bpc_c: 146.6
      intra_pred_h_w4_8bpc_ssse3: 30.6
      intra_pred_h_w8_8bpc_c: 236.3
      intra_pred_h_w8_8bpc_ssse3: 42.2
      intra_pred_h_w16_8bpc_c: 446.6
      intra_pred_h_w16_8bpc_ssse3: 55.8
      intra_pred_h_w32_8bpc_c: 688.2
      intra_pred_h_w32_8bpc_ssse3: 85.9
      intra_pred_h_w64_8bpc_c: 634.2
      intra_pred_h_w64_8bpc_ssse3: 169.2
      6f2f0188