1. 10 Dec, 2018 7 commits
  2. 09 Dec, 2018 2 commits
  3. 08 Dec, 2018 9 commits
  4. 07 Dec, 2018 5 commits
  5. 06 Dec, 2018 4 commits
    • Ronald S. Bultje's avatar
      Fix mc.avg/w_avg/mask for x86-32 · d3a1ebad
      Ronald S. Bultje authored
      d3a1ebad
    • Ronald S. Bultje's avatar
      Special w=4/8 cases · da5a5df8
      Ronald S. Bultje authored
      da5a5df8
    • Liwei Wang's avatar
      Add SSSE3 implementation for the 4x4 blocks in itx · 87a377e9
      Liwei Wang authored
      Cycle times:
      inv_txfm_add_4x4_adst_adst_0_8bpc_c: 445.9
      inv_txfm_add_4x4_adst_adst_0_8bpc_ssse3: 23.7
      inv_txfm_add_4x4_adst_adst_1_8bpc_c: 443.7
      inv_txfm_add_4x4_adst_adst_1_8bpc_ssse3: 52.6
      inv_txfm_add_4x4_adst_dct_0_8bpc_c: 474.5
      inv_txfm_add_4x4_adst_dct_0_8bpc_ssse3: 23.9
      inv_txfm_add_4x4_adst_dct_1_8bpc_c: 482.0
      inv_txfm_add_4x4_adst_dct_1_8bpc_ssse3: 51.1
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_c: 587.2
      inv_txfm_add_4x4_adst_flipadst_0_8bpc_ssse3: 24.0
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_c: 457.2
      inv_txfm_add_4x4_adst_flipadst_1_8bpc_ssse3: 52.8
      inv_txfm_add_4x4_adst_identity_0_8bpc_c: 412.4
      inv_txfm_add_4x4_adst_identity_0_8bpc_ssse3: 43.3
      inv_txfm_add_4x4_adst_identity_1_8bpc_c: 412.0
      inv_txfm_add_4x4_adst_identity_1_8bpc_ssse3: 43.3
      inv_txfm_add_4x4_dct_adst_0_8bpc_c: 467.4
      inv_txfm_add_4x4_dct_adst_0_8bpc_ssse3: 23.2
      inv_txfm_add_4x4_dct_adst_1_8bpc_c: 588.3
      inv_txfm_add_4x4_dct_adst_1_8bpc_ssse3: 48.6
      inv_txfm_add_4x4_dct_dct_0_8bpc_c: 611.5
      inv_txfm_add_4x4_dct_dct_0_8bpc_ssse3: 23.1
      inv_txfm_add_4x4_dct_dct_1_8bpc_c: 576.2
      inv_txfm_add_4x4_dct_dct_1_8bpc_ssse3: 47.6
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_c: 479.5
      inv_txfm_add_4x4_dct_flipadst_0_8bpc_ssse3: 23.4
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_c: 549.3
      inv_txfm_add_4x4_dct_flipadst_1_8bpc_ssse3: 48.3
      inv_txfm_add_4x4_dct_identity_0_8bpc_c: 576.9
      inv_txfm_add_4x4_dct_identity_0_8bpc_ssse3: 25.4
      inv_txfm_add_4x4_dct_identity_1_8bpc_c: 610.7
      inv_txfm_add_4x4_dct_identity_1_8bpc_ssse3: 25.1
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_c: 532.8
      inv_txfm_add_4x4_flipadst_adst_0_8bpc_ssse3: 23.8
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_c: 666.7
      inv_txfm_add_4x4_flipadst_adst_1_8bpc_ssse3: 61.0
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_c: 539.6
      inv_txfm_add_4x4_flipadst_dct_0_8bpc_ssse3: 23.8
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_c: 484.6
      inv_txfm_add_4x4_flipadst_dct_1_8bpc_ssse3: 51.1
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_c: 503.1
      inv_txfm_add_4x4_flipadst_flipadst_0_8bpc_ssse3: 23.9
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_c: 463.0
      inv_txfm_add_4x4_flipadst_flipadst_1_8bpc_ssse3: 54.0
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_c: 719.9
      inv_txfm_add_4x4_flipadst_identity_0_8bpc_ssse3: 43.0
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_c: 456.8
      inv_txfm_add_4x4_flipadst_identity_1_8bpc_ssse3: 44.1
      inv_txfm_add_4x4_identity_adst_0_8bpc_c: 422.8
      inv_txfm_add_4x4_identity_adst_0_8bpc_ssse3: 42.4
      inv_txfm_add_4x4_identity_adst_1_8bpc_c: 417.1
      inv_txfm_add_4x4_identity_adst_1_8bpc_ssse3: 42.3
      inv_txfm_add_4x4_identity_dct_0_8bpc_c: 435.4
      inv_txfm_add_4x4_identity_dct_0_8bpc_ssse3: 25.7
      inv_txfm_add_4x4_identity_dct_1_8bpc_c: 434.1
      inv_txfm_add_4x4_identity_dct_1_8bpc_ssse3: 25.3
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_c: 528.1
      inv_txfm_add_4x4_identity_flipadst_0_8bpc_ssse3: 40.9
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_c: 720.0
      inv_txfm_add_4x4_identity_flipadst_1_8bpc_ssse3: 41.8
      inv_txfm_add_4x4_identity_identity_0_8bpc_c: 383.2
      inv_txfm_add_4x4_identity_identity_0_8bpc_ssse3: 28.3
      inv_txfm_add_4x4_identity_identity_1_8bpc_c: 378.9
      inv_txfm_add_4x4_identity_identity_1_8bpc_ssse3: 28.2
      inv_txfm_add_4x4_wht_wht_0_8bpc_c: 271.5
      inv_txfm_add_4x4_wht_wht_0_8bpc_ssse3: 34.0
      inv_txfm_add_4x4_wht_wht_1_8bpc_c: 266.0
      inv_txfm_add_4x4_wht_wht_1_8bpc_ssse3: 33.9
      87a377e9
    • Xuefeng Jiang's avatar
      Add SSSE3 implementation for dav1d_ipred_h · 6f2f0188
      Xuefeng Jiang authored
      Cycle times:
      intra_pred_h_w4_8bpc_c: 146.6
      intra_pred_h_w4_8bpc_ssse3: 30.6
      intra_pred_h_w8_8bpc_c: 236.3
      intra_pred_h_w8_8bpc_ssse3: 42.2
      intra_pred_h_w16_8bpc_c: 446.6
      intra_pred_h_w16_8bpc_ssse3: 55.8
      intra_pred_h_w32_8bpc_c: 688.2
      intra_pred_h_w32_8bpc_ssse3: 85.9
      intra_pred_h_w64_8bpc_c: 634.2
      intra_pred_h_w64_8bpc_ssse3: 169.2
      6f2f0188
  6. 05 Dec, 2018 6 commits
  7. 04 Dec, 2018 4 commits
  8. 03 Dec, 2018 3 commits
    • Janne Grunau's avatar
      film_grain: limit overlapped pixels to block boundaries · 1e9c428a
      Janne Grunau authored
      Fixes #210.
      1e9c428a
    • Janne Grunau's avatar
      film_grain: copy unmodified planes before applying noise · 62dd32c4
      Janne Grunau authored
      Luma output plane is used during chroma film grain. Fixes an use of
      uninitialized value in iclip/apply_to_row_uv with
      clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5636143299690496. Credits
      to oss-fuzz.
      62dd32c4
    • Ronald S. Bultje's avatar
      Make per-width versions of cfl_ac · 70fb01d8
      Ronald S. Bultje authored
      Also use aligned reads and writes in sub_loop, and integrate sum_loop into
      the main loop.
      
      before:
      cfl_ac_420_w4_8bpc_c: 367.4
      cfl_ac_420_w4_8bpc_avx2: 72.8
      cfl_ac_420_w8_8bpc_c: 621.6
      cfl_ac_420_w8_8bpc_avx2: 85.1
      cfl_ac_420_w16_8bpc_c: 983.4
      cfl_ac_420_w16_8bpc_avx2: 141.0
      
      after:
      cfl_ac_420_w4_8bpc_c: 376.2
      cfl_ac_420_w4_8bpc_avx2: 28.5
      cfl_ac_420_w8_8bpc_c: 607.2
      cfl_ac_420_w8_8bpc_avx2: 29.9
      cfl_ac_420_w16_8bpc_c: 962.1
      cfl_ac_420_w16_8bpc_avx2: 48.8
      70fb01d8