Skip to content

Add SSSE3 implementation for the {16, 32, 64}x64 and 64 x{16, 32} blocks in itx

Liwei Wang requested to merge liwei/dav1d:x86_itx_64_ssse3 into master

Cycle times:

inv_txfm_add_16x64_dct_dct_0_8bpc_c: 3973.5
inv_txfm_add_16x64_dct_dct_0_8bpc_ssse3: 185.7
inv_txfm_add_16x64_dct_dct_1_8bpc_c: 37869.1
inv_txfm_add_16x64_dct_dct_1_8bpc_ssse3: 2103.1
inv_txfm_add_16x64_dct_dct_2_8bpc_c: 37822.9
inv_txfm_add_16x64_dct_dct_2_8bpc_ssse3: 2099.1
inv_txfm_add_16x64_dct_dct_3_8bpc_c: 37871.7
inv_txfm_add_16x64_dct_dct_3_8bpc_ssse3: 2663.5
inv_txfm_add_16x64_dct_dct_4_8bpc_c: 38002.9
inv_txfm_add_16x64_dct_dct_4_8bpc_ssse3: 2589.7
inv_txfm_add_32x64_dct_dct_0_8bpc_c: 8319.2
inv_txfm_add_32x64_dct_dct_0_8bpc_ssse3: 376.9
inv_txfm_add_32x64_dct_dct_1_8bpc_c: 85956.8
inv_txfm_add_32x64_dct_dct_1_8bpc_ssse3: 4298.1
inv_txfm_add_32x64_dct_dct_2_8bpc_c: 89906.2
inv_txfm_add_32x64_dct_dct_2_8bpc_ssse3: 4291.3
inv_txfm_add_32x64_dct_dct_3_8bpc_c: 83710.9
inv_txfm_add_32x64_dct_dct_3_8bpc_ssse3: 5589.5
inv_txfm_add_32x64_dct_dct_4_8bpc_c: 87733.5
inv_txfm_add_32x64_dct_dct_4_8bpc_ssse3: 5658.4
inv_txfm_add_64x16_dct_dct_0_8bpc_c: 3895.9
inv_txfm_add_64x16_dct_dct_0_8bpc_ssse3: 179.5
inv_txfm_add_64x16_dct_dct_1_8bpc_c: 51375.2
inv_txfm_add_64x16_dct_dct_1_8bpc_ssse3: 3859.2
inv_txfm_add_64x16_dct_dct_2_8bpc_c: 52562.9
inv_txfm_add_64x16_dct_dct_2_8bpc_ssse3: 4044.1
inv_txfm_add_64x16_dct_dct_3_8bpc_c: 51347.0
inv_txfm_add_64x16_dct_dct_3_8bpc_ssse3: 5259.5
inv_txfm_add_64x16_dct_dct_4_8bpc_c: 49642.2
inv_txfm_add_64x16_dct_dct_4_8bpc_ssse3: 4008.4
inv_txfm_add_64x32_dct_dct_0_8bpc_c: 7196.4
inv_txfm_add_64x32_dct_dct_0_8bpc_ssse3: 355.8
inv_txfm_add_64x32_dct_dct_1_8bpc_c: 106588.4
inv_txfm_add_64x32_dct_dct_1_8bpc_ssse3: 4965.3
inv_txfm_add_64x32_dct_dct_2_8bpc_c: 106230.7
inv_txfm_add_64x32_dct_dct_2_8bpc_ssse3: 4772.0
inv_txfm_add_64x32_dct_dct_3_8bpc_c: 107427.0
inv_txfm_add_64x32_dct_dct_3_8bpc_ssse3: 7146.9
inv_txfm_add_64x32_dct_dct_4_8bpc_c: 111785.7
inv_txfm_add_64x32_dct_dct_4_8bpc_ssse3: 7156.2
inv_txfm_add_64x64_dct_dct_0_8bpc_c: 14512.4
inv_txfm_add_64x64_dct_dct_0_8bpc_ssse3: 674.2
inv_txfm_add_64x64_dct_dct_1_8bpc_c: 173246.3
inv_txfm_add_64x64_dct_dct_1_8bpc_ssse3: 8790.8
inv_txfm_add_64x64_dct_dct_2_8bpc_c: 174264.6
inv_txfm_add_64x64_dct_dct_2_8bpc_ssse3: 8767.6
inv_txfm_add_64x64_dct_dct_3_8bpc_c: 170047.3
inv_txfm_add_64x64_dct_dct_3_8bpc_ssse3: 10784.9
inv_txfm_add_64x64_dct_dct_4_8bpc_c: 170182.2
inv_txfm_add_64x64_dct_dct_4_8bpc_ssse3: 10795.6
Edited by Henrik Gramner

Merge request reports