Skip to content

x86: Add 10bpc 8x32/32x8 itx AVX-512 (Ice Lake) asm

Henrik Gramner requested to merge gramner/dav1d:itx16_8x32_avx512icl into master

32x8 helps more than 8x32, which is expected and in line with previous functions, as width-8 IDCTs are limited to YMM registers in the 2nd pass.

inv_txfm_add_8x32_dct_dct_0_10bpc_avx2:                36.5
inv_txfm_add_8x32_dct_dct_0_10bpc_avx512icl:           33.0
inv_txfm_add_8x32_dct_dct_1_10bpc_avx2:               163.2
inv_txfm_add_8x32_dct_dct_1_10bpc_avx512icl:          155.8
inv_txfm_add_8x32_dct_dct_2_10bpc_avx2:               199.5
inv_txfm_add_8x32_dct_dct_2_10bpc_avx512icl:          185.2
inv_txfm_add_8x32_dct_dct_4_10bpc_avx2:               285.5
inv_txfm_add_8x32_dct_dct_4_10bpc_avx512icl:          227.2

inv_txfm_add_32x8_dct_dct_0_10bpc_avx2:                21.9
inv_txfm_add_32x8_dct_dct_0_10bpc_avx512icl:           18.7
inv_txfm_add_32x8_dct_dct_1_10bpc_avx2:               348.5
inv_txfm_add_32x8_dct_dct_1_10bpc_avx512icl:          219.3
inv_txfm_add_32x8_dct_dct_2_10bpc_avx2:               348.5
inv_txfm_add_32x8_dct_dct_2_10bpc_avx512icl:          259.6
inv_txfm_add_32x8_dct_dct_4_10bpc_avx2:               348.5
inv_txfm_add_32x8_dct_dct_4_10bpc_avx512icl:          281.9

inv_txfm_add_8x32_identity_identity_2_10bpc_avx2:      28.4
inv_txfm_add_8x32_identity_identity_2_10bpc_avx512icl: 23.3
inv_txfm_add_8x32_identity_identity_4_10bpc_avx2:      53.2
inv_txfm_add_8x32_identity_identity_4_10bpc_avx512icl: 47.9

inv_txfm_add_32x8_identity_identity_2_10bpc_avx2:      30.4
inv_txfm_add_32x8_identity_identity_2_10bpc_avx512icl: 25.2
inv_txfm_add_32x8_identity_identity_4_10bpc_avx2:      51.9
inv_txfm_add_32x8_identity_identity_4_10bpc_avx512icl: 40.6

Merge request reports