Skip to content

x86: add AVX512-IceLake implementation of HBD 64x16 DCT^2

Ronald S. Bultje requested to merge rbultje/dav1d:itx-avx512icl-hbd-64x16 into master
inv_txfm_add_64x16_dct_dct_0_10bpc_c:            892.0 ( 1.00x)
inv_txfm_add_64x16_dct_dct_0_10bpc_sse4:         131.5 ( 6.78x)
inv_txfm_add_64x16_dct_dct_0_10bpc_avx2:          63.4 (14.07x)
inv_txfm_add_64x16_dct_dct_0_10bpc_avx512icl:     56.8 (15.71x)
inv_txfm_add_64x16_dct_dct_1_10bpc_c:          29253.7 ( 1.00x)
inv_txfm_add_64x16_dct_dct_1_10bpc_sse4:        1639.7 (17.84x)
inv_txfm_add_64x16_dct_dct_1_10bpc_avx2:        1106.8 (26.43x)
inv_txfm_add_64x16_dct_dct_1_10bpc_avx512icl:    532.9 (54.89x)
inv_txfm_add_64x16_dct_dct_2_10bpc_c:          29249.8 ( 1.00x)
inv_txfm_add_64x16_dct_dct_2_10bpc_sse4:        3065.6 ( 9.54x)
inv_txfm_add_64x16_dct_dct_2_10bpc_avx2:        1791.0 (16.33x)
inv_txfm_add_64x16_dct_dct_2_10bpc_avx512icl:   1108.0 (26.40x)
inv_txfm_add_64x16_dct_dct_3_10bpc_c:          29269.1 ( 1.00x)
inv_txfm_add_64x16_dct_dct_3_10bpc_sse4:        3738.2 ( 7.83x)
inv_txfm_add_64x16_dct_dct_3_10bpc_avx2:        1790.9 (16.34x)
inv_txfm_add_64x16_dct_dct_3_10bpc_avx512icl:   1203.8 (24.31x)
inv_txfm_add_64x16_dct_dct_4_10bpc_c:          29337.7 ( 1.00x)
inv_txfm_add_64x16_dct_dct_4_10bpc_sse4:        3749.7 ( 7.82x)
inv_txfm_add_64x16_dct_dct_4_10bpc_avx2:        1791.0 (16.38x)
inv_txfm_add_64x16_dct_dct_4_10bpc_avx512icl:   1203.8 (24.37x)
Edited by Ronald S. Bultje

Merge request reports