Skip to content

x86: Add 8-bit ipred z1 AVX-512 (Ice Lake) asm

Henrik Gramner requested to merge gramner/dav1d:ipred_z1_8bpc_avx512icl into master

With AVX-512 we can now keep all the data in registers and use vpermb/vpermi2b to load the input pixels instead of having to do memory loads from a stack buffer.

intra_pred_z1_w4_8bpc_c:             204.5 ( 1.00x)
intra_pred_z1_w4_8bpc_ssse3:          37.6 ( 5.43x)
intra_pred_z1_w4_8bpc_avx2:           37.8 ( 5.41x)
intra_pred_z1_w4_8bpc_avx512icl:      23.5 ( 8.70x)

intra_pred_z1_w8_8bpc_c:             342.5 ( 1.00x)
intra_pred_z1_w8_8bpc_ssse3:          58.1 ( 5.90x)
intra_pred_z1_w8_8bpc_avx2:           40.9 ( 8.38x)
intra_pred_z1_w8_8bpc_avx512icl:      27.5 (12.44x)

intra_pred_z1_w16_8bpc_c:           1162.9 ( 1.00x)
intra_pred_z1_w16_8bpc_ssse3:        124.8 ( 9.31x)
intra_pred_z1_w16_8bpc_avx2:          79.5 (14.63x)
intra_pred_z1_w16_8bpc_avx512icl:     45.1 (25.76x)

intra_pred_z1_w32_8bpc_c:           1927.0 ( 1.00x)
intra_pred_z1_w32_8bpc_ssse3:        254.9 ( 7.56x)
intra_pred_z1_w32_8bpc_avx2:         160.5 (12.01x)
intra_pred_z1_w32_8bpc_avx512icl:    110.3 (17.47x)

intra_pred_z1_w64_8bpc_c:           2962.4 ( 1.00x)
intra_pred_z1_w64_8bpc_ssse3:        504.7 ( 5.87x)
intra_pred_z1_w64_8bpc_avx2:         284.4 (10.42x)
intra_pred_z1_w64_8bpc_avx512icl:    245.8 (12.05x)

Merge request reports