Skip to content

x86: Add high bit-depth ipred z1 AVX-512 (Ice Lake) asm

Henrik Gramner requested to merge gramner/dav1d:ipred_z1_hbd_avx512icl into master

Uses permutes (like the 8bpc AVX-512) for w4 and w8, and memory loads from a temporary stack buffer (like AVX2) for the larger widths as this ended up being optimal.

intra_pred_z1_w4_16bpc_c:             232.4 ( 1.00x)
intra_pred_z1_w4_16bpc_ssse3:          42.0 ( 5.53x)
intra_pred_z1_w4_16bpc_avx2:           35.6 ( 6.53x)
intra_pred_z1_w4_16bpc_avx512icl:      23.2 (10.03x)

intra_pred_z1_w8_16bpc_c:             425.0 ( 1.00x)
intra_pred_z1_w8_16bpc_ssse3:          55.9 ( 7.60x)
intra_pred_z1_w8_16bpc_avx2:           39.3 (10.82x)
intra_pred_z1_w8_16bpc_avx512icl:      26.6 (15.99x)

intra_pred_z1_w16_16bpc_c:           1143.3 ( 1.00x)
intra_pred_z1_w16_16bpc_ssse3:        125.8 ( 9.09x)
intra_pred_z1_w16_16bpc_avx2:          75.7 (15.11x)
intra_pred_z1_w16_16bpc_avx512icl:     64.3 (17.79x)

intra_pred_z1_w32_16bpc_c:           1651.6 ( 1.00x)
intra_pred_z1_w32_16bpc_ssse3:        229.1 ( 7.21x)
intra_pred_z1_w32_16bpc_avx2:         144.6 (11.42x)
intra_pred_z1_w32_16bpc_avx512icl:     91.9 (17.98x)

intra_pred_z1_w64_16bpc_c:           2957.5 ( 1.00x)
intra_pred_z1_w64_16bpc_ssse3:        557.6 ( 5.30x)
intra_pred_z1_w64_16bpc_avx2:         297.4 ( 9.94x)
intra_pred_z1_w64_16bpc_avx512icl:    192.4 (15.37x)

Merge request reports