x86: Add 8-bit ipred z1 AVX-512 (Ice Lake) asm
All threads resolved!
All threads resolved!
With AVX-512 we can now keep all the data in registers and use vpermb
/vpermi2b
to load the input pixels instead of having to do memory loads from a stack buffer.
intra_pred_z1_w4_8bpc_c: 204.5 ( 1.00x)
intra_pred_z1_w4_8bpc_ssse3: 37.6 ( 5.43x)
intra_pred_z1_w4_8bpc_avx2: 37.8 ( 5.41x)
intra_pred_z1_w4_8bpc_avx512icl: 23.5 ( 8.70x)
intra_pred_z1_w8_8bpc_c: 342.5 ( 1.00x)
intra_pred_z1_w8_8bpc_ssse3: 58.1 ( 5.90x)
intra_pred_z1_w8_8bpc_avx2: 40.9 ( 8.38x)
intra_pred_z1_w8_8bpc_avx512icl: 27.5 (12.44x)
intra_pred_z1_w16_8bpc_c: 1162.9 ( 1.00x)
intra_pred_z1_w16_8bpc_ssse3: 124.8 ( 9.31x)
intra_pred_z1_w16_8bpc_avx2: 79.5 (14.63x)
intra_pred_z1_w16_8bpc_avx512icl: 45.1 (25.76x)
intra_pred_z1_w32_8bpc_c: 1927.0 ( 1.00x)
intra_pred_z1_w32_8bpc_ssse3: 254.9 ( 7.56x)
intra_pred_z1_w32_8bpc_avx2: 160.5 (12.01x)
intra_pred_z1_w32_8bpc_avx512icl: 110.3 (17.47x)
intra_pred_z1_w64_8bpc_c: 2962.4 ( 1.00x)
intra_pred_z1_w64_8bpc_ssse3: 504.7 ( 5.87x)
intra_pred_z1_w64_8bpc_avx2: 284.4 (10.42x)
intra_pred_z1_w64_8bpc_avx512icl: 245.8 (12.05x)
Merge request reports
Activity
Filter activity
changed milestone to %1.3.0
added performance x86 labels
requested review from @psilokos
- Resolved by Henrik Gramner
- Resolved by Victorien Le Couviour--Tuffet
changed milestone to %1.4.0
mentioned in issue #316
Please register or sign in to reply