x86: Add high bit-depth film grain AVX-512 (Ice Lake) asm (!1396) · Merge requests · VideoLAN / dav1d · GitLab

Snippets Groups Projects

Merged Henrik Gramner requested to merge gramner/dav1d:avx512_filmgrain_16bpc into master 2 years ago

All threads resolved!

fgy_32x32xn_16bpc_avx2:                 1217.0
fgy_32x32xn_16bpc_avx512icl:            1175.0

fguv_32x32xn_16bpc_420_csfl0_avx2:       714.9
fguv_32x32xn_16bpc_420_csfl0_avx512icl:  498.1
fguv_32x32xn_16bpc_420_csfl1_avx2:       420.9
fguv_32x32xn_16bpc_420_csfl1_avx512icl:  413.0

fguv_32x32xn_16bpc_422_csfl0_avx2:      1400.8
fguv_32x32xn_16bpc_422_csfl0_avx512icl:  967.5
fguv_32x32xn_16bpc_422_csfl1_avx2:       821.8
fguv_32x32xn_16bpc_422_csfl1_avx512icl:  818.0

fguv_32x32xn_16bpc_444_csfl0_avx2:      2691.0
fguv_32x32xn_16bpc_444_csfl0_avx512icl: 1530.8
fguv_32x32xn_16bpc_444_csfl1_avx2:      1378.8
fguv_32x32xn_16bpc_444_csfl1_avx512icl: 1193.8

Helps the most in the csfl0 chroma functions due to those being the most computationally expensive relative to the number of memory loads.

The other ones are bottlenecked by the number of memory loads that can be executed per cycle, at least on RKL which has two load ports. Some newer µarchs has three load ports so it might help more on such systems. The number of retired instructions is much lower compared to AVX2 in any case though, which should free up more resources for the other SMT-thread in multi-threaded scenarios.

Activity

Henrik Gramner added performance x86 labels 2 years ago

added performance x86 labels
Henrik Gramner requested review from @psilokos 2 years ago

requested review from @psilokos
Ronald S. Bultje mentioned in issue #316 2 years ago

mentioned in issue #316
Henrik Gramner added 8 commits 2 years ago
added 8 commits

18d80e98...28a9c46e - 6 commits from branch videolan:master

ca6fe6a6 - x86: Add high bit-depth film grain AVX-512 (Ice Lake) asm

0b0daff8 - x86: Reduce code size in 8-bit film grain AVX-512 asm

Compare with previous version
Ronald S. Bultje @rbultje · 2 years ago

Developer

Resolved 2 years ago by Henrik Gramner

I really like the code sharing between the different overlap variants, very nice.

Last reply by Henrik Gramner 2 years ago
Ronald S. Bultje approved this merge request 2 years ago

approved this merge request
Henrik Gramner resolved all threads 2 years ago

resolved all threads
Henrik Gramner added 2 commits 2 years ago
added 2 commits

949b8902 - x86: Add high bit-depth film grain AVX-512 (Ice Lake) asm

b1a5189c - x86: Reduce code size in 8-bit film grain AVX-512 asm

Compare with previous version
Henrik Gramner merged 2 years ago

merged
Henrik Gramner changed milestone to %1.0.0 2 years ago

changed milestone to %1.0.0

Please register or sign in to reply

VideoLAN code repository instance