Skip to content

x86: Improve film grain AVX2 asm

Henrik Gramner requested to merge gramner/dav1d:x86_fg_avx2 into master

On average a couple of percent faster. In the end it didn't really help that much on high bit-depth though.

Checkasm numbers on Skylake:

                                    old      new
fguv_32x32xn_8bpc_420_csfl0_avx2:  610.9    591.0
fguv_32x32xn_8bpc_420_csfl1_avx2:  530.4    505.2
fguv_32x32xn_8bpc_422_csfl0_avx2:  613.5    591.5
fguv_32x32xn_8bpc_422_csfl1_avx2:  524.7    502.2
fguv_32x32xn_8bpc_444_csfl0_avx2:  452.8    436.7
fguv_32x32xn_8bpc_444_csfl1_avx2:  412.1    379.5
fgy_32x32xn_8bpc_avx2:            1501.7   1424.8

fguv_32x32xn_16bpc_420_csfl0_avx2: 626.7    620.8
fguv_32x32xn_16bpc_420_csfl1_avx2: 547.1    540.8
fguv_32x32xn_16bpc_422_csfl0_avx2: 642.7    630.6
fguv_32x32xn_16bpc_422_csfl1_avx2: 554.1    545.8
fguv_32x32xn_16bpc_444_csfl0_avx2: 509.2    508.3
fguv_32x32xn_16bpc_444_csfl1_avx2: 441.8    437.0
fgy_32x32xn_16bpc_avx2:            1593.0  1552.4

Merge request reports