Skip to content

arm64: filmgrain: Add a NEON implementation of fgy_32x32xn for 16 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-fgy16 into master

Relative speedup over C code:

                    Cortex A53    A72    A73   Apple M1
fgy_32x32xn_16bpc_neon:   3.87   2.28   2.78   3.45

Merge request reports