Skip to content

arm64: looprestoration: Add a NEON implementation of SGR

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-sgr into master

Relative speedup vs (autovectorized) C code:

                      Cortex A53    A72    A73
selfguided_3x3_8bpc_neon:   2.91   2.12   2.68
selfguided_5x5_8bpc_neon:   3.18   2.65   3.39
selfguided_mix_8bpc_neon:   3.04   2.29   2.98

The relative speedup vs non-vectorized C code is around 2.6-4.6x.

Merge request reports