Commit 204bf211 authored by Martin Storsjö's avatar Martin Storsjö Committed by Jean-Baptiste Kempf

arm64: looprestoration: Add a NEON implementation of SGR

Relative speedup vs (autovectorized) C code:
                      Cortex A53    A72    A73
selfguided_3x3_8bpc_neon:   2.91   2.12   2.68
selfguided_5x5_8bpc_neon:   3.18   2.65   3.39
selfguided_mix_8bpc_neon:   3.04   2.29   2.98

The relative speedup vs non-vectorized C code is around 2.6-4.6x.
parent 003fa104
Pipeline #6328 passed with stages
in 7 minutes and 18 seconds