Commit 96b24495 authored by Martin Storsjö's avatar Martin Storsjö

arm: looprestoration: NEON optimized wiener filter

The relative speedup compared to C code is around 4-8x:

                    Cortex A7     A8     A9    A53    A72    A73
wiener_luma_8bpc_neon:   4.00   7.54   4.74   6.84   4.91   8.01
parent 95cd440a
Pipeline #4291 passed with stages
in 6 minutes and 29 seconds