Skip to content

arm64: looprestoration: NEON optimized wiener filter

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-wiener into master

The relative speedup compared to C code is around 4.2 for a Cortex A53 and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code), 6-7x compared to GCC's output without autovectorization, and ~8x compared to clang's output (which doesn't seem to try to vectorize this function).

Merge request reports