Skip to content
Snippets Groups Projects
Commit 8291a66e authored by Martin Storsjö's avatar Martin Storsjö
Browse files

looprestoration: Use only 6 row buffer for wiener, like NEON/x86

This uses a separate function for combined horizontal and vertical
filtering, without needing to write the intermediate results
back to memory inbetween.

This mostly serves as an example for how to adjust the logic for
that case; unless we actually merge the horizontal and vertical
filtering within the _hv function, we still need space for a
7th row on the stack within that function (which means we use just
as much stack as before), but we also need one extra memcpy to
write it into the right destination.

In a build where the compiler is allowed to vectorize and inline
the wiener functions into each other, this change actually reduces
the final binary size by 4 KB, if the C version of the wiener filter
is retained.

This change makes the vectorized C code as fast as it was before
with Clang 18; on Xcode Clang 16, it's 2x slower than it was before.

Unfortunately, with GCC, this change makes the code a bit slower
again.
parent a149f5c3
No related branches found
No related tags found
1 merge request!1773looprestoration: Rewrite the C version of the wiener filter
Pipeline #546452 passed with stages
in 49 minutes and 9 seconds