Skip to content
Snippets Groups Projects
Martin Storsjö's avatar
Martin Storsjö authored
This uses a separate function for combined horizontal and vertical
filtering, without needing to write the intermediate results
back to memory inbetween.

This mostly serves as an example for how to adjust the logic for
that case; unless we actually merge the horizontal and vertical
filtering within the _hv function, we still need space for a
7th row on the stack within that function (which means we use just
as much stack as before), but we also need one extra memcpy to
write it into the right destination.

In a build where the compiler is allowed to vectorize and inline
the wiener functions into each other, this change actually reduces
the final binary size by 4 KB, if the C version of the wiener filter
is retained.

This change makes the vectorized C code as fast as it was before
with Clang 18; on Xcode Clang 16, it's 2x slower than it was before.

Unfortunately, with GCC, this change makes the code a bit slower
again.
8291a66e
Name Last commit Last update