looprestoration: Use only 6 row buffer for wiener, like NEON/x86
This uses a separate function for combined horizontal and vertical filtering, without needing to write the intermediate results back to memory inbetween. This mostly serves as an example for how to adjust the logic for that case; unless we actually merge the horizontal and vertical filtering within the _hv function, we still need space for a 7th row on the stack within that function (which means we use just as much stack as before), but we also need one extra memcpy to write it into the right destination. In a build where the compiler is allowed to vectorize and inline the wiener functions into each other, this change actually reduces the final binary size by 4 KB, if the C version of the wiener filter is retained. This change makes the vectorized C code as fast as it was before with Clang 18; on Xcode Clang 16, it's 2x slower than it was before. Unfortunately, with GCC, this change makes the code a bit slower again.
parent
a149f5c3
No related branches found
No related tags found
Loading
-
mentioned in merge request !1776 (merged)
-
mentioned in commit wtc/dav1d@a0f1761b
-
mentioned in merge request !1780 (merged)
-
mentioned in commit 2ba57aa5
Please register or sign in to comment