Skip to content

looprestoration: Rewrite the C version of the wiener filter

Martin Storsjö requested to merge mstorsjo/dav1d:wiener-c-rewrite into master

See the individual commit messages.

Unfortunately, this change isn't clear cut wrt performance; the previous version was quite straightforward for the compiler to vectorize, while this one apparently is harder.

Initially, I meant to make the individual row functions noinline (just like for SGR), but it turns out that Clang vectorizes it much much more poorly in that case. As these C functions are entirely omitted on architectuers where we have wiener asm (all x86, arm and aarch64), I guess the added code size from having the functions inlined might not matter, so it's better to make the code more performant (for the architectures that might need it).

In commit 1, I first do a very straightforward conversion to per-row functions just like SGR, but much simpler. In commit 2 I extend the horizontal filter implementation to hopefully make it easier for compilers to vectorize. In commit 3, I add a separate _hv function for combining the horizontal and vertical filters. I don't actually write code for such a merged filter though, but I thought it is useful to keep the C code for it, as it does affect the outer structure a little bit (and affects how the pointer window shifting works), so it also serves as example for how to write SIMD implementations of it.

The _hv function, with inlineing, gives a lot of performance back with Clang with vectorization.

Merge request reports

Loading