Skip to content
Snippets Groups Projects
Commit 9da303e9 authored by Martin Storsjö's avatar Martin Storsjö
Browse files

looprestoration: Rewrite the C version of the wiener filter

This reduces the stack usage of these functions (the C version)
significantly.

These C versions aren't used on architectures that already have
wiener filters implemented in assembly, but they matter both if
running e.g. with assembly disabled (e.g. for sanitizer builds),
and matter as example for how to do a cache efficient SIMD
implementation.

This roughly matches how these functions are implemented in the
aarch64 assembly (although that assembly function uses a mainloop
function written in assembly, and custom calling conventions
between the functions).

With this in place, dav1d can run with around 76 KB of stack
with assembly disabled.

This increases the binary size by around 14 KB (in the case of
aarch64 with Xcode Clang 16), unless built with (the default)
-Dtrim_dsp=true. (By default, the C version of the wiener filter
gets skipped entirely.)

On 32 bit arm, the assembly wiener function implementation still
uses large buffers on the stack though, but due to other functions
using less stack there, dav1d can still run with 72 KB of stack
there.

Unfortunately, this change also makes the functions slower, depending
on how well the compiler was able to optimize the previous version.
On GCC (which didn't manage to vectorize the functions so well before),
it becomes 1.6x-2.0x slower, while it gets 2.5x-5x slower on Clang
(where it was very well vectorized before).

Most of this performance can be gained back with later changes on
top, though.
parent d242c47b
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment