Skip to content
Snippets Groups Projects
Commit 2e73051c authored by Martin Storsjö's avatar Martin Storsjö Committed by Jean-Baptiste Kempf
Browse files

arm64: looprestoration: Rewrite the wiener functions

Make them operate in a more cache friendly manner, interleaving
horizontal and vertical filtering (reducing the amount of stack
used from 51 KB to 4 KB), similar to what was done for x86 in
78d27b7d.

This also adds separate 5tap versions of the filters and unrolls
the vertical filter a bit more (which maybe could have been done
without doing the rewrite).

This does, however, increase the compiled code size by around
3.5 KB.

Before:                Cortex A53       A72       A73
wiener_5tap_8bpc_neon:   136855.6   91446.2   87363.6
wiener_7tap_8bpc_neon:   136861.6   91454.9   87374.5
wiener_5tap_10bpc_neon:  167685.3  114720.3  116522.1
wiener_5tap_12bpc_neon:  167677.5  114724.7  116511.9
wiener_7tap_10bpc_neon:  167681.6  114738.5  116567.0
wiener_7tap_12bpc_neon:  167673.8  114720.8  116515.4
After:
wiener_5tap_8bpc_neon:    87102.1   60460.6   66803.8
wiener_7tap_8bpc_neon:   110831.7   78489.0   82015.9
wiener_5tap_10bpc_neon:  109999.2   90259.0   89238.0
wiener_5tap_12bpc_neon:  109978.3   90255.7   89220.7
wiener_7tap_10bpc_neon:  137877.6  107578.5  103435.6
wiener_7tap_12bpc_neon:  137868.8  107568.9  103390.4
parent 4e869495
Loading
Pipeline #65993 passed with stages
in 5 minutes and 14 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment