arm64: looprestoration: Rewrite the wiener functions
Make them operate in a more cache friendly manner, interleaving horizontal and vertical filtering (reducing the amount of stack used from 51 KB to 4 KB), similar to what was done for x86 in 78d27b7d. This also adds separate 5tap versions of the filters and unrolls the vertical filter a bit more (which maybe could have been done without doing the rewrite). This does, however, increase the compiled code size by around 3.5 KB. Before: Cortex A53 A72 A73 wiener_5tap_8bpc_neon: 136855.6 91446.2 87363.6 wiener_7tap_8bpc_neon: 136861.6 91454.9 87374.5 wiener_5tap_10bpc_neon: 167685.3 114720.3 116522.1 wiener_5tap_12bpc_neon: 167677.5 114724.7 116511.9 wiener_7tap_10bpc_neon: 167681.6 114738.5 116567.0 wiener_7tap_12bpc_neon: 167673.8 114720.8 116515.4 After: wiener_5tap_8bpc_neon: 87102.1 60460.6 66803.8 wiener_7tap_8bpc_neon: 110831.7 78489.0 82015.9 wiener_5tap_10bpc_neon: 109999.2 90259.0 89238.0 wiener_5tap_12bpc_neon: 109978.3 90255.7 89220.7 wiener_7tap_10bpc_neon: 137877.6 107578.5 103435.6 wiener_7tap_12bpc_neon: 137868.8 107568.9 103390.4
Loading
-
mentioned in merge request !1776 (merged)
-
mentioned in commit 2ba57aa5
Please register or sign in to comment