Skip to content
  • Martin Storsjö's avatar
    arm64: looprestoration: Rewrite the wiener functions · 2e73051c
    Martin Storsjö authored and Jean-Baptiste Kempf's avatar Jean-Baptiste Kempf committed
    Make them operate in a more cache friendly manner, interleaving
    horizontal and vertical filtering (reducing the amount of stack
    used from 51 KB to 4 KB), similar to what was done for x86 in
    78d27b7d.
    
    This also adds separate 5tap versions of the filters and unrolls
    the vertical filter a bit more (which maybe could have been done
    without doing the rewrite).
    
    This does, however, increase the compiled code size by around
    3.5 KB.
    
    Before:                Cortex A53       A72       A73
    wiener_5tap_8bpc_neon:   136855.6   91446.2   87363.6
    wiener_7tap_8bpc_neon:   136861.6   91454.9   87374.5
    wiener_5tap_10bpc_neon:  167685.3  114720.3  116522.1
    wiener_5tap_12bpc_neon:  167677.5  114724.7  116511.9
    wiener_7tap_10bpc_neon:  167681.6  114738.5  116567.0
    wiener_7tap_12bpc_neon:  167673.8  114720.8  116515.4
    After:
    wiener_5tap_8bpc_neon:    87102.1   60460.6   66803.8
    wiener_7tap_8bpc_neon:   110831.7   78489.0   82015.9
    wiener_5tap_10bpc_neon:  109999.2   90259.0   89238.0
    wiener_5tap_12bpc_neon:  109978.3   90255.7   89220.7
    wiener_7tap_10bpc_neon:  137877.6  107578.5  103435.6
    wiener_7tap_12bpc_neon:  137868.8  107568.9  103390.4
    2e73051c