arm32: looprestoration: Apply simplifications to align with C code
This applies the same simplifications that were done for the C code and the x86 assembly in 4613d3a5, and the arm64 assembly in ce80e6da, to the arm32 implementation. This gives a minor speedup of around a couple percent. Before: Cortex A7 A8 A53 A72 A73 sgr_3x3_8bpc_neon: 926600.0 753468.3 553704.1 399379.1 369674.4 sgr_5x5_8bpc_neon: 621722.9 540412.7 357275.9 274474.3 254996.0 sgr_mix_8bpc_neon: 1529715.1 1171282.5 894982.9 659996.6 610407.2 After: sgr_3x3_8bpc_neon: 899020.3 697278.6 541569.9 382824.3 353891.8 sgr_5x5_8bpc_neon: 602183.2 498322.9 348974.5 264833.9 243837.7 sgr_mix_8bpc_neon: 1497870.8 1182121.3 880470.9 635939.3 590909.3
parent
c43debf1
No related branches found
No related tags found
Loading
Please register or sign in to comment