Skip to content

arm64: looprestoration: Apply simplifications to align with C code

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-sgr-opt into master

This applies the same simplifications that were done for the C code and the x86 assembly in 4613d3a5, to the arm64 implementation.

This gives a minor speedup of around a couple percent.

Before:            Cortex A53        A55        A72        A73       A76  Apple
M3
sgr_3x3_8bpc_neon:   368583.2   363654.2   279958.1   272065.1  169353.3  354.6
sgr_5x5_8bpc_neon:   258570.7   255018.5   200410.6   199478.3  117968.3  260.9
sgr_mix_8bpc_neon:   603698.1   577383.3   482468.3   436540.4  256632.9  541.8
After:
sgr_3x3_8bpc_neon:   367873.2   357884.1   275462.4   268363.9  165909.8  346.0
sgr_5x5_8bpc_neon:   254988.4   248184.2   190875.1   196939.1  120517.2  252.1
sgr_mix_8bpc_neon:   589204.7   563565.8   414025.6   427702.2  251651.2  533.4

Merge request reports

Loading