Draft: aarch64: Test implementing sgr_x_by_x[] with fdiv (!1756) · Merge requests · VideoLAN / dav1d

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-sgr-div into master Nov 13, 2024

Test implementation done in sgr_box5_vert_neon; it may be possible to tweak things a little bit further (we use 32 bit vector elements throughout; we could narrow things down a bit first, like was done before, but we still need things in 32 bit quantities for the float steps). Overall, this doesn't seem to be beneficial compared to the current implementation that we have.

Before:           Cortex A53       A55       A72       A73       A76  Apple M3
sgr_5x5_8bpc_neon:  258319.2  254398.7  195143.7  199321.0  117959.0  250.5
After:
sgr_5x5_8bpc_neon:  286970.0  275679.4  214980.5  224968.7  129278.1  266.8

Edited Nov 13, 2024 by Martin Storsjö

Draft: aarch64: Test implementing sgr_x_by_x[] with fdiv

Merge request reports