Skip to content

arm32: looprestoration: NEON implementation of SGR for 10 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-sgr-10bpc into master

This also contains, among the usual minor trivial fixups, a fairly notable speedup (overall 2-8%) for the existing arm64 looprestoration 10 bpc code.

Checkasm numbers:

                            Cortex A7         A8       A53       A72       A73 
selfguided_3x3_10bpc_neon:   919127.6   717942.8  565717.8  404748.0  372179.8
selfguided_5x5_10bpc_neon:   640310.8   511873.4  370653.3  273593.7  256403.2
selfguided_mix_10bpc_neon:  1533887.0  1252389.5  922111.1  659033.4  613410.6

Corresponding numbers for arm64, for comparison:

                                                Cortex A53       A72       A73
selfguided_3x3_10bpc_neon:                        500706.0  367199.2  345261.2
selfguided_5x5_10bpc_neon:                        361403.3  270550.0  249955.3
selfguided_mix_10bpc_neon:                        846172.4  623590.3  578404.8

Merge request reports