Skip to content

arm: Add NEON implementations of splat_mv

Martin Storsjö requested to merge mstorsjo/dav1d:arm-refmvs into master

Relative speedup over C code, for arm64:

               Cortex A53    A72    A73   Apple M1
splat_mv_w1_neon:    1.01   0.91   1.17   -
splat_mv_w2_neon:    1.65   2.01   1.45   -
splat_mv_w4_neon:    2.55   2.10   1.82   -
splat_mv_w8_neon:    3.43   2.09   2.57  12.00
splat_mv_w16_neon:   3.92   1.73   2.44   3.38
splat_mv_w32_neon:   4.01   1.60   2.28   2.89

(The resolution of the timer used on Apple M1 isn't enough to measure the small versions of this function.)

Relative speedup over C code, for arm32:

                Cortex A7     A8     A9    A53    A72    A73
splat_mv_w1_neon:    0.69   1.05   0.88   0.62   1.06   1.05
splat_mv_w2_neon:    0.93   2.02   1.95   0.92   2.63   1.41
splat_mv_w4_neon:    1.23   1.96   1.43   1.44   2.07   1.83
splat_mv_w8_neon:    1.70   2.46   1.10   2.76   2.11   2.54
splat_mv_w16_neon:   1.93   2.43   1.11   3.19   1.80   2.64
splat_mv_w32_neon:   1.65   2.26   1.18   3.53   1.77   2.66

@janne Do you have any other things you want to test tuning-wise for the smaller sizes (where the current implementation ends up a little slower than C code)?

Merge request reports