Skip to content

arm32: filmgrain: Add NEON implementation of fgy and fguv for 16 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-filmgrain-16bpc into master

Relative speedup over C code:

                                Cortex A7     A8     A9    A53    A72    A73
fguv_32x32xn_16bpc_420_csfl0_neon:   3.47   1.72   2.99   4.18   2.68   6.19
fguv_32x32xn_16bpc_420_csfl1_neon:   3.24   1.36   2.58   3.78   2.73   5.27
fguv_32x32xn_16bpc_422_csfl0_neon:   3.57   2.07   3.05   4.32   2.74   6.20
fguv_32x32xn_16bpc_422_csfl1_neon:   3.33   1.44   2.62   3.89   2.71   5.28
fguv_32x32xn_16bpc_444_csfl0_neon:   3.48   1.69   3.06   4.48   2.97   6.69
fguv_32x32xn_16bpc_444_csfl1_neon:   3.06   1.16   2.36   3.85   2.75   5.19
fgy_32x32xn_16bpc_neon:              2.89   1.05   2.29   3.49   2.49   3.15

Absolute numbers:

                                  Cortex A7       A8       A9      A53      A72      A73
fguv_32x32xn_16bpc_420_csfl0_neon:   6237.3  12701.0   6687.1   4525.8   3220.8   3195.4
fguv_32x32xn_16bpc_420_csfl1_neon:   5143.2  11684.8   5926.4   3857.2   2604.7   2556.5
fguv_32x32xn_16bpc_422_csfl0_neon:   6347.3  11005.2   6797.5   4582.4   3300.4   3250.5
fguv_32x32xn_16bpc_422_csfl1_neon:   5275.2  11594.8   5992.6   3931.1   2668.7   2607.3
fguv_32x32xn_16bpc_444_csfl0_neon:   5181.6  11310.0   5575.4   3629.7   2383.8   2530.0
fguv_32x32xn_16bpc_444_csfl1_neon:   4081.9  10958.8   4868.5   2962.9   1870.3   2034.2
fgy_32x32xn_16bpc_neon:             15439.1  43129.0  19406.6  11542.3   7463.9   7827.8

Corresponding numbers for arm64:

                                                            Cortex A53      A72      A73
fguv_32x32xn_16bpc_420_csfl0_neon:                              4019.2   3247.4   3259.6
fguv_32x32xn_16bpc_420_csfl1_neon:                              3460.1   2628.7   2640.8
fguv_32x32xn_16bpc_422_csfl0_neon:                              4034.4   3329.9   3287.5
fguv_32x32xn_16bpc_422_csfl1_neon:                              3468.3   2749.3   2686.6
fguv_32x32xn_16bpc_444_csfl0_neon:                              3117.7   2447.4   2539.8
fguv_32x32xn_16bpc_444_csfl1_neon:                              2641.2   1977.2   2132.8
fgy_32x32xn_16bpc_neon:                                         9873.5   7605.7   7656.2

Merge request reports