Skip to content

arm32: filmgrain: Add NEON implementations of fgy and fguv for 8 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-filmgrain-8bpc into master

This goes on top of !1217 (merged).

Relative speedup over C code:

                               Cortex A7     A8     A9    A53    A72    A73
fguv_32x32xn_8bpc_420_csfl0_neon:   4.20   2.19   3.48   4.93   3.60   5.93
fguv_32x32xn_8bpc_420_csfl1_neon:   3.92   1.52   2.84   4.34   3.82   5.93
fguv_32x32xn_8bpc_422_csfl0_neon:   4.27   2.13   3.58   5.02   4.04   5.95
fguv_32x32xn_8bpc_422_csfl1_neon:   3.99   1.56   2.91   4.43   3.89   6.00
fguv_32x32xn_8bpc_444_csfl0_neon:   4.48   2.08   3.89   5.66   4.07   6.51
fguv_32x32xn_8bpc_444_csfl1_neon:   4.45   1.41   2.99   5.28   3.63   6.09
fgy_32x32xn_8bpc_neon:              3.61   1.10   2.62   4.35   3.06   3.74

Absolute numbers:

                                 Cortex A7       A8       A9      A53      A72      A73
fguv_32x32xn_8bpc_420_csfl0_neon:   5318.8  11167.7   6024.6   3909.9   2945.2   2993.5
fguv_32x32xn_8bpc_420_csfl1_neon:   4351.0  10929.7   5269.5   3316.8   2166.5   2256.9
fguv_32x32xn_8bpc_422_csfl0_neon:   5387.9  11746.7   6080.0   3945.8   2988.1   3046.3
fguv_32x32xn_8bpc_422_csfl1_neon:   4396.0  11083.2   5300.8   3354.9   2216.4   2291.4
fguv_32x32xn_8bpc_444_csfl0_neon:   4347.9  10595.0   5134.4   3079.1   2277.7   2392.9
fguv_32x32xn_8bpc_444_csfl1_neon:   3295.0  10518.2   4442.6   2476.3   1716.3   1829.2
fgy_32x32xn_8bpc_neon:             12376.2  41046.9  17259.7   9153.1   6610.4   7005.3

Corresponding numbers for arm64:

                                                           Cortex A53      A72      A73
fguv_32x32xn_8bpc_420_csfl0_neon:                              3822.9   2920.0   2935.7
fguv_32x32xn_8bpc_420_csfl1_neon:                              3209.7   2231.7   2335.4
fguv_32x32xn_8bpc_422_csfl0_neon:                              3807.9   2886.5   2966.7
fguv_32x32xn_8bpc_422_csfl1_neon:                              3197.1   2187.9   2355.9
fguv_32x32xn_8bpc_444_csfl0_neon:                              2757.8   2227.4   2334.4
fguv_32x32xn_8bpc_444_csfl1_neon:                              2244.6   1719.1   1786.7
fgy_32x32xn_8bpc_neon:                                         8192.2   6563.3   6969.1

Merge request reports