Skip to content

arm32: filmgrain: Add NEON implementation of gen_grain for 16 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-filmgrain-16bpc into master

Relative speedup over C code:

                              Cortex A7     A8     A9    A53    A72    A73
gen_grain_uv_ar0_16bpc_420_neon:   5.05   6.71   5.42   4.95   6.45   9.59
gen_grain_uv_ar0_16bpc_422_neon:   5.54   7.18   6.29   5.45   6.55   8.80
gen_grain_uv_ar0_16bpc_444_neon:   6.64   8.07   6.70   6.89   7.16   9.98
gen_grain_uv_ar1_16bpc_420_neon:   3.22   2.16   2.58   3.51   3.16   4.68
gen_grain_uv_ar1_16bpc_422_neon:   3.24   2.26   2.73   3.83   3.36   4.65
gen_grain_uv_ar1_16bpc_444_neon:   3.48   2.41   2.85   4.32   3.69   4.90
gen_grain_uv_ar2_16bpc_420_neon:   3.29   2.90   2.92   4.14   3.48   4.59
gen_grain_uv_ar2_16bpc_422_neon:   3.35   3.01   3.13   4.50   3.61   4.50
gen_grain_uv_ar2_16bpc_444_neon:   3.66   3.55   3.32   5.15   3.87   4.93
gen_grain_uv_ar3_16bpc_420_neon:   3.39   3.79   3.60   4.67   4.04   4.70
gen_grain_uv_ar3_16bpc_422_neon:   3.39   4.04   3.96   4.93   4.16   4.65
gen_grain_uv_ar3_16bpc_444_neon:   3.79   4.47   4.36   5.54   4.59   5.07
gen_grain_y_ar0_16bpc_neon:        5.05   5.26   6.97   5.47   5.95   8.59
gen_grain_y_ar1_16bpc_neon:        2.35   1.72   2.07   3.53   3.16   3.47
gen_grain_y_ar2_16bpc_neon:        3.02   2.70   2.88   4.19   3.57   4.03
gen_grain_y_ar3_16bpc_neon:        3.49   3.18   3.69   5.01   3.99   4.50

Merge request reports