Skip to content

arm32: itx: Add a NEON implementation of itx for 10 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-itx16 into master

Relative speedup vs C for a few functions:

                                      Cortex A7     A8     A9    A53    A72    A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon:     2.79   5.08   2.99   2.83   3.49   4.44
inv_txfm_add_4x4_dct_dct_1_10bpc_neon:     5.74   9.43   5.72   7.19   6.73   6.92
inv_txfm_add_8x8_dct_dct_0_10bpc_neon:     3.13   3.68   2.79   3.25   3.21   3.33
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:     7.09  10.41   7.00  10.55   8.06   9.02
inv_txfm_add_16x16_dct_dct_0_10bpc_neon:   5.01   6.76   4.56   5.58   5.52   2.97
inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   8.62  12.48  13.71  11.75  15.94  16.86
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   6.05   8.81   6.13   8.18   7.90  12.27
inv_txfm_add_32x32_dct_dct_0_10bpc_neon:   2.90   3.90   2.16   2.63   3.56   2.74
inv_txfm_add_32x32_dct_dct_1_10bpc_neon:  13.57  17.00  13.30  13.76  14.54  17.08
inv_txfm_add_32x32_dct_dct_2_10bpc_neon:   8.29  10.54   8.05  10.68  12.75  14.36
inv_txfm_add_32x32_dct_dct_3_10bpc_neon:   6.78   8.40   7.60  10.12   8.97  12.96
inv_txfm_add_32x32_dct_dct_4_10bpc_neon:   6.48   6.74   6.00   7.38   7.67   9.70
inv_txfm_add_64x64_dct_dct_0_10bpc_neon:   3.02   4.59   2.21   2.65   3.36   2.47
inv_txfm_add_64x64_dct_dct_1_10bpc_neon:   9.86  11.30   9.14  13.80  12.46  14.83
inv_txfm_add_64x64_dct_dct_2_10bpc_neon:   8.65   9.76   7.60  12.05  10.55  12.62
inv_txfm_add_64x64_dct_dct_3_10bpc_neon:   7.78   8.65   6.98  10.63   9.15  11.73
inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   6.61   7.01   5.52   8.41   8.33   9.69

Merge request reports

Loading