Commit b4b225d8 authored by Martin Storsjö's avatar Martin Storsjö
Browse files

arm32: itx: Add a NEON implementation of itx for 10 bpc

Relative speedup vs C for a few functions:

                                      Cortex A7     A8     A9    A53    A72    A73
inv_txfm_add_4x4_dct_dct_0_10bpc_neon:     2.79   5.08   2.99   2.83   3.49   4.44
inv_txfm_add_4x4_dct_dct_1_10bpc_neon:     5.74   9.43   5.72   7.19   6.73   6.92
inv_txfm_add_8x8_dct_dct_0_10bpc_neon:     3.13   3.68   2.79   3.25   3.21   3.33
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:     7.09  10.41   7.00  10.55   8.06   9.02
inv_txfm_add_16x16_dct_dct_0_10bpc_neon:   5.01   6.76   4.56   5.58   5.52   2.97
inv_txfm_add_16x16_dct_dct_1_10bpc_neon:   8.62  12.48  13.71  11.75  15.94  16.86
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:   6.05   8.81   6.13   8.18   7.90  12.27
inv_txfm_add_32x32_dct_dct_0_10bpc_neon:   2.90   3.90   2.16   2.63   3.56   2.74
inv_txfm_add_32x32_dct_dct_1_10bpc_neon:  13.57  17.00  13.30  13.76  14.54  17.08
inv_txfm_add_32x32_dct_dct_2_10bpc_neon:   8.29  10.54   8.05  10.68  12.75  14.36
inv_txf...
parent 7f5b334b
Pipeline #71229 passed with stages
in 5 minutes and 7 seconds