Skip to content

arm32: mc: NEON implementation of put/prep 8tap/bilin for 16 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-8tap-16bpc into master

Examples of checkasm benchmarks:

                                  Cortex A7      A8      A9     A53     A72     A73
mc_8tap_regular_w8_0_16bpc_neon:      158.7   106.2   167.0   127.9    55.0    77.2
mc_8tap_regular_w8_h_16bpc_neon:     1000.8   557.5   749.2   609.2   401.4   485.4
mc_8tap_regular_w8_hv_16bpc_neon:    2278.9  1255.4  1352.5  1277.2   867.8   915.9
mc_8tap_regular_w8_v_16bpc_neon:     1060.0   393.6   485.5   448.3   298.0   298.2
mc_bilinear_w8_0_16bpc_neon:          159.7    96.6   161.1   123.7    55.4    74.7
mc_bilinear_w8_h_16bpc_neon:          342.3   250.8   352.9   239.0   158.4   165.1
mc_bilinear_w8_hv_16bpc_neon:         587.7   373.8   469.0   339.8   244.4   247.5
mc_bilinear_w8_v_16bpc_neon:          285.8   189.3   284.9   180.4   103.4   100.9
mct_8tap_regular_w8_0_16bpc_neon:     233.0   136.6   229.3   169.3    86.2    98.3
mct_8tap_regular_w8_h_16bpc_neon:    1106.8   588.3   817.9   654.1   406.4   489.8
mct_8tap_regular_w8_hv_16bpc_neon:   2473.3  1326.3  1428.2  1373.7   903.3   951.1
mct_8tap_regular_w8_v_16bpc_neon:    1266.0   474.1   581.3   505.9   382.0   373.4
mct_bilinear_w8_0_16bpc_neon:         232.9   126.2   225.0   166.3    86.2    91.7
mct_bilinear_w8_h_16bpc_neon:         380.6   270.6   386.0   259.7   154.1   151.9
mct_bilinear_w8_hv_16bpc_neon:        631.4   409.2   509.4   372.1   243.1   244.1
mct_bilinear_w8_v_16bpc_neon:         349.5   233.5   347.9   212.4   138.7   138.4

For comparison, the corresponding numbers for the existing arm64 implementation:

                                                         Cortex A53     A72     A73
mc_8tap_regular_w8_0_16bpc_neon:                               94.1    48.9    62.3
mc_8tap_regular_w8_h_16bpc_neon:                              570.4   388.1   467.3
mc_8tap_regular_w8_hv_16bpc_neon:                            1035.8   775.0   891.2
mc_8tap_regular_w8_v_16bpc_neon:                              399.8   284.5   278.2
mc_bilinear_w8_0_16bpc_neon:                                   90.0    44.3    57.4
mc_bilinear_w8_h_16bpc_neon:                                  191.7   158.7   156.4
mc_bilinear_w8_hv_16bpc_neon:                                 295.6   235.0   244.9
mc_bilinear_w8_v_16bpc_neon:                                  147.2    99.0    88.8
mct_8tap_regular_w8_0_16bpc_neon:                             139.4    78.4    84.9
mct_8tap_regular_w8_h_16bpc_neon:                             612.3   395.9   478.6
mct_8tap_regular_w8_hv_16bpc_neon:                           1113.0   804.3   963.5
mct_8tap_regular_w8_v_16bpc_neon:                             462.1   370.8   353.3
mct_bilinear_w8_0_16bpc_neon:                                 135.6    77.0    80.5
mct_bilinear_w8_h_16bpc_neon:                                 210.8   159.2   141.7
mct_bilinear_w8_hv_16bpc_neon:                                325.7   238.4   227.3
mct_bilinear_w8_v_16bpc_neon:                                 180.7   136.7   129.5

Merge request reports