• Martin Storsjö's avatar
    arm64: mc: Implement 8tap and bilin functions · 4aa0363a
    Martin Storsjö authored
    These functions have been tuned against Cortex A53 and Snapdragon
    835. The bilin functions have mainly been written with code size
    in mind, as they aren't used much in practice.
    
    Relative speedups for the actual filtering fuctions (that don't
    just do a plain copy) are around 4-15x, some over 20x. This is
    in comparison with GCC 5.4 with autovectorization disabled; the
    actual real-world speedup against autovectorized C code is around
    4-10x.
    
    Relative speedups measured with checkasm:
                                    Cortex A53   Snapdragon 835
    mc_8tap_regular_w2_0_8bpc_neon:       6.96   5.28
    mc_8tap_regular_w2_h_8bpc_neon:       5.16   4.35
    mc_8tap_regular_w2_hv_8bpc_neon:      5.37   4.98
    mc_8tap_regular_w2_v_8bpc_neon:       6.35   4.85
    mc_8tap_regular_w4_0_8bpc_neon:       6.78   5.73
    mc_8tap_regular_w4_h_8bpc_neon:       8.40   6.60
    mc_8tap_regular_w4_hv_8bpc_neon:      7.23   7.10
    mc_8tap_regular_w4_v_8bpc_neon:       9.06   7.76
    mc_8tap_regular_w8_0_8bpc_neon:       6.96   5.55
    mc_8tap_regular_w8_h_8bpc_neon:      10.36   6.88
    mc_8tap_regular_w8_hv_8bpc_neon:      9.49   6.86
    mc_8tap_regular_w8_v_8bpc_neon:      12.06   9.61
    mc_8tap_regular_w16_0_8bpc_neon:      6.68   4.51
    mc_8tap_regular_w16_h_8bpc_neon:     12.30   7.77
    mc_8tap_regular_w16_hv_8bpc_neon:     9.50   6.68
    mc_8tap_regular_w16_v_8bpc_neon:     12.93   9.68
    mc_8tap_regular_w32_0_8bpc_neon:      3.91   2.93
    mc_8tap_regular_w32_h_8bpc_neon:     13.06   7.89
    mc_8tap_regular_w32_hv_8bpc_neon:     9.37   6.70
    mc_8tap_regular_w32_v_8bpc_neon:     12.88   9.49
    mc_8tap_regular_w64_0_8bpc_neon:      2.89   1.68
    mc_8tap_regular_w64_h_8bpc_neon:     13.48   8.00
    mc_8tap_regular_w64_hv_8bpc_neon:     9.23   6.53
    mc_8tap_regular_w64_v_8bpc_neon:     13.11   9.68
    mc_8tap_regular_w128_0_8bpc_neon:     1.89   1.24
    mc_8tap_regular_w128_h_8bpc_neon:    13.58   7.98
    mc_8tap_regular_w128_hv_8bpc_neon:    8.86   6.53
    mc_8tap_regular_w128_v_8bpc_neon:    12.46   9.63
    mc_bilinear_w2_0_8bpc_neon:           7.02   5.40
    mc_bilinear_w2_h_8bpc_neon:           3.65   3.14
    mc_bilinear_w2_hv_8bpc_neon:          4.36   4.84
    mc_bilinear_w2_v_8bpc_neon:           5.22   4.28
    mc_bilinear_w4_0_8bpc_neon:           6.87   5.99
    mc_bilinear_w4_h_8bpc_neon:           6.50   8.61
    mc_bilinear_w4_hv_8bpc_neon:          7.70   7.99
    mc_bilinear_w4_v_8bpc_neon:           7.04   9.10
    mc_bilinear_w8_0_8bpc_neon:           7.03   5.70
    mc_bilinear_w8_h_8bpc_neon:          11.30  15.14
    mc_bilinear_w8_hv_8bpc_neon:         15.74  13.50
    mc_bilinear_w8_v_8bpc_neon:          13.40  17.54
    mc_bilinear_w16_0_8bpc_neon:          6.75   4.48
    mc_bilinear_w16_h_8bpc_neon:         17.02  13.95
    mc_bilinear_w16_hv_8bpc_neon:        17.37  13.78
    mc_bilinear_w16_v_8bpc_neon:         23.69  22.98
    mc_bilinear_w32_0_8bpc_neon:          3.88   3.18
    mc_bilinear_w32_h_8bpc_neon:         18.80  14.97
    mc_bilinear_w32_hv_8bpc_neon:        17.74  14.02
    mc_bilinear_w32_v_8bpc_neon:         24.46  23.04
    mc_bilinear_w64_0_8bpc_neon:          2.87   1.66
    mc_bilinear_w64_h_8bpc_neon:         19.54  16.02
    mc_bilinear_w64_hv_8bpc_neon:        17.80  14.32
    mc_bilinear_w64_v_8bpc_neon:         24.79  23.63
    mc_bilinear_w128_0_8bpc_neon:         2.13   1.23
    mc_bilinear_w128_h_8bpc_neon:        19.89  16.24
    mc_bilinear_w128_hv_8bpc_neon:       17.55  14.15
    mc_bilinear_w128_v_8bpc_neon:        24.45  23.54
    mct_8tap_regular_w4_0_8bpc_neon:      5.56   5.51
    mct_8tap_regular_w4_h_8bpc_neon:      7.48   5.80
    mct_8tap_regular_w4_hv_8bpc_neon:     7.27   7.09
    mct_8tap_regular_w4_v_8bpc_neon:      7.80   6.84
    mct_8tap_regular_w8_0_8bpc_neon:      9.54   9.25
    mct_8tap_regular_w8_h_8bpc_neon:      9.08   6.55
    mct_8tap_regular_w8_hv_8bpc_neon:     9.16   6.30
    mct_8tap_regular_w8_v_8bpc_neon:     10.79   8.66
    mct_8tap_regular_w16_0_8bpc_neon:    15.35  10.50
    mct_8tap_regular_w16_h_8bpc_neon:    10.18   6.76
    mct_8tap_regular_w16_hv_8bpc_neon:    9.17   6.11
    mct_8tap_regular_w16_v_8bpc_neon:    11.52   8.72
    mct_8tap_regular_w32_0_8bpc_neon:    15.82  10.09
    mct_8tap_regular_w32_h_8bpc_neon:    10.75   6.85
    mct_8tap_regular_w32_hv_8bpc_neon:    9.00   6.22
    mct_8tap_regular_w32_v_8bpc_neon:    11.58   8.67
    mct_8tap_regular_w64_0_8bpc_neon:    15.28   9.68
    mct_8tap_regular_w64_h_8bpc_neon:    10.93   6.96
    mct_8tap_regular_w64_hv_8bpc_neon:    8.81   6.53
    mct_8tap_regular_w64_v_8bpc_neon:    11.42   8.73
    mct_8tap_regular_w128_0_8bpc_neon:   14.41   7.67
    mct_8tap_regular_w128_h_8bpc_neon:   10.92   6.96
    mct_8tap_regular_w128_hv_8bpc_neon:   8.56   6.51
    mct_8tap_regular_w128_v_8bpc_neon:   11.16   8.70
    mct_bilinear_w4_0_8bpc_neon:          5.66   5.77
    mct_bilinear_w4_h_8bpc_neon:          5.16   6.40
    mct_bilinear_w4_hv_8bpc_neon:         6.86   6.82
    mct_bilinear_w4_v_8bpc_neon:          4.75   6.09
    mct_bilinear_w8_0_8bpc_neon:          9.78  10.00
    mct_bilinear_w8_h_8bpc_neon:          8.98  11.37
    mct_bilinear_w8_hv_8bpc_neon:        14.42  10.83
    mct_bilinear_w8_v_8bpc_neon:          9.12  11.62
    mct_bilinear_w16_0_8bpc_neon:        15.59  10.76
    mct_bilinear_w16_h_8bpc_neon:        11.98   8.77
    mct_bilinear_w16_hv_8bpc_neon:       15.83  10.73
    mct_bilinear_w16_v_8bpc_neon:        14.70  14.60
    mct_bilinear_w32_0_8bpc_neon:        15.89  10.32
    mct_bilinear_w32_h_8bpc_neon:        13.47   9.07
    mct_bilinear_w32_hv_8bpc_neon:       16.01  10.95
    mct_bilinear_w32_v_8bpc_neon:        14.85  14.16
    mct_bilinear_w64_0_8bpc_neon:        15.36  10.51
    mct_bilinear_w64_h_8bpc_neon:        14.00   9.61
    mct_bilinear_w64_hv_8bpc_neon:       15.82  11.27
    mct_bilinear_w64_v_8bpc_neon:        14.61  14.76
    mct_bilinear_w128_0_8bpc_neon:       14.41   7.92
    mct_bilinear_w128_h_8bpc_neon:       13.31   9.58
    mct_bilinear_w128_hv_8bpc_neon:      14.07  11.18
    mct_bilinear_w128_v_8bpc_neon:       11.57  14.42
    4aa0363a
Name
Last commit
Last update
doc Loading commit data...
include Loading commit data...
src Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitignore Loading commit data...
.gitlab-ci.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING Loading commit data...
NEWS Loading commit data...
README.md Loading commit data...
THANKS.md Loading commit data...
meson.build Loading commit data...
meson_options.txt Loading commit data...