• Martin Storsjö's avatar
    arm32: mc: Add NEON implementation of emu_edge for 16 bpc · 38df0efa
    Martin Storsjö authored
    Checkasm benchmarks:    Cortex  A7       A8      A53      A72     A73
    emu_edge_w4_16bpc_neon:      375.0    312.6    268.3    159.3   170.0
    emu_edge_w8_16bpc_neon:      619.3    425.5    435.5    249.5   291.1
    emu_edge_w16_16bpc_neon:     719.1    568.3    506.9    324.2   314.4
    emu_edge_w32_16bpc_neon:    2112.2   1677.7   1396.2   1050.5  1009.6
    emu_edge_w64_16bpc_neon:    5046.8   4322.5   3693.7   3953.8  2682.8
    emu_edge_w128_16bpc_neon:  16311.1  14341.3  12877.8  26183.5  8924.9
    
    Corresponding numbers for arm64, for comparison:
                                             Cortex A53      A72      A73
    emu_edge_w4_16bpc_neon:                       302.5    174.9    159.2
    emu_edge_w8_16bpc_neon:                       344.6    292.3    273.2
    emu_edge_w16_16bpc_neon:                      601.0    461.2    316.8
    emu_edge_w32_16bpc_neon:                      974.2   1274.7    960.5
    emu_edge_w64_16bpc_neon:                     2853.1   3527.6   2633.5
    emu_edge_w128_16bpc_neon:                   14633.5  26776.6   7236.0
    38df0efa