• Martin Storsjö's avatar
    arm64: mc: Simplify avg/w_avg/mask by always using the w16 macro · 0bad117e
    Martin Storsjö authored
    This shortens the source by 40 lines, and gives a significant
    speedup on A53, a small speedup on A72 and a very minor slowdown
    for avg/w_avg on A73.
    
    Before:           Cortex A53     A72     A73
    avg_w4_8bpc_neon:       67.4    26.1    25.4
    avg_w8_8bpc_neon:      158.7    56.3    59.1
    avg_w16_8bpc_neon:     382.9   154.1   160.7
    w_avg_w4_8bpc_neon:     99.9    43.6    39.4
    w_avg_w8_8bpc_neon:    253.2    98.3    99.0
    w_avg_w16_8bpc_neon:   543.1   285.0   301.8
    mask_w4_8bpc_neon:     110.6    51.4    45.1
    mask_w8_8bpc_neon:     295.0   129.9   114.0
    mask_w16_8bpc_neon:    654.6   365.8   369.7
    After:
    avg_w4_8bpc_neon:       60.8    26.3    29.0
    avg_w8_8bpc_neon:      142.8    52.9    64.1
    avg_w16_8bpc_neon:     378.2   153.4   160.8
    w_avg_w4_8bpc_neon:     78.7    41.0    40.9
    w_avg_w8_8bpc_neon:    190.6    90.1   105.1
    w_avg_w16_8bpc_neon:   531.1   279.3   301.4
    mask_w4_8bpc_neon:      86.6    47.2    44.9
    mask_w8_8bpc_neon:     222.0   114.3   114.9
    mask_w16_8bpc_neon:    639.5   356.0   369.8
    0bad117e
mc.S 113 KB