• Martin Storsjö's avatar
    arm64: mc: Use two regs for alternating output rows for w4/8 in avg/w_avg/mask · b1167ce1
    Martin Storsjö authored
    It was already done this way for w32/64. Not doing it for w16 as it
    didn't help there (and instead gave a small slowdown due to the two
    setup instructions).
    This gives a small speedup on in-order cores like A53.
    Before:         Cortex A53     A72     A73
    avg_w4_8bpc_neon:     60.9    25.6    29.0
    avg_w8_8bpc_neon:    143.6    52.8    64.0
    avg_w4_8bpc_neon:     56.7    26.7    28.5
    avg_w8_8bpc_neon:    137.2    54.5    64.4
mc.S 113 KB