• Martin Storsjö's avatar
    arm: mc: Fix 8tap_v w8 with OBMC 3/4 heights · bf920fba
    Martin Storsjö authored
    Also make sure that the w4 case can exit after processing 12 pixels,
    where it is convenient.
    
    This gives a small slowdown for in-order cores like A7, A8, A53, but
    acutally seems to give a small speedup for out-of-order cores like
    A9, A72 and A73.
    
    AArch64:
    Before:                      Cortex A53     A72     A73
    mc_8tap_regular_w8_v_8bpc_neon:   223.8   247.3   228.5
    After:
    mc_8tap_regular_w8_v_8bpc_neon:   232.5   243.9   223.4
    
    AArch32:
    Before:                       Cortex A7      A8      A9     A53     A72     A73
    mc_8tap_regular_w8_v_8bpc_neon:   550.2   470.7   520.5   257.0   256.4   248.2
    After:
    mc_8tap_regular_w8_v_8bpc_neon:   554.3   474.2   511.6   267.5   252.6   246.8
    bf920fba
mc.S 90 KB