• Martin Storsjö's avatar
    arm64: mc: Fix out of bounds reads/writes in 8tap/bilin w2/w4 for vertical OBMC · ac65139e
    Martin Storsjö authored and Ronald S. Bultje's avatar Ronald S. Bultje committed
    For 8tap, unroll the vertical filters slightly less (by 4 instead of
    8 elements) and add a special case trailer that handles only 2 elements
    (for 2x6 and 4x6). By unrolling less, performance on in-order cores is
    somewhat impacted.
    Before:                      Cortex A53     A72     A73
    mc_8tap_regular_w2_v_8bpc_neon:   146.5   141.3   145.6
    mc_8tap_regular_w4_v_8bpc_neon:   175.2   180.3   162.4
    mc_8tap_regular_w2_v_8bpc_neon:   175.7   142.7   150.5
    mc_8tap_regular_w4_v_8bpc_neon:   183.3   176.0   154.6