-
Martin Storsjö authored
Also make sure that the w4 case can exit after processing 12 pixels, where it is convenient. This gives a small slowdown for in-order cores like A7, A8, A53, but acutally seems to give a small speedup for out-of-order cores like A9, A72 and A73. AArch64: Before: Cortex A53 A72 A73 mc_8tap_regular_w8_v_8bpc_neon: 223.8 247.3 228.5 After: mc_8tap_regular_w8_v_8bpc_neon: 232.5 243.9 223.4 AArch32: Before: Cortex A7 A8 A9 A53 A72 A73 mc_8tap_regular_w8_v_8bpc_neon: 550.2 470.7 520.5 257.0 256.4 248.2 After: mc_8tap_regular_w8_v_8bpc_neon: 554.3 474.2 511.6 267.5 252.6 246.8
bf920fba