Skip to content

arm32: mc: Optimize warp by doing horz filtering in 8 bit

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-warp-opt into master

Additionally reschedule instructions for loading, to reduce stalls on in order cores.

This applies the changes from a3b8157e on the arm32 version.

Before:             Cortex A7      A8      A9     A53     A72     A73
warp_8x8_8bpc_neon:    3659.3  1746.0  1931.9  2128.8  1173.7  1188.9
warp_8x8t_8bpc_neon:   3650.8  1724.6  1919.8  2105.0  1147.7  1206.9
warp_8x8_16bpc_neon:   4039.4  2111.9  2337.1  2462.5  1334.6  1396.5
warp_8x8t_16bpc_neon:  3973.9  2137.1  2299.6  2413.2  1282.8  1369.6
After:
warp_8x8_8bpc_neon:    2920.8  1269.8  1410.3  1767.3   860.2  1004.8
warp_8x8t_8bpc_neon:   2904.9  1283.9  1397.5  1743.7   863.6  1024.7
warp_8x8_16bpc_neon:   3895.5  2060.7  2339.8  2376.6  1331.1  1394.0
warp_8x8t_16bpc_neon:  3822.7  2026.7  2298.7  2325.4  1278.1  1360.8

CC @KyleSiefring

Edited by Martin Storsjö

Merge request reports