AArch64: Add Neon implementation of load_tmvs
This patch adds a vectorised variant of the mv_projection calculation and a faster initialisation of motion vectors for load_tmvs_neon. Checkasm uplifts after this patch on some Neoverse and Cortex CPU cores compared to the C reference compiled with GCC-13 and Clang-19: GCC Clang AWS Graviton 4: 1.62x 1.59x Cortex-X4: 1.45x 1.46x Cortex-X3: 1.68x 1.69x Cortex-X1: 1.55x 1.52x Cortex-A720: 1.54x 1.57x Cortex-A715: 1.47x 1.55x Cortex-A78: 1.21x 1.18x Cortex-A76: 1.38x 1.37x Cortex-A72: 1.08x 1.11x Cortex-A520: 0.97x 1.18x Cortex-A510: 0.99x 1.14x Cortex-A55: 1.16x 1.23x This patch increases the .text by ~660 bytes, but smaller than the reference implementation by about 0.5 KiB.
parent
b129d9f2
No related branches found
No related tags found
Loading
Please register or sign in to comment