Arpad Panyik
authored
This patch adds a vectorised variant of the mv_projection calculation and a faster initialisation of motion vectors for load_tmvs_neon. Checkasm uplifts after this patch on some Neoverse and Cortex CPU cores compared to the C reference compiled with GCC-13 and Clang-19: GCC Clang AWS Graviton 4: 1.62x 1.59x Cortex-X4: 1.45x 1.46x Cortex-X3: 1.68x 1.69x Cortex-X1: 1.55x 1.52x Cortex-A720: 1.54x 1.57x Cortex-A715: 1.47x 1.55x Cortex-A78: 1.21x 1.18x Cortex-A76: 1.38x 1.37x Cortex-A72: 1.08x 1.11x Cortex-A520: 0.97x 1.18x Cortex-A510: 0.99x 1.14x Cortex-A55: 1.16x 1.23x This patch increases the .text by ~660 bytes, but smaller than the reference implementation by about 0.5 KiB.
Name | Last commit | Last update |
---|---|---|
.. | ||
32 | ||
64 | ||
arm-arch.h | ||
asm-offsets.h | ||
asm.S | ||
cdef.h | ||
cpu.c | ||
cpu.h | ||
filmgrain.h | ||
ipred.h | ||
itx.h | ||
loopfilter.h | ||
looprestoration.h | ||
mc.h | ||
msac.h | ||
refmvs.h |