AArch64: Add Neon implementation of load_tmvs (edb16889) · Commits · VideoLAN / dav1d

Commit edb16889 authored 3 months ago by Arpad Panyik

AArch64: Add Neon implementation of load_tmvs

This patch adds a vectorised variant of the mv_projection calculation
and a faster initialisation of motion vectors for load_tmvs_neon.

Checkasm uplifts after this patch on some Neoverse and Cortex CPU cores
compared to the C reference compiled with GCC-13 and Clang-19:

                     GCC    Clang
 AWS Graviton 4:   1.62x    1.59x
 Cortex-X4:        1.45x    1.46x
 Cortex-X3:        1.68x    1.69x
 Cortex-X1:        1.55x    1.52x
 Cortex-A720:      1.54x    1.57x
 Cortex-A715:      1.47x    1.55x
 Cortex-A78:       1.21x    1.18x
 Cortex-A76:       1.38x    1.37x
 Cortex-A72:       1.08x    1.11x
 Cortex-A520:      0.97x    1.18x
 Cortex-A510:      0.99x    1.14x
 Cortex-A55:       1.16x    1.23x

This patch increases the .text by ~660 bytes, but smaller than the
reference implementation by about 0.5 KiB.

parent b129d9f2

No related branches found

No related tags found

1 merge request!1774AArch64: Add Neon implementation of load_tmvs

Pipeline #553414 passed with stages

in 51 minutes and 55 seconds

Hide whitespace changes

Inline Side-by-side

Showing with 285 additions and 0 deletions

Please register or to comment