AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters (!1717) · Merge requests · VideoLAN / dav1d

The horizontal parts of Armv8.0 Neon 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving some EXT instructions in their data rearrangement codes.

Relative runtime of micro benchmarks after this patch on some Cortex CPU cores:

HBD mct hv        X1     A78     A76     A72     A55
 regular  w8:  0.952x  0.989x  0.924x  0.973x  0.976x
 regular w16:  0.961x  0.993x  0.928x  0.952x  0.971x
 regular w32:  0.964x  0.996x  0.930x  0.973x  0.972x
 regular w64:  0.963x  0.997x  0.930x  0.969x  0.974x

AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters

Merge request reports