AArch64: Optimize Armv8.0 Neon path of HBD horizontal filters (109b2427) · Commits · VideoLAN / dav1d

Commit 109b2427 authored 6 months ago by Arpad Panyik Committed by Martin Storsjö 6 months ago

AArch64: Optimize Armv8.0 Neon path of HBD horizontal filters

The reduction parts of the horizontal HBD MC filters use SRSHL+SQXTUN+
SRSHL instruction sequences. In the horizontal case this can be
rewritten using a single SQSHRUN instruction with an additional
rounding value (34 for 10-bit and 40 for 12-bit).

Relative runtime of micro benchmarks after this patch on some Cortex
CPU cores:

regular:     X1      A78      A76      A55
 mc  w2:  0.847x   0.864x   0.822x   0.859x
 mc  w4:  0.889x   0.994x   0.868x   0.917x
 mc  w8:  0.857x   0.911x   0.915x   0.978x
 mc w16:  0.890x   0.982x   0.868x   0.974x
 mc w32:  0.904x   0.991x   0.873x   0.967x
 mc w64:  0.919x   1.003x   0.860x   0.970x

parent d2687884

No related branches found

No related tags found

1 merge request!1715AArch64: Optimize Armv8.0 Neon path of HBD horizontal filters

Hide whitespace changes

Inline Side-by-side

Showing with 73 additions and 43 deletions

Please register or to comment