Skip to content
Snippets Groups Projects
Commit 109b2427 authored by Arpad Panyik's avatar Arpad Panyik Committed by Martin Storsjö
Browse files

AArch64: Optimize Armv8.0 Neon path of HBD horizontal filters

The reduction parts of the horizontal HBD MC filters use SRSHL+SQXTUN+
SRSHL instruction sequences. In the horizontal case this can be
rewritten using a single SQSHRUN instruction with an additional
rounding value (34 for 10-bit and 40 for 12-bit).

Relative runtime of micro benchmarks after this patch on some Cortex
CPU cores:

regular:     X1      A78      A76      A55
 mc  w2:  0.847x   0.864x   0.822x   0.859x
 mc  w4:  0.889x   0.994x   0.868x   0.917x
 mc  w8:  0.857x   0.911x   0.915x   0.978x
 mc w16:  0.890x   0.982x   0.868x   0.974x
 mc w32:  0.904x   0.991x   0.873x   0.967x
 mc w64:  0.919x   1.003x   0.860x   0.970x
parent d2687884
No related branches found
No related tags found
1 merge request!1715AArch64: Optimize Armv8.0 Neon path of HBD horizontal filters
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment