AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters
The horizontal parts of Armv8.0 Neon 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving some EXT
instructions in their data rearrangement codes.
Relative runtime of micro benchmarks after this patch on some Cortex CPU cores:
HBD mct hv X1 A78 A76 A72 A55
regular w8: 0.952x 0.989x 0.924x 0.973x 0.976x
regular w16: 0.961x 0.993x 0.928x 0.952x 0.971x
regular w32: 0.964x 0.996x 0.930x 0.973x 0.972x
regular w64: 0.963x 0.997x 0.930x 0.969x 0.974x
Merge request reports
Activity
Filter activity
requested review from @mstorsjo
mentioned in merge request !1715 (merged)
added 9 commits
-
ad197d87...93339ce8 - 8 commits from branch
videolan:master
- 2d808de1 - AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters
-
ad197d87...93339ce8 - 8 commits from branch
enabled an automatic merge when the pipeline for 2d808de1 succeeds
added ARM performance labels
changed milestone to %1.5.0
Please register or sign in to reply