AArch64: Optimize Armv8.0 Neon path of SBD H/HV 6-tap filters
The 6-tap horizontal and the horizontal parts of 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving
some EXT
instructions in their data rearrangement codes.
Relative runtime of micro benchmarks after this patch on some Cortex CPU cores:
SBD mct h X1 A78 A76 A72 A55
regular w8: 0.878x 0.894x 0.990x 0.923x 0.944x
regular w16: 0.962x 0.931x 0.943x 0.949x 0.949x
regular w32: 0.937x 0.937x 0.972x 0.938x 0.947x
regular w64: 0.920x 0.965x 0.992x 0.936x 0.944x
SBD mct hv X1 A78 A76 A72 A55
regular w8: 0.931x 0.970x 0.951x 0.950x 0.971x
regular w16: 0.940x 0.971x 0.941x 0.952x 0.967x
regular w32: 0.943x 0.972x 0.946x 0.961x 0.974x
regular w64: 0.943x 0.973x 0.952x 0.944x 0.975x
Merge request reports
Activity
Filter activity
requested review from @mstorsjo
added 10 commits
-
d9f1732e...2d808de1 - 9 commits from branch
videolan:master
- a992a9be - AArch64: Optimize Armv8.0 Neon path of SBD H/HV 6-tap filters
-
d9f1732e...2d808de1 - 9 commits from branch
enabled an automatic merge when the pipeline for a992a9be succeeds
added ARM performance labels
changed milestone to %1.5.0
Please register or sign in to reply