AArch64: Optimize Armv8.0 Neon path of SBD H/HV 6-tap filters
- Sep 06, 2024
-
-
The 6-tap horizontal and the horizontal parts of 6-tap HV subpel filters can be further improved by some pointer arithmetic and saving some instructions (EXTs) in their data rearrangement codes. Relative runtime of micro benchmarks after this patch on Cortex CPU cores: SBD mct h X1 A78 A76 A72 A55 regular w8: 0.878x 0.894x 0.990x 0.923x 0.944x regular w16: 0.962x 0.931x 0.943x 0.949x 0.949x regular w32: 0.937x 0.937x 0.972x 0.938x 0.947x regular w64: 0.920x 0.965x 0.992x 0.936x 0.944x SBD mct hv X1 A78 A76 A72 A55 regular w8: 0.931x 0.970x 0.951x 0.950x 0.971x regular w16: 0.940x 0.971x 0.941x 0.952x 0.967x regular w32: 0.943x 0.972x 0.946x 0.961x 0.974x regular w64: 0.943x 0.973x 0.952x 0.944x 0.975x
a992a9be
-