Skip to content
Snippets Groups Projects

AArch64: Optimize Armv8.0 Neon path of SBD H/HV 6-tap filters

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_sbd_h_hv_6tap_neon into master
  1. Sep 06, 2024
    • Arpad Panyik's avatar
      AArch64: Optimize Armv8.0 Neon path of SBD H/HV 6-tap filters · a992a9be
      Arpad Panyik authored and Martin Storsjö's avatar Martin Storsjö committed
      The 6-tap horizontal and the horizontal parts of 6-tap HV subpel
      filters can be further improved by some pointer arithmetic and saving
      some instructions (EXTs) in their data rearrangement codes.
      
      Relative runtime of micro benchmarks after this patch on Cortex CPU
      cores:
      
      SBD mct h         X1     A78     A76     A72     A55
       regular  w8:  0.878x  0.894x  0.990x  0.923x  0.944x
       regular w16:  0.962x  0.931x  0.943x  0.949x  0.949x
       regular w32:  0.937x  0.937x  0.972x  0.938x  0.947x
       regular w64:  0.920x  0.965x  0.992x  0.936x  0.944x
      
      SBD mct hv        X1     A78     A76     A72     A55
       regular  w8:  0.931x  0.970x  0.951x  0.950x  0.971x
       regular w16:  0.940x  0.971x  0.941x  0.952x  0.967x
       regular w32:  0.943x  0.972x  0.946x  0.961x  0.974x
       regular w64:  0.943x  0.973x  0.952x  0.944x  0.975x
      a992a9be
Loading