Skip to content
Snippets Groups Projects

AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_hbd_hv_6tap_neon into master
  1. Sep 06, 2024
    • Arpad Panyik's avatar
      AArch64: Optimize Armv8.0 Neon path of HBD HV 6-tap filters · 2d808de1
      Arpad Panyik authored and Martin Storsjö's avatar Martin Storsjö committed
      The horizontal parts of 6-tap HV subpel filters can be further
      improved by some pointer arithmetic and saving some instructions
      (EXTs) in their data rearrangement codes.
      
      Relative runtime of micro benchmarks after this patch on Cortex CPU
      cores:
      
      HBD mct hv        X1     A78     A76     A72     A55
       regular  w8:  0.952x  0.989x  0.924x  0.973x  0.976x
       regular w16:  0.961x  0.993x  0.928x  0.952x  0.971x
       regular w32:  0.964x  0.996x  0.930x  0.973x  0.972x
       regular w64:  0.963x  0.997x  0.930x  0.969x  0.974x
      2d808de1
Loading