Skip to content

arm64: warped motion: Various optimizations

Kyle Siefring requested to merge KyleSiefring/dav1d:warp_improve_1 into master
  • Reorder loads of filters to benifit in order cores.
  • Use full 128-bit vectors to transpose 8x8 bytes. zip1 is called in the first stage which will hurt performance on some older big cores.
  • Rework horz stage for 8 bit mode:
    • Use smull instead of mul
    • Replace existing narrow and long instructions
    • Replace mov after calling with right shift

Cortex-A55 Before: warp_8x8_8bpc_neon: 1683.2 warp_8x8_16bpc_neon: 1870.7 warp_8x8t_8bpc_neon: 1673.2 warp_8x8t_16bpc_neon: 1848.0

After: warp_8x8_8bpc_neon: 1267.2 warp_8x8_16bpc_neon: 1769.8 warp_8x8t_8bpc_neon: 1245.4 warp_8x8t_16bpc_neon: 1747.3

Merge request reports