Skip to content
Snippets Groups Projects

AArch64: Specialise Neon convolutions for 6-tap filters

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_sbd_6tap into master
  1. Feb 22, 2024
    • Arpad Panyik's avatar
      AArch64: Enable benchmarks for 8-tap sharp filters · f1d42ae8
      Arpad Panyik authored
      The 6-tap sub-pel filter specialisation uses different code paths for
      sharp (8-tap) and regular/smooth (6-tap) filtering kernels.
      
      This patch enables benchmarking for the different code paths.
      f1d42ae8
    • Arpad Panyik's avatar
      AArch64: Specialise Neon convolutions for 6-tap filters · e51f4377
      Arpad Panyik authored
      The 8-tap sub-pel filters used for motion vector interpolation are:
      regular, smooth, sharp. The regular and smooth filter kernels are
      zero-padded, so they are effectively 6-tap filters (some of them are
      5-tap or even 4-tap).
      
      This patch specialises the put_8tap_neon and prep_8tap_neon functions
      for 6-tap filters, avoiding a lot of redundant work to multiply by
      and add zero. Wherever the sharp filtering is used the 8-tap path
      will be always selected.
      
      Benchmarking this on a broad range of recent CPUs shows a 7-15% FPS
      uplift.
      
      Get raw sample video:
      https://ultravideo.fi/video/Bosphorus_1920x1080_120fps_420_8bit_YUV_RAW.7z
      
      Encode using:
      aomenc --good --cpu-used=5 -w 1920 -h 1080 --bit-depth=8 --ivf -o Bosphorus_1080p_8bit.ivf Bosphorus_1920x1080_120fps_420_8bit_YUV.y4m
      e51f4377
Loading