AArch64: Specialise Neon convolutions for 6-tap filters
- Feb 22, 2024
-
-
Arpad Panyik authored
The 6-tap sub-pel filter specialisation uses different code paths for sharp (8-tap) and regular/smooth (6-tap) filtering kernels. This patch enables benchmarking for the different code paths.
f1d42ae8 -
Arpad Panyik authored
The 8-tap sub-pel filters used for motion vector interpolation are: regular, smooth, sharp. The regular and smooth filter kernels are zero-padded, so they are effectively 6-tap filters (some of them are 5-tap or even 4-tap). This patch specialises the put_8tap_neon and prep_8tap_neon functions for 6-tap filters, avoiding a lot of redundant work to multiply by and add zero. Wherever the sharp filtering is used the 8-tap path will be always selected. Benchmarking this on a broad range of recent CPUs shows a 7-15% FPS uplift. Get raw sample video: https://ultravideo.fi/video/Bosphorus_1920x1080_120fps_420_8bit_YUV_RAW.7z Encode using: aomenc --good --cpu-used=5 -w 1920 -h 1080 --bit-depth=8 --ivf -o Bosphorus_1080p_8bit.ivf Bosphorus_1920x1080_120fps_420_8bit_YUV.y4m
e51f4377
-