AArch64: Specialise Neon convolutions for 6-tap filters
The 8-tap sub-pel filters used for motion vector interpolation are: regular, smooth, sharp. The regular and smooth filter kernels are zero-padded, so they are effectively 6-tap filters (some of them are 5-tap or even 4-tap).
This patch specialises the put_8tap_neon and prep_8tap_neon functions for 6-tap filters, avoiding a lot of redundant work to multiply by and add zero. Wherever the sharp filtering is used the 8-tap path will be always selected.
Benchmarking this on a broad range of recent CPUs (A55, A510, A76, A78, A715, X1, X3, ...) shows a 7-15% FPS uplift. Measurements were done on sample video files from https://ultravideo.fi/dataset.html (e.g.: Bosphorus) encoded by simple settings of aomenc (v3.7.1+) like --good/--rt and --cpu-used={0..10}.