x86: Add 6-tap variants of 8bpc mc AVX2 functions
Overall decoding performance increases by up to 10% depending on the input when using AVX2.
Checkasm numbers on Zen 4:
8-tap 6-tap
w2_v 18.2 16.0
w2_hv 32.3 29.7
w4_v 17.5 14.9
w4_hv 36.9 33.6
w8_h 21.5 17.1
w8_v 19.1 16.9
w8_hv 65.6 51.5
w16_h 48.1 37.4
w16_v 37.2 31.1
w16_hv 170.8 134.1
w32_h 130.9 96.8
w32_v 107.6 89.9
w32_hv 509.0 400.1
w64_h 462.5 343.3
w64_v 368.5 305.8
w64_hv 1738.7 1375.8
w128_h 1314.2 977.5
w128_v 1068.6 903.2
w128_hv 4874.8 3866.1