Skip to content

x86: Add 6-tap variants of 8bpc mc AVX-512 (Ice Lake) functions

Henrik Gramner requested to merge gramner/dav1d:x86_6tap_mc_8bpc_avx512 into master

Because the horizontal filter uses the VNNI vpdpbusd instruction (which does 4 pixels per instruction), there's nothing to gain from going down to 6-tap.

For the vertical filter 6-tap is still beneficial.

For the 2D (hv) case the benefits of 8-tap h + 6-tap v over dual 8-tap is obviously less significant compared to the AVX2 case where 6-tap is beneficial in both directions.

Note that this limitation is only applicable to 8bpc.

Zen 4             8-tap   6-tap

mc_8tap_w2_v:      17.8    15.3
mc_8tap_w2_hv:     26.0    23.0

mc_8tap_w4_v:      16.7    14.2
mc_8tap_w4_hv:     27.3    24.2

mc_8tap_w8_v:      18.4    16.1
mc_8tap_w8_hv:     48.8    43.6

mc_8tap_w16_v:     43.2    39.5
mc_8tap_w16_hv:    91.6    81.8

mc_8tap_w32_v:    113.0   104.1
mc_8tap_w32_hv:   273.0   247.5

mc_8tap_w64_v:    281.2   217.4
mc_8tap_w64_hv:   931.2   858.8

mc_8tap_w128_v:   796.3   616.1
mc_8tap_w128_hv: 2593.8  2380.5

Merge request reports