Skip to content

x86: Add 6-tap variants of 8bpc mc SSSE3 functions

Henrik Gramner requested to merge gramner/dav1d:x86_6tap_mc_8bpc_sse into master

This also removes the SSE2 mc code, as navigating the maze of nested macros and %if statements was a bit too much for an ISA extension that's essentially irrelevant.

Makes overall SSSE3 decoding performance of a Bosphorus sample increase by around 30-35%.

Checkasm numbers on Zen 4:

            8-tap    6-tap
w2_v:        18.5     16.4
w2_hv:       31.0     27.7

w4_v:        17.5     15.3
w4_hv:       68.3     36.3

w8_h:        31.4     23.3
w8_v:        24.0     20.0
w8_hv:      144.3     67.6

w16_h:       83.1     59.8
w16_v:       60.9     49.7
w16_hv:     381.8    173.6

w32_h:      257.0    184.9
w32_v:      179.7    147.4
w32_hv:    1129.7    515.1

w64_h:      910.7    654.9
w64_v:      620.1    510.2
w64_hv:    3835.2   1765.6

w128_h:    2582.5   1853.9
w128_v:    1763.6   1464.8
w128_hv:  10651.3   4938.9
Edited by Henrik Gramner

Merge request reports