Skip to content

x86: Add high bitdepth cdef_filter SSSE3 asm

Henrik Gramner requested to merge gramner/dav1d:cdef16_ssse3 into master
cdef_filter_4x4_16bpc_c: 949.6
cdef_filter_4x4_16bpc_ssse3: 95.5
cdef_filter_4x4_16bpc_avx2: 110.6

cdef_filter_4x8_16bpc_c: 1799.6
cdef_filter_4x8_16bpc_ssse3: 155.7

cdef_filter_8x8_16bpc_c: 1471.2
cdef_filter_8x8_16bpc_ssse3: 259.4
cdef_filter_8x8_16bpc_avx2: 242.5

Includes optimized code paths for pri-only and sec-only filter strengths which the hbd AVX2 code currently lacks, hence the good performance of the SSSE3 code compared to AVX2. Those optimizations will be added to AVX2 in a future MR.

Merge request reports