Skip to content

Optimize the cdef_filter C implementation

Henrik Gramner requested to merge gramner/dav1d:cdef_c_optimizations into master

Performance numbers, measured on Skylake-X:

Before:                            After:
cdef_filter_4x4_8bpc_c: 1217.0     cdef_filter_4x4_8bpc_c: 885.2
cdef_filter_4x8_8bpc_c: 2355.1     cdef_filter_4x8_8bpc_c: 1710.1
cdef_filter_8x8_8bpc_c: 2669.5     cdef_filter_8x8_8bpc_c: 1439.7

For 10-bit (which currently uses C DSP code) the overall decoding performance is increased by around 20%.

The asm can also be optimized using the same approach, although the benefit will likely be a bit smaller.

Merge request reports