Skip to content

arm64: msac: Implement NEON msac_decode_symbol_adapt

Martin Storsjö requested to merge mstorsjo/dav1d:msac-neon into master
                             Cortex A53    A72    A73
msac_decode_symbol_adapt4_c:      107.5   57.1   67.8
msac_decode_symbol_adapt4_neon:    70.1   53.4   55.0
msac_decode_symbol_adapt8_c:      157.3   74.5   90.2
msac_decode_symbol_adapt8_neon:    75.3   57.9   56.2
msac_decode_symbol_adapt16_c:     257.4  106.3  136.0
msac_decode_symbol_adapt16_neon:  101.2   61.2   65.8

Total decoding speedup of Chimera is around 0.8%.

@janne Do you have an opinion on the use of macros here? I'm avoiding duplicating the main ~60 line block of SIMD code by templating it out to three versions. Templating between widths 4 and 8 is trivial (just changing between .4h and .8h register specifiers), but templating between using one or two registers (for width 8 vs 16) is done with a lot of small macros, one per instruction type. The macro definitions end up using more lines of code than it would be to duplicate the code once more...

Merge request reports