ARM64: Port msac improvements to more functions
Required changing the tests to zero pad cdfs.
Results - Neoverse N1
Old:
msac_decode_symbol_adapt4_c: 41.4 ( 1.00x)
msac_decode_symbol_adapt4_neon: 31.0 ( 1.34x)
msac_decode_symbol_adapt8_c: 54.5 ( 1.00x)
msac_decode_symbol_adapt8_neon: 32.2 ( 1.69x)
msac_decode_symbol_adapt16_c: 85.6 ( 1.00x)
msac_decode_symbol_adapt16_neon: 37.5 ( 2.28x)
New:
msac_decode_symbol_adapt4_c: 41.5 ( 1.00x)
msac_decode_symbol_adapt4_neon: 27.7 ( 1.50x)
msac_decode_symbol_adapt8_c: 55.7 ( 1.00x)
msac_decode_symbol_adapt8_neon: 30.1 ( 1.85x)
msac_decode_symbol_adapt16_c: 82.4 ( 1.00x)
msac_decode_symbol_adapt16_neon: 35.2 ( 2.34x)
Edited by Kyle Siefring