-
Martin Storsjö authored
As some functions are made for both 8bpc and 16bpc from a shared template, those functions are moved to a separate assembly file which is included. That assembly file (cdef_tmpl.S) isn't intended to be assembled on its own (just like utils.S), but if it is assembled, it should produce an empty object file. Checkasm benchmarks: Cortex A53 A72 A73 cdef_dir_16bpc_neon: 422.7 305.5 314.0 cdef_filter_4x4_16bpc_neon: 452.9 282.7 296.6 cdef_filter_4x8_16bpc_neon: 800.9 515.3 534.1 cdef_filter_8x8_16bpc_neon: 1417.1 922.7 942.6 Corresponding numbers for 8bpc for comparison: cdef_dir_8bpc_neon: 394.7 268.8 281.8 cdef_filter_4x4_8bpc_neon: 461.5 300.9 307.7 cdef_filter_4x8_8bpc_neon: 831.6 546.1 555.6 cdef_filter_8x8_8bpc_neon: 1454.6 934.0 960.0
e6cebeb7
Loading