-
Martin Storsjö authored
Use a shared template file for assembly functions that can be templated into 8 and 16 bpc forms, just like in the arm64 version. Checkasm benchmarks: Cortex A7 A8 A53 A72 A73 cdef_dir_16bpc_neon: 975.9 853.2 555.2 378.7 386.9 cdef_filter_4x4_16bpc_neon: 746.9 521.7 481.2 333.0 340.8 cdef_filter_4x8_16bpc_neon: 1300.0 885.5 816.3 582.7 599.5 cdef_filter_8x8_16bpc_neon: 2282.5 1415.0 1417.6 1059.0 1076.3 Corresponding numbers for arm64, for comparison: Cortex A53 A72 A73 cdef_dir_16bpc_neon: 418.0 306.7 310.7 cdef_filter_4x4_16bpc_neon: 453.4 282.9 297.4 cdef_filter_4x8_16bpc_neon: 807.5 514.2 533.8 cdef_filter_8x8_16bpc_neon: 1425.2 924.4 942.0
018e64e7
Loading