Skip to content
  • Martin Storsjö's avatar
    arm64: cdef: Add NEON implementations of CDEF for 16 bpc · e6cebeb7
    Martin Storsjö authored
    As some functions are made for both 8bpc and 16bpc from a shared
    template, those functions are moved to a separate assembly file
    which is included. That assembly file (cdef_tmpl.S) isn't intended
    to be assembled on its own (just like utils.S), but if it is
    assembled, it should produce an empty object file.
    
    Checkasm benchmarks:
                             Cortex A53     A72     A73
    cdef_dir_16bpc_neon:          422.7   305.5   314.0
    cdef_filter_4x4_16bpc_neon:   452.9   282.7   296.6
    cdef_filter_4x8_16bpc_neon:   800.9   515.3   534.1
    cdef_filter_8x8_16bpc_neon:  1417.1   922.7   942.6
    
    Corresponding numbers for 8bpc for comparison:
    
    cdef_dir_8bpc_neon:          394.7   268.8   281.8
    cdef_filter_4x4_8bpc_neon:   461.5   300.9   307.7
    cdef_filter_4x8_8bpc_neon:   831.6   546.1   555.6
    cdef_filter_8x8_8bpc_neon:  1454.6   934.0   960.0
    e6cebeb7
Loading