Skip to content
  • Martin Storsjö's avatar
    arm32: cdef: Add NEON implementations of CDEF for 16 bpc · 018e64e7
    Martin Storsjö authored
    Use a shared template file for assembly functions that can be
    templated into 8 and 16 bpc forms, just like in the arm64 version.
    
    Checkasm benchmarks:
                              Cortex A7      A8     A53     A72     A73
    cdef_dir_16bpc_neon:          975.9   853.2   555.2   378.7   386.9
    cdef_filter_4x4_16bpc_neon:   746.9   521.7   481.2   333.0   340.8
    cdef_filter_4x8_16bpc_neon:  1300.0   885.5   816.3   582.7   599.5
    cdef_filter_8x8_16bpc_neon:  2282.5  1415.0  1417.6  1059.0  1076.3
    
    Corresponding numbers for arm64, for comparison:
                                             Cortex A53     A72     A73
    cdef_dir_16bpc_neon:                          418.0   306.7   310.7
    cdef_filter_4x4_16bpc_neon:                   453.4   282.9   297.4
    cdef_filter_4x8_16bpc_neon:                   807.5   514.2   533.8
    cdef_filter_8x8_16bpc_neon:                  1425.2   924.4   942.0
    018e64e7