Skip to content

arm32: cdef: Add NEON implementations of CDEF for 16 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm32-cdef16 into master

Use a shared template file for assembly functions that can be templated into 8 and 16 bpc forms, just like in the arm64 version.

Checkasm benchmarks:
                          Cortex A7      A8     A53     A72     A73
cdef_dir_16bpc_neon:          975.9   853.2   555.2   378.7   386.9
cdef_filter_4x4_16bpc_neon:   746.9   521.7   481.2   333.0   340.8
cdef_filter_4x8_16bpc_neon:  1300.0   885.5   816.3   582.7   599.5
cdef_filter_8x8_16bpc_neon:  2282.5  1415.0  1417.6  1059.0  1076.3

Corresponding numbers for arm64, for comparison:

                                         Cortex A53     A72     A73
cdef_dir_16bpc_neon:                          418.0   306.7   310.7
cdef_filter_4x4_16bpc_neon:                   453.4   282.9   297.4
cdef_filter_4x8_16bpc_neon:                   807.5   514.2   533.8
cdef_filter_8x8_16bpc_neon:                  1425.2   924.4   942.0

Merge request reports