Skip to content

arm: mc: NEON implementation of emu_edge for 8bpc on arm32 and 16bpc on arm64

Martin Storsjö requested to merge mstorsjo/dav1d:arm-emuedge into master

Relative speedups over C code: ARM64:

                      Cortex A53    A72    A73
emu_edge_w4_16bpc_neon:     2.49   1.53   1.91
emu_edge_w8_16bpc_neon:     2.27   1.55   1.90
emu_edge_w16_16bpc_neon:    2.46   1.46   2.09
emu_edge_w32_16bpc_neon:    2.20   1.39   1.73
emu_edge_w64_16bpc_neon:    1.65   1.00   1.46
emu_edge_w128_16bpc_neon:   1.55   1.44   1.54

ARM32:

                      Cortex A7     A8     A9    A53    A72    A73
emu_edge_w4_8bpc_neon:     4.23   3.39   2.55   3.58   3.11   3.57
emu_edge_w8_8bpc_neon:     4.02   3.61   2.47   3.74   3.50   3.77
emu_edge_w16_8bpc_neon:    4.56   3.63   2.93   3.97   3.44   4.11
emu_edge_w32_8bpc_neon:    3.82   3.05   2.04   3.79   2.34   3.10
emu_edge_w64_8bpc_neon:    3.27   2.97   1.84   3.70   2.39   1.97
emu_edge_w128_8bpc_neon:   2.58   2.64   1.54   3.04   1.28   1.87

Merge request reports