Skip to content

arm64: ipred: NEON implementation of dc/h/v prediction functions

Martin Storsjö requested to merge mstorsjo/dav1d:arm-ipred into master

Relative speedups over the C code:

                              Cortex A53    A72    A73
intra_pred_dc_128_w4_8bpc_neon:     2.08   1.47   2.17
intra_pred_dc_128_w8_8bpc_neon:     3.33   2.49   4.03
intra_pred_dc_128_w16_8bpc_neon:    3.93   3.86   3.75
intra_pred_dc_128_w32_8bpc_neon:    3.14   3.79   2.90
intra_pred_dc_128_w64_8bpc_neon:    3.68   1.97   2.42
intra_pred_dc_left_w4_8bpc_neon:    2.41   1.70   2.23
intra_pred_dc_left_w8_8bpc_neon:    3.53   2.41   3.32
intra_pred_dc_left_w16_8bpc_neon:   3.87   3.54   3.34
intra_pred_dc_left_w32_8bpc_neon:   4.10   3.60   2.76
intra_pred_dc_left_w64_8bpc_neon:   3.72   2.00   2.39
intra_pred_dc_top_w4_8bpc_neon:     2.27   1.66   2.07
intra_pred_dc_top_w8_8bpc_neon:     3.83   2.69   3.43
intra_pred_dc_top_w16_8bpc_neon:    3.66   3.60   3.20
intra_pred_dc_top_w32_8bpc_neon:    3.92   3.54   2.66
intra_pred_dc_top_w64_8bpc_neon:    3.60   1.98   2.30
intra_pred_dc_w4_8bpc_neon:         2.29   1.42   2.16
intra_pred_dc_w8_8bpc_neon:         3.56   2.83   3.05
intra_pred_dc_w16_8bpc_neon:        3.46   3.37   3.15
intra_pred_dc_w32_8bpc_neon:        3.79   3.41   2.74
intra_pred_dc_w64_8bpc_neon:        3.52   2.01   2.41
intra_pred_h_w4_8bpc_neon:         10.34   5.74   5.94
intra_pred_h_w8_8bpc_neon:         12.13   6.33   6.43
intra_pred_h_w16_8bpc_neon:        10.66   7.31   5.85
intra_pred_h_w32_8bpc_neon:         6.28   4.18   2.88
intra_pred_h_w64_8bpc_neon:         3.96   1.85   1.75
intra_pred_v_w4_8bpc_neon:         11.44   6.12   7.57
intra_pred_v_w8_8bpc_neon:         14.76   7.58   7.95
intra_pred_v_w16_8bpc_neon:        11.34   6.28   5.88
intra_pred_v_w32_8bpc_neon:         6.56   3.33   3.34
intra_pred_v_w64_8bpc_neon:         4.57   1.24   1.97
Edited by Martin Storsjö

Merge request reports

Loading