Skip to content

arm64: ipred: 16 bpc NEON implementation of the Z2 function

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-z2-16bpc into master

Relative speedup over unvectorized C code:

                          Cortex A53    A55    A72    A73    A76   Apple M1
intra_pred_z2_w4_16bpc_neon:    2.98   2.98   2.38   2.77   3.19   7.75
intra_pred_z2_w8_16bpc_neon:    3.91   4.22   2.64   3.29   3.73   4.78
intra_pred_z2_w16_16bpc_neon:   4.43   5.12   2.89   3.90   3.50   4.26
intra_pred_z2_w32_16bpc_neon:   5.08   6.36   3.44   4.40   4.05   4.96
intra_pred_z2_w64_16bpc_neon:   4.68   5.97   3.29   4.40   3.68   5.23

Merge request reports