Skip to content

arm64: mc: NEON implementation of blend for 16bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-blend-16bpc into master

The branch includes a bunch of cleanups for the 8bpc code (primarily the arm64 version of it) noticed while working on the 16bpc version.

Checkasm numbers:     Cortex A53     A72     A73
blend_h_w2_16bpc_neon:     109.3    83.1    56.7
blend_h_w4_16bpc_neon:     114.1    61.4    62.3
blend_h_w8_16bpc_neon:     133.3    80.8    81.1
blend_h_w16_16bpc_neon:    215.6   132.7   149.5
blend_h_w32_16bpc_neon:    390.4   254.2   235.8
blend_h_w64_16bpc_neon:    719.1   456.3   453.8
blend_h_w128_16bpc_neon:  1646.1  1112.3  1065.9
blend_v_w2_16bpc_neon:     185.9   175.9   180.0
blend_v_w4_16bpc_neon:     338.0   183.4   232.1
blend_v_w8_16bpc_neon:     426.5   213.8   250.6
blend_v_w16_16bpc_neon:    678.2   357.8   382.6
blend_v_w32_16bpc_neon:   1098.3   686.2   695.6
blend_w4_16bpc_neon:        75.7    31.5    32.0
blend_w8_16bpc_neon:       134.0    75.0    75.8
blend_w16_16bpc_neon:      467.9   267.3   310.0
blend_w32_16bpc_neon:     1201.9   658.7   779.7

Corresponding numbers for 8bpc for comparison:

blend_h_w2_8bpc_neon:      104.1    55.9    60.8
blend_h_w4_8bpc_neon:      108.9    58.7    48.2
blend_h_w8_8bpc_neon:       99.3    64.4    67.4
blend_h_w16_8bpc_neon:     145.2    93.4    85.1
blend_h_w32_8bpc_neon:     262.2   157.5   148.6
blend_h_w64_8bpc_neon:     466.7   278.9   256.6
blend_h_w128_8bpc_neon:   1054.2   624.7   571.0
blend_v_w2_8bpc_neon:      170.5   106.6   113.4
blend_v_w4_8bpc_neon:      333.0   189.9   225.9
blend_v_w8_8bpc_neon:      314.9   199.0   203.5
blend_v_w16_8bpc_neon:     476.9   300.8   241.1
blend_v_w32_8bpc_neon:     766.9   430.4   415.1
blend_w4_8bpc_neon:         66.7    35.4    26.0
blend_w8_8bpc_neon:        110.7    47.9    48.1
blend_w16_8bpc_neon:       299.4   161.8   162.3
blend_w32_8bpc_neon:       725.8   417.0   432.8

Merge request reports