Skip to content

arm64: loopfilter: NEON implementation of loopfilter for 16 bpc

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-lpf-16bpc into master
Checkasm runtimes:      Cortex A53     A72     A73
lpf_h_sb_uv_w4_16bpc_neon:   919.0   795.0   714.9
lpf_h_sb_uv_w6_16bpc_neon:  1267.7  1116.2  1081.9
lpf_h_sb_y_w4_16bpc_neon:   1500.2  1543.9  1778.5
lpf_h_sb_y_w8_16bpc_neon:   2216.1  2183.0  2568.1
lpf_h_sb_y_w16_16bpc_neon:  2641.8  2630.4  2639.4
lpf_v_sb_uv_w4_16bpc_neon:   836.5   572.7   667.3
lpf_v_sb_uv_w6_16bpc_neon:  1130.8   709.1   955.5
lpf_v_sb_y_w4_16bpc_neon:   1271.6  1434.4  1272.1
lpf_v_sb_y_w8_16bpc_neon:   1818.0  1759.1  1664.6
lpf_v_sb_y_w16_16bpc_neon:  1998.6  2115.8  1586.6

Corresponding numbers for 8 bpc for comparison:
lpf_h_sb_uv_w4_8bpc_neon:    799.4   632.8   695.4
lpf_h_sb_uv_w6_8bpc_neon:   1067.3   613.6   767.5
lpf_h_sb_y_w4_8bpc_neon:    1490.5  1179.1  1018.9
lpf_h_sb_y_w8_8bpc_neon:    1892.9  1382.0  1172.0
lpf_h_sb_y_w16_8bpc_neon:   2117.4  1625.4  1739.0
lpf_v_sb_uv_w4_8bpc_neon:    447.1   447.7   446.0
lpf_v_sb_uv_w6_8bpc_neon:    522.1   529.0   513.1
lpf_v_sb_y_w4_8bpc_neon:    1043.7   785.0   775.9
lpf_v_sb_y_w8_8bpc_neon:    1500.4  1115.9   881.2
lpf_v_sb_y_w16_8bpc_neon:   1493.5  1371.4  1248.5

Merge request reports