-
Martin Storsjö authored
This matches was is implemented for arm64 so far. Align the dav1d_sm_weights table to allow aligned loads from it. Relative speedups over C code (vs potentially autovectorized code, built with Clang): Cortex A7 A8 A9 A53 A72 A73 intra_pred_paeth_w4_8bpc_neon: 4.81 7.61 5.82 5.50 5.61 6.94 intra_pred_paeth_w8_8bpc_neon: 7.83 11.95 9.51 11.05 8.90 10.51 intra_pred_paeth_w16_8bpc_neon: 4.86 4.49 3.90 4.60 3.76 3.54 intra_pred_paeth_w32_8bpc_neon: 4.55 4.03 3.52 4.27 3.30 3.21 intra_pred_paeth_w64_8bpc_neon: 4.38 3.72 3.32 3.95 3.08 3.00 intra_pred_smooth_h_w4_8bpc_neon: 5.74 10.80 5.32 6.79 4.77 6.48 intra_pred_smooth_h_w8_8bpc_neon: 10.59 17.95 9.39 16.03 6.94 8.98 intra_pred_smooth_h_w16_8bpc_neon: 2.81 3.19 2.12 3.70 2.90 3.59 intra_pred_smooth_h_w32_8bpc_neon: 2.63 2.41 1.86 3.44 2.24 2.66 intra_pred_smooth_h_w64_8bpc_neon: 2.42 2.52 1.79 3.24 1.81 2.11 intra_pred_smooth_v_w4_8bpc_neon: 4.15 7.99 3.46 4.63 3.83 4.39 intra_pred_smooth_v_w8_8bpc_neon: 7.31 12.42 7.04 10.00 4.26 6.20 intra_pred_smooth_v_w16_8bpc_neon: 3.70 3.44 2.53 3.33 2.76 3.21 intra_pred_smooth_v_w32_8bpc_neon: 3.91 3.74 2.70 3.51 2.50 2.96 intra_pred_smooth_v_w64_8bpc_neon: 4.03 3.94 2.80 3.64 2.36 2.80 intra_pred_smooth_w4_8bpc_neon: 4.09 7.74 4.54 4.79 3.26 5.10 intra_pred_smooth_w8_8bpc_neon: 5.63 8.93 6.62 8.28 3.73 6.04 intra_pred_smooth_w16_8bpc_neon: 3.97 3.40 3.32 3.74 3.01 3.77 intra_pred_smooth_w32_8bpc_neon: 3.75 3.14 3.07 3.28 2.65 3.17 intra_pred_smooth_w64_8bpc_neon: 3.60 3.04 2.93 2.97 2.35 2.85 intra_pred_filter_w4_8bpc_neon: 5.54 6.43 4.90 7.26 3.44 4.61 intra_pred_filter_w8_8bpc_neon: 7.05 7.15 5.50 10.05 4.29 6.02 intra_pred_filter_w16_8bpc_neon: 7.36 6.46 5.27 11.51 4.75 6.70 intra_pred_filter_w32_8bpc_neon: 7.56 6.32 5.01 12.34 4.47 6.97 pal_pred_w4_8bpc_neon: 5.47 7.76 4.40 5.20 8.32 7.03 pal_pred_w8_8bpc_neon: 11.11 14.12 8.44 13.95 11.88 12.43 pal_pred_w16_8bpc_neon: 14.38 20.95 9.84 17.43 14.77 13.56 pal_pred_w32_8bpc_neon: 12.91 19.85 10.87 19.03 14.63 14.62 pal_pred_w64_8bpc_neon: 14.01 19.23 10.82 19.82 16.23 16.32 cfl_ac_420_w4_8bpc_neon: 8.11 13.41 7.92 9.26 10.55 9.36 cfl_ac_420_w8_8bpc_neon: 7.77 15.71 7.69 8.94 9.76 8.56 cfl_ac_420_w16_8bpc_neon: 7.72 13.71 8.30 9.05 9.81 9.02 cfl_ac_422_w4_8bpc_neon: 8.85 15.80 8.26 10.97 13.04 10.00 cfl_ac_422_w8_8bpc_neon: 8.77 16.96 7.57 10.46 12.16 9.92 cfl_ac_422_w16_8bpc_neon: 8.28 14.91 7.16 9.69 10.57 9.18 cfl_ac_444_w4_8bpc_neon: 7.47 14.13 7.50 9.76 11.11 9.39 cfl_ac_444_w8_8bpc_neon: 6.81 15.46 5.27 9.11 12.09 9.76 cfl_ac_444_w16_8bpc_neon: 6.11 13.68 4.62 8.17 10.78 8.92 cfl_ac_444_w32_8bpc_neon: 5.71 12.11 4.28 7.53 9.53 8.52 cfl_pred_cfl_128_w4_8bpc_neon: 7.46 12.63 8.48 8.03 7.64 9.29 cfl_pred_cfl_128_w8_8bpc_neon: 5.05 5.16 3.79 4.64 5.07 4.42 cfl_pred_cfl_128_w16_8bpc_neon: 4.44 5.17 3.65 4.20 4.41 4.74 cfl_pred_cfl_128_w32_8bpc_neon: 4.51 5.25 3.67 4.29 4.39 4.73 cfl_pred_cfl_left_w4_8bpc_neon: 6.60 11.74 7.75 6.91 7.44 9.14 cfl_pred_cfl_left_w8_8bpc_neon: 4.92 5.15 3.80 4.41 5.44 4.81 cfl_pred_cfl_left_w16_8bpc_neon: 4.40 5.26 3.66 4.10 4.63 4.94 cfl_pred_cfl_left_w32_8bpc_neon: 4.50 5.31 3.68 4.25 4.43 4.82 cfl_pred_cfl_top_w4_8bpc_neon: 7.00 11.88 7.88 7.50 7.43 9.68 cfl_pred_cfl_top_w8_8bpc_neon: 4.96 5.07 3.78 4.51 5.31 4.75 cfl_pred_cfl_top_w16_8bpc_neon: 4.42 5.31 3.69 4.16 4.60 4.93 cfl_pred_cfl_top_w32_8bpc_neon: 4.52 5.36 3.71 4.29 4.47 4.83 cfl_pred_cfl_w4_8bpc_neon: 5.92 10.54 7.25 6.21 6.79 8.33 cfl_pred_cfl_w8_8bpc_neon: 4.67 5.16 3.77 4.14 5.20 4.71 cfl_pred_cfl_w16_8bpc_neon: 4.29 5.29 3.70 3.97 4.53 4.86 cfl_pred_cfl_w32_8bpc_neon: 4.47 5.34 3.72 4.20 4.42 4.83
8dd9c651
Loading