Commit dc98fff8 authored by Martin Storsjö's avatar Martin Storsjö

arm32: mc: NEON implementation of warp8x8 for 16 bpc

Checkasm benchmarks:
                    Cortex A7      A8     A53     A72     A73
warp_8x8_16bpc_neon:   4062.6  2109.4  2462.0  1338.9  1391.1
warp_8x8t_16bpc_neon:  3996.3  2102.4  2412.0  1273.8  1368.9

Corresponding numbers for arm64, for comparison:
                                   Cortex A53     A72     A73
warp_8x8_16bpc_neon:                   2037.0  1148.8  1222.0
warp_8x8t_16bpc_neon:                  2008.0  1120.4  1200.9
parent 018e64e7
Pipeline #40178 passed with stages
in 4 minutes and 55 seconds