arm64: mc: NEON implementation of w_mask_444/422/420 function
A73 A53
w_mask_420_w4_8bpc_c: 818 1082.9
w_mask_420_w4_8bpc_neon: 79 126.6
w_mask_420_w8_8bpc_c: 2486 3399.8
w_mask_420_w8_8bpc_neon: 200.2 343.7
w_mask_420_w16_8bpc_c: 8022.3 10989.6
w_mask_420_w16_8bpc_neon: 528.1 889
w_mask_420_w32_8bpc_c: 31851.8 42808.6
w_mask_420_w32_8bpc_neon: 2062.5 3380.8
w_mask_420_w64_8bpc_c: 79268.5 102683.9
w_mask_420_w64_8bpc_neon: 5252.9 8575.4
w_mask_420_w128_8bpc_c: 193704.1 255586.5
w_mask_420_w128_8bpc_neon: 14602.3 22167.7
w_mask_422_w4_8bpc_c: 777.3 1038.5
w_mask_422_w4_8bpc_neon: 72.1 112.9
w_mask_422_w8_8bpc_c: 2405.7 3168
w_mask_422_w8_8bpc_neon: 191.9 314.1
w_mask_422_w16_8bpc_c: 7783.7 10543.9
w_mask_422_w16_8bpc_neon: 559.8 835.5
w_mask_422_w32_8bpc_c: 30895.7 41141.2
w_mask_422_w32_8bpc_neon: 2089.7 3187.2
w_mask_422_w64_8bpc_c: 75500.2 98766.3
w_mask_422_w64_8bpc_neon: 5379 8208.2
w_mask_422_w128_8bpc_c: 186967.1 245809.1
w_mask_422_w128_8bpc_neon: 15159.9 21474.5
w_mask_444_w4_8bpc_c: 850.1 1136.6
w_mask_444_w4_8bpc_neon: 66.5 104.7
w_mask_444_w8_8bpc_c: 2373.5 3262.9
w_mask_444_w8_8bpc_neon: 180.5 290.2
w_mask_444_w16_8bpc_c: 7291.6 10590.7
w_mask_444_w16_8bpc_neon: 550.9 809.7
w_mask_444_w32_8bpc_c: 8048.3 10140.8
w_mask_444_w32_8bpc_neon: 2136.2 3095
w_mask_444_w64_8bpc_c: 18055.3 23060
w_mask_444_w64_8bpc_neon: 5522.5 8124.8
w_mask_444_w128_8bpc_c: 42754.3 56072
w_mask_444_w128_8bpc_neon: 15569.5 21531.5
Edited by B Krishnan Iyer