arm32: mc: Optimize warp by doing horz filtering in 8 bit

Additionally reschedule instructions for loading, to reduce stalls
on in order cores.

This applies the changes from a3b8157e
on the arm32 version.

Before:             Cortex A7      A8      A9     A53     A72     A73
warp_8x8_8bpc_neon:    3659.3  1746.0  1931.9  2128.8  1173.7  1188.9
warp_8x8t_8bpc_neon:   3650.8  1724.6  1919.8  2105.0  1147.7  1206.9
warp_8x8_16bpc_neon:   4039.4  2111.9  2337.1  2462.5  1334.6  1396.5
warp_8x8t_16bpc_neon:  3973.9  2137.1  2299.6  2413.2  1282.8  1369.6
After:
warp_8x8_8bpc_neon:    2920.8  1269.8  1410.3  1767.3   860.2  1004.8
warp_8x8t_8bpc_neon:   2904.9  1283.9  1397.5  1743.7   863.6  1024.7
warp_8x8_16bpc_neon:   3895.5  2060.7  2339.8  2376.6  1331.1  1394.0
warp_8x8t_16bpc_neon:  3822.7  2026.7  2298.7  2325.4  1278.1  1360.8
35 jobs for arm32-warp-opt in 6 minutes and 40 seconds (queued for 4 seconds)
Status Job ID Name Coverage
  Style
passed #545917
amd64 docker
style-check

00:00:10

passed #545918
amd64 docker
x86inc-check

00:00:07

 
  Build
passed #545930
amd64 docker
build-android-aarch64

00:00:12

passed #545929
amd64 docker
build-android-armv7

00:00:10

passed #545919
amd64 avx2 docker
build-debian

00:00:24

passed #545931
aarch64 docker
build-debian-aarch64

00:00:25

passed #545932
aarch64 docker
build-debian-aarch64-clang-5

00:00:24

passed #545935
armv7 docker
build-debian-armv7

00:00:45

passed #545936
armv7 docker
build-debian-armv7-clang-5

00:00:35

passed #545922
amd64 docker
build-debian-examples

00:00:17

passed #545923
amd64 docker
build-debian-no-tools

00:00:16

passed #545938
ppc64le docker
build-debian-ppc64le

00:01:44

passed #545920
amd64 docker
build-debian-static

00:00:22

passed #545934
aarch64 docker
build-debian-werror

00:00:08

passed #545921
amd64 docker
build-debian32

00:00:25

passed #545933
catalina amd64
build-macos

00:00:45

passed #545939
amd64 docker
build-pages

00:00:08

passed #545937
amd64 docker
build-ubuntu-snap

00:00:19

passed #545927
amd64 docker
build-win-arm32

00:00:12

passed #545928
amd64 docker
build-win-arm64

00:00:12

passed #545924
amd64 docker
build-win32

00:00:33

passed #545925
amd64 docker
build-win32-unaligned-stack

00:00:24

passed #545926
amd64 docker
build-win64

00:00:40

 
  Test
passed #545940
amd64 docker
test-debian

00:00:59

95.8046%
passed #545949
aarch64 docker
test-debian-aarch64

00:00:26

passed #545951
armv7 docker
test-debian-armv7-clang-5

00:01:11

passed #545944
amd64 docker
test-debian-asan

00:02:20

passed #545941
avx2 amd64 docker
test-debian-asm

00:01:13

passed #545945
amd64 docker
test-debian-msan

00:01:15

passed #545950
ppc64le docker
test-debian-ppc64le

00:02:54

passed #545947
amd64 docker
test-debian-tsan

00:03:20

passed #545946
amd64 docker
test-debian-ubsan

00:01:48

passed #545943
amd64 avx2 docker
test-debian-unaligned-stack

00:00:38

passed #545942
avx2 amd64 docker
test-debian32-asm

00:00:50

passed #545948
amd64 avx2 docker
test-win64

00:01:35