Ipred z1 AVX2

Henrik Gramner requested to merge gramner/dav1d:ipred_z1 into master

Average benchmark results on Skylake-X from a large number of runs with random input:

intra_pred_z1_w4_8bpc_c: 328.2
intra_pred_z1_w4_8bpc_avx2: 37.1
intra_pred_z1_w8_8bpc_c: 631.5
intra_pred_z1_w8_8bpc_avx2: 55.4
intra_pred_z1_w16_8bpc_c: 1238.2
intra_pred_z1_w16_8bpc_avx2: 98.2
intra_pred_z1_w32_8bpc_c: 3088.2
intra_pred_z1_w32_8bpc_avx2: 180.2
intra_pred_z1_w64_8bpc_c: 7456.0
intra_pred_z1_w64_8bpc_avx2: 368.6

The code is branchy so the benchmark numbers vary wildly depending on input.

