Cdef filter simd

Ronald S. Bultje requested to merge rbultje/dav1d:cdef-filter-simd into master

cdef_filter_4x4_8bpc_c: 2273.6 cdef_filter_4x4_8bpc_avx2: 113.6 cdef_filter_8x8_8bpc_c: 7913.0 cdef_filter_8x8_8bpc_avx2: 309.9

Decoding time reduces to 15.51s for first 1000 frames of chimera 1080p, from 23.1 before cdef_filter SIMD or 17.86 with only 8x8 cdef_filter SIMD.

Also add unit tests and rewrite C code to remove last remnants of libaom code in cdef.c.

