The functions are only ever called with pointers to fenc and fdec and the strides are always constant so there's no point in having them as parameters. Cover both the U and V planes in a single function call. This is more efficient with SIMD, especially with the wider vectors provided by AVX2 and AVX-512, even when accounting for losing the possibility of early termination. Drop the MMX and XOP implementations, update the rest of the x86 assembly to match the new behavior. Also enable high bit-depth in the AVX2 version. Comment out the ARM, AARCH64, and MIPS MSA assembly for now.