arm64: filmgrain: Add NEON implementation of the fguv function
Relative speedup over C code:
Cortex A53 A72 A73 Apple M1 fguv_32x32xn_8bpc_420_csfl0_neon: 4.59 2.90 3.82 7.05 fguv_32x32xn_8bpc_420_csfl1_neon: 3.82 2.94 2.93 2.57 fguv_32x32xn_8bpc_422_csfl0_neon: 4.59 3.11 4.03 3.49 fguv_32x32xn_8bpc_422_csfl1_neon: 3.83 2.99 3.01 2.43 fguv_32x32xn_8bpc_444_csfl0_neon: 6.69 4.17 5.69 4.88 fguv_32x32xn_8bpc_444_csfl1_neon: 5.36 3.62 4.14 3.17
Also do some minor fixes to the checkasm test, to make one single run of it more comprehensive, and to make the benchmark include all cases. This also includes a refactoring of the existing fgy function, to reduce the code footprint by sharing the prologue for all overlap combinations (something that the new fguv function also does).
@janne - The code does use
asm-offsets.h and static asserts. There's some existing code (for msac) that also has such offsets, but they're only defines within the .S file without corresponding static asserts (assuming that if they differ, it'd be caught by a checkasm test). I had to change the code you had made for using
static_assert, because the second parameter to it (the "message") must be a string literal (modern clang enforces this), can't be just a plain unquoted string. And doing that makes it much harder to do a drop-in replacement for
static_assert, so I just settled on a
CHECK_OFFSET macro which either expands to
static_assert or a replacement. I guess that code should go to some common header instead of just being in this particular init file (to let it be shared with e.g. msac).