arm64: filmgrain: Add a NEON implementation of fgy_32x32xn for 16 bpc
Relative speedup over C code:
Cortex A53 A72 A73 Apple M1
fgy_32x32xn_16bpc_neon: 3.87 2.28 2.78 3.45
Merge request reports
Activity
Filter activity
Please register or sign in to reply