Skip to content

arm64: filmgrain16: Guard against out of range pixels in the gather function

Martin Storsjö requested to merge mstorsjo/dav1d:arm64-fgy16-gather-fix into master

In 16 bpc, the pixels are 16 bit integers, but valid pixels only are up to 12 bits, and the scaling buffer only contains 4096 elements.

The src pixels are, normally, supposed to be valid pixels, but when processing blocks of 32 pixels at a time, it can operate on uninitialized pixels past the right edge.

Before:               Cortex A53      A72      A73  Apple M1
fgy_32x32xn_16bpc_neon:  10317.5   8269.1   8624.2  27.0
After:
fgy_32x32xn_16bpc_neon:  11277.8   8434.8   8731.7  27.2

Total realtive speedup of the NEON code over the C code now:

fgy_32x32xn_16bpc_neon:     3.41     2.12     2.60  2.82

Merge request reports