AArch64: Fix potential out of bounds access in DotProd H/HV filters
The DotProd/I8MM horizontal and HV/2D subpel filters use -4 offset for sampling instead of -3 to be better aligned in some cases.
This resulted in an out of bounds access, which led to crashes.
This patch fixes it.
Steps to reproduce the crash on macOS:
Use the 4K @ 60 FPS version of this video stream: https://m.youtube.com/watch?v=rPWZepxc0hw, and set the MallocGuardEdges=1
environmental variable for tools/dav1d
to have guard pages around allocations.
Use tag 1.4.2 or any commit after the inclusion of !1632 (merged), then run:
tools/dav1d --muxer null -i <sample.ivf>
It shall crash with EXC_BAD_ACCESS
after the decode of frame 757:
* thread #6, name = 'dav1d-worker', stop reason = EXC_BAD_ACCESS (code=2, address=0x104403fff)
* frame #0: 0x0000000100097028 dav1d`put_8tap_neon_i8mm + 2976
frame #1: 0x000000010002799c dav1d`dav1d_recon_b_inter_8bpc + 1304
frame #2: 0x000000010005d510 dav1d`decode_b + 16796
frame #3: 0x0000000100054ac8 dav1d`decode_sb + 444
frame #4: 0x0000000100054d68 dav1d`decode_sb + 1116
frame #5: 0x0000000100054668 dav1d`dav1d_decode_tile_sbrow + 1484
frame #6: 0x0000000100075178 dav1d`dav1d_worker_task + 3372
frame #7: 0x0000000189d62f94 libsystem_pthread.dylib`_pthread_start + 136
Merge request reports
Activity
That would indeed be good.
However I'm afraid that even with a sample, catching the issue might require running in something like valgrind... Sanitizers like asan, relying on instrumentation, won't catch any issues in assembly. And an out of bounds access can also run without symptoms, depending on the allocator.
Before I got the URL of the mentioned video stream I used an ASAN build with disabled assembly (
-Denable_asm=false
). I modified theprep_8tap_c
function of the mc_tmpl.c file to force some out of bounds accesses in the HV and H cases similarly to the original DotProd assembly version. ASAN was able to catch heap-buffer-overflow on some of the dav1d-test-data files.Maybe a macOS CI runner with
MallocGuardEdges=1
might catch similar issues in the future, I don't know similar tuneable option in glibc. I tried multiple things on GNU/Linux e.g.: to set theMALLOC_MMAP_THRESHOLD_
to the minimum to force usage ofmmap(2)
for smaller allocations.This OoB was my mistake, I assumed the value of
src
was aligned in theprep_*
/put_*
functions - which was not true, and I made some load alignment optimisations based on the false assumptions. I will avoid similar issues in the future!
changed milestone to %1.4.3