Segmentation faults in sgr_mix_c if the calling thread's stack size is not large enough
If we call the dav1d library (to decode AVIF images) from a thread with a reduced stack size (say 64 - 256 KiB), we sometimes get a segmentation fault with the following stack trace:
sgr_mix_c
lr_stripe
lr_sbrow
dav1d_lr_sbrow_16bpc
dav1d_filter_sbrow_lr_16bpc
dav1d_decode_frame_main
dav1d_decode_frame
dav1d_submit_frame
dav1d_parse_obus
gen_picture
dav1d_send_data
Note: We have also seen sgr_5x5_c
instead of sgr_mix_c
at the top of the stack. sgr_3x3_c
was also seen in 2022 but not this year. In a crash we saw sgr_5x5_c
call selfguided_filter
, which has two large local variables (sumsq
and sum
).
We reported this issue by email before. In his reply, Ronald Bultje suspects these are stack buffer overflows.
I found that this issue is already part of the following item under "Cleanups" in the task list:
- The
looprestoration
,mc
,dav1d_apply_grain
, anddav1d_init_wedge_masks
functions uses excessively large stack buffers. Rewrite them in a way that reduces the stack usage, for example by using ring buffers or windowed approaches (which we already use for MC/LR SIMD). This would allow us to reduce the thread stack size requirements.
Perhaps the new information in this issue is that the looprestoration
functions (sgr_mix_c
, sgr_5x5_c
, etc.) should get a higher priority. Thank you!