Changes

Henrik Gramner · 1a0a5eed
--- a/task-list.md
+++ b/task-list.md
@@ -22,4 +22,7 @@ Algorithmic optimizations:
 Cleanups:
 - lfmask and l/a ctx zero can be done in tile instead of frame context for better distribution.
 - the output queue handling is duplicated in `decode.c`, `lib.c` and `obu.c`, so merge this in one common place.
- The `looprestoration`, `mc`, `dav1d_apply_grain`, and `dav1d_init_wedge_masks` functions uses excessively large stack buffers. Rewrite them in a way that reduces the stack usage, for example by using ring buffers or windowed approaches (which we already use for MC/LR SIMD). This would allow us to reduce the thread stack size requirements.
\ No newline at end of file
+- The `looprestoration`, `mc`, `dav1d_apply_grain`, and `dav1d_init_wedge_masks` functions uses excessively large stack buffers. Rewrite them in a way that reduces the stack usage, for example by using ring buffers or windowed approaches (which we already use for MC/LR SIMD). This would allow us to reduce the thread stack size requirements.
+
+Memory usage reductions:
+ - Pack the four (y/uv \* h/v) 6-bit lf mask values into a 24-bit value, which should save 1 KiB / sb128. Requires changes to the mask loading asm code.
\ No newline at end of file