... | ... | @@ -22,4 +22,7 @@ Algorithmic optimizations: |
|
|
Cleanups:
|
|
|
- lfmask and l/a ctx zero can be done in tile instead of frame context for better distribution.
|
|
|
- the output queue handling is duplicated in `decode.c`, `lib.c` and `obu.c`, so merge this in one common place.
|
|
|
- The `looprestoration`, `mc`, `dav1d_apply_grain`, and `dav1d_init_wedge_masks` functions uses excessively large stack buffers. Rewrite them in a way that reduces the stack usage, for example by using ring buffers or windowed approaches (which we already use for MC/LR SIMD). This would allow us to reduce the thread stack size requirements. |
|
|
\ No newline at end of file |
|
|
- The `looprestoration`, `mc`, `dav1d_apply_grain`, and `dav1d_init_wedge_masks` functions uses excessively large stack buffers. Rewrite them in a way that reduces the stack usage, for example by using ring buffers or windowed approaches (which we already use for MC/LR SIMD). This would allow us to reduce the thread stack size requirements.
|
|
|
|
|
|
Memory usage reductions:
|
|
|
- Pack the four (y/uv \* h/v) 6-bit lf mask values into a 24-bit value, which should save 1 KiB / sb128. Requires changes to the mask loading asm code. |
|
|
\ No newline at end of file |