Changes

Ronald S. Bultje · d50a1c1a
--- a/task-list.md
+++ b/task-list.md
@@ -10,11 +10,10 @@ SIMD:
 - move dequant from `decode_coeffs()` to itx;
 - order_palette() to dsp for simd;
 - change coef contexting (hi/lo_ctx) to be diagonal-oriented for dsp/simd;
- `_save_tmvs()` and `_load_tmvs()` in `refmvs.c` can (maybe?) be SIMD'ed, along with all `_splat_*()` code in `refmvs.h`;
- a specifically optimized version for `mc.put/prep_scaled()` for super_res, since then `my` is always 0, so there is only horizontal scaling, not vertical.
+- `_save_tmvs()` and `_load_tmvs()` in `refmvs.c` can (maybe?) be SIMD'ed, along with all `_splat_*()` code in `refmvs.h`.

 Multi-threading:
- postfilter and film-grain threading;
+- postfilter (!1086) and film-grain threading;
 - in first-pass of frame threading with tile threading enabled, it may make sense (assuming no temporal interference from ref_mvs or seg_id) to first parse the tile marked as the one used to update the output CDF, since that would unblock the subsequent thread's pass 1. This is only true if use_ref_mvs=0 and segmentation.temporal_update=0;
 - threading can become a generic worker queue (one tile_sbrow symbol parsing/recon, one sbrow postfilter(s)) and then use a generic single threadpool instead of separate tile/frame[/postfilter?] ones (see also #206);
 - by not adding invisible frames to `out_delayed[]` and/or growing it so it can be bigger than the number of frame threads (and thus making the indexing between `out_delayed[]` and the actual frame thread doing the decoding independent), we could grow concurrency and scalability on typical sequences with frame-multithreading enabled.