... | ... | @@ -10,11 +10,10 @@ SIMD: |
|
|
- move dequant from `decode_coeffs()` to itx;
|
|
|
- order_palette() to dsp for simd;
|
|
|
- change coef contexting (hi/lo_ctx) to be diagonal-oriented for dsp/simd;
|
|
|
- `_save_tmvs()` and `_load_tmvs()` in `refmvs.c` can (maybe?) be SIMD'ed, along with all `_splat_*()` code in `refmvs.h`;
|
|
|
- a specifically optimized version for `mc.put/prep_scaled()` for super_res, since then `my` is always 0, so there is only horizontal scaling, not vertical.
|
|
|
- `_save_tmvs()` and `_load_tmvs()` in `refmvs.c` can (maybe?) be SIMD'ed, along with all `_splat_*()` code in `refmvs.h`.
|
|
|
|
|
|
Multi-threading:
|
|
|
- postfilter and film-grain threading;
|
|
|
- postfilter (!1086) and film-grain threading;
|
|
|
- in first-pass of frame threading with tile threading enabled, it may make sense (assuming no temporal interference from ref_mvs or seg_id) to first parse the tile marked as the one used to update the output CDF, since that would unblock the subsequent thread's pass 1. This is only true if use_ref_mvs=0 and segmentation.temporal_update=0;
|
|
|
- threading can become a generic worker queue (one tile_sbrow symbol parsing/recon, one sbrow postfilter(s)) and then use a generic single threadpool instead of separate tile/frame[/postfilter?] ones (see also #206);
|
|
|
- by not adding invisible frames to `out_delayed[]` and/or growing it so it can be bigger than the number of frame threads (and thus making the indexing between `out_delayed[]` and the actual frame thread doing the decoding independent), we could grow concurrency and scalability on typical sequences with frame-multithreading enabled.
|
... | ... | |