| ... | @@ -17,7 +17,8 @@ SIMD: |
... | @@ -17,7 +17,8 @@ SIMD: |
|
|
Multi-threading:
|
|
Multi-threading:
|
|
|
- postfilter and film-grain threading;
|
|
- postfilter and film-grain threading;
|
|
|
- in first-pass of frame threading with tile threading enabled, it may make sense (assuming no temporal interference from ref_mvs or seg_id) to first parse the tile marked as the one used to update the output CDF, since that would unblock the subsequent thread's pass 1. This is only true if use_ref_mvs=0 and segmentation.temporal_update=0;
|
|
- in first-pass of frame threading with tile threading enabled, it may make sense (assuming no temporal interference from ref_mvs or seg_id) to first parse the tile marked as the one used to update the output CDF, since that would unblock the subsequent thread's pass 1. This is only true if use_ref_mvs=0 and segmentation.temporal_update=0;
|
|
|
- threading can become a generic worker queue (one tile_sbrow symbol parsing/recon, one sbrow postfilter(s)) and then use a generic single threadpool instead of separate tile/frame[/postfilter?] ones (see also #206).
|
|
- threading can become a generic worker queue (one tile_sbrow symbol parsing/recon, one sbrow postfilter(s)) and then use a generic single threadpool instead of separate tile/frame[/postfilter?] ones (see also #206);
|
|
|
|
- by not adding invisible frames to `out_delayed[]` and/or growing it so it can be bigger than the number of frame threads (and thus making the indexing between `out_delayed[]` and the actual frame thread doing the decoding independent), we could grow concurrency and scalability on typical sequences with frame-multithreading enabled.
|
|
|
|
|
|
|
|
Removing redundancies:
|
|
Removing redundancies:
|
|
|
- it may make sense to copy one row (8px+2x2px edges) of pre-cdef data in `uint16_t` at a time so we don't need to extend buffers or add edge data inside the SIMD. This may make the code both simpler *and* faster. Same is true for looprestoration also;
|
|
- it may make sense to copy one row (8px+2x2px edges) of pre-cdef data in `uint16_t` at a time so we don't need to extend buffers or add edge data inside the SIMD. This may make the code both simpler *and* faster. Same is true for looprestoration also;
|
| ... | |
... | |
| ... | | ... | |