... | ... | @@ -22,12 +22,13 @@ Missing software features: |
|
|
|
|
|
Performance optimizations:
|
|
|
- it may make sense to copy one row (8px+2x2px edges) of pre-cdef data in `uint16_t` at a time so we don't need to extend buffers or add edge data inside the SIMD. This may make the code both simpler *and* faster;
|
|
|
- simd for any function already in a ${anything}DSPContext, for any platform;
|
|
|
- simd for any function already in a ${anything}DSPContext, for any platform (see #78 for AVX2);
|
|
|
- move emu_edge to dsp for simd;
|
|
|
- move dequant from `decode_coeffs()` to itx;
|
|
|
- order_palette() to dsp for simd;
|
|
|
- change coef contexting (hi/lo_ctx) to be diagonal-oriented for dsp/simd;
|
|
|
- change multi-symbol coding `read_symbol()` symbol discovery loop and adaptivity to be simd'ed;
|
|
|
- change multi-symbol coding `read_symbol()` symbol discovery loop and adaptivity to be simd'ed [Rostislav expressed interest in this];
|
|
|
- project_motion_field in `ref_mvs.c` can be SIMD'ed;
|
|
|
- postfilter threading;
|
|
|
- threading can become a generic worker queue (one tile_sbrow symbol parsing/recon, one sbrow postfilter(s)) and then use a generic single threadpool instead of separate tile/frame[/postfilter?] ones.
|
|
|
|
... | ... | |