... | ... | @@ -9,7 +9,7 @@ Missing bitstream features: |
|
|
- 12-bits/component decoding (without massively increasing the binary size).
|
|
|
|
|
|
Missing support for weird header bit features:
|
|
|
- disable_cdf_update;
|
|
|
- disable_cdf_update (#145/!300);
|
|
|
- OBU without length field;
|
|
|
- super_res;
|
|
|
- tile_ext;
|
... | ... | @@ -30,7 +30,7 @@ Performance optimizations: |
|
|
- change multi-symbol coding `read_symbol()` symbol discovery loop and adaptivity to be simd'ed [Rostislav expressed interest in this];
|
|
|
- project_motion_field in `ref_mvs.c` can be SIMD'ed;
|
|
|
- `cfl_ac` should take size (`w`/`h`) as function arguments rather than as function LUT indices, so that only subsampling (`420`, `422`, `444`) is a LUT entry;
|
|
|
- `memset()` for context setting in coefficient (`decode_coeffs()` in `recon.c`) and block (`decode_b()` in `decode.c`) can be optimized similar to ffvp9 to act in blocks using `switch`/`case` pairs for constant-size writes instead of `memset()`. For examples, see `SPLAT_CTX()` in `vp9block.c` in FFmpeg;
|
|
|
- `memset()` for context setting in coefficient (`decode_coeffs()` in `recon.c`) and block (`decode_b()` in `decode.c`) can be optimized similar to ffvp9 to act in blocks using `switch`/`case` pairs for constant-size writes instead of `memset()`. For examples, see `SPLAT_CTX()` in `vp9block.c` in FFmpeg (!301);
|
|
|
- `backup_lpf()` in `lr_apply_tmpl.c` backs up 4 lines per 64 pixels per plane, and copies bottom to top per superblock (each 128 or 64 pixels). Most of this is unnecessary. Using a flippable index means we don't need the second copy, and using 64-pixel instead of sb (64 or 128) pixel cdef runs (and then running LR, and then optionally the second cdef and second LR) means we only need to copy the pre-cdef top pixels, not the bottom ones, saving 50% copies. CDEF backup already does all of this. Bonus points for merging the CDEF backup and LR backup together so LR backs up nothing at all;
|
|
|
- postfilter threading;
|
|
|
- threading can become a generic worker queue (one tile_sbrow symbol parsing/recon, one sbrow postfilter(s)) and then use a generic single threadpool instead of separate tile/frame[/postfilter?] ones.
|
... | ... | @@ -39,7 +39,6 @@ Cleanups: |
|
|
- LR/MC intermediate 2d buffers in C dsp can be reduced by doing windowed like in SIMD;
|
|
|
- cdef: noskip_mask resolution can be 8x8;
|
|
|
- ref_mvs: non-cur frame MVs can be at 8x8 resolution, only direct neighbours need to be 4x4;
|
|
|
- tests for bitstream compliance (!175);
|
|
|
- lfmask and l/a ctx zero can be done in tile instead of frame context for better distribution.
|
|
|
- show_existing_frame will be placed in the frame output queue as something keeping a frame thread busy, meaning for such cases, the frame thread will momentarily stall. This is partially required to prevent overflows of the output queue, or growing it to possibly infinite size on garbage input. But for the regular use case, we can make the output buffer queue twice as big, so that each invisible frame can have one matching show_existing_frame, allowing all frame-threads to be active for the worst-"real"-case while still never overflowing on pathological conditions;
|
|
|
- the output queue handling is duplicated in `decode.c`, `lib.c` and `obu.c`, so merge this in one common place.
|
... | ... | |