... | @@ -9,7 +9,7 @@ SIMD: |
... | @@ -9,7 +9,7 @@ SIMD: |
|
- move dequant from `decode_coeffs()` to itx;
|
|
- move dequant from `decode_coeffs()` to itx;
|
|
- order_palette() to dsp for simd;
|
|
- order_palette() to dsp for simd;
|
|
- change coef contexting (hi/lo_ctx) to be diagonal-oriented for dsp/simd;
|
|
- change coef contexting (hi/lo_ctx) to be diagonal-oriented for dsp/simd;
|
|
- project_motion_field in `ref_mvs.c` can maybe be SIMD'ed;
|
|
- `_save_tmvs()` and `_load_tmvs()` in `refmvs.c` can (maybe?) be SIMD'ed, along with all `_splat_*()` code in `refmvs.h`;
|
|
- a specifically optimized version for `mc.put/prep_scaled()` for super_res, since then `my` is always 0, so there is only horizontal scaling, not vertical.
|
|
- a specifically optimized version for `mc.put/prep_scaled()` for super_res, since then `my` is always 0, so there is only horizontal scaling, not vertical.
|
|
|
|
|
|
Multi-threading:
|
|
Multi-threading:
|
... | @@ -34,10 +34,6 @@ Cleanups: |
... | @@ -34,10 +34,6 @@ Cleanups: |
|
- palette buffers are always 16-bit, even if content is 8-bit (remaining item in #257);
|
|
- palette buffers are always 16-bit, even if content is 8-bit (remaining item in #257);
|
|
- LR/MC intermediate 2d buffers in C dsp can be reduced by doing windowed like in SIMD;
|
|
- LR/MC intermediate 2d buffers in C dsp can be reduced by doing windowed like in SIMD;
|
|
- cdef: noskip_mask resolution can be 8x8;
|
|
- cdef: noskip_mask resolution can be 8x8;
|
|
- ref_mvs: non-cur frame MVs can be at 8x8 resolution, only direct neighbours need to be 4x4;
|
|
|
|
- lfmask and l/a ctx zero can be done in tile instead of frame context for better distribution.
|
|
- lfmask and l/a ctx zero can be done in tile instead of frame context for better distribution.
|
|
- show_existing_frame will be placed in the frame output queue as something keeping a frame thread busy, meaning for such cases, the frame thread will momentarily stall. This is partially required to prevent overflows of the output queue, or growing it to possibly infinite size on garbage input. But for the regular use case, we can make the output buffer queue twice as big, so that each invisible frame can have one matching show_existing_frame, allowing all frame-threads to be active for the worst-"real"-case while still never overflowing on pathological conditions;
|
|
- show_existing_frame will be placed in the frame output queue as something keeping a frame thread busy, meaning for such cases, the frame thread will momentarily stall. This is partially required to prevent overflows of the output queue, or growing it to possibly infinite size on garbage input. But for the regular use case, we can make the output buffer queue twice as big, so that each invisible frame can have one matching show_existing_frame, allowing all frame-threads to be active for the worst-"real"-case while still never overflowing on pathological conditions;
|
|
- the output queue handling is duplicated in `decode.c`, `lib.c` and `obu.c`, so merge this in one common place.
|
|
- the output queue handling is duplicated in `decode.c`, `lib.c` and `obu.c`, so merge this in one common place. |
|
|
|
\ No newline at end of file |
|
Reimplement:
|
|
|
|
- `ref_mvs.c`(#217/!945). |
|
|
|
\ No newline at end of file |
|
|