dav1d uses 3 threading models: frame, tile and post-filter multi-threading. This page will attempt to explain each of these. In practice, the best performance (in terms of speed-up as well as speed-up vs. resource usage) will be accomplishes if you combine multiple threading models together. They are meant to be used together, in combination.
Frame-MT means that multiple (temporally adjacent) frames are decoded in parallel. These pictures may have dependencies on each other for entropy state or inter prediction. We resolve these dependencies using thread conditions or specific design choices.
In order to resolve inter prediction dependencies, we use thread conditions. The idea is that if the current thread's coded picture uses another (reference) picture for inter prediction, the current thread simply wait for the reference's thread to have completed reconstruction (prediction, transformed residual, all inline post-filters) of these pixels before continuing with its own prediction. In practice, this means that multiple frame threads will typically lag a couple of superblock rows behind each other, depending on vertical motion.
In order to resolve entropy state dependencies, we use a 2-pass decoding model when frame-MT is enabled and disable-cdf-update=0. In the first pass, we decode all the symbols and cache them in memory. This unblocks the entropy decoding of any next frame thread whose input CDF entropy state is our frame thread's output CDF entropy state. Then, in a second pass, we use these decoded entropy symbols to do the actual reconstruction (prediction, transformed residual, all inline post-filters).
Advantages of frame-MT:
- does not need bitstream support - i.e. it works on any AV1 file;
- scales well to ridiculously large thread numbers;
Disadvantages of frame-MT:
- introduces latency;
- needs a lot of memory;
- needs more threads to get the same relative speed-up compared to tile threading;
- the speed-up is theoretically limited because of somewhat-linear dependencies of the CDF updates, unless disable-cdf-update=1.
Basically, split an image in two and decode each half independently. Inline postfilters (deblock, CDEF, loop restoration) run in the main frame thread since they cross tile boundaries, unless post-filter threading is enabled as well.
Advantages of tile-MT:
- better speed-up per added thread compared to frame-MT;
- does not introduce latency or require significant extra memory.
Disadvantages of tile-MT:
- needs bitstream support, which costs compression efficiency;
- speed-up limited to number of tiles coded in bitstream;
- speed-up only covers the symbol coding / prediction / residual+transform part of the reconstruction, not the post-filters.
This is the latest threading model added to dav1d. It is designed as a thread/task pool, which will later be extended to unify all threading models under a global --thread parameter.
The frame threads split the post filters into superblock row/filter tasks (i.e. if you would have a 9 superblock rows frame with Deblock/CDEF/LR enabled, that would be split into 3 post-filters times 9 sbrows, so 27 tasks.
Each of these tasks depends on the previous post-filter task of the current row and on the same post-filter task of the previous sbrow, allowing for better concurrency. These tasks are connected to their direct forward and backward dependencies, forming a graph. The frame threads schedule the first task of a superblock row once the tile cols for that row are reconstructed, and the rest will simply follow (be scheduled as soon as all their backward dependencies are met).
NB: as this threading model is an intermediary state into a unification, the thread pool is global, not per frame as for the tile threads. In other words, when you specify --tilethreads=4, you really have num_framethreads * num_tilethreads threads, here, resources are shared, or frame thread agnostic.