Implement smart threading defaults based on content and system

changed the description

w.r.t content-adaptive threading, the idea I had was to get rid of the concept of tile/frame threads and introduce a unified thread pool and work queue, to which "frame" and "tile" jobs get appended. Then you could just specify --threads N to give the size of the worker pool, and have it automatically scale to tiles and/or frames depending on the content.

It would also allow us to scale to more cores for highly parallel jobs, by potentially splitting them up into smaller jobs in cases where we suspect it could help. (e.g. if a lot of jobs are blocked on the completion of this task)

And we could do things like film grain on separate threads as well, thus allowing film grain application to use left-over cores for "free" in order to disappear from the main processing loop.

added feature label

mentioned in issue #15 (closed)

A unified thread pool sounds great; the current system performs nicely but produces a lot of total threads, each of which consumes resources such as stack space.

For instance on an 8-core machine with HD material, specifying 8 frame threads and 4 tile threads produces 40 total threads. In emscripten this defaults to eating 80 megs of stack space in the WebAssembly memory plus whatever the system allocates for the Web Workers' native stacks (probably another 80 megs).

A unified 8-thread pool would save 64 megs from the WebAssembly memory and another 64 megs of native memory on the 8-core case. Maybe not a lot per-decoder compared to modern devices total RAM, but if you have multiple videos on a web page that get paused and left potentially alive, that adds up quick.

(Sharing a thread pool between decoder instances is also an interesting possibility for that case -- usually you don't have multiple videos playing at once but you might have multiples on a page that need to be paused/played.)

A thread pool would be great, possibly greater if a client can provide the pool / become responsible for task sharding. Chromium already has an extensive task pooling system:

https://cs.chromium.org/chromium/src/base/task/post_task.h

It'd be at least memory efficient if we could dispatch tasks to our existing pool. I'd have to do some performance testing to see if it could be a performant as raw threads / an internal pool though.

marked #264 (closed) as a duplicate of this issue

marked this issue as related to #264 (closed)

mentioned in merge request !822 (merged)

closed with merge request !822 (merged)

changed milestone to %1.0.0

Implement smart threading defaults based on content and system

Child items 0

Activity