Skip to content

[dav1d git vs dav1d 0.9.2] Severe thread scaling regression in 1440p 120FPS 10b AV1 clip

What steps will reproduce the problem?

  1. Build dav1d from 1.0.0 and onwards(which includes the new threading algorithms).
  2. Decode the linked video file(1440p 120FPS 10b AV1 video packaged in a raw OBU. https://drive.google.com/file/d/1QDO5djeEQrUr1qY2gExe17aUV-UIxFnc/view?usp=sharing

What is the expected output?

The video decodes as fast or faster with dav1d >=1.0.0 vs dav1d 0.9.2.

What do you see instead?

  • Single-threaded: dav1d >=1.0.0 is faster than dav1d 0.9.2.

  • Multi-threaded: Once you go above 2 threads, dav1d 0.9.2 scales better and faster than dav1d >=1.0.0, resulting in large performance differences.

What version / commit were you testing with? (git describe can produce this info if building from source). On what operating system?

dav1d 1.0.0-77-g345127a7

Openmandriva 4.50 Rome Kernel 5.19.8

Hardware: Zen 2 Ryzen 7 3700X locked at 4GHz for consistent performance testing

Please provide any additional information below. I haven't tested any other clips, but will do so if required using publicly available clips. My hypothesis is that very high framerate video is a test case that wasn't actually taken into account, which makes the threading code stall in terms of thread scaling.

I've included a text file down below to show all of the required performance logging that I've done. Another thing: this might explain the 10b performance regressions I saw on mobile for higher framerate videos. Again, I'd need to test if this behavior is present only with higher framerate videos or with higher resolution videos to confirm my hypothesis.

If required, I can get performance profiling to see what's preventing thread scaling entirely.

https://pastebin.com/6MrhP1cm

dav1d_10b_1440p_120FPS_performance_tests.txt

Edited by Zak
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information