[dav1d git vs dav1d 0.9.2] Severe thread scaling regression in 1440p 120FPS 10b AV1 clip
What steps will reproduce the problem?
- Build dav1d from 1.0.0 and onwards(which includes the new threading algorithms).
- Decode the linked video file(1440p 120FPS 10b AV1 video packaged in a raw OBU. https://drive.google.com/file/d/1QDO5djeEQrUr1qY2gExe17aUV-UIxFnc/view?usp=sharing
What is the expected output?
The video decodes as fast or faster with dav1d >=1.0.0 vs dav1d 0.9.2.
What do you see instead?
-
Single-threaded: dav1d >=1.0.0 is faster than dav1d 0.9.2.
-
Multi-threaded: Once you go above 2 threads, dav1d 0.9.2 scales better and faster than dav1d >=1.0.0, resulting in large performance differences.
What version / commit were you testing with? (git describe can produce this info if building from source). On what operating system?
dav1d 1.0.0-77-g345127a7
Openmandriva 4.50 Rome Kernel 5.19.8
Hardware: Zen 2 Ryzen 7 3700X locked at 4GHz for consistent performance testing
Please provide any additional information below. I haven't tested any other clips, but will do so if required using publicly available clips. My hypothesis is that very high framerate video is a test case that wasn't actually taken into account, which makes the threading code stall in terms of thread scaling.
I've included a text file down below to show all of the required performance logging that I've done. Another thing: this might explain the 10b performance regressions I saw on mobile for higher framerate videos. Again, I'd need to test if this behavior is present only with higher framerate videos or with higher resolution videos to confirm my hypothesis.
If required, I can get performance profiling to see what's preventing thread scaling entirely.