[Discussions] vout thread design
Hi,
For some time, I have been working on changes to fix (or accommodate) concrete severe issues we have on master
(#25603 (closed), #25479 (closed), #25694 (closed)…) related to the vout thread and the display lock. We had quite a lot of discussion (privately and in !159 (closed), !324, !353 (closed)).
But there are other long-running issues (mentioned a lot during our discussions) that are not addressed yet. The most important are, in my opinion:
- "reactive resize": on resize, start rendering the current picture at the new size as soon as possible (in particular without waiting unnecessarily the PTS of the next frame);
- "frame perfect": always display the video at the expected size "in sync" with the UI (window and widgets); basically, on resize, this implies that the rendering at the correct size must be complete before returning from
vout_ChangeDisplaySize()
(or more generally before the window content is re-drawn).
In practice, "frame perfect" depends on "reactive resize": it requires to block the UI thread until a new rendering at the correct size is performed, which is not acceptable if the new rendering is delayed (with a passive wait) until the next frame.
These features could for example solve #25716 (closed) properly.
IIUC, they are currently not planned for VLC 4 (but delayed to a future VLC 5). They will probably require a lot more discussions and vout meetings.
I would like to discuss some of the general principles of the vout thread design and summarize some thoughts.
OpenGL context
Firstly, I noticed recently that vlc_gl_MakeCurrent()
(eglMakeCurrent()
) can be excessively slow when called from several threads.
For example, here is a dirty hack to re-render (from the UI thread) on resize, with some timing logs:
diff --git a/modules/video_output/opengl/display.c b/modules/video_output/opengl/display.c
index a546ade6a5..39ecdd1264 100644
--- a/modules/video_output/opengl/display.c
+++ b/modules/video_output/opengl/display.c
@@ -223,8 +223,10 @@ static void PictureDisplay (vout_display_t *vd, picture_t *pic)
vout_display_sys_t *sys = vd->sys;
VLC_UNUSED(pic);
+ vlc_tick_t t = vlc_tick_now();
if (vlc_gl_MakeCurrent (sys->gl) == VLC_SUCCESS)
{
+ fprintf(stderr, "==== [%ld] MakeCurrent: %ld\n", vlc_thread_id(), US_FROM_VLC_TICK(vlc_tick_now() - t));
if (sys->place_changed)
{
float window_ar = (float)sys->place.width / sys->place.height;
@@ -257,6 +259,7 @@ static int Control (vout_display_t *vd, int query)
vout_display_PlacePicture(&sys->place, vd->source, &cfg);
sys->place_changed = true;
vlc_gl_Resize (sys->gl, cfg.display.width, cfg.display.height);
+ PictureDisplay(vd, NULL);
return VLC_SUCCESS;
}
For calls to PictureDisplay()
(the display()
callback) from the vout thread, vlc_gl_MakeCurrent()
takes few microseconds or tens of microseconds:
==== [159638] MakeCurrent: 8
==== [159638] MakeCurrent: 10
==== [159638] MakeCurrent: 3
==== [159638] MakeCurrent: 15
==== [159638] MakeCurrent: 15
==== [159638] MakeCurrent: 17
==== [159638] MakeCurrent: 37
==== [159638] MakeCurrent: 41
==== [159638] MakeCurrent: 54
==== [159638] MakeCurrent: 15
But vlc_gl_MakeCurrent()
from the UI thread takes a huge amount of time (unless we disable all prepare()
/display()
calls from the vout thread):
==== [159631] MakeCurrent: 4529
==== [159631] MakeCurrent: 11486
==== [159631] MakeCurrent: 46573
==== [159631] MakeCurrent: 15048
==== [159631] MakeCurrent: 37911
==== [159631] MakeCurrent: 33848
==== [159631] MakeCurrent: 8111
These timings are way longer than the time taken by prepare()
and display()
on the vout thread.
Unless there is a separate problem that can be fixed, it suggests to me that all rendering should always be performed by the same thread (the vout thread).
If this is the case, achieving "frame perfect" implies that, on resize, the UI thread should:
- request the vout thread to render at a new size
- wait until the rendering is complete
In other words, it should not do the rendering itself.
EDIT: In fact, these delays are probably caused by vsync.
Asynchronous requests
This section is just a repost of this comment.
I am reluctant to remove the control queue, because it allows to post asynchronous events without waiting. On the other hand, the "control hold" mechanism forces the caller to wait for the next vout loop.
Some existing calls might not need to wait, either on the decoder thread (
vout_NextPicture()
, I'm not sure if it's necessary to wait) or more importantly on the UI thread.In particular, the following commits made the requests "synchronous" to avoid a ABBA race condition:
VOUT_CONTROL_DISPLAY_FILLED
(200a686b)VOUT_CONTROL_ASPECT_RATIO
(52dfbbea)VOUT_CONTROL_CROP_RATIO
(b820a3cc)VOUT_CONTROL_CROP_WINDOW
(f2d8f922)VOUT_CONTROL_CROP_BORDER
(2213524e)VOUT_CONTROL_ZOOM
(b7fce97f)VOUT_CONTROL_VIEWPOINT
(184939bd)You explain the race here:
T1 sets, say, A/R, which updates the window size. Then T2 does the same. Then T2 sends the A/R control to vout thread, then T1 sends the same.
-> window uses T2 A/R, display uses T1 A/R.
But since the control commands are executed in order, another solution to avoid the race could be to push the control command before releasing the
window_lock
rather than after.That would avoid to block the UI thread waiting for the
display_lock
, the caller don't need to wait for a result anyway.
I have a strong preference not to block the caller (the UI thread) unnecessarily for handling a request/command, so I'd prefer to post these commands on a queue, processed by the vout thread.
Wait for rendering
From what I understand (I did not follow the previous vout discussions, so my understanding might be lacking), we would like "asynchronous display" when possible. Concretely, when supported, the module display
callback is set to NULL
, and the core only calls prepare()
with the picture and the expected time, without waiting.
Here are two complications I have in mind for asynchronous display (just to note them somewhere):
- "frame perfect" requires the UI thread to wait on resize, so we need a mechanism to block until rendering is complete even for asynchronous displays;
- to support "reactive resize", if we don't block until the display date, we must take care not to redraw the current picture at the new size before the date of the picture submitted to
prepare()
(at the old size) (and we probably are not able to "cancel" a submitted picture)
In practice, my feeling is that waiting until the display date from the vout thread for all vout displays is simpler, and I'm not sure about the benefits of asynchronous display, especially if this is only for some vout displays (again, my understanding might be lacking).
Future
I would like to share some vague design I have in mind as a base to implement "reactive resize" and "frame perfect" properly.
I think that the vout thread should run some kind of event/command loop, calling in a loop:
- process all commands/events until the deadline of the next rendering stage;
- perform the next rendering stage (static filters, or SPU, or
prepare()
+display()
, …)
This is more similar to the previous design (that we had in VLC3) than the direction taken for VLC4.
It is also different from the previous VLC3 design, in that each rendering step would be performed in successive "vout loops" (instead of a single call to DisplayPicture()
). As a consequence, the commands could be (optionally?) processed during any "idle" state from the vout thread (between "rendering steps"), not only at the beginning of a new frame/redisplay delay. This should make it easier to abort some preparation of the next frame to re-render immediately the current frame (on resize for example).
The other threads could only:
- submit requests/commands using this event/command queue (instead of acquiring
display_lock
/control_Hold()
to perform actions themselves) - wait until relevant synchronizations points
- the end of event/command processing
- the end of the rendering after the event/command is processed (for "frame perfect")
That way, the rendering state could only be modified by a single thread (rather than sharing this management between several threads, for which benefits?), which should avoid some complexity. My feeling is that this would simplify waiting interruptions and cancelation of prepared pictures (what !324 does), and implementing "reactive resize" and "frame perfect".
That would also avoid clients (in particular from the UI thread) to wait for requests when it's not necessary. And all rendering calls would be performed from the vout thread, even with "frame perfect".
Thank you for sharing your objections/feedbacks.