Skip to content

vout: synchronize GLX and EGL/X11 window sizes with gl surface sizes

Jeffrey Knockel requested to merge jeffk/vlc:glx-x11-sync into master

Fixes #26734 (closed) for GLX and EGL.

When the output X11 window is resized, the window's size can become out of sync with the GL drawable's size. When the GL surface is smaller than the X11 window, since OpenGL's origin is at the bottom left, the contents do not span the entire X11 window, and a black border can appear above the output contents (see issue #26734 (closed)).

To fix this, GLX and EGL now embed their own X11 windows, whose sizes they keep in sync with the GL surface's size. Per !1534 (comment 311060), we create windows in the style of commit eb713edf with same depth and visual as the root window. Unfortunately, unlike in !1534 (merged), I don't think we can prevent creating windows this time, as each video output needs control over when their windows are resized.

In the case of GLX, why both glXWaitX() and XSync() after XResizeWindow()? Isn't one of those redundant? On Nvidia, glXWaitX() seems enough to wait until the window is resized, but Mesa seems to require an XSync() [1,2]. I don't know which implementation is right, as glXWaitX() is documented as waiting until "rendering" [3] operations are finished, and someone might argue that XResizeWindow() isn't a rendering operation. Firefox decided to probe [4] whether the OpenGL driver is Mesa to decide whether to glXWaitX() versus XSync(), but instead in these changes we call glXWaitX() and check if Xlib knows if the X server has already processed the resize. If not, we XSync().

In the case of EGL, we have an additional wrinkle where we must also caloo eglQuerySurface(EGL_HEIGHT) after XResizeWindow(). According to the EGL spec, the EGL surface is resized as follows [5]:

"If the native window corresponding to surface has been resized prior to the swap, surface must be resized to match. surface will normally be resized by the EGL implementation at the time the native window is resized. If the implementation cannot do this transparently to the client, then eglSwapBuffers must detect the change and resize surface prior to copying its pixels to the native window."

Unfortunately we don't have direct control over the surface size except by resizing the window and trying to induce the OpenGL driver to notice. Mesa's intent appears to be to check for resize on glClear() [6], which wouldn't be so bad on its own, but here is my (probably faulty) understanding of how Mesa implements this behavior:

  • If you try to draw on a buffer that is invalid, Mesa will validate it. Other actions such as eglMakeCurrent() will also validate a buffer if it is invalid.
  • Typically a surface buffer is invalidated on eglSwapBuffer().
  • During validation of an invalid buffer, this is when Mesa checks to see if its corresponding window has been resized. Once a buffer has been validated, it is no longer invalid until it is invalidated again (e.g., by an eglSwapBuffer()).

In a program where a thread calls eglMakeCurrent() once and then holds it for its lifespan, then there won't be repeated calls to eglMakeCurrent() altering buffer validity. In VLC, we call MakeCurrent() and ReleaseCurrent() as needed, however. And we need to call MakeCurrent() in Resize() before XResizeWindow() because eglWaitClient(), which requires a context, is needed before XResizeWindow() for synchronization. Thus, with the MakeCurrent() call before XResizeWindow(), we seem forced to have Mesa validate the surface buffers before XResizeWindow(), and so glClear() will never trigger buffer validation after a resize, and we will be drawing to a surface the wrong size.

A workaround I've discovered appears to be to call eglQuerySurface(EGL_HEIGHT) (or eglQuerySurface(EGL_WIDTH)) after XResizeWindow(). This triggers the Mesa driver to make an X geometry request to the X server, wait for its response, and then update the surface size with its results as a side effect. Nvidia doesn't need the eglQuerySurface(EGL_HEIGHT) hack.

Finally, is Mesa buggy? I don't know. I had very little understanding of Mesa's internal mechanics going into this, and I likely still have a far from perfect understanding of it. While this behavior violates my expectations, it's not clear to me that Mesa's behavior is in violation of the EGL spec. However, I do wonder if the Mesa folks intended for releasing and making-current a context to have such a visible side effect (or for calling eglQuerySurface(EGL_HEIGHT) to have visible side effects either for that matter).

If we don't want to include the eglQuerySurface(EGL_HEIGHT) hack, and if we think that this is Mesa's bug, I am amiable to removing that line from this merge request and reporting this issue to Mesa's issue tracker.

P.S. In the process of debugging this Mesa issue I developed a small test case, which I've attached in case anyone is interested. On Mesa, gcc -DENABLE_BUG=1 eglbug.c -o eglbug -lX11 -lEGL -lGL && ./eglbug should have a black bar on top of the third frame, whereas gcc -DENABLE_BUG=0 eglbug.c -o eglbug -lX11 -lEGL -lGL && ./eglbug should not. (In neither case would I expect Nvidia to have a black bar on the third frame.)

Edited by Jeffrey Knockel

Merge request reports