Render YUV 4:2:2 pictures at full definition in OpenGL

Context

It is straightforward to upload YUV 4:2:0 pictures to OpenGL textures and use them. There are 2 or 3 textures:

  • I420: 1 plane Y at full definition, 1 plane U at half definition, 1 plane V at half definition
  • NV12: 1 plane Y at full definition, 1 plane UV (packed) at half definition

YUY2 pictures (YUV 4:2:2) are more problematic. As reminded by #26682 (closed) (!1590 (merged)), in VLC, such pictures are rendered at half the horizontal definition in OpenGL.

Semantically, the picture contains 3 planes Y U V, with U and V at half horizontal definition:

# This represents the YUV plane components semantically, this is not how they are stored
  Y plane      U plane   V plane
Y0 Y1 Y2 Y3     U0 U2     V0 V2
Y4 Y5 Y6 Y7     U4 U6     V4 V6

These 3 components are stored in a single packed plane:

# YUY2 (packed)
Y0 U0 Y1 V0 Y2 U2 Y3 V2
Y4 U4 Y5 V4 Y6 U6 Y7 V6

Therefore, this single plane is uploaded to a single OpenGL texture.

Problems

This causes problems to extract the YUV values for a given location.

Currently, the texture is uploaded as GL_RGBA (so [Y1 U Y2 V] are mapped to [r g b a]), and the Y2 value is ignored (so half the horizontal resolution is lost): https://code.videolan.org/rom1v/vlc/-/blob/2e44d338763f72a30a7f5631f86d73c2fc58397e/modules/video_output/opengl/interop.c#L271-288

As a consequence, the rendering of a YUV 4:2:2 picture is worse than YUV 4:2:0. We should fix it.

If we upload the texture as GL_RGBA, then every texel contains two pixels ([Y1 U V] and [Y2 U V]). If we upload the texture as GL_RG, then every texel represents 1 pixel, but the chroma information is split over two texels ([Y1 U] and [Y2 V]).

In both case, native OpenGL interpolation could not work:

  • in the first case, Y1 and Y2 would be interpolated separately (instead of together)
  • in the second case, U and V would be interpolated together (instead of separately)

Solution?

I think we should add a specific case for packed YUV 4:2:2 (YUY2 and UYVY) (yet another vlc_gl_sampler_ops instance):

  • the texture GL_TEXTURE_*_FILTER will be set to GL_NEAREST (instead of GL_LINEAR) to disable native interpolation
  • the texture will be GL_RGBA (so each texel contains [Y1 U Y2 V])
  • the generated vlc_texture(vec2 tex_coords) GLSL function would access 4 pixels (i.e. 2 texels) and perform the linear interpolation "manually"

What do you think?

Alternatively, we could upload the same picture twice (😱), once in GL_RG to access the Y components, once in GL_RGBA to access the UV components, and keep the native OpenGL interpolation.

Edited by Romain Vimont