Skip to content

qt: implement denoting viewport in dual kawase blur

This makes it possible to clip the result without post-processing (be it stencil mask, or even the fragment shader if you consider it post process), but rather in layer stage!

As a corollary, depending on the excess area, this can save a lot of video memory. The saving is relatively more in two-pass mode, because we are not back-propagating the viewport to the intermediate layers in four-pass mode (at least for now). As a reminder, in two-pass mode, there is only one layer which is the final layer, and in four-pass mode, there are two intermediate layers, and one final layer. However with the recent static optimization hint work, which can be used to the intermediate layers in four-pass mode, it should not be necessary (can be used to reduce peak video memory consumption, but that is less important anyway).

One might ask, would not this be problematic for static sources, where we would apply the blur once and show the relevant part by clipping and save some computing. First, that is only relevant for the cases where effect scale does not change (which is not a use case for us atm), second, it is arguably better to save video memory than GPU cycles (computing) due to incurred re-blurring, and third, if wanted it can still be used (instead of the new viewport property, clip: true can be used). It should be noted that since we are supporting live blurring (such as item layers), continuous re-blurring must be tolerable anyway and modern GPUs are good at that but they still lack adequate video memory.

There are two use cases for this at the moment:

  1. Artists page, where we use Qt's clip: true to clip the excess part. However, it means the final layer contains the excess part consuming video memory unnecessarily. If Qt is using stencil mask for clipping, at least due to early stencil the fragment shader would not run for the obscured fragments. This approach already has the same behavior, by completely getting rid of the excess fragments, but with less video memory consumption. At the same time, not having a clip node should save some of the work the batch renderer is doing.

  2. Player page. There, we are not even bothering with clip: true, because the window itself naturally clips the content. This approach is going to save less relative memory there because we are using the four-pass configuration. With the static optimization hint, the relative saving should be equal for two/four-pass modes.

Some additional points:

  • With nvidia-smi I noticed up to 20~ MiB video memory reduction in artists page (with large window width).
  • The layers (and the textures) can be debugged with GammaRay, it has been really helpful to me. I paid attention when coming up with DualKawaseBlur so that it could be debugged nicely with GammaRay.
  • Maybe with !7667 (merged) we can close #26908 (closed).
Edited by Fatih Uzunoğlu

Merge request reports

Loading