Commits · f793fc0480f · VideoLAN / libplacebo

May 28, 2020

shaders/colorspace: implement ITU-R BT.2390 · f793fc04

Niklas Haas authored 4 years ago

We make this the default tone mapping function because it's the de-facto
standard in the industry. Unfortunately, it's quite a bit heavier than
the other algorithms due to the extra PQ round trip needed during tone
mapping.

It's entirely possible that we could make the choice of whether to do
things in PQ space or in linear light a choice completely independent of
the tone mapping function itself, since arguably PQ's "perceptual
uniformity" quality makes it a suitable space to do tone mapping in
regardless of what function we use.

That being said, I don't currently want to consider the headache of
testing this all, so let's just implement it for BT.2390 and call it a
day.

f793fc04

May 27, 2020
- vulkan: only accept 10-bit for non-linear colorspaces · 55fde14a
  Niklas Haas authored 4 years ago
```
The code as written allows 10-bit+linear, which is a bad combination. So
explicitly ban it.
```
  55fde14a
May 26, 2020

colorspace: fix default chroma location · 0c19ed29

Niklas Haas authored 4 years ago

Simple oversight. This should be PL_CHROMA_LEFT, not PL_CHROMA_TOP_LEFT.
Our own documentation gets it right, I just had the wrong name in my
head when writing the code.

0c19ed29

shaders/custom: implement OFFSET ALIGN · 430aa2f2
Niklas Haas authored 4 years ago
```
Tested on KrigBilateral. Hopefully not too terribly broken.

Fixes #88
```
430aa2f2

shaders/custom: fix rects for cropped inputs · aa73a1a3

Niklas Haas authored 4 years ago

Rather than using the `params->rect`, we should generally be ignoring it
in favor of the raw texture dimensions, and only update the rect
accordingly.

aa73a1a3

colorspace: treat PL_CHROMA_UNKNOWN as _TOP_LEFT · 04d258da

Niklas Haas authored 4 years ago

A lot of subsampled content out there is untagged, but should be treated
as _TOP_LEFT content (the de-facto standard chroma subsampling mode).
However, we effectively treat _UNKNOWN as PL_CHROMA_CENTER. To fix this,
make pl_chroma_location_offset explicitly default the chroma location.

Since a lot of users currently just call that function on the chroma
planes always (regardless of subsampling), introduce a new helper
function `pl_image_set_chroma_location` to only set the plane shift for
actually subsampled planes.

Annoying API break, but it is what it is.

04d258da

May 25, 2020

tests/context: test more utility functions · 75e9fdb7
Niklas Haas authored 4 years ago
```
Yay coverage
```
75e9fdb7

ci: remove tests and generated code from coverage · d44b0eee

Niklas Haas authored 4 years ago

No point in testing test code that's guaranteed to run, nor generated
code that's full of boilerplate switch/case statements.

d44b0eee

dummy: make limits.max_group_threads sane · 4cc3bff9

Niklas Haas authored 4 years ago

This was UINT32_MAX, which explodes some of the shader logic. Pick
something more reasonable.

4cc3bff9

demos/video-filtering: add timer support · 4efb3f36
Niklas Haas authored 4 years ago
```
They can be interesting to look at.
```
4efb3f36

vulkan: don't try retrieving pending timers · 70f18499

Niklas Haas authored 4 years ago

I think that in theory, vkGetQueryPoolResults shouldn't have been
allowed to report VK_SUCCESS in this case, but at least AMDVLK still
does so. Explicitly check to see if the time is pending before
attempting to read from it.

70f18499

demos/video-filtering: update comments · d872c73c

Niklas Haas authored 4 years ago

These performance figures are quite a bit out of date. Many things have
changed in the meantime. Nuke the old results and re-measure them on
latest git master versions.

d872c73c

demos/video-filtering: update for API changes · e1e868ca
Niklas Haas authored 4 years ago
```
This was never updated for the pl_dispatch changes.
```
e1e868ca
shaders: make sh_try_compute validate group sizes · dc0812e9
Niklas Haas authored 4 years ago

dc0812e9

dispatch: silently upgrade fragment->compute · 3335f047

Niklas Haas authored 4 years ago

Whenever PL_GPU_CAP_PARALLEL_COMPUTE is set. In this case, it's assumed
that dispatching multiple compute shaders can take advantage of
parallel execution of the compute queues.

3335f047

renderer: fix plane pass dimensions · 9faf55d2

Niklas Haas authored 4 years ago

Turns out the correct handling img.w/h isn't as simple as setting it to
either "the plane size" or "the src_rect size". We have to set it to the
plane size for the plane shaders, but the src_rect size for the plane
scalers. How annoying. Maybe this field needs to be reworked/figured out
somehow.

9faf55d2

renderer: don't crash when img_tex fails · 61d0d5c3
Niklas Haas authored 4 years ago
```
Ensure `img` is always in a valid state, even on failure.
```
61d0d5c3

renderer: actually apply AV1 grain · c2400492

Niklas Haas authored 4 years ago

The refactors in 525279eb broke this, by beginning a shader but never
actually dispatching it.

As an aside, also handle failures slightly more robustly.

c2400492

renderer: fix FBO indirection logic · 3bc3d608

Niklas Haas authored 4 years ago

6725b941 was too aggressive, and also not quite the correct thing to be
reverting. Re-add the assert and relax the FBO indirection check to test
whether the shader is actually resizable or not.

3bc3d608

Revert "renderer: assert if SAMPLER_NOOP implies scaling" · 6725b941
Niklas Haas authored 4 years ago
```
This reverts commit 58e588cd.

Required by the previous revert.
```
6725b941

Revert "shaders: make sh_subpass slightly stricter" · 7ec6f58b

Niklas Haas authored 4 years ago

This reverts commit 00127130.

This commit is a regression in performance since it breaks merging of
chroma scalers.

7ec6f58b

renderer: reorder code (style) · 357f9323

Niklas Haas authored 4 years ago

Generate the shader header etc. after figuring out all the
plane-specific stuff. Makes a bit more sense in this order.

357f9323

renderer: fix dimensions of intermediate plane tex · 25c33a50
Niklas Haas authored 4 years ago
```
This is after scaling.
```
25c33a50

README: document optional dependencies a bit better · 739d650c

Niklas Haas authored 4 years ago

Now that `python3-mako` represents the first nontrivial dependency of a
main feature (e.g. `vulkan`), document this list properly.

Formatting could be bikeshed, but whatever. I just wanted to have a
list out there.

739d650c

vulkan: support timers on async transfer queues · cd05aa7e

Niklas Haas authored 4 years ago

This requires using host query resets, which requires a new extension.
The extension in question is also promoted to vulkan API version 1.2,
but for some reason, loading the function pointer under the old name
fails, even though the extension text seems to suggest that it should be
available under the new name as well. (But I think this might be a
loader bug). Work around it by just annoyingly introducing the concept
of function aliases.

Side note, the validation layers think this is an error because they're
too old to know about host query resets. I think this commit gives the
term "bleeding edge" a new meaning.

cd05aa7e

vulkan: add pNext chain boilerplate for physical device features · 6076fcaf

Niklas Haas authored 4 years ago

We now "officially" support enabling arbitrary extra device features,
including both features request by the user and features needed by us.

It's about time for me to write the shitty boilerplate to link and
memdup chains and stuff. Using generated code to get the sizeof()
unknown structs, because how the heck else would you copy over a pNext
chain while only modifying the values you care about?

Still using shitty hacks for the features whitelisting. I hope they
never change the structure of these arrays as just being a list of
VkBool32 values. (But in theory we could just generate code for this, in
the worst case....)

6076fcaf

vulkan: generate boilerplate instead of hand-writing it · dca1913c

Niklas Haas authored 4 years ago

Turns out I need to generate even more boilerplate than just this, so
just write a shitty python script (inspired by RADV/mesa) to do the job.

Makes python part of the build dependency of libplacebo, but meson
already depends on python so nothing has changed.

The CI URL for aarch64 needs to be updated to pull in python3-mako.

dca1913c

May 24, 2020

renderer: refactor pl_render_target.dst_rect · 65e5e17e

Niklas Haas authored 4 years ago

This is now pl_rect2df instead of pl_rect2d, to make it easier to use
the pl_rect2df_aspect* series of functions, especially without requiring
the hacky rounding integer versions of them. Delete those and add some
needed helper functions as well.

Rewrite the fix_rects code to crop `src_rect` for any fractional
offset in the `dst_rect`, and also for regions of the `dst_rect` that
lie outside the target fbo.

Also fix a bug in the img->w/h calculation for cropped planes.

65e5e17e

shaders/sampling: allow taking the sampler2D as an argument · 87ccba7d

Niklas Haas authored 4 years ago

Requested by VLC, which wants to abstract the texture binding and
coordinates (vertex attribute) away from the actual shader doing the
scaling.

This requires adding a new type of shader signature,
PL_SHADER_SIG_SAMPLER2D, and also extending pl_sample_src to allow
specifying samplers in this way.

The main glaring note here is that I realized the compute shader does
some hacks w.r.t the texture coordinate which does not actually work
in a general sense, since it relies on the mapping logic being performed
by the pl_dispatch. That being said, it's not entirely clear how vertex
attributes should work at all for compute shaders like this.

It's entirely possible we may need to work around this either by having
the thread 0 in the work group broadcast its position to the rest of
the work group (instead of abusing tex_map), or alternatively, we could
maybe move some of the pl_dispatch simulation code from the dispatch
mechanism to the actual shader binding mechanism, so that generated
compute shaders won't have vertex attributes to begin with.

87ccba7d

common: add aspect ratio helper code · a41981fa

Niklas Haas authored 4 years ago

This is sufficiently nontrivial and often-needed enough that providing
helpers makes a lot of sense. Add some extra helpers that come up when
rendering to sub-rects of targets.

The only annoying thing here is the mismatch between pl_rect2df and
pl_rect2d. Maybe I can come up with a better API here?

Also update the sdl2 demo to actually preserve the aspect ratio, as well
as add some test cases to the new helper functions.

a41981fa

shader/sampling: use larger group size for polar sampling · f74ad3c8

Niklas Haas authored 4 years ago

I re-benchmarked this and determined that larger group sizes are
actually faster these days, so just use however many as possible.

The horizontal width of 32 still seems to be pretty decent.

f74ad3c8

tests/bench: delete some less interesting benchmarks · 2f5b2f56
Niklas Haas authored 4 years ago
```
These consume time without really telling us anything useful.
```
2f5b2f56

May 22, 2020

renderer: aggressively re-use FBOs · 39df2eb6

Niklas Haas authored 4 years ago

As an addendum to f3a07a, this quenches all concerns by making sure we
re-use same-sized FBOs wherever possible. The new code should be
strictly better than even the old code, in terms of minimizing FBO
resizes.

It is not yet, however, optimal in the sense of minimizing FBO residency
for FBOs that could be aliased. Doing that would require refcounting
FBOs or something. (Which I guess isn't too difficult to accomplish, so
maybe I should give it a try?)

That being said, aliasing FBOs might break cross-frame optimizations,
which would only end up necessitating us introducing other tricks like
rotating between different pl_renderer instances, thus defeating the
gains. Would have to be tested anyway to see if aliasing FBOs actually
gains more performance than it loses. (And the main benefit would be
gaining VRAM, anyway)

Reduces some code ugliness as a side benefit.

39df2eb6

shaders: make sh_subpass slightly stricter · 00127130

Niklas Haas authored 4 years ago

The current logic allows resizeable parents to become non-resizeable,
which is a big no-no since resizeable parents are almost definitely
intended for a framebuffer size that has nothing at all to do with the
subpass.

To fix this, only allow merging resizeable shaders with subpasses that
are also resizeable.

00127130

renderer: don't hard-error when sh_subpass fails · 38b875d9
Niklas Haas authored 4 years ago
```
As an aside, we also make sh_subpass not explicitly spam/fail the parent
shader in this case.
```
38b875d9

renderer: completely refactor struct img · 525279eb

Niklas Haas authored 4 years ago

Rather than this merely representing an "in-flight" image, with the
img->sh only living for as long as this exists between different pass
types, `img` is now conceptually persistent and either in one of two
modes: `sh`, or `tex`. This allows us to, in principle, avoid doing
redundant FBO roundtrips for cases where the previous pass thinks the
next pass needs a tex but the next pass actually needs a sh, such as is
currently the case for the AV1 grain shader.

Since `pass_hook` in particular can randomly mutate `img` to either of
the two forms, callers must now be somewhat vigilant to make sure they
always use `img_tex()` and `img_sh()` to access the "current"
shader/texture, rather than relying on local variables staying
persistent.

The use of locally initialized pl_shader is now exclusive to passes that
keep their own pl_dispatch_begin calls (for various reasons).

525279eb

renderer: use a dynamic list of FBOs instead of hard-coding them · fae66ede

Niklas Haas authored 4 years ago

The current approach was to pair each FBO with its use, "semantically".
The intent was to minimize the number of "reallocations" that would be
required if the number of passes changed dynamically (e.g. as the
renderer options changed). However, this is not only an unrealistic
design goal to optimize for (users can use separate pl_renderers for
wildly different purposes, and for a single conceptual video stream it
doesn't really matter), but also, it gets in the way of a planned
refactor I have concerning `struct img`.

Change this to make all FBOs dynamically allocated. The current
implementation simply uses a counter, but a more advanced implementation
could use a pool of textures and find ones that have matching sizes
before recreating ones that don't. The API shouldn't change as a result
of this.

fae66ede

May 21, 2020

ci: update container version · 0386edd3

Niklas Haas authored 4 years ago

In doing so I finally hit the time bomb caused by assuming blacklisting
compute is only needed on that specific driver version. Turns out, the
same issue is present even on newer driver versions. Since I have no
idea how else to work around and/or debug it, just permanently disable
compute on the CI.

Unfortunately, for some reason, the `shaderc` version in this version of
the CI image hits random internal exceptions when trying to compile
pretty much anything. But using glslang directly works. Except for msan,
because we don't have msan-instrumented libc++.

Some other changes needed for whatever reason.

0386edd3

spirv: handle shaderc internal errors properly · 3b1dd90b

Niklas Haas authored 4 years ago

These don't generate any errors, but the compilation status still isn't
"success". Treat these as errors as well, in terms of logging.

3b1dd90b

vulkan: always load VK_KHR_surface · e630b7ae

Niklas Haas authored 4 years ago

There's really no reason not to. Also clarify that these functions are
not, in fact, "mandatory" instance-level function pointers.

e630b7ae