- May 28, 2020
-
-
Niklas Haas authored
We make this the default tone mapping function because it's the de-facto standard in the industry. Unfortunately, it's quite a bit heavier than the other algorithms due to the extra PQ round trip needed during tone mapping. It's entirely possible that we could make the choice of whether to do things in PQ space or in linear light a choice completely independent of the tone mapping function itself, since arguably PQ's "perceptual uniformity" quality makes it a suitable space to do tone mapping in regardless of what function we use. That being said, I don't currently want to consider the headache of testing this all, so let's just implement it for BT.2390 and call it a day.
-
- May 27, 2020
-
-
Niklas Haas authored
The code as written allows 10-bit+linear, which is a bad combination. So explicitly ban it.
-
- May 26, 2020
-
-
Niklas Haas authored
Simple oversight. This should be PL_CHROMA_LEFT, not PL_CHROMA_TOP_LEFT. Our own documentation gets it right, I just had the wrong name in my head when writing the code.
-
Niklas Haas authored
Tested on KrigBilateral. Hopefully not too terribly broken. Fixes #88
-
Niklas Haas authored
Rather than using the `params->rect`, we should generally be ignoring it in favor of the raw texture dimensions, and only update the rect accordingly.
-
Niklas Haas authored
A lot of subsampled content out there is untagged, but should be treated as _TOP_LEFT content (the de-facto standard chroma subsampling mode). However, we effectively treat _UNKNOWN as PL_CHROMA_CENTER. To fix this, make pl_chroma_location_offset explicitly default the chroma location. Since a lot of users currently just call that function on the chroma planes always (regardless of subsampling), introduce a new helper function `pl_image_set_chroma_location` to only set the plane shift for actually subsampled planes. Annoying API break, but it is what it is.
-
- May 25, 2020
-
-
Niklas Haas authored
Yay coverage
-
Niklas Haas authored
No point in testing test code that's guaranteed to run, nor generated code that's full of boilerplate switch/case statements.
-
Niklas Haas authored
This was UINT32_MAX, which explodes some of the shader logic. Pick something more reasonable.
-
Niklas Haas authored
They can be interesting to look at.
-
Niklas Haas authored
I think that in theory, vkGetQueryPoolResults shouldn't have been allowed to report VK_SUCCESS in this case, but at least AMDVLK still does so. Explicitly check to see if the time is pending before attempting to read from it.
-
Niklas Haas authored
These performance figures are quite a bit out of date. Many things have changed in the meantime. Nuke the old results and re-measure them on latest git master versions.
-
Niklas Haas authored
This was never updated for the pl_dispatch changes.
-
Niklas Haas authored
-
Niklas Haas authored
Whenever PL_GPU_CAP_PARALLEL_COMPUTE is set. In this case, it's assumed that dispatching multiple compute shaders can take advantage of parallel execution of the compute queues.
-
Niklas Haas authored
Turns out the correct handling img.w/h isn't as simple as setting it to either "the plane size" or "the src_rect size". We have to set it to the plane size for the plane shaders, but the src_rect size for the plane scalers. How annoying. Maybe this field needs to be reworked/figured out somehow.
-
Niklas Haas authored
Ensure `img` is always in a valid state, even on failure.
-
Niklas Haas authored
The refactors in 525279eb broke this, by beginning a shader but never actually dispatching it. As an aside, also handle failures slightly more robustly.
-
Niklas Haas authored
6725b941 was too aggressive, and also not quite the correct thing to be reverting. Re-add the assert and relax the FBO indirection check to test whether the shader is actually resizable or not.
-
Niklas Haas authored
This reverts commit 58e588cd. Required by the previous revert.
-
Niklas Haas authored
This reverts commit 00127130. This commit is a regression in performance since it breaks merging of chroma scalers.
-
Niklas Haas authored
Generate the shader header etc. after figuring out all the plane-specific stuff. Makes a bit more sense in this order.
-
Niklas Haas authored
This is after scaling.
-
Niklas Haas authored
Now that `python3-mako` represents the first nontrivial dependency of a main feature (e.g. `vulkan`), document this list properly. Formatting could be bikeshed, but whatever. I just wanted to have a list out there.
-
Niklas Haas authored
This requires using host query resets, which requires a new extension. The extension in question is also promoted to vulkan API version 1.2, but for some reason, loading the function pointer under the old name fails, even though the extension text seems to suggest that it should be available under the new name as well. (But I think this might be a loader bug). Work around it by just annoyingly introducing the concept of function aliases. Side note, the validation layers think this is an error because they're too old to know about host query resets. I think this commit gives the term "bleeding edge" a new meaning.
-
Niklas Haas authored
We now "officially" support enabling arbitrary extra device features, including both features request by the user and features needed by us. It's about time for me to write the shitty boilerplate to link and memdup chains and stuff. Using generated code to get the sizeof() unknown structs, because how the heck else would you copy over a pNext chain while only modifying the values you care about? Still using shitty hacks for the features whitelisting. I hope they never change the structure of these arrays as just being a list of VkBool32 values. (But in theory we could just generate code for this, in the worst case....)
-
Niklas Haas authored
Turns out I need to generate even more boilerplate than just this, so just write a shitty python script (inspired by RADV/mesa) to do the job. Makes python part of the build dependency of libplacebo, but meson already depends on python so nothing has changed. The CI URL for aarch64 needs to be updated to pull in python3-mako.
-
- May 24, 2020
-
-
Niklas Haas authored
This is now pl_rect2df instead of pl_rect2d, to make it easier to use the pl_rect2df_aspect* series of functions, especially without requiring the hacky rounding integer versions of them. Delete those and add some needed helper functions as well. Rewrite the fix_rects code to crop `src_rect` for any fractional offset in the `dst_rect`, and also for regions of the `dst_rect` that lie outside the target fbo. Also fix a bug in the img->w/h calculation for cropped planes.
-
Niklas Haas authored
Requested by VLC, which wants to abstract the texture binding and coordinates (vertex attribute) away from the actual shader doing the scaling. This requires adding a new type of shader signature, PL_SHADER_SIG_SAMPLER2D, and also extending pl_sample_src to allow specifying samplers in this way. The main glaring note here is that I realized the compute shader does some hacks w.r.t the texture coordinate which does not actually work in a general sense, since it relies on the mapping logic being performed by the pl_dispatch. That being said, it's not entirely clear how vertex attributes should work at all for compute shaders like this. It's entirely possible we may need to work around this either by having the thread 0 in the work group broadcast its position to the rest of the work group (instead of abusing tex_map), or alternatively, we could maybe move some of the pl_dispatch simulation code from the dispatch mechanism to the actual shader binding mechanism, so that generated compute shaders won't have vertex attributes to begin with.
-
Niklas Haas authored
This is sufficiently nontrivial and often-needed enough that providing helpers makes a lot of sense. Add some extra helpers that come up when rendering to sub-rects of targets. The only annoying thing here is the mismatch between pl_rect2df and pl_rect2d. Maybe I can come up with a better API here? Also update the sdl2 demo to actually preserve the aspect ratio, as well as add some test cases to the new helper functions.
-
Niklas Haas authored
I re-benchmarked this and determined that larger group sizes are actually faster these days, so just use however many as possible. The horizontal width of 32 still seems to be pretty decent.
-
Niklas Haas authored
These consume time without really telling us anything useful.
-
- May 22, 2020
-
-
Niklas Haas authored
As an addendum to f3a07a, this quenches all concerns by making sure we re-use same-sized FBOs wherever possible. The new code should be strictly better than even the old code, in terms of minimizing FBO resizes. It is not yet, however, optimal in the sense of minimizing FBO residency for FBOs that could be aliased. Doing that would require refcounting FBOs or something. (Which I guess isn't too difficult to accomplish, so maybe I should give it a try?) That being said, aliasing FBOs might break cross-frame optimizations, which would only end up necessitating us introducing other tricks like rotating between different pl_renderer instances, thus defeating the gains. Would have to be tested anyway to see if aliasing FBOs actually gains more performance than it loses. (And the main benefit would be gaining VRAM, anyway) Reduces some code ugliness as a side benefit.
-
Niklas Haas authored
The current logic allows resizeable parents to become non-resizeable, which is a big no-no since resizeable parents are almost definitely intended for a framebuffer size that has nothing at all to do with the subpass. To fix this, only allow merging resizeable shaders with subpasses that are also resizeable.
-
Niklas Haas authored
As an aside, we also make sh_subpass not explicitly spam/fail the parent shader in this case.
-
Niklas Haas authored
Rather than this merely representing an "in-flight" image, with the img->sh only living for as long as this exists between different pass types, `img` is now conceptually persistent and either in one of two modes: `sh`, or `tex`. This allows us to, in principle, avoid doing redundant FBO roundtrips for cases where the previous pass thinks the next pass needs a tex but the next pass actually needs a sh, such as is currently the case for the AV1 grain shader. Since `pass_hook` in particular can randomly mutate `img` to either of the two forms, callers must now be somewhat vigilant to make sure they always use `img_tex()` and `img_sh()` to access the "current" shader/texture, rather than relying on local variables staying persistent. The use of locally initialized pl_shader is now exclusive to passes that keep their own pl_dispatch_begin calls (for various reasons).
-
Niklas Haas authored
The current approach was to pair each FBO with its use, "semantically". The intent was to minimize the number of "reallocations" that would be required if the number of passes changed dynamically (e.g. as the renderer options changed). However, this is not only an unrealistic design goal to optimize for (users can use separate pl_renderers for wildly different purposes, and for a single conceptual video stream it doesn't really matter), but also, it gets in the way of a planned refactor I have concerning `struct img`. Change this to make all FBOs dynamically allocated. The current implementation simply uses a counter, but a more advanced implementation could use a pool of textures and find ones that have matching sizes before recreating ones that don't. The API shouldn't change as a result of this.
-
- May 21, 2020
-
-
Niklas Haas authored
In doing so I finally hit the time bomb caused by assuming blacklisting compute is only needed on that specific driver version. Turns out, the same issue is present even on newer driver versions. Since I have no idea how else to work around and/or debug it, just permanently disable compute on the CI. Unfortunately, for some reason, the `shaderc` version in this version of the CI image hits random internal exceptions when trying to compile pretty much anything. But using glslang directly works. Except for msan, because we don't have msan-instrumented libc++. Some other changes needed for whatever reason.
-
Niklas Haas authored
These don't generate any errors, but the compilation status still isn't "success". Treat these as errors as well, in terms of logging.
-
Niklas Haas authored
There's really no reason not to. Also clarify that these functions are not, in fact, "mandatory" instance-level function pointers.
-