- May 25, 2020
-
-
Niklas Haas authored
This reverts commit 00127130. This commit is a regression in performance since it breaks merging of chroma scalers.
-
Niklas Haas authored
Generate the shader header etc. after figuring out all the plane-specific stuff. Makes a bit more sense in this order.
-
Niklas Haas authored
This is after scaling.
-
Niklas Haas authored
Now that `python3-mako` represents the first nontrivial dependency of a main feature (e.g. `vulkan`), document this list properly. Formatting could be bikeshed, but whatever. I just wanted to have a list out there.
-
Niklas Haas authored
This requires using host query resets, which requires a new extension. The extension in question is also promoted to vulkan API version 1.2, but for some reason, loading the function pointer under the old name fails, even though the extension text seems to suggest that it should be available under the new name as well. (But I think this might be a loader bug). Work around it by just annoyingly introducing the concept of function aliases. Side note, the validation layers think this is an error because they're too old to know about host query resets. I think this commit gives the term "bleeding edge" a new meaning.
-
Niklas Haas authored
We now "officially" support enabling arbitrary extra device features, including both features request by the user and features needed by us. It's about time for me to write the shitty boilerplate to link and memdup chains and stuff. Using generated code to get the sizeof() unknown structs, because how the heck else would you copy over a pNext chain while only modifying the values you care about? Still using shitty hacks for the features whitelisting. I hope they never change the structure of these arrays as just being a list of VkBool32 values. (But in theory we could just generate code for this, in the worst case....)
-
Niklas Haas authored
Turns out I need to generate even more boilerplate than just this, so just write a shitty python script (inspired by RADV/mesa) to do the job. Makes python part of the build dependency of libplacebo, but meson already depends on python so nothing has changed. The CI URL for aarch64 needs to be updated to pull in python3-mako.
-
- May 24, 2020
-
-
Niklas Haas authored
This is now pl_rect2df instead of pl_rect2d, to make it easier to use the pl_rect2df_aspect* series of functions, especially without requiring the hacky rounding integer versions of them. Delete those and add some needed helper functions as well. Rewrite the fix_rects code to crop `src_rect` for any fractional offset in the `dst_rect`, and also for regions of the `dst_rect` that lie outside the target fbo. Also fix a bug in the img->w/h calculation for cropped planes.
-
Niklas Haas authored
Requested by VLC, which wants to abstract the texture binding and coordinates (vertex attribute) away from the actual shader doing the scaling. This requires adding a new type of shader signature, PL_SHADER_SIG_SAMPLER2D, and also extending pl_sample_src to allow specifying samplers in this way. The main glaring note here is that I realized the compute shader does some hacks w.r.t the texture coordinate which does not actually work in a general sense, since it relies on the mapping logic being performed by the pl_dispatch. That being said, it's not entirely clear how vertex attributes should work at all for compute shaders like this. It's entirely possible we may need to work around this either by having the thread 0 in the work group broadcast its position to the rest of the work group (instead of abusing tex_map), or alternatively, we could maybe move some of the pl_dispatch simulation code from the dispatch mechanism to the actual shader binding mechanism, so that generated compute shaders won't have vertex attributes to begin with.
-
Niklas Haas authored
This is sufficiently nontrivial and often-needed enough that providing helpers makes a lot of sense. Add some extra helpers that come up when rendering to sub-rects of targets. The only annoying thing here is the mismatch between pl_rect2df and pl_rect2d. Maybe I can come up with a better API here? Also update the sdl2 demo to actually preserve the aspect ratio, as well as add some test cases to the new helper functions.
-
Niklas Haas authored
I re-benchmarked this and determined that larger group sizes are actually faster these days, so just use however many as possible. The horizontal width of 32 still seems to be pretty decent.
-
Niklas Haas authored
These consume time without really telling us anything useful.
-
- May 22, 2020
-
-
Niklas Haas authored
As an addendum to f3a07a, this quenches all concerns by making sure we re-use same-sized FBOs wherever possible. The new code should be strictly better than even the old code, in terms of minimizing FBO resizes. It is not yet, however, optimal in the sense of minimizing FBO residency for FBOs that could be aliased. Doing that would require refcounting FBOs or something. (Which I guess isn't too difficult to accomplish, so maybe I should give it a try?) That being said, aliasing FBOs might break cross-frame optimizations, which would only end up necessitating us introducing other tricks like rotating between different pl_renderer instances, thus defeating the gains. Would have to be tested anyway to see if aliasing FBOs actually gains more performance than it loses. (And the main benefit would be gaining VRAM, anyway) Reduces some code ugliness as a side benefit.
-
Niklas Haas authored
The current logic allows resizeable parents to become non-resizeable, which is a big no-no since resizeable parents are almost definitely intended for a framebuffer size that has nothing at all to do with the subpass. To fix this, only allow merging resizeable shaders with subpasses that are also resizeable.
-
Niklas Haas authored
As an aside, we also make sh_subpass not explicitly spam/fail the parent shader in this case.
-
Niklas Haas authored
Rather than this merely representing an "in-flight" image, with the img->sh only living for as long as this exists between different pass types, `img` is now conceptually persistent and either in one of two modes: `sh`, or `tex`. This allows us to, in principle, avoid doing redundant FBO roundtrips for cases where the previous pass thinks the next pass needs a tex but the next pass actually needs a sh, such as is currently the case for the AV1 grain shader. Since `pass_hook` in particular can randomly mutate `img` to either of the two forms, callers must now be somewhat vigilant to make sure they always use `img_tex()` and `img_sh()` to access the "current" shader/texture, rather than relying on local variables staying persistent. The use of locally initialized pl_shader is now exclusive to passes that keep their own pl_dispatch_begin calls (for various reasons).
-
Niklas Haas authored
The current approach was to pair each FBO with its use, "semantically". The intent was to minimize the number of "reallocations" that would be required if the number of passes changed dynamically (e.g. as the renderer options changed). However, this is not only an unrealistic design goal to optimize for (users can use separate pl_renderers for wildly different purposes, and for a single conceptual video stream it doesn't really matter), but also, it gets in the way of a planned refactor I have concerning `struct img`. Change this to make all FBOs dynamically allocated. The current implementation simply uses a counter, but a more advanced implementation could use a pool of textures and find ones that have matching sizes before recreating ones that don't. The API shouldn't change as a result of this.
-
- May 21, 2020
-
-
Niklas Haas authored
In doing so I finally hit the time bomb caused by assuming blacklisting compute is only needed on that specific driver version. Turns out, the same issue is present even on newer driver versions. Since I have no idea how else to work around and/or debug it, just permanently disable compute on the CI. Unfortunately, for some reason, the `shaderc` version in this version of the CI image hits random internal exceptions when trying to compile pretty much anything. But using glslang directly works. Except for msan, because we don't have msan-instrumented libc++. Some other changes needed for whatever reason.
-
Niklas Haas authored
These don't generate any errors, but the compilation status still isn't "success". Treat these as errors as well, in terms of logging.
-
Niklas Haas authored
There's really no reason not to. Also clarify that these functions are not, in fact, "mandatory" instance-level function pointers.
-
Niklas Haas authored
This extension was treated as global in the past, but now that we have a proper extension framework we should just handle it like any other extension. Solves a very concrete issue where dependencies on e.g. VK_EXT_hdr_metadata were not satisfied if the user did not happen to enable this extension. Also check if the extension is loaded when we attempt actually creating a swapchain.
-
Niklas Haas authored
Forgot to un-mark timers as recording. Probably mostly benign, but could hypothetically cause issues.
-
Niklas Haas authored
With ca1ebc, the mismatched shaders of the variety that 092229 was intending to solve should no longer be a possibility. If this is still the case, it's probably a bug. Assert instead.
-
Niklas Haas authored
This function already exists.
-
Niklas Haas authored
The current logic could end rounding the plane up in both directions, inadvertently introducing an upscale by 1 pixel to the refplane, which not only forced an FBO indirection but also lowered the quality due to the effective resample. Furthermore, the code for adjusting the rect by the `rc` was wrong, since it failed to scale the "affine" part of the transformation down to the coordinate space of the plane. Simplify and fix this by only rounding the offset (making sure to always round towards 0 to have the best chance of "doing the right thing"), and also correctly scaling this offset when calculating the offsets for the individual planes.
-
Niklas Haas authored
This can happen if e.g. the shader's output image is rounded up relative to the rect actually being sampled, for example when when users are sampling fractional parts of the image.
-
Niklas Haas authored
Across the boards, "output sizes" and so forth are assumed to be integers, but a lot of the code generating them is based on floating point math. To make it clear what happens, make things consistent by using roundf() instead of implicit conversion (which may end up truncating etc.).
-
Niklas Haas authored
The order these are written in currently means the second check never has a chance to get printed, because SAMPLER_NOOP implies SAMPLER_DIRECT.
-
- May 20, 2020
-
-
Niklas Haas authored
In theory there are various ways we could reconcile this difference, but for now, the easiest thing to do is to simply disable it. I'll try working on a way to bring back support for this, but I wanted to get this fix out of the way first.
-
Niklas Haas authored
09834c52 introduced a regression here that caused an unnecessary FBO indirection to occur for the (relatively common) case of user shaders consulting the dimensions of OUTPUT in their RPN expressions. OUTPUT apparently behaves weirdly. mpv should probably document this better. Even better would be to have had different names for the OUTPUT hook stage and the OUTPUT magic constant for RPN exprs. But it's probably too late for that now. Unless we want to rename the OUTPUT stage, which we totally could do.
-
Niklas Haas authored
Also add a TODO for something I realized while staring at this code again.
-
Niklas Haas authored
This shader is very definitively non-resizable. Flag it as such.
-
- May 19, 2020
-
-
Niklas Haas authored
This is a bit annoying, and arguably such shaders are broken to begin with because there's no way you can possibly rely on this logic without spamming warnings. But it represents a difference in functionality compared to mpv, so we implement this just to be on the safe side.
-
Niklas Haas authored
This is `int` and not `size_t`, which may cause issues on platforms where the two are not the same size.
-
Niklas Haas authored
To make sure these messages see the light of day before the application gets a chance to crash.
-
Niklas Haas authored
Would have made debugging https://github.com/haasn/libplacebo/issues/74 easier.
-
Niklas Haas authored
These were never updated for 7ac80fb8
-
Niklas Haas authored
PL_VERSION can be a bit unhelpful for git commits
-
Niklas Haas authored
A simple oversight. Fixes e.g. Anime4K
-
Niklas Haas authored
Rather than tying struct img to the specifics of pass->cur_img, I wanted to loosen this up a bit and re-use it for the plane state. To this end, I parametrized one key function, `pass_hook`, by the actual `img` to use, and got rid of some redundant fields. This commit doesn't (or shouldn't) change the behavior yet, it's mostly to disentangle the behavior-changing bits of this refactor from the merely refactoring ones.
-