This release focuses on a major overhaul of gamut mapping, including an
all-new perceptual colorspace (IPTPQc4) and related set of 3DLUT
generation primitives. The new default gamut mapping has been replaced
by a perceptual soft-knee operation which attempts mapping primaries
onto primaries while preserving a balance of both intensity and
chromaticity. The old bespoke tone mapping modes have been replaced by a
single tunable LMS/IPTPQc4 hybrid tone mapping scheme. Lastly, a
new HDR contrast recovery option helps preserve detail in very bright
tone-mapped regions.

In addition, this release focuses on major performance improvements to
the shader generation system, considerably reducing CPU overhead in
release builds.

Finally, full support has been added for libavutil Vulkan hwcontexts,
allowing libplacebo to work directly with Vulkan Video decoding without
any memory copy overhead.

- add `pl_shader_info`
- add `pl_ipt_lms2rgb/rgb2lms and pl_ipt_lms2ipt/ipt2lms`
- add <libplacebo/gamut_mapping.h>
- add `pl_color_map_params.visualize_hue/theta` to visualize gamut 3DLUT
- add `pl_vulkan_required_features`
- add `pl_get_mapped_avframe` to retrieve AVFrame from mapped pl_frame
- add `linear` frame mixer to pl_frame_mixers
- add `pl_render_params.corner_rounding`
- add `pl_matrix2x2_scale/invert` and `pl_transform2x2_scale/invert`
- add `pl_shader_distort` and `pl_render_params.distort_params` to
  enable arbitrary affine distortions (e.g. arbitrary rotation, shear,
  flip/scale) as a final output post-processing step
- add `pl_shader_color_map_ex` and `pl_color_map_args` as a more
  flexible API variant of `pl_shader_color_map`
- add `pl_color_map_parms.contrast_recovery/smoothness` settings, plus
  `pl_color_map_args.feature_map` and `pl_shader_extract_features`
- add `pl_transform2x2_bounds` to compute bounding box of transform

- drop public APIs deprecated for libplacebo v4
- require vulkan 1.3 headers at compile time
- require vulkan 1.2 as a minimum runtime version
- require a minimum version of GLSL 130 (GL/ES 3.0+)
- require a mininum version of meson 0.63 to build
- `pl_dispatch_info.shader` has been changed from `pl_shader_res` to
  `pl_shader_info`, which may require updating some code (the fields
  should be mostly backwards compatible)
- complete refactor of `pl_color_map_params` gamut mapping settings
- reduce default tone-mapping LUT size from 1024 to 256
- deprecate `pl_vulkan.queues`
- add `pl_vulkan_params.extra_queues` to enable more queue types
- users are no longer required to use extension-specific vulkan feature
  structs, and may freely mix and match them with vulkan 1.x
  meta-structs - additionally, both types of structs are reported in
- `pl_vulkan_wrap_params.aspect` can now be left as 0 when not mapping
  individual planes of planar images
- also enable subgroup clustered and quad operations in shaders
- change type of `pl_vulkan.(un)lock_queue` to uint32_t, to match Vulkan
  API and libavutil vulkan hwcontext

Fixes and performance enhancements:
- drop various backwards compatibility paths for GLSL 110
- fix clipping artefacts when using textureless linear tone-mapping
- fix source black point determination when using ICC profiles
- vastly improve CPU overhead of shader (re-)generation on release
  builds, at the cost of less friendly variable names
- significantly improve CPU overhead of pl_shader-related internal
  temporary allocations and various other related operations
- fix HDR passthrough minimum luma value being ignored
- minor fixes and performance improvements on internal allocator
- fix various possible sources of minor memory leaks
- various fixes and improvements to the meson build scripts
- work-around nvidia-specific swapchain resizing bug, at the cost of
  performance when resizing the window
- fix frame queue FPS estimation for non-monotonic PTS
- fix negative inputs edge case in pl_hdr_rescale
- clamp detected HDR peak to legal PQ range
- avoid unnecessary tone-mapping LUT regeneration on static content
- minor optimizations to the fast bicubic shader
- minor theoretical overhead reduction by forcing LOD level
- fix pl_map_avframe_ex data buffer handling
- various improvements to the `plplay` demo program
- correctly handle v4 perceptual ICC profiles
- fix memory leak of vulkan queue locks
- fix theoretical UB in vulkan memory barrier ordering semantics
- relax unnecessarily strict / redundant host memory barriers
- improve debug logging to understand where VRAM is being allocated
- reduce fragmentation of large VRAM allocations (>= 64 MB)
- fix edge case where DRM format modifier was incorrectly checked on
- correctly find vulkan SDK headers via VULKAN_SDK env var
- correctly init d3d11 monitored fences
- various fixes to d3d11 error reporting and logging
- fix extension loading on pl_vulkan_import
- various fixes to d3d11 register allocation
- fix edge case when encountering incomplete interlaced streams
- fix pl_queue_update with explicitly provided "extreme" vsync ratios
- properly hold last interpolated frame for an extra 1/fps duration
- fix off-by-one in interpolation EOF semantics
- fix discontinuity in black point handling of st2094-40
- fix possible double-acquire in pl_renderer frame acquisition logic, as
  well as a possible use-after-unacquire
- correctly handle mistagged YCbCr AVFrames (e.g. tagged as RGB)
- fix incorrect clamping in pl_render_image(..., NULL) path
- fix crash when effectively rendering to blank/empty target crop
- fix alpha blending output
- significantly speed up mathematical opeations on mingw builds
- fix ICC profile gamma estimation for very strange synthetic test
- fix excessive saturation when using black point compensation or
  inverse tone mapping