v0.4.0 · Tags · Leo Izen / libplacebo

v0.4.0
6c336a1d · meson: update version · Feb 12, 2018
v0.4.0

This release brings with it a major performance change to the renderer's
scaling pipeline, support for vulkan format emulation, improvements to
the HDR peak detection algorithm as well as greatly improved support for
compute shader-based rasterization.

In addition to the above features, this release renames the `RA`
abstraction to `pl_gpu`, and normalizes the `ra_` prefix to `pl_` in
order to match the rest of the API as well as prevent potential future
symbol conflicts with mpv's internal copy of RA.

Additions:
- Add `pl_shader_encode_color`, which is an inversion of
  `pl_shader_decode_color` (NOTE: Some features, such as inverse
  constant luminance or XYZ encoding, are not currently supported)
- Add API for overlay images. This can be used to directly overlay
  arbitrary planes (e.g. subtitles, or menu items) on top of the
  rendered image or the target FBO. Each plane is allowed to have its
  own color space/repr and format. See `struct pl_overlay` for more
  details
- Add various predefined `pl_color_repr` values
- Add scene change detection functionality to the HDR peak detection
  algorithm in `pl_shader_color_map`. The new algorithm is designed to
  allow using larger peak detection buffers without the associated risk
  of eye-adaptation-like effects
- Add support for texel buffers (uniform and storage) in the pl_buf API.
  These can be used to storage raw texels in buffer format, much like 1D
  textures, but with the usual buffer-related functionality
- Add support for vulkan format emulation, which allows users to
  directly upload and download e.g. rgb8 or rgb32f textures, even though
  these are not typically supported by GPUs. The format emulation uses
  compute shaders and texel storage buffers to fix the data
  representation on-GPU, and is significantly faster than doing the same
  with on-CPU software conversion routines
- Add a `pl_dispatch_compute` function, which behaves like
  `pl_dispatch_finish` but without the FBO output, blending and
  rasterization simulation logic. This is useful when dispatching
  pure-compute shaders. (Note: Not currently very useful for API users
  since there are no pure-compute shaders as part of the public API)

Changes:
- Rename the `ra_` series of abstractions to `pl_` for consistency. The
  old `struct ra` was renamed to `struct pl_gpu`
- The dispatch logic for compute shaders will now correctly simulate the
  dispatch rect, by manually translating coordinates in the shader.
- Disallow using compute shader scaling for flipped source rects. API
  users who relied on this behavior are encouraged to instead flip the
  coordinates of the resulting `pl_dispatch_finish` call
- Have the renderer infer `src_rect` and `dst_rect` when not given
- Have the renderer encode colors to the `target.repr` before outputting
  to screen. This includes, for example, conversion to YUV and dithering
  to the target representation's bit depth (where known)
- Add a `pl_blend_params` struct to `pl_dispatch_finish`, which can be
  used to enable blending (including compute shaders)
  including when using compute shaders.
- Subsampled planes must now be an integer multiple (or inverse thereof)
  of the reference size. If this is not the case, the renderer will
  assume the planes are cropped or expanded. This is needed to correctly
  render e.g. 4:2:0 subsampled JPEGs with odd sizes
- Require that all uses of `pl_image` with the same `signature` point to
  the same data, to allow safe cache purging
- Allow dispatching debanding and bicubic shaders at any size
- `pl_plane_find_fmt` may now be called with `out_map=NULL`
- Change the `pl_color_map_default_params` in response to the new scene
  detection feature. The default buffer size has been increased from 21
  to 64, which leads to a much more stable image that flickers less

Fixes and performance improvements:
- Fix typo that lead to incorrect anamorphic rendering
- Fix various rendering issues when using a flipped src_rect
- Fix a double-free that could happen as a result of rendering failing
  in an unexpected way.
- Fix handling of alpha values in `pl_shader_decode_color`
- Correctly set the queue family index for vulkan buffer barriers
- Work around a locale dependence in libshaderc by calling uselocale/XXX
  before calling into shaderc functions
- Remove POSIX dependence in the test suite
- Remove non-portable ffsll call
- Fix demos/sdl2.c building on windows
- Fix an issue where trying to polar sample multiple planes could result
  in a variable conflict
- Fix inverted plane shift offset calculation
- Respect the debanding settings in the renderer
- Sanitize `pl_pass_run_params.scissors` and silently drop no-ops
- Avoid dependence on exact float equality in the renderer's
  upscaling/downscaling checks
- Fix an off-by-one in the HDR peak computation code which lead to
  subtly too-dark output
- Correctly check for and enable required vulkan device features
- Switch to a custom implementation of locale-invariant printf, to avoid
  depending on broken platform-dependent versions
- Overhaul the renderer's scaling pipeline. The new code is
  significantly faster when rendering cropped images, and also fixes an
  issue where src_rect was not applied when FBOs are unavailable
- Minor performance tweaks here and there
- Cap the sizes of various buffer pools and command queues in order to
  avoid OOM scenarios in infinite loops that don't contain enough
  forward dependencies to force command flushing
- Correctly check for and catch buffer access overflow
- Don't segfault when `pl_buf_create` fails
- Don't segfault when running a compute shader without viewport/scissors
- Fix a 32-bit integer overflow that could occurr when using HDR peak
  detection on 4K content with large values of `peak_detect_frames`
- Fix an off-by-one in the generation of literal LUTs such as the type
  used by the dithering code when textures are unavailable, which led to
  these shaders miscompiling
- Various fixes for building on 32-bit systems
- Fix an uninitialized value in the renderer