[Workshop 2021-12-06/07] Technical Discussions summary

Here are the notes resuming the VLC Workshop 2021-12-06/07 discussions.

Warning: the document is quite long: it has been a very dense workshop, with a lot of subject covered.

Vout

VOUT, input events & display lock

The problem

Cf. @rom1v MR proposition (!324) for the prologue.

Also cf. #26335 (closed) and #26184 (vsync iOS).

Summary:

Input (mouse/keyboard) events are blocked by the display lock
Display lock has been created to:
- force the same display state between prepare and display functions
- avoid concurrent access to vout display

Though, there is no fundamental reason to manage the mouse/keyboard events within the display lock.

@Courmisch has started to extract the mouse & keyboard events in an independent event loop and will propose an MR soon.

BUT, the resize event is still a problem.

Note: one of the consequences is that an mouse/keyboard event will be harder to get access to the exact state of the vout, like the current picture displayed (and its related information like its timestamp), but this is considered as very minor (negligible) counterpart.

The current state of the vout is also triggered some side effects on particular platforms. For example, on iOS, it generates a lot of display late errors (also because the display is linked to the vsync and the vsync is not managed properly - cf #26184).

Relaxing the display lock during a resize

One of the propositions related to @rom1v work is to postpone the prepare phase as late as possible to reduce the display lock preemption.

But the main reason of the prepare/display split is to limit the display duration variability and be able to have a "correct" picture display timing on screen.

Moreover, if we want (as a long term point of view) to manage vsync scenarios, we will probably want to start the prepare phase as soon as possible (just after the previous vsync tick) to have the maximum amount of time to process all we need before the next tick.

On the other hand, it seems that most vout would be able to perform the display phase even if the prepare phase is not evaluated with the "exact" dimensions: they are able to "scale" the content to the display size. This would result in a lower (upscaled) quality scene displayed, but this seems a far better compromise than having to wait for a new "perfect" rendered scene with the exact new dimensions.

As a general point of view for the "frame perfect" point of view, the general consensus seems to be:

Picture display timing accuracy is king: VLC should be as accurate as possible during "standard" playback conditions (aka non resize scenarios), so it should always make everything possible to have the display phase as predictable as possible (or as fast as possible). One of the consequences is that the prepare should remain "as soon as possible" (even in the resize scenarios).
The Resize scenarios should favor interactivity over quality - this implies the UI thread should be locked as little as possible during a resize - even if the resulting rendered scene has lower quality. In other terms, render quality during a resize is considered as "best effort".

-> The consequences on the general consensus are:

the prepare should be performed as soon as possible
the display lock should be relaxed between the prepare and display phases
on all the main vouts, in case of a resize event between prepare and display, the wait is interrupted and the display phase is performed immediately (at the "old" dimensions). This is not a major problem to keep that "old" prepare+display picture on screen with a new display size because if the display phase is able to stretch it on screen (cf section below).

Vout Prepare cancellation

The other subject related to the vout rework is the ability to "cancel" the prepare phase.

One of the mitigation regarding the previous point could be to be able to stop and/or restart the prepare phase as soon as the new (resize) information are received.

A proposition for a more flexible vout workflow could be to have:

one prepare phase
0-N displays (you can have more than one display phase if you need to redraw the scene - ie window damage, etc.)
one unprepare phase

Unfortunately, most of the current vout code requires a display phase after a prepare phase, so such a workflow requires a lot of code rework (and tests!).

However, as a mitigation for VLC 4.0, it is proposed to:

add an optional cancel() callback in all vout. The goal is to implement it on all major vouts and use the display() callback for the vouts that cannot implement it.
For these vouts, it is similar to replace the real cancellation workflow with a display triggered as soon as possible - so instead of a prepare-cancel-prepare-display phase, the vout will perform a prepare-cancel-display ASAP-prepare-display workflow.

-> The proposition is globally accepted.

Note: #26183 proposes a (isofunctional) vout refactor - where the vout is considered as a state machine, and only transitions between states are not mandatory sequential, although both this refactor and the workflow rework can be done separately.

SPU performance with large displays

The problem

On large displays and/or big resolution formats, when subtitles are displayed, the display can suffer noticable lags.

One of the main cause is the management of the SPU scaling before the actual display:

the SPU is rendered with a certain ratio/size in software (freetype)
then a swscale filter is added to perform the ratio/scaling
then the SPU is uploaded to the GPU, then rendered/blended.

Discussion

@alexandre-janniaux has some prototype code to add a "can scale spu" capability to the vout, and avoid relying on swscale.

But, after some discussions, it seems that every vout that is able to manage the SPU (blend them) seems to be able to scale them on their side.

It is then proposed to consider this as a requirement: a vout that is able to "manage" a SPU is also able to scale it.

Side subject: black bars

During the SPU discussion, the black bars subject has been introduced: the issue being the SPU could be positioned outside of the "real" video position (on top of the black bars for example).

Everyone agrees that this should be possible, and as a consequence, state that the black bars should always be in the responsibility of the vout display.

-> The proposition is accepted.

Note: as a consequence, the vout modules that do not manage the SPU (blending+scaling), will not necessarily manage the black bars.

OpenGL filters and format change

The problem

With the current behavior of the decoder-filters-vout pipeline, the vout input format is very dependant on the decoder output format.

However, some filters requires some format change (especially on the dimensions part), for example:

the crop/add filter
HEVC deinterlace filter - decoders send fields separately, so the output of the decoder declares half the height, and the deinterlace filter will send the correct height (x2)
upscalers ?

For these scenarios, the core is adding a converter between the filter and the vout to match the previous format.

@rom1v proposes to add a local optimization between the filters and the vout to be able to bypass the converter injection.

The proposition revolves around the capability of the vout to accept an update_format command positively if the filter asks for it.

The general consensus on this question is:

changing the dimensions part of the format seems reasonable
changing the chroma part seems more risky, and maybe not worth the effort. Some additional investigation on the subject may be needed.

-> The "dynamic" update_format for the vout input is globally accepted for the dimensions part of the format. The chroma part update needs confirmation.

Decoders, Sout, filters

General data transmission - Ancillaries

Cf. MR !744 (merged)

There is no fundamental issue on the current MR. The only remaining question is the naming convention.

General consensus is:

a block is a buffer of bytes
a frame is a buffer of bytes with attached metadata.

Applied to the VLC playback workflow, this is roughly equivalent of:

before the demux module, you have blocks (buffers of bytes without metadata)
after the demux module, you have frames (buffers of bytes with metadata - especially timestamps)

-> The proposition is globally accepted. @tguillem will modify the MR accordingly.

Note: this implies renaming data into block.

Decoder drain

The problem

When a format change is detected by the packetizer, the packetizer will ask the decoder to perform a drain. As a consequence, the packetizer will synchronously wait for both the decoder and its entry FIFO to be completely empty before resuming the workflow.

Mitigation

@inthewings propose to add an asynchrounous mecanism to propage the update format information and do not wait for that long.

One reasonable solution could be to attach the update format event to the first related frame as a metadata or a flag (add an in-band control). When the decoder detects the flag, it triggers its own drain, then is restarted.

-> The solution has been globally accepted, @inthewings can propose a related MR.

Note: an even more flexible solution would be to replace the frame FIFO with a message FIFO with frames attached to it: this way, the control signal could be sent even if there is no data available in the pipeline. However, this is a big change that would need a significant amount of time to develop and even more to test, so this is definitively not something to plan for VLC 4.0.

On-the-fly decoder reconfiguration

@inthewings also proposed to add a new API to the decoder modules to ask if they are able to accept a new format without triggering a new module instance creation.

This could be a plus for the mediacodec module, because the way it manages the direct rendering output (surface dependent), not been forced to reload you avoid vout reconfiguration. For other decoder modules, this is less problematic.

The main issue with this proposition is: there are some scenarios where switching to another decoder module could be suitable. Example:

you start an adaptive streaming playback, with a very good bandwidth. The 4K profile is selected, your HW decoder does not support it, so VLC fallback to a SW decoder.
after some time, the adaptive access switch to the HD profile. If your HW decoder supports it, it could be a good idea to switch to the HW decoder module.

Moreover, if you want to determine if switching module could be suitable, you cannot rely on the module's priority.

An additional proposition would be to force the module to answer positively to the callback only if they can guarantee they will not switch to SW decoding after the format change.

However, this would introduce HW decoding specific codebase, which can be hard to test - especially in CI, so it could be easilly silently broken.

-> The proposition did not reach a consensus.

Stream Output

PCR in sout

There is no real debate around PCR utility in the current implementation of the stream output.

Discussions summarized the current work of @alaric, who is trying to find a proper way to fix the chromecast support (with subtitles). The best solution found so far is to add to the stream output support for HLS output with WebVTT subtitles:

this avoids blending the subtitles in the video
this seems to be the only access/mux the chromecastS support for audio+video+subtitles

However, this implies each track muxed in the stream output (video, audio, webvtt) have extremely different transcode/transmux speed, and the data fragmentation is also very different. For example, the subtitle track can send very sparse data, and to create a valid HLS stream, the sout needs to know how to differentiate between:

there will be no data for next period (empty segment)
OR the transcode stream for this track is late compared to the other streams.

PCR is then used in the sout to represent the progress of the data processing in the transcode.

This does not resolve all the existing sout issues, though (ie sout-keep).

External Filters API

This is a sensitive/complicated subject, as an external API are likely prone to race condition

@alexandre-janniaux has made some prototypes in the past, adding filters definition as a variable in the input, but this is absolutely not a long-term and recommended design¹.

The subject is even more complicated if you consider you can add/remove filters on-the-fly (from the user UI for example).

-> The consensus is to postpone the subject after VLC4.0.

Notes:

this is also linked to the stream output subject, as since the sout workshop (and #5037), the goal is to separate the filters definition from the transcode pipeline and remove the sout exception.

Clock

Clock Context

The problem

The goal is to keep the data related to the PTS as long as possible inside the playback workflow. To do that, the clock internal state needs to be kept through the whole pipeline, and not scratched synchronously.

As an example scenario: during a mediaplayer playback:

a change in the clock (discontinuity) is performed
currently, the clock internal state is reset, so every picture that has been decoded and not displayed has now an invalid PTS (because the PTS cannot be converted to the system time anymore). So they are discarded.
however, if we could keep the current state for these pictures (and not for the future ones), we could be able to display them.

Discussion

@tguillem has some prototype branch that adds a notion of "clock context" that is kept until the last frame of the corresponding PTS is consumed by the audio/video outputs.

The idea seems okay, but there are some constraints regarding who and how the contexts are managed:

New clock context MUST by triggered by the demux module.
Deducing the context relative to a frame PTS can be difficult. You need either:
- take as a principle that clock contexts can be represented as PTS ranges, and clock context ranges cannot overlap at the same time
- or add a clock context identifier attached to every frame that have a valid PTS
- or use a new es_out (aka a new fifo), or manage multiple fifos, where each fifo is related to one clock context.

Clock Rework - Master clock input

The problem

Reminder:

main clock: the main interpolation of the PTS <-> system time match - any slave clock will ask the translation from PTS to system time to it.
master clock: the clock that updates the main clock. Default master clock is Master audio by default in VLC4.
monotonic clock: the "CLOCK_MONOTONIC" system clock.

The new clock design efforts were heavily focused on the Master Clock Audio scenario.

But, to avoid some regressions between VLC3 and VLC4, Videolabs integrated the old Master Clock Input scenario in VLC4, by simply adding the VLC3 clock as a possible master of the main clock in VLC4.

During several tests in the Master Clock Input scenario, it has been observed some unexpected behavior from the main clock.

It appears that:

the old (VLC3) input clock has a very empirical model, especially the drift mitigation part - it shifts every 200ms - either positively or negatively
as a consequence, the Master Clock Input sometimes sends non-monotonic (system time, PTS) tuples to the main clock
moreover, if there is a significant shift in the input values, the main clock will try to compensate both on its coeff value and on its offset value - which can lead to a lot of local flapping.

Discussion

@typx is warning about the expectations from the main clock regarding the input values:

the main clock has been designed to match the EXACT last input tuple (system time, PTS)
the main clock requires the input values to be monotonic, and probably to respect some stability.

So the input clock needs to be filtered before being consumed by the main clock.

One of the question is to determine which component is the best suited to filter, in term of drift, and in term of jitter.

The global consensus is to implement the input filters in the Master Clock, because it is very dependent of the type of source the input clock relies on. For example, you expect a lot of jitter with a network input clock, but absolutely not in a Master Audio scenario.

libvlc

Separating Dialogues vs Notifications

Proposition: spliting dialogue and notification API to have a clear separation and be able to manage error handling outside of UI lifetime (ie during UI loading).

-> Proposition is accepted. @chub will propose a MR on that subject.

OSD Activation/Deactivation API

The problem

The proposition is to add some flexibility in the current API to activate or deactivate part of the OSDs.

For example: being able to deactivate the Volume OSD display, but not the title one.

However, this would add a lot of complexity in the API, for a very small benefit (with very rare usecases).

Discussion

After some discussion, the proposition is to:

keep the global on/off API
deactivate the OSD by default in libvlc (as well as the keyboard/mouse event)

-> Proposition is globally accepted

Note: the general consensus is that it could be a good idea, after VLC4.0, to refactor the existing OSD API to be able to specify another (external) rendering provider - not only the default OSD rendering module.

Apple/iOS - Main loop and deadlock

The problem

For more details about the subject: cf !457.

The problem is related to the resources used by vlc and that requires some action (allocation/release) inside the main thread. For example, on iOS, the windowing API requires events to be consumed by the main loop (and not another custom thread/event loop).

This could lead to some libvlc deadlock.

Discussion

The current discussion conclusion about this problem are:

having a different paradigm for windowing resource management only for darwin systems will make it very difficult to maintain.
vlc core should not "manage" the Darwin main loop, or the main loop in general (this should be the application responsibility).
However, libvlc API should also be considered as "safe" in term of deadlocks: if libvlc is in a deadlock state, it should be because the application did it so (ie blocking a vmem callback and asking for a release), and not because vlc is waiting for the next main loop iteration.
adding an asynchronous release function could help find a more generic solution, but it may not resolve all the deadlocks related to the fundamental problem.

Conclusion:

Either implement a "main loop management" strategy in VLCKit
Or find a way to generalize the behavior to every platform (in case multiple platform have the same issue)

Note: A long term solution could be to design a Resource Manager in vlc, but this is absolutely not something that could be implemented for VLC4.0.

Core - Other subjects

Block split/merge

@Courmisch is suggesting that having a way to split a block in multiple sub-blocks in read-only mode could be very convenient, but not urgent right now.

First blocks trashed with vorbis format

cf. Issue #20927 (closed).

The problem: VLC is dropping the first decoded vorbis frames because the first PTS have negative values

One of the proposition is to be able to manage negative PTS values - considering they are perfectly valid ones. Consequences are:

The new "Invalid" value for PTS should be INT64_MIN
The 0 PTS value should be considered as valid
As code needs to be changed on that subject, it could be a good idea to also define proper overflow boundaries for timestamp computing

However, @fcartegnie warns that it can break a lot of the current code, and some formats can conflict with this convention:

some decoders do not accept negative values
some formats may have timestamps in uint64. As a general consensus, this subject needs more investigation before considering the solution as safe.

Workaround: for 4.0, it has been decided to shift vorbis and ogg output PTS to be > 0.

-> The short term solution is globally accepted. The long term solution needs some investigation.

Audio output/device split

@tguillem has started the split the audio output code into audio device and audio stream (like the wasapi/mmdevice split).

He proposes to:

keep on the work internally for iOS and android platforms in VLC 4.0
generalize it only in VLC 5.0

-> Everyone agrees.

Gapless

Cf. Thomas's MR

As VLC 4.0 does not have a real "multi-input" support, this is not a "real" gapless feature, as this cannot guarantee instant track switching 100% of the time.

No one is fundamentaly against the MR, this is not the 100% complete feature (no cross-fade, etc.), but this seems to work quite well.

However, @typx suggests to rename the feature to avoid unnecessary complains from expert users.

-> The compromise is globally accepted.

Gitlab/Merge Requests process

The situation

The MR process is a huge success, with an average hundred MRs per month processed. The problem is, the number of reviewers may be too low to maintain such a tempo.

The current process is globally okay, but it needs some tweaking.

Process changes

Inactive MRs

The Stale status has been created since the launch of the MR service, but it was never automated.

This seems a good time to implement it.

-> The bot should implement the Stale status, tagging MRs that remain inactive for a significant period. The default inactivity period is set to two months.

Draft bug

The current implementation of the bot is not able to detect the transition from a draft/wip MR to a non-draft status.

As a consequence, the draft period is taken into account in the inactivity (lack of review) count - and it should not, of course.

-> As the bot is stateless by design, it is proposed to add a Draft status label that will flag that should be used to track the change of status of the MR.

Upvotes and Downvotes

Using 👍 and 👎 to vote pro and against a MR is really not ideal:

votes are kept as is even if the MR has been completely rewritten
down votes are confusing, as they are not vetos, and can be overriden with unresolved threads.

The proposition is to give up the 👍 / 👎 vote to:

The 👍 is replaced with gitlab Approval system, which is more integrated/flexible in gitlab (both in UI and in the workflow). The approval validity remains the same as the 👍 votes:
- You cannot approve your own MR (the bot will not take it into account)
- Only users with a developer+ role will be taken into account
The 👎 is replaced with the existing unresolved thread rule.
- Warning: this action can be very volatile, as the MR owner can resolve any thread included in his/her own MR. This is also a common habit among developers that are used to github platform: closing a discussion is a synonym of "I have made the changes you requested. Feel free to reopen the discussion if you disagree."
- This implies either to be swiftly reactive to MR notifications and/or to be more cautious on Approval delivery.

TC invocation

It seems there is no real way to invoke the TC:

there is no formal procedure to ask for a review from the TC
there is no real way to track MRs that are being discussed currently by the TC

-> The proposition is to add a custom Ask TC status label that can be manually set by a developer on a MR to trigger the TC review process.

Hardened VLC - Sandbox

These are the preliminary discussions about the previous (long) informal communications about designing a hardened version of VLC.

The general principle around that are:

the goal is for VLC to be able to be less vulnerable to toxic source, becoming the entry point of a confidentiality/integrity attack on a user environment (or worse - whole machine etc.).
to achieve this, a per-process separation is needed. It is notably useful for the functions that requires nothing but "small" input and output data - like demuxers.
a broker/worker strategy seems a good design for the global architecture.

This requires a lot of work, though, on multiple subjects:

a mitigation system
an interception to regulate the access to the system
an IPC protocol, as well as a RPC (remote procedure call) or a serialization/deserialization system to share information between systems.
libvlc integration

-> The general principles seems okay.

However, on the mitigation subject, @Courmisch warns about the use of seccomp (and any seccomp-*):

it is usually slow
syscalls are really hard to filter, as syscalls parameters cannot be dereferenced.
syscall ids are inconsistent between architectures, and change with kernel and libc versions.
seccomp seems to be deprecated by the Chrome developers as a sandbox API. It seems to be considered as a kernel vulnerability mitigation now.

@Courmisch suggests to:

use namespaces for modules that need a "real" sandboxing (ie simple and identified access to the other processes)
harden modules that needs finer security rules (for example, rewrite the file access in Rust).

Medialibrary

Medialibrary Preparse

The general consensus is the current behavior of the medialibrary preparse cannot be released as is in VLC4.

The strict minimal viable target for VLC4 is to mitigate the effect of a preparse crash.

The only solution that seems reasonable to achieve this is to have the preparse function in a separate process.

The overall features of this new preparse:

create a new application that is using libvlc core
spawned as a new process, communicating from stdin/stdout with the medialibrary
it should include two modes: one in daemon mode - to keep existing connections for example - and one in command line, for test purpose for example
communication (serialization) protocol is to be developed

On the communication keep subject, for now, only the stream_extractor has a notion of that kind.

It is proposed to add an optional openat callback in the access modules to keep connection resources between access calls.

This will not be used in VLC4 for now, only in the new preparse application would use it.

-> Both the new preparse design and keepalive optional propositions are globally accepted.

To be exact, this prototype was about changing the filters for each media, by copying the video-filter variable from the input media to the vout object in the input resource code, as soon as input_resource_SetInput is called, but this is a "whitelist" of every accepted variable so this is not really pretty. Historically, there was vlc_player_vout_SetFilter: 0e457ba7 but the player (and thus media player indirectly) have access to the vout objects and can already set video-filter themselves, so the issue is more about the design for the libvlc API (and configuration/interaction use case) than the update of the filters themselves, and this is where races and ordering can happen and making this subject complex. ↩

Edited Dec 27, 2021 by Simon Latapie

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information