Skip to content
Snippets Groups Projects

clock: setup the clock-start infrastructure, but don't switch to it yet

This MR superseeds !5825 and extract patches from !5176 to implement the clock start.

I planned a lot of changes for the clock, which I'll describe afterwards, but most of the current questions are related to the clock(s) startup. To be completely understandable and exhaustive, I'll try to rebuild everything from the ground up in this document, after a lot of discussion with Thomas and Simon to develop the ideas.

This MR brings the first part of the clock startup mechanism, without switching to it, by allowing the two mechanisms to cohabit and adding new tests that are using the new mechanism.

Problem introduction

The "clock" component is handling the "intra-media synchronization problem", which is also usually referred to as "lipsync" and is about synchronizing the different outputs that are coming from the same timeline. There's currently no real integration of "extra-media synchronization", though an external clock can be defined as "master" and synchronized across multiple group of clocks, and this will not be part of the current document.

Initially in 3.0, there was a single clock, the input clock, which was converting timestamps in the input media timeline into system timestamps as soon as they were received. This brought two different problems: - We were losing the original timestamp informations in the pipeline, which make it hard to reason about, hard to reproduce the same output, and not resilient to timeline operations like rate changing. - The input data always had priority over the components like the audio output, and it was suspected that a lot of audio issues were coming from this phenomena.

Nowadays, in VLC 4.0, we still have the input clock as a component, but it is only providing data, and the intra-media synchronization problem is handled by a distributed system of clocks instead.

Clocks in VLC 4.0

There are two kinds of clocks, or more accurately clock trackers: the reference clock (called master clock in the code) and the derived clocks (called slave clocks). The reference clock is the clock defining the extrapolation parameters, and thus always has a clock drift of 0. The derived clocks are the ones computing their drift against the reference in order for the underlying subsystem to fix its course.

There is another component also called clock, which is the main clock, but the name is a bit deceptive since it's not a clock (not used to compute drift) and I'll be renaming that to vlc_clock_bus later instead since its role is really to connect different clocks together.

The clock API is not changing depending on whether the clock is reference or derived, and there is no awareness from the components themselves whether its the case by design, since it avoids exploding combinatorial complexity on the different implementors that is being used and ensure the same code works everywhere, since conceptually both clock types are returning the same kind of information.

The clock API consists of the following operations: - Clock reset: remove the current interpolation informations from the clock. - Clock update: update the current reference point, and update the drift of the current clock against the last reference point. - Clock convertToSystem: transform a point in the media timeline associated to the clock bus into a system time, thus mapping to the playback timeline.

The clock bus currently has additional APIs that are used to startup the clock, which is vlc_clock_main_SetFirstPCR(). It marks a beginning timestamps used by another mechanism called the monotonic clock startup, which will discuss a bit after.

Outputs are currently using the clocks in two differents way: - The video output is currently always converting timestamps from the picture data into the system playback timeline, and checking whether the picture is late or not in order to drop it. It effectively never uses the drift computation. - The audio output is currently never converting timestamps from the source to the playback timeline. Instead, it converts once to setup the beginning of the stream, and then appends consecutive data to the playback, while it computes the current progress of the playback. It then updates the computed drift value, and uses the drift value to trigger resampling when needed.

Recent patches also added the concept of clock context to mix different timelines for one playback. It allows to change timeline at the input level, before providing new data, which will asynchronously change the timeline at the output levels. The mechanism is triggered by vlc_clock_main_SetFirstPCR.

The clock-start branch

In the precedent points, I renamed vlc_clock_main intentionally to better match what it was doing, but it also provides some hindsight about what "starting the clock" meant at that point. When we talk about starting the clock, we imagine something in the line of "starting the playback timeline so that it starts counting". But we're actually setting up some generalized epoch information inside the clock bus that is coming from the input instead, which is not necessarily used by the clocks themselves.

Indeed, in theory, if the input clock is set as reference, the points will already be defined (though it's not completely exact currently, this is easier to understand this way) and there won't be any monotonic clock startup for the output clocks. Conversion would directly use the reference point.

In short, there is no real clock startup, or rather specifically the clock startup is a side effect of the output themselves starting to playback and update points. It means that there is no synchronization of the start other than the barrier setup at the vlc_input_decoder level which is lifted by the es_out at the end of the buffering when every decoders are ready to emit at least one frame (or two for the video decoders).

The precedents points brought two ideas: - If the input provide some informations to the clock bus for the startup, and the outputs also need to setup some informations for the monotonic clock startup, we can match the previous APIs and move this towards the API used through each vlc_clock. Instead of running vlc_clock_main_SetFirstPCR(bus, ts_system, ts_stream), we would instead run vlc_clock_Start(clocks.input, ts_system, ts_stream). - Since monotonic clock startup was done from the timestamp of the first frame played by the output, and the conversion against the first_pcr into system time, vlc_clock_Start() on the output side can also match this requirement on the output side, making the behavior symmetrical between reference and derived clocks.

It provides benefits since we can now have a non-mutating conversion function, so expectations on the clock are much easier to derive from traces. It also provides an event for the moment interpolation parameters are defined, and really gives a meaning and a physical representation to the "clock start" we were expecting previously.

An important note: it means that each clock needs to be started somehow, so there is no more "clock start", but instead "clock starts" (multiple starts). This can be deceiving when viewing clocks as "counting time", but it makes a lot of senses when viewing them as drift calculators.

In that way, the previous vlc_clock_Update was there to define drift as each output evolves in their playback, but now vlc_clock_Start should also be there to define the phase shift at origin for each output. This formulation more clearly maps the problem scope to instance of the problem in the industry, effectively making their solution importable as-is.

Monotonic clock startup

Now that clock startup has some definition, we reach the final points about defining how startup should happen. The objectives stay the same as described above: we want to be able to chose the better date for the output to reduce the initial drift, given that we cannot synchronize the output themselves more than at the decoder level.

The problem can be split into two separate discussions:

  • When the input is the reference clock, the points of its clock are directly available and the current clock-start branch would start playback as soon as buffering ends and decoders reach the barrier. By construction, the outputs are then necessarily late, but it's not clear how much late they are.
  • When the audio is the reference clock, monotonic startup will be used as long as the audio has not updated the first reference clock point, which might happen potentially after it has played the first packet completely. Of course, it's not possible to wait this time before starting up since the audio would already have played then (up to 2s delay with homepod) so this looks like a chicken&egg problem.

The problem must also be thought inside the framework of clock contexts since they are part of master, and the solution should have sufficient meaning against them. In that regards, thinking about the problem as fixing the phase shift at origin is still working and we can imagine that:

  • Outputs are expected to play the whole timeline from a context, starting the timeline with vlc_clock_Start to know how much "silence-like" frames it must insert.
  • When the whole timeline has been played and the output detects a context change, it can reset the current timeline with vlc_clock_Start to switch to the newer timeline and can start using again vlc_clock_Start to know how much "silence-like" stitching frames it will need.
  • At the output-level, there would not be difference when switching context, as silence would also provide clock point to update the drift, and it would decide whether more silence must be inserted because of that drift.

With that mindset, there would be no differences between handling clock startup with or without clock contexts, except the constraint that the output would need to do the startup of its own clock. There is also no difference in term of problem specification at this level, as every output need to agree with some value to define when the stitching should be done, so it doesn't need a separate API. Without proper synchronization, there could be a difference with what each output considers the "beginning" of the stream.

However, on the one hand, waiting for the output to be ready to play the data, including their own latency, and on the other hand, stitching timelines from information of the inputs while including how much data was sent for the former timeline, are completely different problems.

To solve the problem on the stitching side, it's interesting to note that input has every informations needed to provide the correct offset to the next context, by computing the MAX(pts) of all packets that are sent on this timeline. If elements are dropped, it won't necessarily stitch without glitches, but dropping is a glitch source in itself already.

To solve the problem on the output startup side, multiple possible strategies can be used:

  • The do-nothing strategy: we assume that everything will startup correctly, and compute drift afterwards. If it went all wrong, and drift is way too much, we trigger the drop mechanisms to catch up.
  • The prepare-output strategy: we render "silence-like" frames with the same characteristics as the source until we have proper precise latency specifications to setup a common start date.
  • The anticipate strategy: we reserve some hardcoded (or configurable from the user) time before the playback starts. This is what was currently being attempted in the initial VLC 4.0 clock rewrite.

Clock context handling in the outputs

Currently, the clock context handling is setup with detection relying on a heuristic through vlc_clock_ConvertToSystem, but it's unclear which context should be started or used in cases like the audio output where some data are being played from the previous context.

Some strategy needs to be defined regarding this, and I think that the current state is lacking a mechanism to properly stitch in the outputs, which would be some kind of mechanism to add more stitching info before playback and allow the outputs to manipulate multiple contexts at the same time. If outputs have not used this mechanism before some data is played, then some data needs to be dropped probably.

More details on this MR and next steps

As mentioned above, this MR brings the first part of the clock startup mechanism, without switching to it, by allowing the two mechanisms to cohabit and adding new tests that are using the new mechanism.

I'll push the other patches transforming the usage in es_out in the MR !5176 after this merge request. This is currently separated because the next patches are breaking some use cases in weird ways during small seeks because of pacing changes, and because there are a lot of work to be done on clock contexts to guarantees a better behaviour before continuing.

Next steps include:

  • Switching the outputs to the clock_start mechanism.
  • Switching the es_out to the clock_start mechanism.
  • Definitively moving the monotonic startup in vlc_clock_Start and not in vlc_clock_ConvertToSystem anymore.
  • Simplifying vlc_clock_ConvertToSystem by removing system_ts.
  • Improving video output and audio output for context detection.
  • Improving the tests to include this mechanism instead of the old SetFirstPCR mechanism.
  • Remove the previous mechanism.

Merging this MR still benefits the work on clock context and future work on this part because:

  • It's not changing the current behaviour, but still adds new tests for the new behaviour.
  • It's 30 commits that we won't need to worry about when working on clock contexts.
  • There are still some changes improving the overall quality of the code for the clock.
Edited by Alexandre Janniaux

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading