Skip to content
Snippets Groups Projects
Martin Storsjö's avatar
Martin Storsjö authored
Switch to the same cache-friendly algorithm as was done for arm64
in 2e73051c and for the reference
C code in 8291a66e.

Contrary to the arm64 implementation, this uses a main loop in C
(very similar to the one in the main C implementation in
8291a66e) rather than assembly;
this gives a bit more overhead on the call to each function, but
it shouldn't affect the big picture much.

Performane wise, this doesn't make much of a difference - it makes
things a little bit faster on some cores, and a little bit slower
on others:

Before:                 Cortex A7        A8       A53       A72       A73
wiener_7tap_8bpc_neon:   269384.4  147730.7  140028.5   92662.5   92929.0
wiener_7tap_10bpc_neon:  352690.2  159970.2  169427.8  116614.9  119371.1
After:
wiener_7tap_8bpc_neon:   238328.0  157274.1  134588.6   92200.3   97619.6
wiener_7tap_10bpc_neon:  336369.3  162182.0  161954.4  125521.2  130634.0

This is mostly in line with the results on arm64 in
2e73051c. On arm64, there was a
bit larger speedup for the 7tap case, mostly attributed to
unrolling the vertical filter (and the new filter_hv function) to
operate on 16 pixels at a time. On arm32, there's not enough
registers to do that, so we can't get such gains from unrolling.
(Reducing the unrolling on the arm64 version to match the case
on arm32 also shows similar performance numbers as on arm32 here.)

In the arm64 version, we also added separate 5tap versions of all
functions; not doing that for arm32 at this point.

This increases the binary size by 2 KB.

This doesn't have any immediate effect on how much stack space
dav1d requires in total, since the largest stack users on arm
currently are the 8tap_scaled functions.
2ba57aa5

dav1d logo

dav1d

dav1d is an AV1 cross-platform decoder, open-source, and focused on speed and correctness.

It is now battle-tested and production-ready and can be used everywhere.

The canonical repository URL for this repo is https://code.videolan.org/videolan/dav1d

This project was partially funded by the Alliance for Open Media/AOM.

Goal and Features

The goal of this project is to provide a decoder for most platforms, and achieve the highest speed possible to overcome the temporary lack of AV1 hardware decoder.

It supports all features from AV1, including all subsampling and bit-depth parameters.

In the future, this project will host simple tools or simple wrappings (like, for example, an MFT transform).

License

dav1d is released under a very liberal license, a contrario from the other VideoLAN projects, so that it can be embedded anywhere, including non-open-source software; or even drivers, to allow the creation of hybrid decoders.

The reasoning behind this decision is the same as for libvorbis, see RMS on vorbis.

Roadmap

The plan is the following:

Reached

  1. Complete C implementation of the decoder,
  2. Provide a usable API,
  3. Port to most platforms,
  4. Make it fast on desktop, by writing asm for AVX2 chips.
  5. Make it fast on mobile, by writing asm for ARMv8 chips,
  6. Make it fast on older desktop, by writing asm for SSSE3+ chips,
  7. Make high bit-depth fast on mobile, by writing asm for ARMv8 chips.
  8. Make it fast on older mobile, by writing asm for ARMv7 chips,
  9. Make high bit-depth fast on older mobile, by writing asm for ARMv7 chips,
  10. Make high bit-depth fast on desktop, by writing asm for AVX2 chips,
  11. Make high bit-depth fast on older desktop, by writing asm for SSSE3+ chips,
  12. Improve threading.

On-going

  1. Improve C code base with various tweaks,
  2. Accelerate for less common architectures, like PPC, SSE2, RISC-V or AVX-512.

After

  1. Use more GPU decoding, when possible.

Contribute

Currently, we are looking for help from:

  • C developers,
  • asm developers,
  • platform-specific developers,
  • GPGPU developers,
  • testers.

Our contributions guidelines are quite strict. We want to build a coherent codebase to simplify maintenance and achieve the highest possible speed.

Notably, the codebase is in pure C and asm.

We are on IRC, on the #dav1d channel on Libera.chat. If you do not have an IRC Client at hand, use IRC Web Interface.

See the contributions document.

CLA

There is no CLA.

People will keep their copyright and their authorship rights, while adhering to the BSD 2-clause license.

VideoLAN will only have the collective work rights.

CoC

The VideoLAN Code of Conduct applies to this project.

Compile

  1. Install Meson (0.49 or higher), Ninja, and, for x86* targets, nasm (2.14 or higher)
  2. Run mkdir build && cd build to create a build directory and enter it
  3. Run meson setup .. to configure meson, add --default-library=static if static linking is desired
  4. Run ninja to compile

Cross-Compilation for 32- or 64-bit Windows, 32-bit Linux

If you're on a linux build machine trying to compile .exe for a Windows target/host machine, run

meson setup build --cross-file=package/crossfiles/x86_64-w64-mingw32.meson

or, for 32-bit:

meson setup build --cross-file=package/crossfiles/i686-w64-mingw32.meson

mingw-w64 is a pre-requisite and should be installed on your linux machine via your preferred method or package manager. Note the binary name formats may differ between distributions. Verify the names, and use alias if certain binaries cannot be found.

For 32-bit linux, run

meson setup build --cross-file=package/crossfiles/i686-linux32.meson

Build documentation

  1. Install doxygen and graphviz
  2. Run meson setup build -Denable_docs=true to create the build directory
  3. Run ninja -C build doc/html to build the docs

The result can be found in build/doc/html/. An online version built from master can be found here.

Run tests

  1. In the root directory, run git clone https://code.videolan.org/videolan/dav1d-test-data.git tests/dav1d-test-data to fetch the test data repository
  2. During meson configuration, specify -Dtestdata_tests=true
  3. Run meson test -v after compiling

Support

This project is partially funded by the Alliance for Open Media/AOM and is supported by TwoOrioles and VideoLabs.

These companies can provide support and integration help, should you need it.

FAQ

Why do you not improve libaom rather than starting a new project?

  • We believe that libaom is a very good library. It was however developed for research purposes during AV1 design. We think that an implementation written from scratch can achieve faster decoding, in the same way that ffvp9 was faster than libvpx.

Is dav1d a recursive acronym?

  • Yes.

Can I help?

I am not a developer. Can I help?

  • Yes. We need testers, bug reporters and documentation writers.

What about the AV1 patent license?

  • This project is an implementation of a decoder. It gives you no special rights on the AV1 patents.

Please read the AV1 patent license that applies to the AV1 specification and codec.

Will you care about <my_arch>? <my_os>?

  • We do, but we don't have either the time or the knowledge. Therefore, patches and contributions welcome.