Commits · 0d6a31f275dc · VideoLAN / dav1d

Jan 12, 2023

x86: Remove stack alignment compiler flags · 0d6a31f2

Henrik Gramner authored 2 years ago and

Henrik Gramner committed 2 years ago

The intent was good, but in practice it results in a significant
amount of problems due to various compiler bugs for negligible gains.

0d6a31f2

Dec 14, 2022
- dav1d: add an option to skip decoding some frame types · ed63a745
  James Almer authored 2 years ago
```
Should be useful for scenarios like wanting only keyframes to quickly generate
a set of preview images of the whole stream.
```
  ed63a745
- picture: support creating and freeing refs without tile data · 6f80bad2
  James Almer authored 2 years ago
  
  6f80bad2
- x86: Add 10bpc 8x32/32x8 itx AVX-512 (Ice Lake) asm · 50babcfb
  Henrik Gramner authored 2 years ago
  
  50babcfb
- x86: Add minor DC-only IDCT optimizations · 3136ae6a
  Henrik Gramner authored 2 years ago
  
  3136ae6a
Dec 13, 2022
- getbits: Fix assertion failure · 20c03152
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
bits_left could underflow after reaching EOB.

Credit to OSS-Fuzz.
```
  20c03152
- checkasm: Fix integer overflow in refmvs test · 95d19071
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  95d19071
- dav1dplay: Update to new libplacebo API · 53efaa9b
  Henrik Gramner authored 3 years ago and Ronald S. Bultje committed 2 years ago
  
  53efaa9b
Dec 09, 2022
- Add minor getbits improvements · f2a8fc13
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  f2a8fc13
- Add a separate getbits function for getting a single bit · 366964fb
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
A length of 1 is by far the most common case, and having a special case
for that is not only slightly faster but also reduces code size by a
decent amount due to not having to pass a length argument every time.
```
  366964fb
- Remove redundant zeroing in sequence header parsing · 1a772e46
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
The Dav1dSequenceHeader struct is already zero-initialized,
so zeroing individual values a second time is redundant.
```
  1a772e46
- Set the correct default value of initial_display_delay · d81a9c75
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
According to section 6.4.1 of the AV1 specification, the value should
be equal to BUFFER_POOL_MAX_SIZE (10) when not explicitly signaled.
```
  d81a9c75
- tools: remove the null last entry in inloop_filters_tbl · 9cf6c84c
  James Almer authored 2 years ago
```
Fixes segfaults if you run the CLI with an invalid argument for --inloopfilters
```
  9cf6c84c
Dec 04, 2022
- Do not assume the picture allocation starts as the left edge · c56e352b
  Luca Barbato authored 2 years ago
```
Fixes: #412
```
  c56e352b
Nov 21, 2022
- ppc: Allocate the correct temp buffer size · 1f76c4cd
  Luca Barbato authored 2 years ago
```
It mirrors what is done with neon as well.

Fixes: #413
```
  1f76c4cd
- ppc: Do not use static const with vec_splats · 4e2a3f6d
  Luca Barbato authored 2 years ago
```
clang-15 doesn't consider it compile-time-constant anymore.
```
  4e2a3f6d
Nov 10, 2022
- Add info to dav1d_send_data docs · 4b9f5b70
  Charlie Hayden authored 2 years ago and Ronald S. Bultje committed 2 years ago
  
  4b9f5b70
Oct 30, 2022
- build: drop -D_DARWIN_C_SOURCE on macOS/iOS after 6b611d36 · 21abfb98
  Jan Beich authored 2 years ago
```
Already implied when -D_POSIX_C_SOURCE is not passed.
```
  21abfb98
- build: drop -D_POSIX_C_SOURCE on non-Linux after 6b611d36 · 7409a189
  Jan Beich authored 2 years ago
```
Non-GNU systems enable extensions (XSI, BSD, GNU) by default.
```
  7409a189
Oct 27, 2022
- threading: Add a pending list for async task insertion · 8f16314d
  Victorien Le Couviour--Tuffet authored 2 years ago
  
  8f16314d
Oct 26, 2022
- Implement atomic_compare_exchange_strong in the atomic compat headers · 8a4932ff
  Martin Storsjö authored 2 years ago
```
This fixes building with MSVC (and older GCC versions) after
3e7886db.
```
  8a4932ff
Oct 20, 2022

threading: Fix a race around frame completion (frame-mt) · 3e7886db

Victorien Le Couviour--Tuffet authored 2 years ago

The completion of the first frame to decode while an async reset
request on that same frame is pending will render it stale. The
processing of such a stale request is likely to result in a hang.

One reason this happens is the skip condition at the beginning of
reset_task_cur().
=> Consume the async request before that check.

Another reason is several threads producing async reset requests in
parallel: an async request for the first frame could cascade through the
other threads (other frames) during completion of that frame, meaning
not being caught by the last synchronous reset_task_cur() after
signaling the main thread and before releasing the lock.
=> To solve this we need to add protections at the racy locations. That
means after we increase first, before returning from
reset_task_cur_async(), and after consuming the async request.

3e7886db

Oct 10, 2022

Handle host_machine.system() 'ios' and 'tvos' the same way as 'darwin' · 5b07b425

Sebastian Dröge authored 2 years ago

Despite not being documented in Meson's list of canonical system names,
Meson does accept 'ios' mostly a synonym for darwin.

By using 'ios' instead of darwin, it allows distinguishing between the
two in the cases where that is necessary. Therefore, within dav1d, allow
using the 'ios' name as alias for 'darwin' for system name, to allow
using cross files that does this distinction.

meson itself also allows 'tvos' in addition to 'ios' in the internal
`is_darwin()` function, as such all 3 are handled the same here.

5b07b425

Sep 30, 2022
- x86: Add 10-bit 8x8/8x16/16x8/16x16 itx AVX-512 (Ice Lake) asm · cac76e4b
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  cac76e4b
- Specify hidden visibility for global data symbol declarations · e4c4af02
  Henrik Gramner authored 2 years ago
```
'-fvisibility=hidden' only applies to definitions, not declarations,
so the compiler has to be conservative about how references to global
data symbols are performed.

Explicitly specifying the visibility allows for better code generation.
```
  e4c4af02
Sep 28, 2022
- build: strip() the result of cc.get_define() · 58c856b7
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
Whitespace is added to the result if compiling with MSVC using /std:c11
which breaks various things. Adding strip() fixes the problem.
```
  58c856b7
- checkasm: Move printf format string to .rodata on x86 · 0b0b5fbf
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
  
  0b0b5fbf
- checkasm: Improve 32-bit parameter clobbering on x86-64 · 6fefa6a5
  Henrik Gramner authored 2 years ago and Henrik Gramner committed 2 years ago
```
Use explicit parameter type detection and manually clobber the
upper bits instead of relying on internal compiler behavior.
```
  6fefa6a5
Sep 26, 2022

x86: Fix incorrect 32-bit parameter usage in high bit-depth AVX-512 mc · 8349845c

Henrik Gramner authored 2 years ago and

Henrik Gramner committed 2 years ago

The 32-bit width parameter was used directly as a pointer offset, but
the upper half is undefined. Fix it by replacing 'cmp' with 'sub' to
explicitly zero those bits.

8349845c

Sep 19, 2022

arm: itx: Add clipping to row_clip_min/max in the 10 bpc codepaths · 345127a7

Martin Storsjö authored 2 years ago

This fixes conformance with the argon test samples, in particular
with these samples:
    profile0_core/streams/test10100_579_8614.obu
    profile0_core/streams/test10218_6914.obu

This gives a pretty notable slowdown to these transforms - some
examples:

Before:                                 Cortex A53       A72       A73    Apple M1
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:       365.7     290.2     299.8    0.3
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:    1865.2    1384.1    1457.5    2.6
inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   33976.3   26817.0   24864.2   40.4
After:
inv_txfm_add_8x8_dct_dct_1_10bpc_neon:       397.7     322.2     335.1    0.4
inv_txfm_add_16x16_dct_dct_2_10bpc_neon:    2121.9    1336.7    1664.6    2.6
inv_txfm_add_64x64_dct_dct_4_10bpc_neon:   38569.4   27622.6   28176.0   51.0

Thus, for the transforms alone, it makes them around 10-13% slower
(the Apple M1 measurements are too noisy to be conclusive here).

Measured on actual full decoding, it makes decoding of 10 bpc
Chimera around maybe 1% slower on an Apple M1 - close to measurement
noise anyway.

345127a7

x86: Fix overflows in 12bpc AVX2 IDCT/IADST · 9c74a9b0
Henrik Gramner authored 2 years ago

9c74a9b0

x86: Fix overflows in 12bpc AVX2 DC-only IDCT · 49b1c3c5

Henrik Gramner authored 2 years ago

Using smaller immediates also results in a small code size reduction in
some cases, so apply those changes to the (10bpc-only) SSE code as well.

49b1c3c5

x86: Fix clipping in high bit-depth AVX2 4x16 IDCT · 0c8a3461

Henrik Gramner authored 2 years ago

Certain clips were incorrectly performed on negated values, which
caused things to be off-by-one in both directions. Correct this by
negating such values prior to clipping instead of afterwards.

0c8a3461

Sep 15, 2022

Don't use gas-preprocessor with clang-cl for arm targets · cc9651f5

Martin Storsjö authored 3 years ago

Since meson 0.58.0 (released in May 2021), meson accepts adding '.S'
assembly files as source files to the clang-cl compiler.

If using an older version of meson, keep using gas-preprocessor
just like for MSVC builds.

cc9651f5

Fix checking the reference dimesions for the projection process · d4a2b75d

David Conrad authored 2 years ago

Section 7.9.2 returns 0 "If RefMiRows[ srcIdx ] is not equal to MiRows,
RefMiCols[ srcIdx ] is not equal to MiCols"

dav1d was comparing pixel width/height, not block width/height,
so conform with the spec

d4a2b75d

Fix calculation of OBMC lap dimensions · eb25f00c

David Conrad authored 2 years ago

Individual OBMC lapped predictions have a max width of 64 pixels
for the top lap and have a max height of 64 for the left laps

This is 7.11.3.9. Overlapped motion compensation process
step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] )

dav1d wasn't clipping this as needed, which means that with scaled MC, the
interpolation of the 2nd half of a 128 block was incorrect, since mx/my
for subpel filter selection need to be reset at the 64 pixel boundary

eb25f00c

Support film grain application whose only effect is clipping to video range · 10f5ce54

David Conrad authored 2 years ago

This is the parameter combination:
num_y_points == 0 && num_cb_points == 0 && num_cr_points == 0 &&
chroma_scaling_from_luma == 1 && clip_to_restricted_range == 1

Film grain application has two effects: adding noise, and optionally
clipping to video range

For luma, the spec skips film grain application if there's no noise
(num_y_points == 0), but for chroma, it's only skipped if there's no
chroma noise *and* chroma_scaling_from_luma is false

This means it's possible for there to be no noise (num_*_points = 0), but
if clip_to_restricted_range is true then chroma pixels can be clipped to
video range, if chroma_scaling_from_luma is true. Luma pixels, however,
aren't clipped to video range unless there's noise to apply.
dav1d currently skips applying film grain entirely if there is no noise,
regardless of the secondary clipping.

10f5ce54

Ignore T.35 metadata if the OBU contains no payload · 673ee248

David Conrad authored 2 years ago

The syntax of itu_t_t35_payload_bytes is not defined in the AV1
specification, but it does state that decoders should ignore the
entire OBU if they do not understand it.

673ee248

Fix chroma deblock filter size calculation for lossless · 2152826b

David Conrad authored 2 years ago

In section 5.11.34 txSz is always defined to TX_4X4 if Lossless is true

Chroma deblock filter size calculation needs to use this overridden txSz
when lossless is enabled

2152826b

Fix rounding in the calculation of initialSubpelX · e202fa08
David Conrad authored 2 years ago
```
The spec divides err by two, rounding to 0, instead of >>1,
which rounds towards negative infinity
```
e202fa08