- Jun 22, 2023
-
-
Martin Storsjö authored
Make them operate in a more cache friendly manner, interleaving the various passes, and merging some of the functions that operate on data in similar patterns. This reduces the amount of stack used from 207 KB to 14 KB for sgr_3x3, from 207 KB to 16 KB for sgr_5x5 and from 255 KB to 33 KB for sgr_mix. This does however increase the size of the binary by about 12 KB. (The executable code generated from assembly actually shrinks by a little, but the higher level logic in C is quite nontrivial.) This is somewhat similar to what was done for x86 in fe2bb774. Benchmarks from checkasm: Before: Cortex A53 A55 A72 A73 A76 Apple M1 sgr_3x3_8bpc_neon: 493005.0 483133.2 365056.3 345197.9 202819.1 537.3 sgr_5x5_8bpc_neon: 353152.6 349614.3 268962.2 248431.8 142302.4 385.9 sgr_mix_8bpc_neon: 829903.9 815910.9 622858.5 577238.0 333362.9 881.7 sgr_3x3_10bpc_neon: 504778.6 499851.6 379203.1 346695.2 199738.7 537.0 sgr_5x5_10bpc_neon: 363111.9 362489.7 267903.1 247506.5 138417.2 351.3 sgr_mix_10bpc_neon: 853053.7 846768.8 628349.6 584553.8 328399.5 843.6 After: sgr_3x3_8bpc_neon: 387949.9 384216.4 294423.7 301968.2 184643.1 492.4 sgr_5x5_8bpc_neon: 259854.7 257233.2 193983.7 198388.4 128497.0 341.2 sgr_mix_8bpc_neon: 606401.5 595661.3 457209.7 462721.8 281906.7 738.6 sgr_3x3_10bpc_neon: 392472.7 394100.5 296048.1 304339.4 184271.4 471.3 sgr_5x5_10bpc_neon: 257248.3 257651.1 197552.5 199655.1 130739.7 322.9 sgr_mix_10bpc_neon: 605263.3 611197.4 441789.3 461339.2 286320.1 721.4 Speedup vs before: 27-41% 25-40% 23-42% 13-26% 5-18% 8-19%
-
Martin Storsjö authored
This issue isn't caught by checkasm, since these functions are internal to the SGR implementation, and checkasm only affects the parameters on the external DSP function interface. This could potentially trigger errors with future compilers.
-
- Jun 12, 2023
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- Jun 09, 2023
-
-
James Almer authored
Missed in 31de9d50.
-
- Jun 07, 2023
-
-
Always-enabled basic sanity checks in API functions is reasonable, but within internal functions assert() is more appropriate when it comes to checking for "should never happen" conditions.
-
-
-
Martin Storsjö authored
After 8f320d59, MSVC started producing this warning: [63/123] Compiling C object src/libdav1d.a.p/obu.c.obj ../src/obu.c(708): warning C4244: '=': conversion from 'uint16_t' to 'uint8_t', possible loss of data
-
-
- Jun 06, 2023
-
-
-
James Almer authored
There's no reason to be so strict by ensuring the tool only works with a library built from the exact same git snapshot, when the only thing that matters is API availability and ABI compatibility. Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- Jun 02, 2023
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- Jun 01, 2023
-
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
All of them are 32 bits values that shall be > 0 Signed-off-by:
James Almer <jamrial@gmail.com>
-
It's already checked at the end of parse_seq_hdr() now.
-
-
-
Pack eob and txtp into a single 16-bit value instead of storing them separately. This reduces memory usage by 4 kB per sb128.
-
-
Merge sgr_idx into the restoration type value. This reduces memory usage by 12 bytes per sb128.
-
Move the txtp_map array into the scratch buffer union. This reduces the Dav1dTaskContext size by 1 kB.
-
-
Jean-Baptiste Kempf authored
-
James Almer authored
Signed-off-by:
James Almer <jamrial@gmail.com>
-
- May 31, 2023
-
-
James Almer authored
Simplifies checks for the caller. Signed-off-by:
James Almer <jamrial@gmail.com>
-
James Almer authored
Don't just check that we don't overrun at a byte aligned offset. Also make sure that the parsing was correct and no valid bits are left in the OBU. Signed-off-by:
James Almer <jamrial@gmail.com>
-
This also simplifies overrun checking a fair amount.
-
The default __printf__ format attribute doesn't match what printf functions actually support. Using __gnu_printf__ fixes it.
-
We know that the payload is aligned on a byte boundary and fully contained within the OBU, so using a bitstream reader function to copy the data one byte at a time is a bit redundant.
-
We require the size to be representable as a signed value. This limit already exists in dav1d_data_create().
-
Creating an entire decoder instance just for some bitstream parsing is completely unnecessary. We can instead parse the sequence header directly into the user-provided buffer while ignoring/skipping other OBU types, with zero memory allocations required.
-
- May 29, 2023
-
-
-
It's not required by the API and would only risk masking potential bugs.
-
- May 26, 2023
-
-
It's only used in debug mode, so inlining prevents dead code from being generated in release mode.
-
In many cases it can be combined with the allocation of the data being referenced instead of allocating it separately.
-
It's not used by anything, and the data it references is stack-allocated.
-
- May 25, 2023
-
-
James Almer authored
-