Commits on Source (81)
-
James Almer authored
The spec states that a decoder should instead ignore them. Otherwise, streams compliant with an hypothetical future revision of the spec may be rejected when backwards compatibility is expected.
-
James Almer authored
-
046188e4
-
Prevents overflows in malloc size calculations.
30d5f486 -
Henrik Gramner authoredb20a2d63
-
Closes #274
e25ed555 -
e16e2726
-
84f938ec
-
Martin Storsjö authored
GCC Cortex A53 A72 A73 msac_decode_bool_c: 29.9 17.9 23.2 msac_decode_bool_neon: 27.4 15.3 20.4 msac_decode_bool_adapt_c: 49.2 26.8 31.0 msac_decode_bool_adapt_neon: 38.2 22.2 25.4 msac_decode_bool_equi_c: 26.6 16.8 19.4 msac_decode_bool_equi_neon: 23.9 13.7 15.7 Clang Cortex A53 A72 A73 msac_decode_bool_c: 28.0 16.4 23.1 msac_decode_bool_neon: 26.9 14.6 21.0 msac_decode_bool_adapt_c: 46.8 25.1 31.4 msac_decode_bool_adapt_neon: 36.2 19.0 26.2 msac_decode_bool_equi_c: 23.7 13.4 18.8 msac_decode_bool_equi_neon: 23.7 11.3 14.2 This is as fast as, or faster than, what either GCC or Clang produces.
2e8a3a21 -
Henrik Gramner authored
Improves performance on 32-bit platforms over using uint64_t.
60519f04 -
fc3777b4
-
Luc Trudeau authoredaf0375ca
-
Luc Trudeau authoredd04d0a6c
-
Luc Trudeau authored3d6479ce
-
The last 1/4 of the mask is always zero, so we can skip some calculations that doesn't change the output.
f64fdae5 -
Martin Storsjö authored
Also make sure that the w4 case can exit after processing 12 pixels, where it is convenient. This gives a small slowdown for in-order cores like A7, A8, A53, but acutally seems to give a small speedup for out-of-order cores like A9, A72 and A73. AArch64: Before: Cortex A53 A72 A73 mc_8tap_regular_w8_v_8bpc_neon: 223.8 247.3 228.5 After: mc_8tap_regular_w8_v_8bpc_neon: 232.5 243.9 223.4 AArch32: Before: Cortex A7 A8 A9 A53 A72 A73 mc_8tap_regular_w8_v_8bpc_neon: 550.2 470.7 520.5 257.0 256.4 248.2 After: mc_8tap_regular_w8_v_8bpc_neon: 554.3 474.2 511.6 267.5 252.6 246.8
bf920fba -
7d5f0d0c
-
Martin Storsjö authored
The armv7 runner doesn't seem to cope well with the testdata though.
a690e548 -
bfd4ee57
-
664c6a5f
-
Henrik Gramner authored75558f8b
-
Henrik Gramner authored3e0ec4cd
-
James Almer authored
This way adding new fields in the future will not require breaking ABI
-
2cce131e
-
Martin Storsjö authored63eef332
-
Janne Grunau authored
Needed for oss-fuzz after switching to '-fsanitize=fuzzer' for the libfuzzer based build. Adding '-fsanitize=fuzzer' for all oss-fuzz based build breaks afl.
785f00fe -
5bc43169
-
0040d92b
-
Marvin Scholz authored
nasm -v can actually fail for example on macOS, where nasm could be a stub executable that forwards commands to the real nasm, but if the real nasm is not installed, fails. This would lead to a confusing error message due to the out of bounds array access, to avoid that, explicitly check the exit code.
098a565c -
Tristan Matthews authored75c3f4a4
-
Jean-Baptiste Kempf authored3e3855bf
-
Konstantin Pavlov authored6c90f005
-
This fixes building with raspbian compilers, that default to armv6. The isb instruction is unavailable on armv6, and the cycle counter register is accessed differently there as well. This fixes issue #282.
13067916 -
James Almer authored
-
James Almer authored
-
Martin Storsjö authored
On older versions of glibc, clock_gettime isn't available in the main libc, but part of a separate librt. Only look for librt if clock_gettime isn't available otherwise.
39dba4cd -
Limited to PowerPC64 LE for now.
197032bd -
2073ea11
-
f6024104
-
efd852af
-
A73 A53 blend_h_w2_8bpc_c: 149.3 246.8 blend_h_w2_8bpc_neon: 74.6 137 blend_h_w4_8bpc_c: 251.6 409.8 blend_h_w4_8bpc_neon: 66 146.6 blend_h_w8_8bpc_c: 446.6 844.1 blend_h_w8_8bpc_neon: 68.6 131.2 blend_h_w16_8bpc_c: 830 1513 blend_h_w16_8bpc_neon: 85.9 192 blend_h_w32_8bpc_c: 1605.2 2847.8 blend_h_w32_8bpc_neon: 149.8 357.6 blend_h_w64_8bpc_c: 3304.8 5515.5 blend_h_w64_8bpc_neon: 262.8 629.5 blend_h_w128_8bpc_c: 7895.1 13260.6 blend_h_w128_8bpc_neon: 577 1402 blend_v_w2_8bpc_c: 241.2 410.8 blend_v_w2_8bpc_neon: 122.1 196.8 blend_v_w4_8bpc_c: 874.4 1418.2 blend_v_w4_8bpc_neon: 248.5 375.9 blend_v_w8_8bpc_c: 1550.5 2514.7 blend_v_w8_8bpc_neon: 210.8 376 blend_v_w16_8bpc_c: 2925.3 5086 blend_v_w16_8bpc_neon: 253.4 608.3 blend_v_w32_8bpc_c: 5686.7 9470.5 blend_v_w32_8bpc_neon: 348.2 994.8 blend_w4_8bpc_c: 201.5 309.3 blend_w4_8bpc_neon: 38.6 99.2 blend_w8_8bpc_c: 531.3 944.8 blend_w8_8bpc_neon: 55.1 125.8 blend_w16_8bpc_c: 1992.8 3349.8 blend_w16_8bpc_neon: 150.1 344 blend_w32_8bpc_c: 4982 8165.9 blend_w32_8bpc_neon: 360.4 910.9
a1e3f358 -
Janne Grunau authored
clock_gettime() is only available since MacOS X 10.12 (Sierra).
79e4a5f7 -
Martin Storsjö authored4a2ea99d
-
Martin Storsjö authored
This keeps the put/prep functions close to the 8tap/bilin functions that use them.
46980237 -
Martin Storsjö authoredc950e710
-
Martin Storsjö authoredc1b3e1a9
-
04dc8a4d
-
e0346114
-
The speedup for most non-dc-only dct functions is around 9-12x over the C code generated by GCC 7.3. Relative speedups vs C for a few functions: Cortex A53 A72 A73 inv_txfm_add_4x4_dct_dct_0_8bpc_neon: 3.90 4.16 5.65 inv_txfm_add_4x4_dct_dct_1_8bpc_neon: 7.20 8.05 11.19 inv_txfm_add_8x8_dct_dct_0_8bpc_neon: 5.09 6.73 6.45 inv_txfm_add_8x8_dct_dct_1_8bpc_neon: 12.18 10.80 13.05 inv_txfm_add_16x16_dct_dct_0_8bpc_neon: 7.31 9.35 11.17 inv_txfm_add_16x16_dct_dct_1_8bpc_neon: 14.36 13.06 15.93 inv_txfm_add_16x16_dct_dct_2_8bpc_neon: 11.00 10.09 12.05 inv_txfm_add_32x32_dct_dct_0_8bpc_neon: 4.41 5.40 5.77 inv_txfm_add_32x32_dct_dct_1_8bpc_neon: 13.84 13.81 18.04 inv_txfm_add_32x32_dct_dct_2_8bpc_neon: 11.75 11.87 15.22 inv_txfm_add_32x32_dct_dct_3_8bpc_neon: 10.20 10.40 13.13 inv_txfm_add_32x32_dct_dct_4_8bpc_neon: 9.01 9.21 11.56 inv_txfm_add_64x64_dct_dct_0_8bpc_neon: 3.84 4.82 5.28 inv_txfm_add_64x64_dct_dct_1_8bpc_neon: 14.40 12.69 16.71 inv_txfm_add_64x64_dct_dct_4_8bpc_neon: 10.91 9.63 12.67 Some of the specialcased identity_identity transforms for 32x32 give insane speedups over the generic C code: inv_txfm_add_32x32_identity_identity_0_8bpc_neon: 225.26 238.11 247.07 inv_txfm_add_32x32_identity_identity_1_8bpc_neon: 225.33 238.53 247.69 inv_txfm_add_32x32_identity_identity_2_8bpc_neon: 59.60 61.94 64.63 inv_txfm_add_32x32_identity_identity_3_8bpc_neon: 26.98 27.99 29.21 inv_txfm_add_32x32_identity_identity_4_8bpc_neon: 15.08 15.93 16.56
ef1ea008 -
18df7139
-
fcb6a6da
-
Martin Storsjö authored764e8ea1
-
Martin Storsjö authored578489df
-
Martin Storsjö authored7107c2f1
-
Martin Storsjö authored
For the cdef_filter tests, one could also extend the buffer to contain 16*11 pixels, to simplify printing it as one rectangular section. Extend the common hex_dump function to allow dumping to an arbitrary FILE* pointer, to reuse it for printing the source pixel buffer in case of errors.
13a7d786 -
Martin Storsjö authoredc9f19b1f
-
Victorien Le Couviour--Tuffet authored
'build_' prefix is reserved by meson, this will become an error in the future, as indicated by a warning when configuring the build dir. Closes #285.
beda6e0d -
Also eliminate some pointer chasing by allocating tile context buffers as part of the struct instead of having the struct contain pointers to separately allocated buffers.
0276455d -
Avoid allocating significantly more memory than what is actually used.
65ba279b -
A73 A53 Earlier Now Earlier Now blend_v_w2_8bpc_neon: 122.1 121.3 195.5 195.5 blend_v_w4_8bpc_neon: 248.2 247.5 375.6 358.5 blend_v_w8_8bpc_neon: 210.3 205.2 375.6 358.5 blend_v_w16_8bpc_neon: 252.7 237.1 579.2 590.5 blend_v_w32_8bpc_neon: 347 345.8 997.4 994.1
632b4876 -
In the (very unlikely) scenario of a pthread mutex/cond init failure in the tile state reallocation code some newly allocated mutexes/conds could leak.
0435ec9c -
dav1d_submit_frame() could erroneously return 0 when tile data memory allocation failed. Fixes an assertion failure in dav1d_parse_obus().
c1a28d0e -
Calling dav1d_get_picture() again after it has already returned with an error due to a memory allocation failure could result in crashes. Although doing so is not a proper API usage, and the outcome is going to be unpredictable, we should at least try to avoid crashing.
e2e56ab9 -
James Almer authored
-
Henrik Gramner authoredee31bb85
-
James Almer authored
Limit frame size in pixels to about 16MP, while allowing the fuzzer to test frame widths and heights above 4096.
-
B Krishnan Iyer authored
A73 A53 w_mask_420_w4_8bpc_c: 797.5 1072.7 w_mask_420_w4_8bpc_neon: 85.6 152.7 w_mask_420_w8_8bpc_c: 2344.3 3118.7 w_mask_420_w8_8bpc_neon: 221.9 372.4 w_mask_420_w16_8bpc_c: 7429.9 9702.1 w_mask_420_w16_8bpc_neon: 620.4 1024.1 w_mask_420_w32_8bpc_c: 27498.2 37205.7 w_mask_420_w32_8bpc_neon: 2394.1 3838 w_mask_420_w64_8bpc_c: 66495.8 88721.3 w_mask_420_w64_8bpc_neon: 6081.4 9630 w_mask_420_w128_8bpc_c: 163369.3 219494 w_mask_420_w128_8bpc_neon: 16015.7 24969.3 w_mask_422_w4_8bpc_c: 858.3 1100.2 w_mask_422_w4_8bpc_neon: 81.5 143.1 w_mask_422_w8_8bpc_c: 2447.5 3284.6 w_mask_422_w8_8bpc_neon: 217.5 342.4 w_mask_422_w16_8bpc_c: 7673.4 10135.9 w_mask_422_w16_8bpc_neon: 632.5 1062.6 w_mask_422_w32_8bpc_c: 28344.9 39090 w_mask_422_w32_8bpc_neon: 2393.4 3963.8 w_mask_422_w64_8bpc_c: 68159.6 93447 w_mask_422_w64_8bpc_neon: 6015.7 9928.1 w_mask_422_w128_8bpc_c: 169501.2 231702.7 w_mask_422_w128_8bpc_neon: 15847.5 25803.4 w_mask_444_w4_8bpc_c: 674.6 862.3 w_mask_444_w4_8bpc_neon: 80.2 135.4 w_mask_444_w8_8bpc_c: 2031.4 2693 w_mask_444_w8_8bpc_neon: 209.3 318.7 w_mask_444_w16_8bpc_c: 6576 8217.4 w_mask_444_w16_8bpc_neon: 627.3 986.2 w_mask_444_w32_8bpc_c: 26051.7 31593.9 w_mask_444_w32_8bpc_neon: 2374 3671.6 w_mask_444_w64_8bpc_c: 63600 75849.9 w_mask_444_w64_8bpc_neon: 5957 9335.5 w_mask_444_w128_8bpc_c: 156964.7 187932.4 w_mask_444_w128_8bpc_neon: 15759.4 24549.5
b271590a -
This is using the Linux-only prctl(PR_SET_NAME, …) call, because glibc’s pthread_setname_np() is doing exactly the same call so there is no reason to use it instead, as it isn’t any more portable. I don’t have any other OS to test this on, but if you want to add one just add an #else defined(__YOUR_OS__) before the #else in thread.h.
15a93861 -
Continuing trying to decode after a memory allocation failure could cause null pointer dereferences in certain scenarios.
c138435f -
5ab6d231
-
arm: mc: neon: Use vld with ! post-increment instead of a register in blend/blend_h/blend_v function A73 A53 Current Earlier Current Earlier blend_h_w2_8bpc_neon: 74.1 74.6 137.5 137 blend_h_w4_8bpc_neon: 65.8 66 147.1 146.6 blend_h_w8_8bpc_neon: 68.7 68.6 131.7 131.2 blend_h_w16_8bpc_neon: 85.6 85.9 190.4 192 blend_h_w32_8bpc_neon: 149.8 149.8 358.3 357.6 blend_h_w64_8bpc_neon: 264.1 262.8 630.3 629.5 blend_h_w128_8bpc_neon: 575.4 577 1404.2 1402 blend_v_w2_8bpc_neon: 120.1 121.3 196.4 195.5 blend_v_w4_8bpc_neon: 247.2 247.5 358.4 358.5 blend_v_w8_8bpc_neon: 204.2 205.2 358.4 358.5 blend_v_w16_8bpc_neon: 238.5 237.1 591.8 590.5 blend_v_w32_8bpc_neon: 347.2 345.8 997.2 994.1 blend_w4_8bpc_neon: 38.3 38.6 98.7 99.2 blend_w8_8bpc_neon: 54.8 55.1 125.3 125.8 blend_w16_8bpc_neon: 150.8 150.1 334.5 344 blend_w32_8bpc_neon: 361.6 360.4 910.7 910.9
b704a993 -
A73 A53 Current Earlier Current Earlier blend_h_w2_8bpc_neon: 74.1 74.1 137.5 137.5 blend_h_w4_8bpc_neon: 65.8 65.8 147.1 147.1 blend_h_w8_8bpc_neon: 68.9 68.7 131.7 131.7 blend_h_w16_8bpc_neon: 86 85.6 190.3 190.4 blend_h_w32_8bpc_neon: 149.2 149.8 358 358.3 blend_h_w64_8bpc_neon: 263.1 264.1 629.8 630.3 blend_h_w128_8bpc_neon: 571 575.4 1404.5 1404.2 blend_v_w2_8bpc_neon: 118.7 120.1 195.3 196.4 blend_v_w4_8bpc_neon: 245.8 247.2 357.3 358.4 blend_v_w8_8bpc_neon: 202 204.2 357.2 358.4 blend_v_w16_8bpc_neon: 234.8 238.5 591.3 591.8 blend_v_w32_8bpc_neon: 344.4 347.2 994.7 997.2 blend_w4_8bpc_neon: 37.5 38.3 96.7 98.7 blend_w8_8bpc_neon: 53 54.8 123.3 125.3 blend_w16_8bpc_neon: 151 150.8 332.4 334.5 blend_w32_8bpc_neon: 370.9 361.6 908.4 910.7
d4df8619 -
A73 A53 Current Earlier Current Earlier blend_h_w2_8bpc_neon: 71.1 74.1 132.7 137.5 blend_h_w4_8bpc_neon: 60.2 65.8 137.5 147.1 blend_h_w8_8bpc_neon: 62.2 68.9 123.1 131.7 blend_h_w16_8bpc_neon: 82.1 86 180.7 190.3 blend_h_w32_8bpc_neon: 149.9 149.2 358.3 358 blend_h_w64_8bpc_neon: 265.3 263.1 630.2 629.8 blend_h_w128_8bpc_neon: 579.5 571 1404.4 1404.5 blend_v_w2_8bpc_neon: 118.7 118.7 193.2 195.3 blend_v_w4_8bpc_neon: 248.6 245.8 373.4 357.3 blend_v_w8_8bpc_neon: 202.7 202 356.4 357.2 blend_v_w16_8bpc_neon: 238.8 234.8 590.4 591.3 blend_v_w32_8bpc_neon: 346.7 344.4 993.7 994.7 blend_w4_8bpc_neon: 33.5 37.5 90.7 96.7 blend_w8_8bpc_neon: 49.7 53 123.3 123.3 blend_w16_8bpc_neon: 151.8 151 348.8 332.4 blend_w32_8bpc_neon: 372.9 370.9 908.3 908.4
407c27db -
Luc Trudeau authored
sdl2.pc adds -I${includedir}/SDL2 to the command line, so SDL2/ is clearly expected. Fixes #289
55e1edc7 -
Jean-Baptiste Kempf authoredd04eab15
-
d12418b3
-
60869f8a
-
Luca Barbato authored
clang-8: cdef_filter_4x4_8bpc_c: 436.6 cdef_filter_4x4_8bpc_vsx: 101.1 cdef_filter_4x8_8bpc_c: 827.7 cdef_filter_4x8_8bpc_vsx: 183.5 cdef_filter_8x8_8bpc_c: 1510.2 cdef_filter_8x8_8bpc_vsx: 289.1 gcc-9: cdef_filter_4x4_8bpc_c: 403.2 cdef_filter_4x4_8bpc_vsx: 105.6 cdef_filter_4x8_8bpc_c: 825.5 cdef_filter_4x8_8bpc_vsx: 192.2 cdef_filter_8x8_8bpc_c: 1586.3 cdef_filter_8x8_8bpc_vsx: 295.0
a0eb045c -
4806492a
-
Marvin Scholz authoredafee1ac7
-
Jean-Baptiste Kempf authored
dav1d_logo.png
0 → 100644
19 KiB
doc/dav1d_logo.svg
0 → 100644
src/arm/64/itx.S
0 → 100644
This diff is collapsed.
src/arm/itx_init_tmpl.c
0 → 100644