- Mar 07, 2020
-
-
-
The description was being added only to the last field of each line by Doxygen.
-
Matthias Dressel authored
The argument for --input was aligned with the argument for --output. None of the other arguments were aligned. For consistency either align all or none. This commit removes the alignment.
-
Matthias Dressel authored
avx512 was merged with avx512icl. See 7b208fa8
-
- Mar 06, 2020
-
-
Konstantin Pavlov authored
This currently does not check the vulkan/placebo codepath since needed packages are not yet in Debian unstable.
-
Konstantin Pavlov authored
Now when -Denable_examples=true is requested, meson will fail as expected if there is no SDL available.
-
Konstantin Pavlov authored
This requires a docker image with doxygen & dot installed, so bump it as well. Fixes #334.
-
Konstantin Pavlov authored
This makes it much easier to introduce new jobs without copying walls of text over and over. No functional changes. Changes are: - move docker images to common templates to make them easier to bump - replace "debian" tag with "docker" to choose runners - align meson parameters - use variables sections where applicable - move test data cache to before_script
-
Konstantin Pavlov authored
-
Jan Beich authored
../examples/dav1dplay.c:1030:5: warning: implicit declaration of function 'init_demuxers' is invalid in C99 [-Wimplicit-function-declaration] init_demuxers(); ^ /usr/bin/ld.bfd: examples/c590b3c@@dav1dplay@exe/dav1dplay.c.o: in function `decoder_thread_main': dav1dplay.c:(.text+0x1243): undefined reference to `init_demuxers' cc: error: linker command failed with exit code 1 (use -v to see invocation)
-
- Mar 05, 2020
-
-
Jean-Baptiste Kempf authored
-
Checkasm numbers: Cortex A53 A72 A73 w_mask_420_w4_16bpc_neon: 173.6 123.5 120.3 w_mask_420_w8_16bpc_neon: 484.2 344.1 329.5 w_mask_420_w16_16bpc_neon: 1411.2 1027.4 1035.1 w_mask_420_w32_16bpc_neon: 5561.5 4093.2 3980.1 w_mask_420_w64_16bpc_neon: 13809.6 9856.5 9581.0 w_mask_420_w128_16bpc_neon: 35614.7 25553.8 24284.4 w_mask_422_w4_16bpc_neon: 159.4 112.2 114.2 w_mask_422_w8_16bpc_neon: 453.4 326.1 326.7 w_mask_422_w16_16bpc_neon: 1394.6 1062.3 1050.2 w_mask_422_w32_16bpc_neon: 5485.8 4219.6 4027.3 w_mask_422_w64_16bpc_neon: 13701.2 10079.6 9692.6 w_mask_422_w128_16bpc_neon: 35455.3 25892.5 24625.9 w_mask_444_w4_16bpc_neon: 153.0 112.3 112.7 w_mask_444_w8_16bpc_neon: 437.2 331.8 325.8 w_mask_444_w16_16bpc_neon: 1395.1 1069.1 1041.7 w_mask_444_w32_16bpc_neon: 5370.1 4213.5 4138.1 w_mask_444_w64_16bpc_neon: 13482.6 10190.5 10004.6 w_mask_444_w128_16bpc_neon: 35583.7 26911.2 25638.8 Corresponding numbers for 8 bpc for comparison: w_mask_420_w4_8bpc_neon: 126.6 79.1 87.7 w_mask_420_w8_8bpc_neon: 343.9 195.0 211.5 w_mask_420_w16_8bpc_neon: 886.3 540.3 577.7 w_mask_420_w32_8bpc_neon: 3558.6 2152.4 2216.7 w_mask_420_w64_8bpc_neon: 8894.9 5161.2 5297.0 w_mask_420_w128_8bpc_neon: 22520.1 13514.5 13887.2 w_mask_422_w4_8bpc_neon: 112.9 68.2 77.0 w_mask_422_w8_8bpc_neon: 314.4 175.5 208.7 w_mask_422_w16_8bpc_neon: 835.5 565.0 608.3 w_mask_422_w32_8bpc_neon: 3381.3 2231.8 2287.6 w_mask_422_w64_8bpc_neon: 8499.4 5343.6 5460.8 w_mask_422_w128_8bpc_neon: 21823.3 14206.5 14249.1 w_mask_444_w4_8bpc_neon: 104.6 65.8 72.7 w_mask_444_w8_8bpc_neon: 290.4 173.7 196.6 w_mask_444_w16_8bpc_neon: 831.4 586.7 591.7 w_mask_444_w32_8bpc_neon: 3320.8 2300.6 2251.0 w_mask_444_w64_8bpc_neon: 8300.0 5480.5 5346.8 w_mask_444_w128_8bpc_neon: 21633.8 15981.3 14384.8
-
Janne Grunau authored
Switches build-debian (for avx2 checkasm coverage) and test-win64 and test-debian-unaligned-stack (for testing asm '%if's). Refs #330, #333
-
- Mar 04, 2020
-
-
-
Martin Storsjö authored
Checkasm numbers: Cortex A53 A72 A73 blend_h_w2_16bpc_neon: 109.3 83.1 56.7 blend_h_w4_16bpc_neon: 114.1 61.4 62.3 blend_h_w8_16bpc_neon: 133.3 80.8 81.1 blend_h_w16_16bpc_neon: 215.6 132.7 149.5 blend_h_w32_16bpc_neon: 390.4 254.2 235.8 blend_h_w64_16bpc_neon: 719.1 456.3 453.8 blend_h_w128_16bpc_neon: 1646.1 1112.3 1065.9 blend_v_w2_16bpc_neon: 185.9 175.9 180.0 blend_v_w4_16bpc_neon: 338.0 183.4 232.1 blend_v_w8_16bpc_neon: 426.5 213.8 250.6 blend_v_w16_16bpc_neon: 678.2 357.8 382.6 blend_v_w32_16bpc_neon: 1098.3 686.2 695.6 blend_w4_16bpc_neon: 75.7 31.5 32.0 blend_w8_16bpc_neon: 134.0 75.0 75.8 blend_w16_16bpc_neon: 467.9 267.3 310.0 blend_w32_16bpc_neon: 1201.9 658.7 779.7 Corresponding numbers for 8bpc for comparison: blend_h_w2_8bpc_neon: 104.1 55.9 60.8 blend_h_w4_8bpc_neon: 108.9 58.7 48.2 blend_h_w8_8bpc_neon: 99.3 64.4 67.4 blend_h_w16_8bpc_neon: 145.2 93.4 85.1 blend_h_w32_8bpc_neon: 262.2 157.5 148.6 blend_h_w64_8bpc_neon: 466.7 278.9 256.6 blend_h_w128_8bpc_neon: 1054.2 624.7 571.0 blend_v_w2_8bpc_neon: 170.5 106.6 113.4 blend_v_w4_8bpc_neon: 333.0 189.9 225.9 blend_v_w8_8bpc_neon: 314.9 199.0 203.5 blend_v_w16_8bpc_neon: 476.9 300.8 241.1 blend_v_w32_8bpc_neon: 766.9 430.4 415.1 blend_w4_8bpc_neon: 66.7 35.4 26.0 blend_w8_8bpc_neon: 110.7 47.9 48.1 blend_w16_8bpc_neon: 299.4 161.8 162.3 blend_w32_8bpc_neon: 725.8 417.0 432.8
-
Martin Storsjö authored
Use a post-increment with a register on the last increment, avoiding a separate increment. Avoid processing the last 8 pixels in the w32 case when we only output 24 pixels. Before: ARM32 Cortex A7 A8 A9 A53 A72 A73 blend_v_w4_8bpc_neon: 450.4 574.7 538.7 374.6 199.3 260.5 blend_v_w8_8bpc_neon: 559.6 351.3 552.5 357.6 214.8 204.3 blend_v_w16_8bpc_neon: 926.3 511.6 787.9 593.0 271.0 246.8 blend_v_w32_8bpc_neon: 1482.5 917.0 1149.5 991.9 354.0 368.9 ARM64 blend_v_w4_8bpc_neon: 351.1 200.0 224.1 blend_v_w8_8bpc_neon: 333.0 212.4 203.8 blend_v_w16_8bpc_neon: 495.2 302.0 247.0 blend_v_w32_8bpc_neon: 840.0 557.8 514.0 After: ARM32 blend_v_w4_8bpc_neon: 435.5 575.8 537.6 356.2 198.3 259.5 blend_v_w8_8bpc_neon: 545.2 347.9 553.5 339.1 207.8 204.2 blend_v_w16_8bpc_neon: 913.7 511.0 788.1 573.7 275.4 243.3 blend_v_w32_8bpc_neon: 1445.3 951.2 1079.1 920.4 352.2 361.6 ARM64 blend_v_w4_8bpc_neon: 333.0 191.3 225.9 blend_v_w8_8bpc_neon: 314.9 199.3 203.5 blend_v_w16_8bpc_neon: 476.9 301.3 241.1 blend_v_w32_8bpc_neon: 766.9 432.8 416.9
-
Martin Storsjö authored
-
Martin Storsjö authored
-
Martin Storsjö authored
For loads where we load/store a full or half register (instead of a lanewise load/store), the lane specification in itself doesn't matter, only its size. This doesn't change the generated code, but makes it more readable.
-
- Mar 03, 2020
-
-
Jean-Baptiste Kempf authored
-
Janne Grunau authored
-
- Mar 02, 2020
-
-
Checkasm runtimes: Cortex A53 A72 A73 lpf_h_sb_uv_w4_16bpc_neon: 919.0 795.0 714.9 lpf_h_sb_uv_w6_16bpc_neon: 1267.7 1116.2 1081.9 lpf_h_sb_y_w4_16bpc_neon: 1500.2 1543.9 1778.5 lpf_h_sb_y_w8_16bpc_neon: 2216.1 2183.0 2568.1 lpf_h_sb_y_w16_16bpc_neon: 2641.8 2630.4 2639.4 lpf_v_sb_uv_w4_16bpc_neon: 836.5 572.7 667.3 lpf_v_sb_uv_w6_16bpc_neon: 1130.8 709.1 955.5 lpf_v_sb_y_w4_16bpc_neon: 1271.6 1434.4 1272.1 lpf_v_sb_y_w8_16bpc_neon: 1818.0 1759.1 1664.6 lpf_v_sb_y_w16_16bpc_neon: 1998.6 2115.8 1586.6 Corresponding numbers for 8 bpc for comparison: lpf_h_sb_uv_w4_8bpc_neon: 799.4 632.8 695.4 lpf_h_sb_uv_w6_8bpc_neon: 1067.3 613.6 767.5 lpf_h_sb_y_w4_8bpc_neon: 1490.5 1179.1 1018.9 lpf_h_sb_y_w8_8bpc_neon: 1892.9 1382.0 1172.0 lpf_h_sb_y_w16_8bpc_neon: 2117.4 1625.4 1739.0 lpf_v_sb_uv_w4_8bpc_neon: 447.1 447.7 446.0 lpf_v_sb_uv_w6_8bpc_neon: 522.1 529.0 513.1 lpf_v_sb_y_w4_8bpc_neon: 1043.7 785.0 775.9 lpf_v_sb_y_w8_8bpc_neon: 1500.4 1115.9 881.2 lpf_v_sb_y_w16_8bpc_neon: 1493.5 1371.4 1248.5
-
-
-
- Feb 25, 2020
-
-
Requires meson 0.51 for oss-fuzz and 0.49 for the fuzzing binaries in general due to the use of the 'kwargs' keyword argument.
-
-
- Feb 24, 2020
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Victorien Le Couviour--Tuffet authored
Add 2 seperate code paths for pri/sec strengths equal 0. Having both strengths not equal to 0 is uncommon, branching to skip unnecessary computations is therefore beneficial. ------------------------------------------ before: cdef_filter_4x4_8bpc_avx2: 93.8 after: cdef_filter_4x4_8bpc_avx2: 71.7 --------------------- before: cdef_filter_4x8_8bpc_avx2: 161.5 after: cdef_filter_4x8_8bpc_avx2: 116.3 --------------------- before: cdef_filter_8x8_8bpc_avx2: 221.8 after: cdef_filter_8x8_8bpc_avx2: 156.4 ------------------------------------------
-
Victorien Le Couviour--Tuffet authored
--------------------- fully edged blocks perf ------------------------------------------ before: cdef_filter_4x4_8bpc_avx2: 91.0 after: cdef_filter_4x4_8bpc_avx2: 75.7 --------------------- before: cdef_filter_4x8_8bpc_avx2: 154.6 after: cdef_filter_4x8_8bpc_avx2: 131.8 --------------------- before: cdef_filter_8x8_8bpc_avx2: 214.1 after: cdef_filter_8x8_8bpc_avx2: 195.9 ------------------------------------------
-
Change the input buffer randomization algorithm to more readily trigger issues with both under- and overflows in cdef_filter.
-
- Feb 21, 2020
-
-
Luc Trudeau authored
-
Luc Trudeau authored
Muxer and demuxers arrays are now statically initialized
-
- Feb 20, 2020
-
-
Luc Trudeau authored
-
- Feb 18, 2020
-
-
Janne Grunau authored
-
- Feb 17, 2020
-
-
Martin Storsjö authored
This increases the code size by around 3 KB on arm64. Before: ARM32: Cortex A7 A8 A9 A53 A72 A73 cdef_filter_4x4_8bpc_neon: 807.1 517.0 617.7 506.6 429.9 357.8 cdef_filter_4x8_8bpc_neon: 1407.9 899.3 1054.6 862.3 726.5 628.1 cdef_filter_8x8_8bpc_neon: 2394.9 1456.8 1676.8 1461.2 1084.4 1101.2 ARM64: cdef_filter_4x4_8bpc_neon: 460.7 301.8 308.0 cdef_filter_4x8_8bpc_neon: 831.6 547.0 555.2 cdef_filter_8x8_8bpc_neon: 1454.6 935.6 960.4 After: ARM32: cdef_filter_4x4_8bpc_neon: 669.3 541.3 524.4 424.9 322.7 298.1 cdef_filter_4x8_8bpc_neon: 1159.1 922.9 881.1 709.2 538.3 514.1 cdef_filter_8x8_8bpc_neon: 1888.8 1285.4 1358.5 1152.9 839.3 871.2 ARM64: cdef_filter_4x4_8bpc_neon: 383.6 262.1 259.9 cdef_filter_4x8_8bpc_neon: 684.9 472.2 464.7 cdef_filter_8x8_8bpc_neon: 1160.0 756.8 788.0 (The checkasm benchmark averages three different cases; the fully edged case is one of those three, while it's the most common case in actual video. The difference is much bigger if only benchmarking that particular case.) This actually apparently makes the code a little bit slower for the w=4 cases on Cortex A8, while it's a significant speedup on all other cores.
-
Martin Storsjö authored
The signedness of elements doesn't matter for vsub; match the vsub.i16 next to it.
-
- Feb 16, 2020
-
-
Henrik Gramner authored
Console output is incredibly slow on Windows, which is aggravated by the lack of line buffering. As a result, a significant percentage of overall runtime is actually spent displaying the decoding progress. Doing the line buffering manually alleviates most of the issue.
-