- 16 Dec, 2014 24 commits
-
-
Anton Mitrofanov authored
Update to the latest version of gas-preprocessor.pl from http://git.libav.org/?p=gas-preprocessor.git Contributions by Janne Grunau, Martin Storsjo, Mans Rullgard, David Conrad, Martin Aumuller and others
-
benchmarks on a Nexus 9 (nvidia denver): 101.3 cycles in x264_cabac_encode_decision_c, 67105369 runs, 3495 skips 97.3 cycles in x264_cabac_encode_decision_asm, 67105493 runs, 3371 skips 132.8 cycles in x264_cabac_encode_terminal_c, 1046950 runs, 1626 skips 116.1 cycles in x264_cabac_encode_terminal_asm, 1048424 runs, 152 skips 92.4 cycles in x264_cabac_encode_bypass_c, 16776192 runs, 1024 skips 89.6 cycles in x264_cabac_encode_bypass_asm, 16776453 runs, 763 skips Cycle counts are not as stable as one would like. The dynamic code optimisation seems to produce different results for small chnages in a binary. Repeated runs with the same binary produce stable results though (ignoring the first run).
-
Needs kernel support since user space access to the cycle counter is not allowed on all available AArch64 systems (Android 5 and iOS).
-
3-4 times faster.
-
2-3 times faster than C.
-
x264_mbtree_propagate_cost_neon is ~7 times faster. x264_mbtree_propagate_list_neon is 33% faster.
-
3.5 times faster.
-
All functions ~33% faster.
-
deblock_luma_intra[0]_neon is 2 times fastes, deblock_luma_intra[1]_neon is ~4 times faster.
-
deblock_h_chroma_422 2.5 times faster
-
deblock_chroma_420_mbaff_neon 2 times faster
-
deblock_h_chroma_420_intra, deblock_h_chroma_422_intra and x264_deblock_h_chroma_intra_mbaff_neon are ~3 times faster. deblock_chroma_intra[1] is ~4 times faster than C.
-
-
integral_init4h_neon and integral_init8h_neon are 3-4 times faster than C. integral_init8v_neon is 6 times faster and integral_init4v_neon is 10 times faster.
-
Between 10% and 40% faster than C.
-
decimate_score15 and 16 are 60% faster, decimate_score64 is 4 times faster than C.
-
4 times faster than C.
-
7 times faster than C.
-
pixel_sad_4x16_neon: 33% faster than C pixel_satd_4x16_neon: 5 times faster pixel_ssd_4x16_neon: 4 times faster
-
13 times faster than C.
-
35 times faster than C.
-
zigzag_scan_4x4_field_neon, zigzag_sub_4x4_field_neon, zigzag_sub_4x4ac_field_neon, zigzag_sub_4x4_frame_neon, igzag_sub_4x4ac_frame_neon more than 2 times faster zigzag_scan_8x8_frame_neon, zigzag_scan_8x8_field_neon, zigzag_sub_8x8_field_neon, zigzag_sub_8x8_frame_neon 4-5 times faster zigzag_interleave_8x8_cavlc_neon 6 times faster
-
~20% faster than calling pixel_sa8d_16x16 and pixel_satd_16x16 separately.
-
25% faster than the previous version.
-
- 12 Dec, 2014 4 commits
-
-
Henrik Gramner authored
All CPUs with AVX2 supports FMA3 (but not the other way around).
-
Anton Mitrofanov authored
-
Henrik Gramner authored
-
Anton Mitrofanov authored
Didn't affect output due to the incorrect values either not being used in the code path or producing equal results compared to the correct values. Also deduplicate hpel_ref arrays.
-
- 01 Dec, 2014 1 commit
-
-
Anton Mitrofanov authored
-
- 29 Nov, 2014 1 commit
-
-
Henrik Gramner authored
It would previously report FAILED if any of the earlier plane_copy tests failed.
-
- 17 Oct, 2014 3 commits
-
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
-
Henrik Gramner authored
40->27 cycles on Haswell.
-
- 09 Oct, 2014 1 commit
-
-
Henrik Gramner authored
Improves the accuracy of benchmarks, especially in short functions. To quote the Intel 64 and IA-32 Architectures Software Developer's Manual: "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. If software requires RDTSC to be executed only after all previous instructions have completed locally, it can either use RDTSCP (if the processor supports that instruction) or execute the sequence LFENCE;RDTSC." RDTSCP would accomplish the same task, but it's only available since Nehalem. This change makes SSE2 a requirement to run checkasm.
-
- 29 Sep, 2014 1 commit
-
-
Vittorio Giovara authored
-
- 16 Sep, 2014 5 commits
-
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
-