- 16 Jan, 2016 12 commits
-
-
Henrik Gramner authored
Avoids some code duplication. Also drop the -mno-cygwin check since that option was removed back in 2008.
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
--disable-win32thread can be passed as an argument to configure to compile with pthreads, which was the old default behavior.
-
Henrik Gramner authored
A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
-
Henrik Gramner authored
The function existed but was never enabled.
-
Geza Lore authored
Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF.
-
Henrik Gramner authored
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. This patch doesn't modify any emitted instructions, and doesn't actually affect x264 at all. It's only for other projects that use x86inc.asm without an appropriate `strip` command in their buildsystem. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything.
-
Henrik Gramner authored
cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment.
-
Henrik Gramner authored
-
Henrik Gramner authored
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments.
-
Henrik Gramner authored
* Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand.
-
- 03 Jan, 2016 3 commits
-
-
Henrik Gramner authored
-
Henrik Gramner authored
Makes it possible to use them in arithmetic expressions.
-
Henrik Gramner authored
Furthermore, the x264_analyse_prepare_costs() and x264_analyse_init_costs() functions were only used in x264_encoder_open(), so move that entire section of code to analyse.c as well to simplify things.
-
- 20 Dec, 2015 2 commits
-
-
Janne Grunau authored
The asm is only for 8-bit and function prototypes reflect that. Avoids numerous warnings with --bit-depth=9/10.
-
Janne Grunau authored
Android 6 does not link shared libraries with text relocations. Make the movrel macro position independent and add movrelx for indirect loads of external symbols. Move the function pointer table for the aligned memcpy variants to the data.rel.ro section on Linux/Android.
-
- 17 Oct, 2015 2 commits
-
-
Martin Storsjö authored
-
Janne Grunau authored
r9 is a volatile register in the iOS ABI and will therefore not be preserved by compiled functions like the luma motion compensation. Add the symbol prefix to the puts() call and use blx since a switch between arm and thumb mode might be required.
-
- 11 Oct, 2015 21 commits
-
-
Anton Mitrofanov authored
Patch from FreeBSD ports.
-
Anton Mitrofanov authored
Some compilers depending on target OS uses 4-byte stack alignment by default. Explicitly check known good compilers and specific options for stack alignment.
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
Now high bit depth VBV should act more like 8-bit depth one.
-
Anton Mitrofanov authored
It was previously used but never updated from it's initialization value.
-
Anton Mitrofanov authored
Keep predictor offsets more stable. This should fix VBV misprediction in frames with a large difference in complexity between the top and bottom parts.
-
Martin Storsjö authored
The cost function could be simplified to avoid having to clobber q4/q5, but this requires reordering instructions which increase the total runtime. checkasm timing Cortex-A7 A8 A9 mbtree_propagate_cost_c 63702 155835 62829 mbtree_propagate_cost_neon 17199 10454 11106 mbtree_propagate_list_c 104203 108949 84532 mbtree_propagate_list_neon 82035 78348 60410
-
Martin Storsjö authored
This avoids having to duplicate the same code for all architectures that implement only the internal part of this function in assembler.
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 deblock_luma_intra[0]_c 5988 4653 4316 deblock_luma_intra[0]_neon 3103 2170 2128 deblock_luma_intra[1]_c 7119 5905 5347 deblock_luma_intra[1]_neon 2068 1381 1412 This includes extra optimizations by Janne Grunau. Timings from a separate build, on Exynos 5422: Cortex-A7 A15 deblock_luma_intra[0]_c 6627 3300 deblock_luma_intra[0]_neon 3059 1128 deblock_luma_intra[1]_c 7314 4128 deblock_luma_intra[1]_neon 2038 720
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 intra_predict_8x16c_dct_c 862 540 590 intra_predict_8x16c_dct_neon 608 511 657 intra_predict_8x16c_h_c 972 707 719 intra_predict_8x16c_h_neon 722 656 672 intra_predict_8x16c_p_c 10183 9819 8655 intra_predict_8x16c_p_neon 2622 1972 1983
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 plane_copy_c 13124 10925 9106 plane_copy_neon 7349 5103 8945
-
Martin Storsjö authored
Cast the function pointer to a different type signature, to be able to use uint64_t as return type (instead of intptr_t) for those calls that require it. Use two separate functions, depending on whether neon is available.
-
Martin Storsjö authored
To test all codepaths in the aarch64 neon implementation, one at the very least needs to test with width 8, 16, 24 and 32.
-
Jerome Duval authored
Add Haiku as supported platform in configure. Haiku has no nice() function, use the platform specific substitute instead.
-
Martin Storsjö authored
Disable this on iOS, since it has got a slightly different ABI for vararg parameters.
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 decimate_score15_c 764 736 535 decimate_score15_neon 487 494 453 decimate_score16_c 782 727 553 decimate_score16_neon 487 494 521 decimate_score64_c 2361 2597 2011 decimate_score64_neon 1017 802 785
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 deblock_chroma_420_intra_mbaff_c 1469 1276 1181 deblock_chroma_420_intra_mbaff_neon 981 717 644 deblock_chroma_intra[1]_c 2954 2402 2321 deblock_chroma_intra[1]_neon 947 581 575 deblock_h_chroma_420_intra_c 2859 2509 2264 deblock_h_chroma_420_intra_neon 1480 1119 1028 deblock_h_chroma_422_intra_c 6211 5030 4792 deblock_h_chroma_422_intra_neon 2894 1990 2077
-
Martin Storsjö authored
This requires spilling some registers to the stack, contray to the aarch64 version. checkasm timing Cortex-A7 A8 A9 sa8d_satd_16x16_neon 12936 6365 7492 sa8d_satd_16x16_separate_neon 14841 6605 8324
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 deblock_chroma_420_mbaff_c 1944 1706 1526 deblock_chroma_420_mbaff_neon 1210 873 865
-
Martin Storsjö authored
checkasm timing Cortex-A7 A8 A9 deblock_h_chroma_422_c 6953 6269 5145 deblock_h_chroma_422_neon 3905 2569 2551
-