- 25 Jul, 2015 12 commits
-
-
Rishikesh More authored
Signed-off-by:
Rishikesh More <rishikesh.more@imgtec.com>
-
Rishikesh More authored
Add macros for load/store, slide, shift, transpose and basic arithmetic operations required by subsequent patches. Signed-off-by:
Rishikesh More <rishikesh.more@imgtec.com>
-
Kaustubh Raste authored
MSA is the MIPS SIMD Architecture. Add X264_CPU_MSA define. Update configure to detect MIPS platform and set flags. CPU-specific gcc options are expected through --extra-cflags. Sample command line for mips32r5: ./configure --host=mipsel-linux-gnu --cross-prefix=<TOOLCHAIN>/mips-mti-linux-gnu- --extra-cflags="-EL -mips32r5 -msched-weight -mload-store-pairs" Signed-off-by:
Kaustubh Raste <kaustubh.raste@imgtec.com>
-
Anton Mitrofanov authored
This should improve MBTree and VBV when a lot of forced frame types are used.
-
Henrik Gramner authored
For NV21 input.
-
Xiaolei Yu authored
Eliminates an extra copy when encoding Android camera preview images. Checkasm test by Janne Grunau. ARM assembly with improvements from Janne Grunau.
-
Henrik Gramner authored
-
Henrik Gramner authored
Enables the use of nasm as an alternative to yasm. Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't support [symbol-$$] addressing which is used extensively by x264's PIC code. This includes all 64-bit Windows and 64-bit OS X builds, even non-shared. For the above reason nasm is currently intentionally not auto-detected, instead the assembler must be explicitly specified using "AS=nasm ./configure". Also drop -O2 from ASFLAGS since it's simply ignored anyway.
-
Timothy Gu authored
struc and endstruc attempts to revert to the previous section state set by the SECTION macro. Use the primitive [SECTION] directive instead of the SECTION macro for the .note.GNU-stack section to prevent it from being emitted again during endstruc.
-
Henrik Gramner authored
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
-
Henrik Gramner authored
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
-
Henrik Gramner authored
Just for consistency, doesn't affect behavior.
-
- 24 Jul, 2015 2 commits
-
-
Henrik Gramner authored
* Fix unsigned <= 0 check. * Add additional size sanity check on 32-bit systems. * Don't read uninitialized data if fread() fails.
-
Henrik Gramner authored
-
- 16 Jul, 2015 2 commits
-
-
Henrik Gramner authored
-
Henrik Gramner authored
Could only occur in 4:2:2 with height == 1. Also enable asm for inputs with different U/V strides as long as the strides have identical signs.
-
- 23 Feb, 2015 3 commits
-
-
Christophe Gisquet authored
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2 instructions did not issue warnings when used in SSE functions. Handle it by also checking the register type when such instructions are used.
-
Christophe Gisquet authored
-
Anton Mitrofanov authored
-
- 20 Dec, 2014 2 commits
-
-
Anton Mitrofanov authored
-
Henrik Gramner authored
Also remove the MMX2 implementation and fix src overread for height == 1.
-
- 16 Dec, 2014 19 commits
-
-
Janne Grunau authored
benchmarks on a Nexus 9 (nvidia denver): 101.3 cycles in x264_cabac_encode_decision_c, 67105369 runs, 3495 skips 97.3 cycles in x264_cabac_encode_decision_asm, 67105493 runs, 3371 skips 132.8 cycles in x264_cabac_encode_terminal_c, 1046950 runs, 1626 skips 116.1 cycles in x264_cabac_encode_terminal_asm, 1048424 runs, 152 skips 92.4 cycles in x264_cabac_encode_bypass_c, 16776192 runs, 1024 skips 89.6 cycles in x264_cabac_encode_bypass_asm, 16776453 runs, 763 skips Cycle counts are not as stable as one would like. The dynamic code optimisation seems to produce different results for small chnages in a binary. Repeated runs with the same binary produce stable results though (ignoring the first run).
-
Janne Grunau authored
3-4 times faster.
-
Janne Grunau authored
2-3 times faster than C.
-
Janne Grunau authored
x264_mbtree_propagate_cost_neon is ~7 times faster. x264_mbtree_propagate_list_neon is 33% faster.
-
Janne Grunau authored
3.5 times faster.
-
Janne Grunau authored
All functions ~33% faster.
-
Janne Grunau authored
deblock_luma_intra[0]_neon is 2 times fastes, deblock_luma_intra[1]_neon is ~4 times faster.
-
Janne Grunau authored
deblock_h_chroma_422 2.5 times faster
-
Janne Grunau authored
deblock_chroma_420_mbaff_neon 2 times faster
-
Janne Grunau authored
deblock_h_chroma_420_intra, deblock_h_chroma_422_intra and x264_deblock_h_chroma_intra_mbaff_neon are ~3 times faster. deblock_chroma_intra[1] is ~4 times faster than C.
-
Janne Grunau authored
-
Janne Grunau authored
integral_init4h_neon and integral_init8h_neon are 3-4 times faster than C. integral_init8v_neon is 6 times faster and integral_init4v_neon is 10 times faster.
-
Janne Grunau authored
Between 10% and 40% faster than C.
-
Janne Grunau authored
decimate_score15 and 16 are 60% faster, decimate_score64 is 4 times faster than C.
-
Janne Grunau authored
4 times faster than C.
-
Janne Grunau authored
7 times faster than C.
-
Janne Grunau authored
pixel_sad_4x16_neon: 33% faster than C pixel_satd_4x16_neon: 5 times faster pixel_ssd_4x16_neon: 4 times faster
-
Janne Grunau authored
13 times faster than C.
-
Janne Grunau authored
35 times faster than C.
-