- 17 Jan, 2018 1 commit
-
-
Henrik Gramner authored
Fixes segfaults on Windows where the stack is only 16-byte aligned.
-
- 24 Dec, 2017 3 commits
-
-
Only 17 elements are actually used. It was originally padded to 64 bytes to avoid cache line splits in the x86 assembly, but those haven't really been an issue on x86 CPU:s made in the past decade or so. Benchmarking shows no performance impact from dropping the padding, so might as well remove it and save some cache.
-
Anton Mitrofanov authored
Fixes some thread safety doubts and makes code cleaner. Downside: slightly higher memory usage when calling multiple encoders from the same application.
-
Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI option to set the bit depth at runtime. Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an incorrect value, it's preferable to induce a linking failure. If applications relies on this symbol this will make it more obvious where the problem is. Add Makefile rules that compiles modules with different bit depths. Assembly on x86 is prefixed with the 'private_prefix' define, while all other archs modify their function prefix internally. Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64 assembly, PowerPC assembly, and MIPS assembly. The depth and cache CLI filters heavily depend on bit depth size, so they need to be duplicated for each value. This means having to rename these filters, and adjust the callers to use the right version. Unfortunately the threaded input CLI module inherits a common.h dependency (input/frame -> common/threadpool -> common/frame -> common/common) which is extremely complicated to address in a sensible way. Instead duplicate the module and select the appropriate one at run time. Each bitdepth needs different checkasm compilation rules, so split the main checkasm target into two executables.
-
- 24 Jun, 2017 1 commit
-
-
Henrik Gramner authored
-
- 14 Jun, 2017 2 commits
-
-
Makes it possible to use slicing with resolutions larger than 2^24 pixels.
-
Anton Mitrofanov authored
Use the correct ctxIdxInc calculation for coded_block_flag.
-
- 21 May, 2017 6 commits
-
-
Henrik Gramner authored
-
Henrik Gramner authored
Also drop the MMX version and make some slight improvements to the SSE2, SSSE3, AVX, and AVX2 versions.
-
Henrik Gramner authored
Reorder some elements in the x264_t.mb.pic struct to reduce the amount of padding required. Also drop the MMX implementation in favor of SSE.
-
Henrik Gramner authored
Reorder some elements in the x264_mb_analysis_list_t struct to reduce the amount of padding required. Also drop the MMX implementation in favor of SSE.
-
Henrik Gramner authored
The vperm* instructions ignores unused bits, so we can pack the permutation indices together to save cache and just use a shift to get the right values.
-
Henrik Gramner authored
-
- 19 May, 2017 1 commit
-
-
Henrik Gramner authored
Drop ALIGNED_N and ALIGNED_ARRAY_N in favor of using explicit alignment. This will allow us to increase the native alignment without unnecessarily increasing the alignment of everything that's currently 32-byte aligned.
-
- 21 Jan, 2017 3 commits
-
-
Vittorio Giovara authored
-
Henrik Gramner authored
20% faster than SSSE3.
-
Henrik Gramner authored
-
- 01 Dec, 2016 1 commit
-
-
Anton Mitrofanov authored
Also make x264_weighted_reference_duplicate() static.
-
- 20 Apr, 2016 1 commit
-
-
Anton Mitrofanov authored
-
- 16 Jan, 2016 1 commit
-
-
Henrik Gramner authored
-
- 18 Aug, 2015 1 commit
-
-
Those are false positives, but it doesn't hurt to get rid of them.
-
- 23 Feb, 2015 1 commit
-
-
Anton Mitrofanov authored
-
- 20 Jul, 2014 1 commit
-
-
Anton Mitrofanov authored
-
- 24 Feb, 2014 1 commit
-
-
Anton Mitrofanov authored
Actually allocate less (instead of just initialize less) and fix comments.
-
- 21 Jan, 2014 2 commits
-
-
Kieran Kunhya authored
-
Fiona Glaser authored
We don't need to wastefully allocate quant tables above QP_MAX_SPEC; they're never used.
-
- 08 Jan, 2014 1 commit
-
-
Henrik Gramner authored
Also update AUTHORS file and my e-mail address in the headers of various files.
-
- 30 Oct, 2013 2 commits
-
-
Anton Mitrofanov authored
It probably wasn't used or maintained for last few years.
-
Anton Mitrofanov authored
Do the reconfig when the next frame's encode begins. Fixes some rare crashes with frame-threading and encoder_reconfig.
-
- 23 Aug, 2013 2 commits
-
-
Kieran Kunhya authored
This format has been reverse engineered and x264's output has almost exactly the same bitstream as Panasonic cameras and encoders produce. It therefore does not comply with SMPTE RP2027 since Panasonic themselves do not comply with their own specification. It has been tested in Avid, Premiere, Edius and Quantel. Parts of this patch were written by Fiona Glaser and some reverse engineering was done by Joseph Artsimovich.
-
Henrik Gramner authored
Combine frame and mb data mallocs into a single large malloc. Additionally, on Linux systems with hugepage support, ask for hugepages on large mallocs. This gives a small performance improvement (~0.2-0.9%) on systems without hugepage support, as well as a small memory footprint reduction. On recent Linux kernels with hugepage support enabled (set to madvise or always), it improves performance up to 4% at the cost of about 7-12% more memory usage on typical settings.. It may help even more on Haswell and other recent CPUs with improved 2MB page support in hardware.
-
- 03 Jul, 2013 1 commit
-
-
Henrik Gramner authored
-
- 20 May, 2013 1 commit
-
-
Anton Mitrofanov authored
Autoload the OpenCL library so that it's not required to run an openCL-enabled build of x264. Update X264_BUILD, which should have been changed with the first patch.
-
- 23 Apr, 2013 6 commits
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Fiona Glaser authored
AVX2 functions: mc_chroma intra_sad_x3_16x16 last64 ads hpel dct4 idct4 sub16x16_dct8 quant_4x4x4 quant_4x4 quant_4x4_dc quant_8x8 SAD_X3/X4 SATD var var2 SSD zigzag interleave weightp weightb intra_sad_8x8_x9 decimate integral hadamard_ac sa8d_satd sa8d lowres_init denoise
-
Fiona Glaser authored
RDO: ~20% faster than C Bitstream: ~50% faster than C 1-2% faster overall, highest on preset superfast/fast/medium.
-
Steve Borho authored
OpenCL support is compiled in by default, but must be enabled at runtime by an --opencl command line flag. Compiling OpenCL support requires perl. To avoid the perl requirement use: configure --disable-opencl. When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU device. Lowres intra cost prediction, lowres motion search (including subpel) and bidir cost predictions are all done on the GPU. MB-tree and final slice decisions are still done by the CPU. Presets which do not use a threaded lookahead will not use OpenCL at all (superfast, ultrafast). Because of data dependencies, the GPU must use an iterative motion search which performs more total work than the CPU would do, so this is not work efficient or power efficient. But if there are spare GPU cycles to spare, it can often speed up the encode. Output quality when OpenCL lookahead is enabled is often very slightly worse in quality than the CPU quality (because of the same data dependencies). x264 must co...
-
Fiona Glaser authored
-
- 25 Feb, 2013 1 commit
-
-
Fiona Glaser authored
Branchlessly handle elimination of candidates in MMX roundclip asm. Add a new asm function, similar to roundclip, except without the round part. Optimize and organize the C code, and make both subme>=3 and subme<3 consistent. Add lots of explanatory comments and try to make things a little more understandable. ~5-10% faster with subme>=3, ~15-20% faster with subme<3.
-