- 21 Jul, 2018 1 commit
-
-
Anton Mitrofanov authored
The path cost for high resolutions can exceed COST_MAX.
-
- 17 Jan, 2018 1 commit
-
-
Henrik Gramner authored
-
- 24 Dec, 2017 2 commits
-
-
Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI option to set the bit depth at runtime. Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an incorrect value, it's preferable to induce a linking failure. If applications relies on this symbol this will make it more obvious where the problem is. Add Makefile rules that compiles modules with different bit depths. Assembly on x86 is prefixed with the 'private_prefix' define, while all other archs modify their function prefix internally. Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64 assembly, PowerPC assembly, and MIPS assembly. The depth and cache CLI filters heavily depend on bit depth size, so they need to be duplicated for each value. This means having to rename these filters, and adjust the callers to use the right version. Unfortunately the threaded input CLI module inherits a common.h dependency (input/frame -> common/threadpool -> common/frame -> common/common) which is extremely complicated to address in a sensible way. Instead duplicate the module and select the appropriate one at run time. Each bitdepth needs different checkasm compilation rules, so split the main checkasm target into two executables.
-
-
- 21 May, 2017 3 commits
-
-
Henrik Gramner authored
Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.
-
Henrik Gramner authored
Also make the order of fenc/fdec arguments a bit more consistent.
-
Henrik Gramner authored
Improves cost calculations, especially when a short MV range is used.
-
- 21 Jan, 2017 1 commit
-
-
Henrik Gramner authored
-
- 12 Apr, 2016 2 commits
-
-
Anton Mitrofanov authored
The b_intra_penalty parameter is no longer used anywhere after the improvements to the --b-adapt 1 algorithm.
-
Anton Mitrofanov authored
Roughly the same speed as before but with significantly better results, comparable to --b-adapt 2.
-
- 11 Apr, 2016 1 commit
-
-
Anton Mitrofanov authored
Cost analyse functions expects the placement of the B-ref in a sequence of an even number of B-frames to be located towards the beginning while the actual placement was towards the end. Change the placement to be consistent with the analyse expectations, e.g. PbbBbP -> PbBbbP.
-
- 16 Jan, 2016 1 commit
-
-
Henrik Gramner authored
-
- 11 Oct, 2015 1 commit
-
-
Anton Mitrofanov authored
Now high bit depth VBV should act more like 8-bit depth one.
-
- 25 Jul, 2015 1 commit
-
-
Anton Mitrofanov authored
This should improve MBTree and VBV when a lot of forced frame types are used.
-
- 16 Jul, 2015 1 commit
-
-
Anton Mitrofanov authored
-
- 23 Feb, 2015 1 commit
-
-
Anton Mitrofanov authored
-
- 03 Sep, 2014 1 commit
-
-
Anton Mitrofanov authored
-
- 13 Mar, 2014 1 commit
-
-
Fiona Glaser authored
Move the second core part of macroblock tree into an assembly function; SIMD-optimize roughly half of it (for x86). Roughly ~25-65% faster mbtree, depending on content. Slightly change how mbtree handles the tradeoff between range and precision for propagation. Overall a slight (but mostly negligible) effect on SSIM and ~2% faster.
-
- 12 Mar, 2014 1 commit
-
-
Henrik Gramner authored
Reduce the number of registers used from 7 to 6. Reduce the number of vector registers used by the AVX2 implementation from 8 to 7. Multiply fps_factor by 1/256 once per frame instead of once per macroblock row. Use mova instead of movu for dst since it's guaranteed to be aligned. Some cosmetics.
-
- 08 Jan, 2014 1 commit
-
-
Henrik Gramner authored
Also update AUTHORS file and my e-mail address in the headers of various files.
-
- 23 Apr, 2013 2 commits
-
-
Steve Borho authored
OpenCL support is compiled in by default, but must be enabled at runtime by an --opencl command line flag. Compiling OpenCL support requires perl. To avoid the perl requirement use: configure --disable-opencl. When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU device. Lowres intra cost prediction, lowres motion search (including subpel) and bidir cost predictions are all done on the GPU. MB-tree and final slice decisions are still done by the CPU. Presets which do not use a threaded lookahead will not use OpenCL at all (superfast, ultrafast). Because of data dependencies, the GPU must use an iterative motion search which performs more total work than the CPU would do, so this is not work efficient or power efficient. But if there are spare GPU cycles to spare, it can often speed up the encode. Output quality when OpenCL lookahead is enabled is often very slightly worse in quality than the CPU quality (because of the same data dependencies). x264 must compile its OpenCL kernels for your device before running them, and in order to avoid doing this every run it caches the compiled kernel binary in a file named x264_lookahead.clbin (--opencl-clbin FNAME to override). The cache file will be ignored if the device, driver, or OpenCL source are changed. x264 will use the first GPU device which supports the required cl_image features required by its kernels. Most modern discrete GPUs and all AMD integrated GPUs will work. Intel integrated GPUs (up to IvyBridge) do not support those necessary features. Use --opencl-device N to specify a number of capable GPUs to skip during device detection. Switchable graphics environments (e.g. AMD Enduro) are currently not supported, as some have bugs in their OpenCL drivers that cause output to be silently incorrect. Developed by MulticoreWare with support from AMD and Telestream.
-
Fiona Glaser authored
Rescale the scale factor if the offset clips. This makes weightp more effective in fades to/from white (and an other situation that requires big offsets). Search more than 1 scale factor and more than 1 offset, depending on --subme. Try to find the optimal chroma denominator instead of hardcoding it. Overall improvement: a few percent in fade-heavy clips, such as a sample from Avatar: TLA.
-
- 26 Feb, 2013 1 commit
-
-
Fiona Glaser authored
There's quite a few others, but most of them don't help to fix or there's no easy way to avoid them.
-
- 25 Feb, 2013 2 commits
-
-
Fiona Glaser authored
Branchlessly handle elimination of candidates in MMX roundclip asm. Add a new asm function, similar to roundclip, except without the round part. Optimize and organize the C code, and make both subme>=3 and subme<3 consistent. Add lots of explanatory comments and try to make things a little more understandable. ~5-10% faster with subme>=3, ~15-20% faster with subme<3.
-
Anton Mitrofanov authored
Code assumed keyframe analysis would only pull one frame off the list; this isn't true with open-gop.
-
- 09 Jan, 2013 1 commit
-
-
Loren Merritt authored
-
- 08 Jan, 2013 1 commit
-
-
Anton Mitrofanov authored
This is obviously bad user input, but x264 shouldn't crash if it happens.
-
- 07 Nov, 2012 1 commit
-
-
Fiona Glaser authored
-
- 18 May, 2012 1 commit
-
-
Fiona Glaser authored
Split each lookahead frame analysis call into multiple threads. Has a small impact on quality, but does not seem to be consistently any worse. This helps alleviate bottlenecks with many cores and frame threads. In many case, this massively increases performance on many-core systems. For example, over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system. Realtime 1080p30 at --preset slow should now be feasible on real systems. For sliced-threads, this patch should be faster regardless of settings (~10%). By default, lookahead threads are 1/6 of regular threads. This isn't exacting, but it seems to work well for all presets on real systems. With sliced-threads, it's the same as the number of encoding threads.
-
- 23 Apr, 2012 1 commit
-
-
Henrik Gramner authored
New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.
-
- 27 Mar, 2012 1 commit
-
-
Kieran Kunhya authored
-
- 07 Mar, 2012 1 commit
-
-
Anton Mitrofanov authored
Helps avoid VBV predictors going nuts with very low-cost MBs. One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.
-
- 06 Mar, 2012 1 commit
-
-
Henrik Gramner authored
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero. This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI. As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations. Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary. Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
-
- 04 Feb, 2012 1 commit
-
-
Hii authored
-
- 11 Nov, 2011 1 commit
-
-
Anton Mitrofanov authored
-
- 22 Oct, 2011 1 commit
-
-
Henrik Gramner authored
Gives a slight speed increase and significant binary size reduction when only one chroma format is needed.
-
- 15 Oct, 2011 1 commit
-
-
Fiona Glaser authored
The only real bug here is in slicetype.c, which may or may not affect real encodes.
-
- 21 Sep, 2011 1 commit
-
-
Henrik Gramner authored
-
- 24 Aug, 2011 2 commits
-
-
Loren Merritt authored
-
Loren Merritt authored
-