- Dec 24, 2017
-
-
Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI option to set the bit depth at runtime. Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an incorrect value, it's preferable to induce a linking failure. If applications relies on this symbol this will make it more obvious where the problem is. Add Makefile rules that compiles modules with different bit depths. Assembly on x86 is prefixed with the 'private_prefix' define, while all other archs modify their function prefix internally. Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64 assembly, PowerPC assembly, and MIPS assembly. The depth and cache CLI filters heavily depend on bit depth size, so they need to be duplicated for each value. This means having to rename these filters, and adjust the callers to use the right version. Unfortunately the threaded input CLI module inherits a common.h dependency (input/frame -> common/threadpool -> common/frame -> common/common) which is extremely complicated to address in a sensible way. Instead duplicate the module and select the appropriate one at run time. Each bitdepth needs different checkasm compilation rules, so split the main checkasm target into two executables.
-
qp is modified to require a valid value before use, while qp_max is set to maximum allowable value (and clipped later on). This is needed so that param functions do not depend on bit depth size.
-
-
-
-
Anton Mitrofanov authored
-
Also drop the x264 prefix from all static cabac arrays.
-
-
Use dword instead of qword entries. Cuts the size of the tables in half which allows each table fit inside a single cache line. When PIC is disabled dwords are enough to store absolute addresses. When PIC is enabled we can store dword offsets relative to the start of the table and simply add the address of the table to the offset in order to calculate the full address. This approach also have the advantage of eliminating a whole bunch of run-time .data relocations.
-
On ELF platforms such symbols needs to be flagged as functions with the correct visibility to please certain linkers in some scenarios.
-
The standard section for read-only data on Windows is .rdata. Nasm will flag non-standard sections as executable by default which isn't ideal.
-
-
There are 32 pseudo-instructions for each floating-point comparison instruction, but only 8 of them are actually valid in legacy-encoded mode. The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions and can therefore be disregarded for this purpose.
-
-
Anton Mitrofanov authored
Fix MSVS fprofiled build for win64
-
Anton Mitrofanov authored
-
- Aug 11, 2017
-
-
Henrik Gramner authored
Some cpuflags would previously be displayed incorrectly when running older operating systems without AVX support on modern CPU:s.
-
- Jun 26, 2017
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
- Jun 24, 2017
-
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
-
Henrik Gramner authored
Uses gathers and scatters in combination with conflict detections to vectorize the scalar part. Also improve the checkasm test to try different mb_y values and check for out-of-bounds writes.
-
Upstreaming this from FFmpeg. Unused in x264.
-
- Jun 14, 2017
-
-
The existing functions could easily be used by just calling them twice - this would give the following cycle numbers from checkasm: var2_8x8_c: 4110 var2_8x8_neon: 1505 var2_8x16_c: 8019 var2_8x16_neon: 2545 However, by merging both passes into the same function, we get the following speedup: var2_8x8_neon: 1205 var2_8x16_neon: 2327
-
The existing functions could easily be used by just calling them twice - this would give the following cycle numbers from checkasm: Cortex A7 A8 A9 A53 var2_8x8_c: 7302 5342 5050 4400 var2_8x8_neon: 2645 1612 1932 1715 var2_8x16_c: 14300 10528 10020 8637 var2_8x16_neon: 5127 2695 3217 2651 However, by merging both passes into the same function, we get the following speedup: var2_8x8_neon: 2312 1190 1389 1300 var2_8x16_neon: 4862 2130 2293 2422
-
These levels were added in the 2016-10 revision of the H.264 specification and improves support for content with high resolutions and/or high frame rates. Level 6.2 supports 8K resolution at 120 fps. Also shrink the x264_levels array by using smaller data types.
-
Makes it possible to use slicing with resolutions larger than 2^24 pixels.
-
Use a dynamic size depending on the MV range. Reduces memory consumption by up to a few megabytes. Drop a related old miscompilation check since it may otherwise cause an out-of-bounds memory access. Also remove an unused extern variable declaration.
-
Anton Mitrofanov authored
Use the correct ctxIdxInc calculation for coded_block_flag.
-
Anton Mitrofanov authored
Change V and H intra prediction in lossless (TransformBypassModeFlag == 1) macroblocks to correctly adhere to the specification. Affects lossless encoding with 8x8dct or mix of lossless with normal macroblocks. 8x8dct has already been disabled in lossless mode for some time due to being out-of-spec but this will allow us to re-enable it again.
-
Anton Mitrofanov authored
Could occur on the 1st pass in combination with --fake-interlaced and some input heights due to allocating a too small buffer.
-
- May 23, 2017
-
-
Henrik Gramner authored
Functions that uses self-relative expressions in the form of [foo-$$] appears to cause issues on 64-bit Mach-O systems when assembled with nasm. Temporarily disable those functions on macho64 for the time being until we've figured out the root cause.
-
Only check if option -Werror=unknown-warning-option is supported before adding it
-
Prior to this, this loop hasn't run at all. The condition has been the same since it was introduced in 5b0cb86f. This issue was pointed out by a clang warning.
-
-
- May 21, 2017
-
-
Henrik Gramner authored
-