- 06 Mar, 2012 8 commits
-
-
Fiona Glaser authored
-
Ronald S. Bultje authored
Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.
-
Henrik Gramner authored
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero. This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI. As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations. Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary. Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
-
Fiona Glaser authored
x264_cavlc_init needs to be stack-aligned now.
-
Anton Mitrofanov authored
-
Steven Walters authored
-
Oka Motofumi authored
-
Anton Mitrofanov authored
BGR/BGRA input was correct.
-
- 15 Feb, 2012 1 commit
-
-
Fiona Glaser authored
Broke if the first macroblock in the slice exceeded the set slice-max-size.
-
- 05 Feb, 2012 1 commit
-
-
Henrik Gramner authored
Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv. Did not cause any problems.
-
- 04 Feb, 2012 18 commits
-
-
Fiona Glaser authored
TBM and BMI1 are supported by Trinity/Piledriver. The others (and BMI1) will probably appear in Intel's upcoming Haswell. Also update x86inc with AVX2 stuff.
-
Loren Merritt authored
-
Fiona Glaser authored
-
Fiona Glaser authored
Also remove unused AVX cruft.
-
Fiona Glaser authored
Covers both 8-bit and 16-bit, ~5-10% faster on Bulldozer.
-
Fiona Glaser authored
Field: 35(mmx) ->16(xop) cycles Frame: 32(ssse3)->20(xop) cycles
-
Fiona Glaser authored
Faster on Sandy Bridge. Also add details on unsuccessful optimizations in these functions.
-
Fiona Glaser authored
Might be useful in a few cases.
-
Ronald S. Bultje authored
This allows combining multiple conditionals in a single statement.
-
Anton Mitrofanov authored
Such sources are more common, so better to be correct for the common case. This also produces less error for the case of full range than the previous algorithm produced for the case of TV range.
-
Hii authored
-
Henrik Gramner authored
Displays version info in Windows Explorer.
-
Sergey Radionov authored
Isn't used by x264 currently, so didn't cause a problem. Fix backported from libav.
-
Mans Rullgard authored
Some linkers apparently fail to correctly align ARM functions when mixing with Thumb code.
-
Anton Mitrofanov authored
-
Fiona Glaser authored
Fixes an issue with referencing across I-frames that's prohibited in Blu-ray for some godforsaken reason.
-
Oka Motofumi authored
-
Anton Mitrofanov authored
-
- 18 Jan, 2012 1 commit
-
-
Loren Merritt authored
Trellis didn't return a boolean value as it was supposed to. Regression in r2143-5.
-
- 15 Jan, 2012 11 commits
-
-
Loren Merritt authored
Another 20% faster. 18k->12k codesize. This patch series may have a large impact on encoding speed. For example, 24% faster at --preset slower --crf 23 with 720p parkjoy. Overall speed increase is proportional to the cost of trellis (which is proportional to bitrate, and much more with --trellis 2).
-
Loren Merritt authored
-
Loren Merritt authored
Hoist the branch on coef value out of the loop over node contexts. Special cases for each possible coef value (0,1,n). Special case for dc-only blocks. Template the main loop for two common subsets of nodes, to avoid a bunch of branches about which nodes are live. Use the nonupdating version of cabac_size_decision in more cases, and omit those bins from the node struct. CABAC offsets are now compile-time constants. Change TRELLIS_SCORE_MAX from a specific constant to anything negative, which is cheaper to test. Remove dct_weight2_zigzag[], since trellis has to lookup zigzag[] anyway. 60% faster on x86_64. 25k->18k codesize.
-
Loren Merritt authored
Due to different tie-break order.
-
Henrik Gramner authored
Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments
-
Ilia Valiakhmetov authored
From Google Code-In.
-
Edward Wang authored
From Google Code-In.
-
Fiona Glaser authored
Use integer MAC for one of the SUMSUB passes. About a dozen cycles faster for 16x16.
-
Cristian Militaru authored
From Google Code-In.
-
Fiona Glaser authored
Helps the most with trellis and RD, but also helps with bitstream writing. Seems at worst neutral even in the extreme case of a CPU with small L2 cache (e.g. ARM Cortex A8).
-
Matt Habel authored
Also add an ACCUM macro to handle accumulator-induced add-or-swap more concisely.
-