- 11 Feb, 2009 2 commits
-
-
Anton Mitrofanov authored
A "make distclean" is probably required after updating to this revision.
-
Fiona Glaser authored
Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.
-
- 10 Feb, 2009 1 commit
-
-
Manuel Rommel authored
Original thread: date: Mon, Feb 9, 2009 at 9:37 PM subject: [x264-devel] commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )
-
- 09 Feb, 2009 7 commits
-
-
Guillaume Poirier authored
-
Guillaume Poirier authored
This will allow simplifying vectors loads that can only load 16-bytes aligned data (such as AltiVec).
-
Fiona Glaser authored
Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).
-
Fiona Glaser authored
Up to +0.04db with CAVLC, generally a lot less with CABAC.
-
Fiona Glaser authored
Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO. Up to 7% faster overall in extreme cases.
-
Fiona Glaser authored
-
Fiona Glaser authored
SSSE3 version of predict_8x8_hu SSE2 version of predict_8x8c_p SSSE3 versions of both planar prediction functions Optimizations to predict_16x16_p_sse2 Some unnecessary REP_RETs -> RETs. SSE2 version of predict_8x8_vr by Holger. SSE2 version of predict_8x8_hd. Don't compile MMX versions of some of the pred functions on x86_64. Remove now-useless x86_64 C versions of 4x4 pred functions. Rewrite some of the x86_64-only C functions in asm.
-
- 08 Feb, 2009 1 commit
-
-
Manuel Rommel authored
Also put width == 2 variant in its own scalar function because it's faster than a vectorized one.
-
- 04 Feb, 2009 2 commits
-
-
Holger Lubitz authored
Assembly versions of most remaining 4x4 and 8x8 intra pred functions. Assembly version of predict_8x8_filter. A few other optimizations. Primarily Core 2-optimized.
-
Guillaume Poirier authored
-
- 03 Feb, 2009 2 commits
-
-
Fiona Glaser authored
Integrate array_non_zero with the CAVLC 8x8dct interleave function. Roughly 1.5-2x faster than the original separate array_non_zero method.
-
Fiona Glaser authored
~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise. Allows a small optimization in i8x8 encoding.
-
- 01 Feb, 2009 1 commit
-
-
Guillaume Poirier authored
the variance computation epilogue since there won't be any overflow triggering an overflow. Suggested by Loren Merritt
-
- 30 Jan, 2009 1 commit
-
-
Fiona Glaser authored
Modify quantization to also calculate array_non_zero. PPC assembly changes by gpoirior. New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero. Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc. Also add new i16x16 DC-only iDCT with asm. Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well. Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around. Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25. Overall performance increase 0-6% depending on encoding settings.
-
- 29 Jan, 2009 2 commits
-
-
Guillaume Poirier authored
This isn't ideal since the `time base' register is running at a fraction of the processor cycle speed, so the measurement isn't as precise as x86's rdtsc. It's better than nothing though...
-
Brad Smith authored
-
- 28 Jan, 2009 3 commits
-
-
Loren Merritt authored
remove auto-reconfigure on svn update, which has done nothing since we stopped using svn. fix $AS on sparc (was disabled by mmx check). fix --extra-asflags (was ignored). mark bash scripts as bash, not sh patch partly by Greg Robinson and Jugdish.
-
Loren Merritt authored
60KB smaller binary.
-
Fiona Glaser authored
pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads. This isn't a final fix, but should resolve the problem in most cases in the meantime.
-
- 27 Jan, 2009 1 commit
-
-
Fiona Glaser authored
~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only. Small optimization in cache_save (skip_bp) Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future) Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
-
- 26 Jan, 2009 1 commit
-
-
Guillaume Poirier authored
-
- 23 Jan, 2009 2 commits
-
-
Guillaume Poirier authored
-
Guillaume Poirier authored
-
- 20 Jan, 2009 2 commits
-
-
Guillaume Poirier authored
Suggested by Loren.
-
Fiona Glaser authored
The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding. As this option was basically useless, it was a waste of code and prevented some other useful optimizations. Remove some unused mc code related to sub-8x8 partitions. Small deblocking speedup when p4x4 is used. Also remove unused x264_nal_decode prototype from x264.h.
-
- 19 Jan, 2009 1 commit
-
-
Brad Smith authored
-
- 18 Jan, 2009 2 commits
-
-
Guillaume Poirier authored
-
Fiona Glaser authored
And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames. Also a small tweak to coeff_level_run asm.
-
- 17 Jan, 2009 1 commit
-
-
Brad Smith authored
OS such as Linux but instead looks for HAVE_ALTIVEC_H being set. Fixes all *BSD/PowerPC builds.
-
- 14 Jan, 2009 6 commits
-
-
Guillaume Poirier authored
It changed in commit 045ae4045a1827555b3eaab4fbf3c9809e98c58f (factorization of mallocs) (NB: Altivec implementation wasn't allocating and writing to any scratch memory.)
-
Guillaume Poirier authored
-
Guillaume Poirier authored
-
Fiona Glaser authored
New MV costs should improve quality slightly by improving the smoothness of the field of MV costs (and they're closer to CABAC's actual costs). Despite being optimized for CABAC, they still help under CAVLC, albeit less. MV cost change by Loren Merritt
-
Fiona Glaser authored
This allows an input qpfile to be used to force I-frames, for example. The same can be done through the library interface. Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h Note that forcing B-frames and B-refs may not always have the intended result. Patch partially by Steven Walters <kemuri9@gmail.com>.
-
Fiona Glaser authored
Only one IDIV is left in macroblock level code (transform_rd)
-
- 08 Jan, 2009 1 commit
-
-
Fiona Glaser authored
With some combinations of video width and other settings, the scratch buffer was slightly too small. This caused heap corruption on some systems. Also prevent merange from being raised during encoding with esa/tesa through encoder_reconfig, as this no longer works.
-
- 06 Jan, 2009 1 commit
-
-
Fiona Glaser authored
They hurt compression anyways, and direct auto was bugged with lossless.
-