- 09 Feb, 2009 1 commit
-
-
Fiona Glaser authored
Up to +0.04db with CAVLC, generally a lot less with CABAC.
-
- 03 Feb, 2009 1 commit
-
-
Fiona Glaser authored
~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise. Allows a small optimization in i8x8 encoding.
-
- 30 Jan, 2009 1 commit
-
-
Fiona Glaser authored
Modify quantization to also calculate array_non_zero. PPC assembly changes by gpoirior. New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero. Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc. Also add new i16x16 DC-only iDCT with asm. Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well. Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around. Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25. Overall performance increase 0-6% depending on encoding settings.
-
- 27 Jan, 2009 1 commit
-
-
Fiona Glaser authored
~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only. Small optimization in cache_save (skip_bp) Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future) Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
-
- 11 Dec, 2008 2 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
Use a VLC table for common levelcodes instead of constructing them on-the-spot Branchless version of i_trailing calculation (2x faster on Nehalem) Completely remove array_non_zero_count and instead use the count calculated in level/run coding. Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
-
- 28 Nov, 2008 1 commit
-
-
Fiona Glaser authored
Early-terminate in residual writing using stored nnz counts To allow the above, store nnz counts for luma and chroma DC Add assembly functions to find the last nonzero coefficient in a block Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful. CAVLC output should be equivalent.
-
- 27 Nov, 2008 1 commit
-
-
Fiona Glaser authored
About 3.5x faster DC dequant on Conroe
-
- 10 Nov, 2008 2 commits
-
-
Fiona Glaser authored
9-12% faster chroma encode. Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks.
-
Fiona Glaser authored
Disable hadamard_ac sse2/ssse3 under stack_mod4 Fix one MSVC compilation warning Fix compilation in debug mode in certain cases on x64 Remove eval.c from MSVC project Fix crash when VBV is used in CQP mode Patches by MasterNobody
-
- 22 Oct, 2008 2 commits
-
-
Fiona Glaser authored
Improves quality when using p8x4/p4x8/p4x4 subpartitions Benefit is proportional to how many sub-8x8 partitions are used; helps most at high bitrates and low resolutions.
-
Fiona Glaser authored
3-7x faster decimation, 1-3% faster overall
-
- 18 Oct, 2008 2 commits
-
-
Fiona Glaser authored
Slightly improves compression.
-
Fiona Glaser authored
Small speed loss in trellis 1, slightly larger in trellis 2, but significant quality improvement.
-
- 28 Sep, 2008 1 commit
-
-
Fiona Glaser authored
This improves lossless compression by about 4-25% depending on source. The benefit is generally higher for intra-only compression. Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly. In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally. Note that 8x8dct is only available with CABAC as it is never useful with CAVLC. High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard. The only known compliant decoder for this profile is the latest version of CoreAVC. As I write this, JM does not actually correctly decode this profile. Hopefully this lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder.
-
- 15 Sep, 2008 1 commit
-
-
Fiona Glaser authored
The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary. Default subme is raised to 6 so that psy RD is on by default.
-
- 27 Aug, 2008 1 commit
-
-
Fiona Glaser authored
Also clean up macroblock.c with some refactoring Note that this change significantly reduces subme7+trellis2 performance, but improves quality. Issue originally reported by Alex_W.
-
- 21 Aug, 2008 2 commits
-
-
Loic Le Loarer authored
-
Loren Merritt authored
-
- 09 Aug, 2008 1 commit
-
-
Fiona Glaser authored
-
- 30 Jul, 2008 1 commit
-
-
Fiona Glaser authored
set the chroma DC coefficients to zero for residual coding in qpel-rd fix C99ism
-
- 16 Jul, 2008 1 commit
-
-
Anton Mitrofanov authored
-
- 06 Jul, 2008 1 commit
-
-
Fiona Glaser authored
Update AUTHORS file with Gabriel and me update XCHG macro to work correctly in if statements Add new lookup tables for block_idx and fdec/fenc addresses Slightly faster array_non_zero_count_mmx (patch by holger) Eliminate branch in analyse_intra Unroll loops in and clean up chroma encode Convert some for loops to do/while loops for speed improvement Do explicit write-combining on --me tesa mvsad_t struct Shrink --me esa zero[] array Speed up bime by reducing size of visited[][][] array
-
- 04 Jul, 2008 1 commit
-
-
Fiona Glaser authored
Update "Authors" lists based on actual authorship; highest is most important Update copyright notices and remove old CVS tags from file headers Add file headers to GTK and other sections missing them Update FSF address Other header-related cosmetics
-
- 03 Jul, 2008 1 commit
-
-
Fiona Glaser authored
-
- 02 Jul, 2008 1 commit
-
-
Fiona Glaser authored
If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct. Not useful for larger sizes because the odds of an empty block are much lower. Cosmetics in i16x16 to be more consistent with other similar functions. Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis. Rename lambda arrays to lambda_tab for consistency.
-
- 24 Jun, 2008 1 commit
-
-
Fiona Glaser authored
Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining. More use of write-combining throughout load/save code in common/macroblock.c GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance. More manual loop unrolling and such. Move all packXtoY functions to macroblock.h so any function can use them. Add pack8to32. Minor optimizations to encoder/macroblock.c
-
- 03 Jun, 2008 1 commit
-
-
Fiona Glaser authored
-
- 17 May, 2008 3 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
and a few other minor optimizations
-
Fiona Glaser authored
-
- 14 May, 2008 1 commit
-
-
Fiona Glaser authored
-
- 27 Apr, 2008 2 commits
-
-
Fiona Glaser authored
move some nnz counts from macroblock_encode to cavlc if cabac doesn't need them
-
Fiona Glaser authored
-
- 30 Mar, 2008 1 commit
-
-
Loren Merritt authored
-
- 22 Mar, 2008 2 commits
-
-
Loren Merritt authored
-
Loren Merritt authored
-
- 20 Mar, 2008 1 commit
-
-
Loren Merritt authored
-
- 19 Mar, 2008 1 commit
-
-
Fiona Glaser authored
large speedup with trellis=2, small speedup with trellis=0 and/or subme>=6
-
- 03 Mar, 2008 1 commit
-
-
Loren Merritt authored
patch by Alexander Strange.
-