- 30 Dec, 2008 1 commit
-
-
Loren Merritt authored
Globally define t#[qdwb], so that only t# needs to be locally defined when reorganizing registers
-
- 28 Dec, 2008 1 commit
-
-
Fiona Glaser authored
Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO. This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts. However, the PSNR penalty of this is extremely small (~0.001db). Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20. Overall encoding speed benefit is up to 5%, depending on encoding settings. Also remove an old unnecessary CABAC table that hasn't been used for years.
-
- 26 Dec, 2008 1 commit
-
-
Fiona Glaser authored
Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc. Also a small optimization in p8x8 CAVLC.
-
- 25 Dec, 2008 1 commit
-
-
Loren Merritt authored
Also suppress the last mingw warning message
-
- 24 Dec, 2008 1 commit
-
-
Fiona Glaser authored
Remove SAD argument from var, not needed anymore. Speed up var asm a bit by eliminating psadbw and instead HADDWing at end. Eliminate all remaining warnings on gcc 3.4 on cygwin Port another minor optimization from lavc (pskip)
-
- 23 Dec, 2008 1 commit
-
-
Fiona Glaser authored
Merge the two list tables to allow cleaner MC/CABAC/CAVLC code Remove lots of unnecessary {s Port some very minor opts from lavc
-
- 22 Dec, 2008 1 commit
-
-
Loren Merritt authored
reduce memory if using ESA and not p4x4
-
- 16 Dec, 2008 1 commit
-
-
Fiona Glaser authored
Patch partially by Loren Merritt
-
- 15 Dec, 2008 2 commits
-
-
Fiona Glaser authored
Explicit loop unrolling
-
Fiona Glaser authored
Add some early terminations and minor optimizations This change may also fix the extremely rare direct+threading MV bug.
-
- 14 Dec, 2008 1 commit
-
-
David Wolstencroft authored
The previous Altivec implemention of mc_chroma assumed that i_src_stride was always mod 16.
-
- 13 Dec, 2008 1 commit
-
-
Guillaume Poirier authored
So far, only Apple GCC version was supported.
-
- 12 Dec, 2008 1 commit
-
-
Fiona Glaser authored
Slightly better quality, especially in non-RD mode, with CAVLC.
-
- 11 Dec, 2008 4 commits
-
-
Loren Merritt authored
Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change). Add x264_clz function as part of the LUT system: this may be useful later. Note this changes output somewhat as the numbers from the lookup table are not exact.
-
Fiona Glaser authored
-
Fiona Glaser authored
-
Fiona Glaser authored
Use a VLC table for common levelcodes instead of constructing them on-the-spot Branchless version of i_trailing calculation (2x faster on Nehalem) Completely remove array_non_zero_count and instead use the count calculated in level/run coding. Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
-
- 05 Dec, 2008 1 commit
-
-
Guillaume Poirier authored
-
- 30 Nov, 2008 1 commit
-
-
Fiona Glaser authored
Correct level detection to take this into account.
-
- 29 Nov, 2008 3 commits
-
-
Anton Mitrofanov authored
-
Fiona Glaser authored
-
Loren Merritt authored
-
- 28 Nov, 2008 2 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
Early-terminate in residual writing using stored nnz counts To allow the above, store nnz counts for luma and chroma DC Add assembly functions to find the last nonzero coefficient in a block Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful. CAVLC output should be equivalent.
-
- 27 Nov, 2008 2 commits
-
-
Fiona Glaser authored
About 3.5x faster DC dequant on Conroe
-
Loren Merritt authored
(unlikely to have occurred in any real video)
-
- 26 Nov, 2008 1 commit
-
-
Fiona Glaser authored
Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support. Users should upgrade to yasm 0.6.1 or later.
-
- 25 Nov, 2008 7 commits
-
-
Anton Mitrofanov authored
-
Anton Mitrofanov authored
Remove Release64 which never worked anyways.
-
Fiona Glaser authored
Do satd 4x8 by transposing the two blocks' positions and running satd 8x4. Use pinsrd (SSE4) for faster width4 SSD Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe) Move mask_misalign declaration to cpu.h to avoid warning in encoder.c. These optimizations help on Nehalem, Phenom, and Penryn CPUs.
-
Guillaume Poirier authored
-
David Wolstencroft authored
useless loads/stores and calculations of permutation vectors. Affected functions are all of mc_luma, mc_chroma, 'get_ref', SATD, SA8D and deblock. Gains globally vary from ~5% - 15% on a depending on settings running on a 1.42 ghz G4.
-
Loren Merritt authored
refactor sa8d. slightly faster. more checkasm for hadamard.
-
Fiona Glaser authored
Misalign mask needed to be set separately for each encoding thread.
-
- 23 Nov, 2008 1 commit
-
-
Fiona Glaser authored
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it). Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref. Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom. Merge cpu-32.asm and cpu-64.asm Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
-
- 21 Nov, 2008 1 commit
-
-
Fiona Glaser authored
A little bit faster on both 32-bit and 64-bit
-
- 13 Nov, 2008 1 commit
-
-
Fiona Glaser authored
Helps a bit on Phenom as well ~25% faster width8 multiSAD on Nehalem
-
- 11 Nov, 2008 1 commit
-
-
Fiona Glaser authored
Only for experimental purposes and ultra-fast encoding. Probably not a good idea for firstpass.
-
- 10 Nov, 2008 2 commits
-
-
Fiona Glaser authored
-
Fiona Glaser authored
Remove idct/dct2x2 from checkasm as they are no longer in dctf
-