1. 09 Aug, 2009 1 commit
  2. 08 Aug, 2009 1 commit
  3. 17 Jul, 2009 1 commit
  4. 10 Jul, 2009 1 commit
  5. 03 Jul, 2009 1 commit
    • Fiona Glaser's avatar
      Early termination for chroma encoding · 205a032c
      Fiona Glaser authored
      Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
      This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
      Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8.  mmx/sse2/ssse3 versions of each.
      Early termination is disabled at very low QPs due to it not being useful there.
      Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
      Increase is greater with lower bitrates.
      205a032c
  6. 26 Jun, 2009 1 commit
    • David Conrad's avatar
      Fix bug in checkasm · 8a96d510
      David Conrad authored
      frame_init_lowres_core check didn't check the C plane.
      However, all x86 and PPC assembly was correct regardless of the unit test being incorrect.
      8a96d510
  7. 19 Jun, 2009 1 commit
  8. 31 Mar, 2009 1 commit
  9. 30 Mar, 2009 2 commits
  10. 17 Mar, 2009 1 commit
    • Fiona Glaser's avatar
      SSE2 zigzag_interleave · d25d50c9
      Fiona Glaser authored
      Replace PHADD with FastShuffle (more accurate naming).
      This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
      d25d50c9
  11. 07 Mar, 2009 1 commit
    • Holger Lubitz's avatar
      Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT · 54e38917
      Holger Lubitz authored
      Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
      16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
      Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
      Overall performance boost is up to ~15% on 64-bit Conroe.
      54e38917
  12. 11 Feb, 2009 1 commit
  13. 04 Feb, 2009 1 commit
  14. 03 Feb, 2009 1 commit
    • Fiona Glaser's avatar
      Faster 8x8dct+CAVLC interleave · ded3e28c
      Fiona Glaser authored
      Integrate array_non_zero with the CAVLC 8x8dct interleave function.
      Roughly 1.5-2x faster than the original separate array_non_zero method.
      ded3e28c
  15. 30 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Massive overhaul of nnz/cbp calculation · e394bd60
      Fiona Glaser authored
      Modify quantization to also calculate array_non_zero.
      PPC assembly changes by gpoirior.
      New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
      Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
      Also add new i16x16 DC-only iDCT with asm.
      Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
      Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
      Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
      Overall performance increase 0-6% depending on encoding settings.
      e394bd60
  16. 29 Jan, 2009 1 commit
  17. 27 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Much faster chroma encoding and other opts · 83d805fe
      Fiona Glaser authored
      ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
      Small optimization in cache_save (skip_bp)
      Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
      Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
      83d805fe
  18. 31 Dec, 2008 2 commits
  19. 30 Dec, 2008 1 commit
  20. 24 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Optimize variance asm + minor changes · 9fe6e5e6
      Fiona Glaser authored
      Remove SAD argument from var, not needed anymore.
      Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
      Eliminate all remaining warnings on gcc 3.4 on cygwin
      Port another minor optimization from lavc (pskip)
      9fe6e5e6
  21. 22 Dec, 2008 1 commit
  22. 28 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Significantly faster CABAC and CAVLC residual coding and bit cost calculation · c1d73389
      Fiona Glaser authored
      Early-terminate in residual writing using stored nnz counts
      To allow the above, store nnz counts for luma and chroma DC
      Add assembly functions to find the last nonzero coefficient in a block
      Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
      Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
      CAVLC output should be equivalent.
      c1d73389
  23. 27 Nov, 2008 2 commits
  24. 25 Nov, 2008 2 commits
    • Fiona Glaser's avatar
      Faster width4 SSD+SATD, SSE4 optimizations · 69e69197
      Fiona Glaser authored
      Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
      Use pinsrd (SSE4) for faster width4 SSD
      Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
      Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
      These optimizations help on Nehalem, Phenom, and Penryn CPUs.
      69e69197
    • Loren Merritt's avatar
      refactor satd. 20KB smaller binary. · e56a842d
      Loren Merritt authored
      refactor sa8d. slightly faster.
      more checkasm for hadamard.
      e56a842d
  25. 23 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Phenom CPU optimizations · 80ea99c0
      Fiona Glaser authored
      Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
      Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
      Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
      Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
      Merge cpu-32.asm and cpu-64.asm
      Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
      80ea99c0
  26. 21 Nov, 2008 1 commit
  27. 10 Nov, 2008 1 commit
  28. 23 Oct, 2008 1 commit
  29. 22 Oct, 2008 1 commit
  30. 03 Oct, 2008 1 commit
    • Loren Merritt's avatar
      rm gtk, avc2avi. · e21bc344
      Loren Merritt authored
      I don't remember why I allowed a gui into the repository in the first place. There's nothing that makes this one special relative to all the other x264 guis.
      avc2avi doesn't compile since we removed the bitstream reader. And avc doesn't belong in avi.
      e21bc344
  31. 28 Sep, 2008 1 commit
  32. 20 Sep, 2008 1 commit
  33. 19 Sep, 2008 1 commit
  34. 15 Sep, 2008 1 commit
  35. 05 Sep, 2008 2 commits