1. 26 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      VLC table optimizations · 131d066e
      Fiona Glaser authored
      Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc.
      Also a small optimization in p8x8 CAVLC.
  2. 25 Dec, 2008 1 commit
  3. 24 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Optimize variance asm + minor changes · 9fe6e5e6
      Fiona Glaser authored
      Remove SAD argument from var, not needed anymore.
      Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
      Eliminate all remaining warnings on gcc 3.4 on cygwin
      Port another minor optimization from lavc (pskip)
  4. 23 Dec, 2008 1 commit
  5. 22 Dec, 2008 1 commit
  6. 16 Dec, 2008 1 commit
  7. 15 Dec, 2008 2 commits
  8. 14 Dec, 2008 1 commit
  9. 13 Dec, 2008 1 commit
  10. 11 Dec, 2008 2 commits
    • Loren Merritt's avatar
      use lookup tables instead of actual exp/pow for AQ · 6abf5d67
      Loren Merritt authored
      Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change).
      Add x264_clz function as part of the LUT system: this may be useful later.
      Note this changes output somewhat as the numbers from the lookup table are not exact.
    • Fiona Glaser's avatar
      Much faster CAVLC residual coding · 99448f6c
      Fiona Glaser authored
      Use a VLC table for common levelcodes instead of constructing them on-the-spot
      Branchless version of i_trailing calculation (2x faster on Nehalem)
      Completely remove array_non_zero_count and instead use the count calculated in level/run coding.  Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
  11. 05 Dec, 2008 1 commit
  12. 29 Nov, 2008 1 commit
  13. 28 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Significantly faster CABAC and CAVLC residual coding and bit cost calculation · c1d73389
      Fiona Glaser authored
      Early-terminate in residual writing using stored nnz counts
      To allow the above, store nnz counts for luma and chroma DC
      Add assembly functions to find the last nonzero coefficient in a block
      Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
      Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
      CAVLC output should be equivalent.
  14. 27 Nov, 2008 2 commits
  15. 26 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Remove nasm support · c5c0a7fd
      Fiona Glaser authored
      Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support.
      Users should upgrade to yasm 0.6.1 or later.
  16. 25 Nov, 2008 4 commits
  17. 23 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Phenom CPU optimizations · 80ea99c0
      Fiona Glaser authored
      Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
      Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
      Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
      Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
      Merge cpu-32.asm and cpu-64.asm
      Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
  18. 21 Nov, 2008 1 commit
  19. 13 Nov, 2008 1 commit
  20. 11 Nov, 2008 1 commit
  21. 10 Nov, 2008 3 commits
    • Fiona Glaser's avatar
      Fix minor memory leak in r1022 · ebe1103b
      Fiona Glaser authored
    • Fiona Glaser's avatar
      Faster chroma encoding · be121180
      Fiona Glaser authored
      9-12% faster chroma encode.
      Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks.
    • Fiona Glaser's avatar
      Various cosmetics and minor fixes · ae51235d
      Fiona Glaser authored
      Disable hadamard_ac sse2/ssse3 under stack_mod4
      Fix one MSVC compilation warning
      Fix compilation in debug mode in certain cases on x64
      Remove eval.c from MSVC project
      Fix crash when VBV is used in CQP mode
      Patches by MasterNobody
  22. 09 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Faster b-adapt + adaptive quantization · 0c841de6
      Fiona Glaser authored
      Factor out pow to be only called once per macroblock.  Speeds up b-adapt, especially b-adapt 2, considerably.
      Speed boost is as high as 24% with b-adapt 2 + b-frames 16.
  23. 05 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Initial Nehalem CPU optimizations · 1bf7228f
      Fiona Glaser authored
      movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
      Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
      Thanks to Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
      Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%.
  24. 30 Oct, 2008 1 commit
  25. 27 Oct, 2008 1 commit
    • Fiona Glaser's avatar
      Optimize CABAC bit cost calculation · e09f55cc
      Fiona Glaser authored
      Speed up cabac mvd and add new precalculated transition/entropy table.
      Add "noup" function for cabac operations to not update the state table when it isn't necessary.
      1-3% faster macroblock_size_cabac.
  26. 23 Oct, 2008 1 commit
  27. 22 Oct, 2008 2 commits
  28. 02 Oct, 2008 2 commits
  29. 29 Sep, 2008 1 commit
  30. 28 Sep, 2008 1 commit
    • Fiona Glaser's avatar
      Replace High 4:4:4 profile lossless with High 4:4:4 Predictive. · a9e86d24
      Fiona Glaser authored
      This improves lossless compression by about 4-25% depending on source.
      The benefit is generally higher for intra-only compression.
      Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly.
      In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally.
      Note that 8x8dct is only available with CABAC as it is never useful with CAVLC.
      High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard.
      The only known compliant decoder for this profile is the latest version of CoreAVC.
      As I write this, JM does not actually correctly decode this profile.
      Hopefully this lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder.