1. 09 Feb, 2009 4 commits
    • Fiona Glaser's avatar
      Add decimation in i16x16 blocks · 6dc8b9ad
      Fiona Glaser authored
      Up to +0.04db with CAVLC, generally a lot less with CABAC.
      6dc8b9ad
    • Fiona Glaser's avatar
      Much faster CABAC residual context selection · c656d68f
      Fiona Glaser authored
      Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO.
      Up to 7% faster overall in extreme cases.
      c656d68f
    • Fiona Glaser's avatar
      Faster coeff_last64 on 32-bit · 5a7a1d14
      Fiona Glaser authored
      5a7a1d14
    • Fiona Glaser's avatar
      More intra pred asm optimizations · 0743869d
      Fiona Glaser authored
      SSSE3 version of predict_8x8_hu
      SSE2 version of predict_8x8c_p
      SSSE3 versions of both planar prediction functions
      Optimizations to predict_16x16_p_sse2
      Some unnecessary REP_RETs -> RETs.
      SSE2 version of predict_8x8_vr by Holger.
      SSE2 version of predict_8x8_hd.
      Don't compile MMX versions of some of the pred functions on x86_64.
      Remove now-useless x86_64 C versions of 4x4 pred functions.
      Rewrite some of the x86_64-only C functions in asm.
      0743869d
  2. 08 Feb, 2009 1 commit
  3. 04 Feb, 2009 2 commits
  4. 03 Feb, 2009 2 commits
  5. 01 Feb, 2009 1 commit
  6. 30 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Massive overhaul of nnz/cbp calculation · e394bd60
      Fiona Glaser authored
      Modify quantization to also calculate array_non_zero.
      PPC assembly changes by gpoirior.
      New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
      Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
      Also add new i16x16 DC-only iDCT with asm.
      Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
      Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
      Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
      Overall performance increase 0-6% depending on encoding settings.
      e394bd60
  7. 29 Jan, 2009 2 commits
  8. 28 Jan, 2009 3 commits
  9. 27 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Much faster chroma encoding and other opts · 83d805fe
      Fiona Glaser authored
      ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
      Small optimization in cache_save (skip_bp)
      Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
      Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
      83d805fe
  10. 26 Jan, 2009 1 commit
  11. 23 Jan, 2009 2 commits
  12. 20 Jan, 2009 2 commits
  13. 19 Jan, 2009 1 commit
  14. 18 Jan, 2009 2 commits
  15. 17 Jan, 2009 1 commit
  16. 14 Jan, 2009 6 commits
  17. 08 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Fix regression in r1066 · d7d1d37f
      Fiona Glaser authored
      With some combinations of video width and other settings, the scratch buffer was slightly too small.
      This caused heap corruption on some systems.
      Also prevent merange from being raised during encoding with esa/tesa through encoder_reconfig, as this no longer works.
      d7d1d37f
  18. 06 Jan, 2009 1 commit
  19. 05 Jan, 2009 2 commits
  20. 02 Jan, 2009 1 commit
  21. 31 Dec, 2008 3 commits