1. 31 Dec, 2008 1 commit
  2. 30 Dec, 2008 2 commits
  3. 28 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Much faster CABAC RDO · 406a40dc
      Fiona Glaser authored
      Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO.
      This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts.
      However, the PSNR penalty of this is extremely small (~0.001db).
      Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20.
      Overall encoding speed benefit is up to 5%, depending on encoding settings.
      Also remove an old unnecessary CABAC table that hasn't been used for years.
      406a40dc
  4. 26 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      VLC table optimizations · 131d066e
      Fiona Glaser authored
      Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc.
      Also a small optimization in p8x8 CAVLC.
      131d066e
  5. 25 Dec, 2008 1 commit
  6. 24 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Optimize variance asm + minor changes · 9fe6e5e6
      Fiona Glaser authored
      Remove SAD argument from var, not needed anymore.
      Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
      Eliminate all remaining warnings on gcc 3.4 on cygwin
      Port another minor optimization from lavc (pskip)
      9fe6e5e6
  7. 23 Dec, 2008 1 commit
  8. 22 Dec, 2008 1 commit
  9. 16 Dec, 2008 1 commit
  10. 15 Dec, 2008 2 commits
  11. 14 Dec, 2008 1 commit
  12. 13 Dec, 2008 1 commit
  13. 12 Dec, 2008 1 commit
  14. 11 Dec, 2008 4 commits
  15. 05 Dec, 2008 1 commit
  16. 30 Nov, 2008 1 commit
  17. 29 Nov, 2008 3 commits
  18. 28 Nov, 2008 2 commits
    • Fiona Glaser's avatar
      10L in r1041 · df72b08c
      Fiona Glaser authored
      df72b08c
    • Fiona Glaser's avatar
      Significantly faster CABAC and CAVLC residual coding and bit cost calculation · c1d73389
      Fiona Glaser authored
      Early-terminate in residual writing using stored nnz counts
      To allow the above, store nnz counts for luma and chroma DC
      Add assembly functions to find the last nonzero coefficient in a block
      Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
      Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
      CAVLC output should be equivalent.
      c1d73389
  19. 27 Nov, 2008 2 commits
  20. 26 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Remove nasm support · c5c0a7fd
      Fiona Glaser authored
      Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support.
      Users should upgrade to yasm 0.6.1 or later.
      c5c0a7fd
  21. 25 Nov, 2008 7 commits
  22. 23 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Phenom CPU optimizations · 80ea99c0
      Fiona Glaser authored
      Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
      Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
      Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
      Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
      Merge cpu-32.asm and cpu-64.asm
      Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
      80ea99c0
  23. 21 Nov, 2008 1 commit
  24. 13 Nov, 2008 1 commit
  25. 11 Nov, 2008 1 commit