1. 04 Feb, 2012 3 commits
  2. 15 Jan, 2012 1 commit
  3. 22 Oct, 2011 1 commit
  4. 21 Sep, 2011 2 commits
  5. 10 Aug, 2011 2 commits
  6. 09 Aug, 2011 3 commits
    • Loren Merritt's avatar
      asm cosmetics part 2 · 1921c682
      Loren Merritt authored
      These changes were split out of the cpuflags commit because they change the output executable.
      1921c682
    • Loren Merritt's avatar
      asm cosmetics: INIT_MMX/XMM/YMM now support a cpuflags argument · f85be1cd
      Loren Merritt authored
      Reduces the number of macro args that need to be passed around.
      Allows multiple implementations of a given macro (e.g. PALIGNR) to check
      cpuflags at the location where the macro is defined, instead of having
      to select implementations by %define at toplevel.
      Remove INIT_AVX, as it's replaced by "INIT_XMM avx".
      
      This commit does not change the stripped executable.
      f85be1cd
    • Loren Merritt's avatar
      Cosmetics: s/mmxext/mmx2/ · 189c30d3
      Loren Merritt authored
      189c30d3
  7. 18 Feb, 2011 1 commit
  8. 25 Jan, 2011 2 commits
  9. 10 Jan, 2011 1 commit
  10. 14 Dec, 2010 1 commit
  11. 19 Nov, 2010 3 commits
  12. 10 Oct, 2010 1 commit
  13. 18 Sep, 2010 1 commit
    • Fiona Glaser's avatar
      Update source file headers · 213a99d0
      Fiona Glaser authored
      Update dates, improve file descriptions, make things more consistent.
      Also add information about commercial licensing.
      213a99d0
  14. 15 Jul, 2010 1 commit
    • Loren Merritt's avatar
      Convert x264 to use NV12 pixel format internally · 387828ed
      Loren Merritt authored
      ~1% faster overall on Conroe, mostly due to improved cache locality.
      Also allows improved SIMD on some chroma functions (e.g. deblock).
      This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
      This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.
      
      Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
      387828ed
  15. 06 May, 2010 1 commit
    • Fiona Glaser's avatar
      Deduplicate asm constants, automate name prefixing · 311c4bb1
      Fiona Glaser authored
      Auto-prefix global constants with x264_ in cextern.
      Eliminate x264_ prefix from asm files; automate it in cglobal.
      Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm).
      Remove x264_emms() entirely on non-x86 (don't even call an empty function).
      Add cextern_naked for a non-prefixed cextern (used in checkasm).
      311c4bb1
  16. 27 Mar, 2010 1 commit
  17. 15 Feb, 2010 1 commit
  18. 09 Nov, 2009 2 commits
    • Loren Merritt's avatar
      cosmetics · df732ec7
      Loren Merritt authored
      df732ec7
    • Dylan Yudaken's avatar
      Weighted P-frame prediction · ccac8546
      Dylan Yudaken authored
      Merge Dylan's Google Summer of Code 2009 tree.
      Detect fades and use weighted prediction to improve compression and quality.
      "Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
      "Smart", the default mode, also performs fade detection and decides weights accordingly.
      MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
      If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
      However, it will be used to adjust quality instead of create actual weights.
      This will improve quality in fades when encoding in Baseline profile.
      
      Doesn't add support for interlaced encoding with weightp yet.
      Only adds support for luma weights, not chroma weights.
      Internal code for chroma weights is in, but there's no analysis yet.
      Baseline profile requires that weightp be off.
      All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
      "Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.
      
      Thanks to Google for sponsoring our most successful Summer of Code yet!
      ccac8546
  19. 29 Oct, 2009 1 commit
  20. 26 Jul, 2009 1 commit
  21. 17 Jul, 2009 1 commit
  22. 11 Feb, 2009 1 commit
  23. 30 Dec, 2008 1 commit
  24. 29 Nov, 2008 1 commit
  25. 25 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Faster width4 SSD+SATD, SSE4 optimizations · 69e69197
      Fiona Glaser authored
      Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
      Use pinsrd (SSE4) for faster width4 SSD
      Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
      Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
      These optimizations help on Nehalem, Phenom, and Penryn CPUs.
      69e69197
  26. 23 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Phenom CPU optimizations · 80ea99c0
      Fiona Glaser authored
      Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
      Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
      Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
      Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
      Merge cpu-32.asm and cpu-64.asm
      Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
      80ea99c0
  27. 28 Sep, 2008 1 commit
  28. 20 Sep, 2008 1 commit
  29. 19 Sep, 2008 1 commit
  30. 05 Sep, 2008 1 commit