1. 26 May, 2010 1 commit
    • Fiona Glaser's avatar
      Detect Atom CPU, enable appropriate asm functions · 57729402
      Fiona Glaser authored
      I'm not going to actually optimize for this pile of garbage unless someone pays me.
      But it can't hurt to at least enable the correct functions based on benchmarks.
      
      Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf.
      57729402
  2. 06 May, 2010 2 commits
    • Anton Mitrofanov's avatar
      More cosmetics · 54e784fd
      Anton Mitrofanov authored
      54e784fd
    • Fiona Glaser's avatar
      Deduplicate asm constants, automate name prefixing · 311c4bb1
      Fiona Glaser authored
      Auto-prefix global constants with x264_ in cextern.
      Eliminate x264_ prefix from asm files; automate it in cglobal.
      Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm).
      Remove x264_emms() entirely on non-x86 (don't even call an empty function).
      Add cextern_naked for a non-prefixed cextern (used in checkasm).
      311c4bb1
  3. 05 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Massive cosmetic and syntax cleanup · 58d2349d
      Fiona Glaser authored
      Convert all applicable loops to use C99 loop index syntax.
      Clean up most inconsistent syntax in ratecontrol.c, visualize, ppc, etc.
      Replace log(x)/log(2) constructs with log2, and similar with log10.
      Fix all -Wshadow violations.
      Fix visualize support.
      58d2349d
  4. 27 Mar, 2010 1 commit
  5. 30 Jan, 2010 1 commit
  6. 21 Jan, 2010 1 commit
  7. 20 Aug, 2009 1 commit
  8. 17 Mar, 2009 1 commit
    • Fiona Glaser's avatar
      SSE2 zigzag_interleave · d25d50c9
      Fiona Glaser authored
      Replace PHADD with FastShuffle (more accurate naming).
      This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
      d25d50c9
  9. 19 Jan, 2009 1 commit
  10. 31 Dec, 2008 1 commit
  11. 29 Nov, 2008 1 commit
  12. 25 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Faster width4 SSD+SATD, SSE4 optimizations · 69e69197
      Fiona Glaser authored
      Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
      Use pinsrd (SSE4) for faster width4 SSD
      Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
      Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
      These optimizations help on Nehalem, Phenom, and Penryn CPUs.
      69e69197
  13. 23 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Phenom CPU optimizations · 80ea99c0
      Fiona Glaser authored
      Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
      Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
      Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
      Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
      Merge cpu-32.asm and cpu-64.asm
      Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
      80ea99c0
  14. 05 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Initial Nehalem CPU optimizations · 1bf7228f
      Fiona Glaser authored
      movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
      Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
      Thanks to Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
      Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%.
      1bf7228f
  15. 04 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Update file headers throughout x264 · bdbd4fe7
      Fiona Glaser authored
      Update "Authors" lists based on actual authorship; highest is most important
      Update copyright notices and remove old CVS tags from file headers
      Add file headers to GTK and other sections missing them
      Update FSF address
      Other header-related cosmetics
      bdbd4fe7
  16. 29 Jun, 2008 1 commit
  17. 08 Jun, 2008 2 commits
    • Loren Merritt's avatar
      many changes to which asm functions are enabled on which cpus. · c0c0e1f4
      Loren Merritt authored
      with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
      some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
      some ssse3 instructions didn't become useful until Penryn, so yet another flag.
      disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
      enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
      remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast".
      don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).
      c0c0e1f4
    • Loren Merritt's avatar
      enable ssse3 phadd satd on Penryn. · f9ad5ee2
      Loren Merritt authored
      f9ad5ee2
  18. 27 Apr, 2008 2 commits
  19. 21 Apr, 2008 1 commit
  20. 16 Mar, 2008 1 commit
  21. 14 Jan, 2008 1 commit
  22. 03 Jan, 2008 1 commit
  23. 20 Nov, 2007 1 commit
  24. 17 Jul, 2007 1 commit
  25. 06 Jul, 2007 1 commit
  26. 05 Jun, 2007 1 commit
  27. 14 Mar, 2007 3 commits
  28. 16 Dec, 2006 1 commit
  29. 12 Dec, 2006 1 commit
  30. 02 Aug, 2006 1 commit
  31. 01 Aug, 2006 2 commits
  32. 17 Jan, 2006 1 commit
  33. 15 Jan, 2006 1 commit
  34. 14 Jan, 2006 1 commit