1. 17 Jul, 2019 2 commits
  2. 06 Mar, 2019 1 commit
  3. 17 Jan, 2018 1 commit
  4. 21 May, 2017 1 commit
    • Henrik Gramner's avatar
      x86: AVX-512 support · 472ce364
      Henrik Gramner authored
      AVX-512 consists of a plethora of different extensions, but in order to keep
      things a bit more manageable we group together the following extensions
      under a single baseline cpu flag which should cover SKL-X and future CPUs:
       * AVX-512 Foundation (F)
       * AVX-512 Conflict Detection Instructions (CD)
       * AVX-512 Byte and Word Instructions (BW)
       * AVX-512 Doubleword and Quadword Instructions (DQ)
       * AVX-512 Vector Length Extensions (VL)
      
      On x86-64 AVX-512 provides 16 additional vector registers, prefer using
      those over existing ones since it allows us to avoid using `vzeroupper`
      unless more than 16 vector registers are required. They also happen to
      be volatile on Windows which means that we don't need to save and restore
      existing xmm register contents unless more than 22 vector registers are
      required.
      
      Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
      we're breaking API by messing with the cpu flags since they weren't really
      used for anything.
      
      Big thanks to Intel for their support.
      472ce364
  5. 21 Jan, 2017 1 commit
  6. 16 Jan, 2016 1 commit
  7. 23 Feb, 2015 1 commit
  8. 20 Dec, 2014 1 commit
  9. 12 Mar, 2014 2 commits
  10. 08 Jan, 2014 1 commit
  11. 05 Jul, 2013 1 commit
    • Henrik Gramner's avatar
      x86: Remove X264_CPU_SSE_MISALIGN functions · ff41804e
      Henrik Gramner authored
      Prevents a crash if the misaligned exception mask bit is cleared for some reason.
      
      Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
      They also require modifying the MXCSR control register and by removing those functions
      we can get rid of that complexity altogether.
      
      VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
      implementations of all removed functions but there were no performance improvements on
      Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
      so I kept them and added some minor cosmetics fixes and tweaks.
      ff41804e
  12. 20 May, 2013 1 commit
  13. 09 Jan, 2013 1 commit
  14. 12 Nov, 2012 1 commit
  15. 04 Feb, 2012 1 commit
  16. 15 Jun, 2011 1 commit
  17. 12 Apr, 2011 1 commit
  18. 24 Mar, 2011 1 commit
  19. 25 Jan, 2011 1 commit
  20. 18 Sep, 2010 1 commit
    • Fiona Glaser's avatar
      Update source file headers · 213a99d0
      Fiona Glaser authored
      Update dates, improve file descriptions, make things more consistent.
      Also add information about commercial licensing.
      213a99d0
  21. 09 Jun, 2010 1 commit
  22. 06 May, 2010 1 commit
    • Fiona Glaser's avatar
      Deduplicate asm constants, automate name prefixing · 311c4bb1
      Fiona Glaser authored
      Auto-prefix global constants with x264_ in cextern.
      Eliminate x264_ prefix from asm files; automate it in cglobal.
      Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm).
      Remove x264_emms() entirely on non-x86 (don't even call an empty function).
      Add cextern_naked for a non-prefixed cextern (used in checkasm).
      311c4bb1
  23. 02 Sep, 2009 1 commit
    • Steven Walters's avatar
      Threaded lookahead · 6940dcae
      Steven Walters authored
      Move lookahead into a separate thread, set to higher priority than the other threads, for optimal performance.
      Reduces the amount that lookahead bottlenecks encoding, greatly increasing performance with lookahead-intensive settings (e.g. b-adapt 2) on many-core CPUs.
      Buffer size can be controlled with --sync-lookahead, which defaults to auto (threads+bframes buffer size).
      Note that this buffer is separate from the rc-lookahead value.
      Note also that this does not split lookahead itself into multiple threads yet; this may be added in the future.
      Additionally, split frames into "fdec" and "fenc" frame types and keep the two separate.
      This split greatly reduces memory usage, which helps compensate for the larger lookahead size.
      Extremely special thanks to Michael Kazmier and Alex Giladi of Avail Media, the original authors of this patch.
      6940dcae
  24. 31 Dec, 2008 1 commit
  25. 25 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Faster width4 SSD+SATD, SSE4 optimizations · 69e69197
      Fiona Glaser authored
      Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
      Use pinsrd (SSE4) for faster width4 SSD
      Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
      Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
      These optimizations help on Nehalem, Phenom, and Penryn CPUs.
      69e69197
  26. 04 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Update file headers throughout x264 · bdbd4fe7
      Fiona Glaser authored
      Update "Authors" lists based on actual authorship; highest is most important
      Update copyright notices and remove old CVS tags from file headers
      Add file headers to GTK and other sections missing them
      Update FSF address
      Other header-related cosmetics
      bdbd4fe7
  27. 29 Jun, 2008 1 commit
  28. 08 Jun, 2008 1 commit
    • Loren Merritt's avatar
      many changes to which asm functions are enabled on which cpus. · c0c0e1f4
      Loren Merritt authored
      with Phenom, 3dnow is no longer equivalent to "sse2 is slow", so make a new flag for that.
      some sse2 functions are useful only on Core2 and Phenom, so make a "sse2 is fast" flag for that.
      some ssse3 instructions didn't become useful until Penryn, so yet another flag.
      disable sse2 completely on Pentium M and Core1, because it's uniformly slower than mmx.
      enable some sse2 functions on Athlon64 that always were faster and we just didn't notice.
      remove mc_luma_sse3, because the only cpu that has lddqu (namely Pentium 4D) doesn't have "sse2 is fast".
      don't print mmx1, sse1, nor 3dnow in the detected cpuflags, since we don't really have any such functions. likewise don't print sse3 unless it's used (Pentium 4D).
      c0c0e1f4
  29. 27 Apr, 2008 1 commit
  30. 24 Apr, 2008 1 commit
  31. 21 Apr, 2008 1 commit
  32. 16 Jun, 2007 1 commit
  33. 06 Apr, 2007 1 commit
  34. 01 Aug, 2006 1 commit
  35. 14 Dec, 2004 1 commit
  36. 03 Jun, 2004 1 commit