1. 23 Apr, 2013 3 commits
    • Fiona Glaser's avatar
      x86-64: cabac_block_residual assembly · a3f5c732
      Fiona Glaser authored
      RDO: ~20% faster than C
      Bitstream: ~50% faster than C
      1-2% faster overall, highest on preset superfast/fast/medium.
      a3f5c732
    • Steve Borho's avatar
      OpenCL lookahead · f49a1b2e
      Steve Borho authored
      OpenCL support is compiled in by default, but must be enabled at runtime by an
      --opencl command line flag. Compiling OpenCL support requires perl. To avoid
      the perl requirement use: configure --disable-opencl.
      
      When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
      device.  Lowres intra cost prediction, lowres motion search (including subpel)
      and bidir cost predictions are all done on the GPU.  MB-tree and final slice
      decisions are still done by the CPU.  Presets which do not use a threaded
      lookahead will not use OpenCL at all (superfast, ultrafast).
      
      Because of data dependencies, the GPU must use an iterative motion search which
      performs more total work than the CPU would do, so this is not work efficient
      or power efficient. But if there are spare GPU cycles to spare, it can often
      speed up the encode. Output quality when OpenCL lookahead is enabled is often
      very slightly worse in quality than the CPU quality (because of the same data
      dependencies).
      
      x264 must compile its OpenCL kernels for your device before running them, and in
      order to avoid doing this every run it caches the compiled kernel binary in a
      file named x264_lookahead.clbin (--opencl-clbin FNAME to override).  The cache
      file will be ignored if the device, driver, or OpenCL source are changed.
      
      x264 will use the first GPU device which supports the required cl_image
      features required by its kernels. Most modern discrete GPUs and all AMD
      integrated GPUs will work.  Intel integrated GPUs (up to IvyBridge) do not
      support those necessary features. Use --opencl-device N to specify a number of
      capable GPUs to skip during device detection.
      
      Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
      as some have bugs in their OpenCL drivers that cause output to be silently
      incorrect.
      
      Developed by MulticoreWare with support from AMD and Telestream.
      f49a1b2e
    • Fiona Glaser's avatar
      3cdaca1a
  2. 25 Feb, 2013 1 commit
    • Fiona Glaser's avatar
      x86: optimize and clean up predictor checking · 6371c3a5
      Fiona Glaser authored
      Branchlessly handle elimination of candidates in MMX roundclip asm.
      Add a new asm function, similar to roundclip, except without the round part.
      Optimize and organize the C code, and make both subme>=3 and subme<3 consistent.
      Add lots of explanatory comments and try to make things a little more understandable.
      ~5-10% faster with subme>=3, ~15-20% faster with subme<3.
      6371c3a5
  3. 09 Jan, 2013 1 commit
  4. 18 May, 2012 1 commit
    • Fiona Glaser's avatar
      Threaded lookahead · df700eae
      Fiona Glaser authored
      Split each lookahead frame analysis call into multiple threads.  Has a small
      impact on quality, but does not seem to be consistently any worse.
      
      This helps alleviate bottlenecks with many cores and frame threads. In many
      case, this massively increases performance on many-core systems.  For example,
      over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
      Realtime 1080p30 at --preset slow should now be feasible on real systems.
      
      For sliced-threads, this patch should be faster regardless of settings (~10%).
      
      By default, lookahead threads are 1/6 of regular threads.  This isn't exacting,
      but it seems to work well for all presets on real systems.  With sliced-threads,
      it's the same as the number of encoding threads.
      df700eae
  5. 24 Apr, 2012 1 commit
    • Fiona Glaser's avatar
      Add mb_info API for signalling constant macroblocks · 8e57a9a0
      Fiona Glaser authored
      Some use-cases of x264 involve encoding video with large constant areas of the frame.
      Sometimes, the caller knows which areas these are, and can tell x264.
      This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems.
      This is really only suitable without B-frames.
      An example use-case would be using x264 for VNC.
      8e57a9a0
  6. 27 Mar, 2012 1 commit
  7. 07 Mar, 2012 2 commits
  8. 04 Feb, 2012 1 commit
  9. 15 Jan, 2012 1 commit
  10. 01 Dec, 2011 1 commit
  11. 22 Oct, 2011 1 commit
  12. 21 Sep, 2011 1 commit
  13. 10 Jul, 2011 3 commits
  14. 15 Jun, 2011 1 commit
  15. 12 May, 2011 15 commits
  16. 26 Apr, 2011 1 commit
  17. 14 Apr, 2011 1 commit
  18. 13 Apr, 2011 2 commits
  19. 12 Apr, 2011 1 commit
    • Fiona Glaser's avatar
      Minor fixes · 2246e451
      Fiona Glaser authored
      Fix a comment typo.
      Align an array properly.
      Make x264_scan8 unsigned: saves a bunch of movsxd instructions on x86_64.
      2246e451
  20. 24 Mar, 2011 1 commit