1. 30 Oct, 2013 1 commit
    • Fiona Glaser's avatar
      Add --filler option · c084f6c0
      Fiona Glaser authored
      Allows generation of hard-CBR streams without using NAL HRD.
      Useful if you want to be able to reconfigure the bitrate (which you can't do
      with NAL HRD on).
      c084f6c0
  2. 23 Aug, 2013 4 commits
    • Henrik Gramner's avatar
      Windows Unicode support · fa3cac51
      Henrik Gramner authored
      Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.
      
      This patch does the following in order to handle things like Unicode filenames:
      * Keep strings internally as UTF-8.
      * Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
      * Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
      * Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.
      fa3cac51
    • Kieran Kunhya's avatar
      AVC-Intra support · 9b94896b
      Kieran Kunhya authored
      This format has been reverse engineered and x264's output has almost exactly
      the same bitstream as Panasonic cameras and encoders produce. It therefore does
      not comply with SMPTE RP2027 since Panasonic themselves do not comply with
      their own specification. It has been tested in Avid, Premiere, Edius and
      Quantel.
      
      Parts of this patch were written by Fiona Glaser and some reverse
      engineering was done by Joseph Artsimovich.
      9b94896b
    • Henrik Gramner's avatar
      Transparent hugepage support · fa1e2b74
      Henrik Gramner authored
      Combine frame and mb data mallocs into a single large malloc.
      Additionally, on Linux systems with hugepage support, ask for hugepages on
      large mallocs.
      
      This gives a small performance improvement (~0.2-0.9%) on systems without
      hugepage support, as well as a small memory footprint reduction.
      
      On recent Linux kernels with hugepage support enabled (set to madvise or
      always), it improves performance up to 4% at the cost of about 7-12% more
      memory usage on typical settings..
      
      It may help even more on Haswell and other recent CPUs with improved 2MB page
      support in hardware.
      fa1e2b74
    • Anton Mitrofanov's avatar
      a6c396f0
  3. 03 Jul, 2013 1 commit
  4. 20 May, 2013 1 commit
  5. 23 Apr, 2013 4 commits
    • Fiona Glaser's avatar
      x86: more AVX2 framework, AVX2 functions, plus some existing asm tweaks · 0ea5be85
      Fiona Glaser authored
      AVX2 functions:
      mc_chroma
      intra_sad_x3_16x16
      last64
      ads
      hpel
      dct4
      idct4
      sub16x16_dct8
      quant_4x4x4
      quant_4x4
      quant_4x4_dc
      quant_8x8
      SAD_X3/X4
      SATD
      var
      var2
      SSD
      zigzag interleave
      weightp
      weightb
      intra_sad_8x8_x9
      decimate
      integral
      hadamard_ac
      sa8d_satd
      sa8d
      lowres_init
      denoise
      0ea5be85
    • Steve Borho's avatar
      OpenCL lookahead · f49a1b2e
      Steve Borho authored
      OpenCL support is compiled in by default, but must be enabled at runtime by an
      --opencl command line flag. Compiling OpenCL support requires perl. To avoid
      the perl requirement use: configure --disable-opencl.
      
      When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
      device.  Lowres intra cost prediction, lowres motion search (including subpel)
      and bidir cost predictions are all done on the GPU.  MB-tree and final slice
      decisions are still done by the CPU.  Presets which do not use a threaded
      lookahead will not use OpenCL at all (superfast, ultrafast).
      
      Because of data dependencies, the GPU must use an iterative motion search which
      performs more total work than the CPU would do, so this is not work efficient
      or power efficient. But if there are spare GPU cycles to spare, it can often
      speed up the encode. Output quality when OpenCL lookahead is enabled is often
      very slightly worse in quality than the CPU quality (because of the same data
      dependencies).
      
      x264 must compile its OpenCL kernels for your device before running them, and in
      order to avoid doing this every run it caches the compiled kernel binary in a
      file named x264_lookahead.clbin (--opencl-clbin FNAME to override).  The cache
      file will be ignored if the device, driver, or OpenCL source are changed.
      
      x264 will use the first GPU device which supports the required cl_image
      features required by its kernels. Most modern discrete GPUs and all AMD
      integrated GPUs will work.  Intel integrated GPUs (up to IvyBridge) do not
      support those necessary features. Use --opencl-device N to specify a number of
      capable GPUs to skip during device detection.
      
      Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
      as some have bugs in their OpenCL drivers that cause output to be silently
      incorrect.
      
      Developed by MulticoreWare with support from AMD and Telestream.
      f49a1b2e
    • Fiona Glaser's avatar
      Add slices-max feature · 732e4f7e
      Fiona Glaser authored
      The H.264 spec technically has limits on the number of slices per frame. x264
      normally ignores this, since most use-cases that require large numbers of
      slices prefer it to. However, certain decoders may break with extremely large
      numbers of slices, as can occur with some slice-max-size/mbs settings.
      
      When set, x264 will refuse to create any slices beyond the maximum number,
      even if slice-max-size/mbs requires otherwise.
      732e4f7e
    • Fiona Glaser's avatar
      Add slice-min-mbs feature · fdfffa30
      Fiona Glaser authored
      Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly
      small slices.
      Useful with certain decoders that barf on extremely small slices.
      
      If slice-min-mbs would be violated as a result of slice-max-size, x264 will
      exceed slice-max-size and print a warning.
      fdfffa30
  6. 26 Feb, 2013 1 commit
    • Fiona Glaser's avatar
      x86: detect Bobcat, improve Atom optimizations, reorganize flags · 5d60b9c9
      Fiona Glaser authored
      The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
      and apply the appropriate flags.
      
      It also has an extremely slow palignr instruction; create a flag for this to
      avoid massive penalties on palignr-heavy functions.
      
      Improve Atom function selection and document exactly what the SLOW_ATOM flag
      covers.
      
      Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
      optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
      Atom along with other SIMD multiplies.
      
      Drop TBM detection; it'll probably never be useful for x264.
      
      Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).
      
      Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.
      5d60b9c9
  7. 25 Feb, 2013 1 commit
  8. 09 Jan, 2013 1 commit
  9. 03 Jul, 2012 1 commit
    • Anton Mitrofanov's avatar
      Fix crash with --fps 0 · 5e3aaf1a
      Anton Mitrofanov authored
      Fix some integer overflows and check input parameters better.
      Also fix incorrect type specifiers for demuxer info printing.
      5e3aaf1a
  10. 18 May, 2012 1 commit
    • Fiona Glaser's avatar
      Threaded lookahead · df700eae
      Fiona Glaser authored
      Split each lookahead frame analysis call into multiple threads.  Has a small
      impact on quality, but does not seem to be consistently any worse.
      
      This helps alleviate bottlenecks with many cores and frame threads. In many
      case, this massively increases performance on many-core systems.  For example,
      over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
      Realtime 1080p30 at --preset slow should now be feasible on real systems.
      
      For sliced-threads, this patch should be faster regardless of settings (~10%).
      
      By default, lookahead threads are 1/6 of regular threads.  This isn't exacting,
      but it seems to work well for all presets on real systems.  With sliced-threads,
      it's the same as the number of encoding threads.
      df700eae
  11. 04 Feb, 2012 1 commit
  12. 22 Oct, 2011 1 commit
  13. 21 Sep, 2011 2 commits
    • Henrik Gramner's avatar
      4:2:2 encoding support · 5b0cb86f
      Henrik Gramner authored
      5b0cb86f
    • Loren Merritt's avatar
      SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9) · 3d82e875
      Loren Merritt authored
      i4x4 analysis cycles (per partition):
      penryn   sandybridge
      184-> 75  157-> 54  preset=superfast (sad)
      281->165  225->124  preset=faster    (satd with early termination)
      332->165  263->124  preset=medium
      379->165  297->124  preset=slower    (satd without early termination)
      
      This is the first code in x264 that intentionally produces different behavior
      on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
      directions, whereas the old code (on fast presets) may early terminate after
      checking only some of them. There is no systematic difference on slow presets,
      though they still occasionally disagree about tiebreaks.
      
      For ease of debugging, add an option "--cpu-independent" to disable satd_x9
      and any analogous future code.
      3d82e875
  14. 05 Aug, 2011 1 commit
  15. 22 Jul, 2011 2 commits
  16. 10 Jul, 2011 2 commits
  17. 13 Jun, 2011 1 commit
  18. 13 Apr, 2011 1 commit
    • Fiona Glaser's avatar
      Consolidate Blu-ray hacks into --bluray-compat · e54ea0c8
      Fiona Glaser authored
      This option is now required for Blu-ray compatibility.
      --open-gop bluray is now gone (using bluray-compat and open-gop implies a Blu-ray compatible open-gop).
      This option doesn't automatically enforce every aspect of Blu-ray compatibility (e.g. resolution, framerate, level, etc).
      e54ea0c8
  19. 12 Apr, 2011 1 commit
  20. 24 Mar, 2011 1 commit
  21. 25 Jan, 2011 1 commit
  22. 10 Jan, 2011 2 commits
  23. 14 Dec, 2010 1 commit
  24. 07 Dec, 2010 1 commit
  25. 25 Nov, 2010 2 commits
    • Alex Wright's avatar
      Make --weightp 1 a better speed tradeoff · 7e3019a3
      Alex Wright authored
      Since fade analysis is now so fast, weightp 1 now does fade analysis but no reference duplication.
      This is the opposite of what it used to do (reference duplication but no fade analysis).
      This also gives weightp's better fade quality to faster presets (up to superfast).
      7e3019a3
    • Fiona Glaser's avatar
      Change qpmin default to 0 · ca8f00c7
      Fiona Glaser authored
      There's probably no real reason to keep it at 10 anymore, and lowering it allows AQ to pick lower quantizers in really flat areas.
      Might help on gradients at high quality levels.
      The previous value of 10 was arbitrary anyways.
      ca8f00c7
  26. 19 Nov, 2010 1 commit
  27. 17 Nov, 2010 1 commit
  28. 10 Nov, 2010 2 commits
    • Fiona Glaser's avatar
      Improve quantizer handling · 2f2ab0fa
      Fiona Glaser authored
      The default value for i_qpplus1 in x264_picture_t is now X264_QP_AUTO.  This is currently 0, but may change in the future.
      qpfiles no longer use -1 to indicate "auto"; QP is just omitted.  The old method should still work though.
      
      CRF values now make sense in high bit depth mode.
      --qp should be used for lossless mode, not --crf.
      --crf 0 will still work as expected in 8-bit mode, but won't be lossless with higher bit depths.
      Add bit depth to statsfiles.
      
      These changes are required to make the QP interface sensible in combination with high bit depth.
      2f2ab0fa
    • Loren Merritt's avatar
      Add numeric names for the presets (0==ultrafast ... 9==placebo) · 3d96daca
      Loren Merritt authored
      This mapping will of course change if new presets are added in between, but will always be ordered from fastest to slowest.
      3d96daca