1. 11 Apr, 2010 1 commit
  2. 27 Mar, 2010 5 commits
    • Fiona Glaser's avatar
      New "superfast" preset, much faster intra analysis · 0b720fee
      Fiona Glaser authored
      Especially at the fastest settings, intra analysis was taking up the majority of MB analysis time.
      This patch takes a ton more shortcuts at the fastest encoding settings, decreasing compression 0.5-5% but improving speed greatly.
      Also rearrange the fastest presets a bit: now we have ultrafast, superfast, veryfast, faster.
      superfast is the old veryfast (but much faster due to this patch).
      veryfast is between the old veryfast and faster.
      faster is the same as before except with MB-tree on.
      
      Encoding with subme >= 5 should be unaffected by this patch.
      0b720fee
    • Fiona Glaser's avatar
      Add tune for still image compression · aad44376
      Fiona Glaser authored
      There has been some demand for this from companies looking to use x264 for still image compression (it can outperform JPEG or JPEG-2000 by a factor of 2 or more).
      Still image compression is a bit different; because temporal stability isn't an issue, we can get away with far more powerful psy settings.
      aad44376
    • Fiona Glaser's avatar
      "CRF-max" support with VBV · 7ff23daa
      Fiona Glaser authored
      This is a rather curious feature that may have more use than is initially obvious.
      In CRF mode with VBV enabled, CRF-max allows the user to specify a quality level which the encoder will never go below, even due to the effects of VBV.
      This is not the same as qpmax, which is not aware of issues like scene complexity.
      Setting this WILL cause VBV underflows in any situation where the encoder would have needed to exceed the relevant CRF to avoid underflow.
      
      Why might one want to do this even if it would cause VBV underflows?
      In the case of streaming, particularly ultra-low-latency streaming, it may be preferable to drop frames than to display frames that are of too low a quality.
      Thus, in extremely complex scenes, rather than display completely awful video, the streaming server could simply drop to a lower framerate.
      Scenecuts, which normally look terrible under situations like single-frame VBV, could be handled by just displaying them a bit later and dropping frames to compensate.
      In other words, it's better to see the scenecut 150ms delayed than for it to look like a blocky mess for 150ms.
      
      On the caller-side, this would be handled by detecting the output size of x264's frames and dropping future frames to compensate if necessary.
      
      This can also be used in normal encoding simply to ensure that VBV does not hurt quality too much (at the cost of potentially causing underflows).
      This can help quite a lot when using single-frame VBV and sliced threads, where VBV can often be somewhat unstable.
      7ff23daa
    • Kieran Kunhya's avatar
      Blu-ray support: NAL-HRD, VFR ratecontrol, filler, pulldown · bb9b16b4
      Kieran Kunhya authored
      x264 can now generate Blu-ray-compliant streams for authoring Blu-ray Discs!
      Compliance tested using Sony BD-ROM Verifier 1.21.
      Thanks to The Criterion Collection for sponsoring compliance testing!
      
      An example command, using constant quality mode, for 1080p24 content:
      x264 --crf 16 --preset veryslow --tune film --weightp 0 --bframes 3 --nal-hrd vbr --vbv-maxrate 40000 --vbv-bufsize 30000 --level 4.1 --keyint 24 --b-pyramid strict --slices 4 --aud --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 <input> -o <output>
      
      This command is much more complicated than usual due to the very complicated restrictions the Blu-ray spec has.
      Most options after "tune" are required by the spec.
      --weightp 0 is not, but there are known bugged Blu-ray player chipsets (Mediatek, notably) that will decode video with --weightp 1 or 2 incorrectly.
      Furthermore, note the Blu-ray spec has very strict limitations on allowed resolution/fps combinations.
      Examples include 1080p @ 24000/1001fps (NTSC FILM) and 720p @ 60000/1001fps.
      
      Detailed features introduced in this patch:
      
      Full NAL-HRD compliance, with both VBR (no filler) and CBR (filler) modes.
      Can be enabled with --nal-hrd vbr/cbr.
      libx264 now returns HRD timing information to the caller in the form of an x264_hrd_t.
      x264cli doesn't currently use it, but this information is critical for compliant TS muxing.
      
      Full VFR ratecontrol support: VBV, 1-pass ABR, and 2-pass modes.
      This means that, even without knowing the average framerate, x264 can achieve a correct bitrate in target bitrate modes.
      Note that this changes the statsfile format; first pass encodes make before this patch will have to be re-run.
      
      Pulldown support: libx264 allows the calling application to specify a pulldown mode for each frame.
      This is similar to the way that RFFs (Repeat Field Flags) work in MPEG-2.
      Note that libx264 does not modify timestamps: it assumes the calling application has set timestamps correctly for pulldown!
      x264cli contains an example implementation of caller-side pulldown code.
      
      Pic_struct support: necessary for pulldown and allows interlaced signalling.
      Also signal TFF vs BFF with delta_poc_bottom: should significantly improve interlaced compression.
      --tff and --bff should be preferred to the old --interlaced in order to tell x264 what field order to use.
      
      Huge thanks to Alex Giladi and Lamont Alston for their work on code that eventually became part of this patch.
      bb9b16b4
    • Alex Wright's avatar
      Mixed-refs support for B-frames · 1f9393eb
      Alex Wright authored
      Small speed cost, usually a few percent at most. Generally has lowest cost in cases when it isn't very useful. Up to ~2% better compression overall on highly complex sources.
      
      Also fix a few minor bugs in B-frame analysis and various bits of cleanup.
      1f9393eb
  3. 23 Feb, 2010 4 commits
  4. 14 Feb, 2010 1 commit
    • Fiona Glaser's avatar
      Add ability to adjust ratecontrol parameters on the fly · 34c42187
      Fiona Glaser authored
      encoder_reconfig and x264_picture_t->param can now be used to change ratecontrol parameters.
      This is extraordinarily useful in certain streaming situations where the encoder needs to adapt the bitrate to network circumstances.
      
      What can be changed:
      1) CRF can be adjusted if in CRF mode.
      2) VBV maxrate and bufsize can be adjusted if in VBV mode.
      3) Bitrate can be adjusted if in CBR mode.
      However, x264 cannot switch between modes and cannot change bitrate in ABR mode.
      
      Also fix a bug where x264_picture_t->param reconfig method would not always be frame-exact.
      
      Commit sponsored by SayMama video calling.
      34c42187
  5. 30 Jan, 2010 1 commit
    • Yusuke Nakamura's avatar
      Improve DTS generation, move DTS compression into libx264 · afc36d0b
      Yusuke Nakamura authored
      This change fixes some cases in which PTS could be less than DTS.
      
      Additionally, a new parameter, b_dts_compress, enables DTS compression.
      DTS compression eliminates negative DTS (i.e. initial delay) due to B-frames.
      The algorithm changes timebase in order to avoid duplicating DTS.
      Currently, in x264cli, only the FLV muxer uses it.  The MP4 muxer doesn't need it, as it uses an EditBox instead.
      afc36d0b
  6. 14 Jan, 2010 3 commits
    • Fiona Glaser's avatar
      Fix free callback, add x264_encoder_parameters function · 398d0eb3
      Fiona Glaser authored
      x264 would try to use the passed param struct after freeing if the param_free callback was set.
      Probably didn't cause any issues, as probably no programs used the callback in this location yet.
      
      A new x264_encoder_parameters function is now available in the API.
      This function lets the calling application grab the current state of the encoder's parameters.
      Use this in x264cli to ensure that the param struct used for set_param is updated with whatever changes x264_encoder_open has made to it.
      
      Patch partially by Anton Mitrofanov <BugMaster@narod.ru>.
      398d0eb3
    • Fiona Glaser's avatar
      Periodic intra refresh · cde39046
      Fiona Glaser authored
      Uses SEI recovery points, a moving vertical "bar" of intra blocks, and motion vector restrictions to eliminate keyframes.
      Attempt to hide the visual appearance of the intra bar when --no-psy isn't set.
      Enabled with --intra-refresh.
      The refresh interval is controlled using keyint, but won't exceed the number of macroblock columns in the frame.
      Greatly benefits low-latency streaming by making it possible to achieve constant framesize without intra-only encoding.
      Combined with slice-max size for one slice per packet, tests suggest effective resiliance against packet loss as high as 25%.
      x264 is now the best free software low-latency video encoder in the world.
      
      Accordingly, change the API to add b_keyframe to the parameters present in output pictures.
      Calling applications should check this to see if a frame is seekable, not the frame type.
      
      Also make x264's motion estimation strictly abide by horizontal MV range limits in order for PIR to work.
      Also fix a major bug in sliced-threads VBV handling.
      Also change "auto" threads for sliced threads to "cores" instead of "1.5*cores" after performance testing.
      Also simplify ratecontrol's checking of first pass options.
      Also some minor tweaks to row-based VBV that should improve VBV accuracy on small frames.
      cde39046
    • Kieran Kunhya's avatar
      LAVF/FFMS input support, native VFR timestamp handling · 30d76a5e
      Kieran Kunhya authored
      libx264 now takes three new API parameters.
      b_vfr_input tells x264 whether or not the input is VFR, and is 1 by default.
      i_timebase_num and i_timebase_den pass the timebase to x264.
      
      x264_picture_t now returns the DTS of each frame: the calling app need not calculate it anymore.
      
      Add libavformat and FFMS2 input support: requires libav* and ffms2 libraries respectively.
      FFMS2 is _STRONGLY_ preferred over libavformat: we encourage all distributions to compile with FFMS2 support if at all possible.
      FFMS2 can be found at http://code.google.com/p/ffmpegsource/.
      --index, a new x264cli option, allows the user to store (or load) an FFMS2 index file for future use, to avoid re-indexing in the future.
      
      Overhaul the muxers to pass through timestamps instead of assuming CFR.
      Also overhaul muxers to correctly use b_annexb and b_repeat_headers to simplify the code.
      Remove VFW input support, since it's now pretty much redundant with native AVS support and LAVF support.
      Finally, overhaul a large part of the x264cli internals.
      
      --force-cfr, a new x264cli option, allows the user to force the old method of timestamp handling.  May be useful in case of a source with broken timestamps.
      Avisynth, YUV, and Y4M input are all still CFR.  LAVF or FFMS2 must be used for VFR support.
      
      Do note that this patch does *not* add VFR ratecontrol yet.
      Support for telecined input is also somewhat dubious at the moment.
      
      Large parts of this patch by Mike Gurlitz <mike.gurlitz@gmail.com>, Steven Walters <kemuri9@gmail.com>, and Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>.
      30d76a5e
  7. 15 Dec, 2009 1 commit
  8. 09 Dec, 2009 1 commit
    • Fiona Glaser's avatar
      Bring back slice-based threading support · 6f221210
      Fiona Glaser authored
      Enabled with --sliced-threads
      Unlike normal threading, adds no encoding latency.
      Less efficient than normal threading, both performance and compression-wise.
      Useful for low-latency encoding environments where performance is still important, such as HD videoconferencing.
      Add --tune zerolatency, which eliminates all x264 encoder-side latency (no delayed frames at all).
      Some tweaks to VBV ratecontrol and lookahead (in addition to those required by sliced threading).
      Commit sponsored by a media streaming company that wishes to remain anonymous.
      6f221210
  9. 30 Nov, 2009 1 commit
    • Steven Walters's avatar
      Enhanced Avisynth input support · 025f01db
      Steven Walters authored
      Requires avisynth_c.h from the Avisynth API headers.
      Reports errors properly from Avisynth script input.
      Automatically construct input scripts for almost any input file.
      Tries ffmpegsource2, DSS2, directshowsource, and many other sourcing methods, based on the input file extension.
      Automatically converts to YV12.
      025f01db
  10. 09 Nov, 2009 1 commit
    • Dylan Yudaken's avatar
      Weighted P-frame prediction · ccac8546
      Dylan Yudaken authored
      Merge Dylan's Google Summer of Code 2009 tree.
      Detect fades and use weighted prediction to improve compression and quality.
      "Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
      "Smart", the default mode, also performs fade detection and decides weights accordingly.
      MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
      If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
      However, it will be used to adjust quality instead of create actual weights.
      This will improve quality in fades when encoding in Baseline profile.
      
      Doesn't add support for interlaced encoding with weightp yet.
      Only adds support for luma weights, not chroma weights.
      Internal code for chroma weights is in, but there's no analysis yet.
      Baseline profile requires that weightp be off.
      All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
      "Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.
      
      Thanks to Google for sponsoring our most successful Summer of Code yet!
      ccac8546
  11. 19 Oct, 2009 1 commit
    • Lamont Alston's avatar
      Make B-pyramid spec-compliant · cf5ba813
      Lamont Alston authored
      The rules of the specification with regard to picture buffering for pyramid coding are widely ignored.
      x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant.
      Now it is.
      Two modes are now available:
      1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
      2) normal b-pyramid, which is like the old mode except fully compliant.
      This patch also adds MMCO support (necessary for compliant pyramid in some cases).
      MB-tree still doesn't support b-pyramid (but will soon).
      cf5ba813
  12. 07 Oct, 2009 1 commit
    • Fiona Glaser's avatar
      Constrained intra prediction support · 7639d496
      Fiona Glaser authored
      Enable with --constrained-intra.  Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.
      
      Commit sponsored by a media streaming company that wishes to remain anonymous.
      7639d496
  13. 21 Sep, 2009 1 commit
    • Fiona Glaser's avatar
      Major API change: encapsulate NALs within libx264 · 7a0fbed7
      Fiona Glaser authored
      libx264 now returns NAL units instead of raw data.  x264_nal_encode is no longer a public function.
      See x264.h for full documentation of changes.
      New parameter: b_annexb, on by default.  If disabled, startcodes are replaced by sizes as in mp4.
      x264's VBV now works on a NAL level, taking into account escape codes.
      VBV will also take into account the bit cost of SPS/PPS, but only if b_repeat_headers is set.
      Add an overhead tracking system to VBV to better predict the constant overhead of frames (headers, NALU overhead, etc).
      7a0fbed7
  14. 02 Sep, 2009 2 commits
    • Steven Walters's avatar
      Threaded lookahead · 6940dcae
      Steven Walters authored
      Move lookahead into a separate thread, set to higher priority than the other threads, for optimal performance.
      Reduces the amount that lookahead bottlenecks encoding, greatly increasing performance with lookahead-intensive settings (e.g. b-adapt 2) on many-core CPUs.
      Buffer size can be controlled with --sync-lookahead, which defaults to auto (threads+bframes buffer size).
      Note that this buffer is separate from the rc-lookahead value.
      Note also that this does not split lookahead itself into multiple threads yet; this may be added in the future.
      Additionally, split frames into "fdec" and "fenc" frame types and keep the two separate.
      This split greatly reduces memory usage, which helps compensate for the larger lookahead size.
      Extremely special thanks to Michael Kazmier and Alex Giladi of Avail Media, the original authors of this patch.
      6940dcae
    • Fiona Glaser's avatar
      Force a link error in case of incompatible API · 7df6f5d6
      Fiona Glaser authored
      This is because the number of bug reports due to miscompiled ffmpeg builds is reaching critical mass.
      The name of x264_encoder_open is now #defined based on the current X264_BUILD.
      Note that this changes the calling convention required for dlopen, but not for ordinary calls to x264_encoder_open.
      7df6f5d6
  15. 31 Aug, 2009 1 commit
    • Fiona Glaser's avatar
      Multi-slice encoding support · 4ccbb199
      Fiona Glaser authored
      Slicing support is available through three methods (which can be mixed):
      --slices sets a number of slices per frame and ensures rectangular slices (required for Blu-ray).  Overridden by either of the following options:
      --slice-max-mbs sets a maximum number of macroblocks per slice.
      --slice-max-size sets a maximum slice size, in bytes (includes NAL overhead).
      Implement macroblock re-encoding support to allow highly accurate slice size limitation.  Might be useful for other things in the future, too.
      4ccbb199
  16. 20 Aug, 2009 1 commit
  17. 19 Aug, 2009 1 commit
    • Fiona Glaser's avatar
      Add support for frame-accurate parameter changes · c83699f1
      Fiona Glaser authored
      Parameter structs can now be passed with individual frames.
      The previous method would only change the parameter of what was currently being encoded, which due to delay might be very far from an intended exact frame.
      Also add support for changing aspect ratio.  Only works in a stream with repeating headers and requires the caller to force an IDR to ensure instant effect.
      c83699f1
  18. 13 Aug, 2009 1 commit
  19. 08 Aug, 2009 1 commit
  20. 07 Aug, 2009 1 commit
    • Fiona Glaser's avatar
      Macroblock-tree ratecontrol · 835ccc3c
      Fiona Glaser authored
      On by default; can be turned off with --no-mbtree.
      Uses a large lookahead to track temporal propagation of data and weight quality accordingly.
      Requires a very large separate statsfile (2 bytes per macroblock) in multi-pass mode.
      Doesn't work with b-pyramid yet.
      Note that MB-tree inherently measures quality different from the standard qcomp method, so bitrates produced by CRF may change somewhat.
      This makes the "medium" preset a bit slower.  Accordingly, make "fast" slower as well, and introduce a new preset "faster" between "fast" and "veryfast".
      All presets "fast" and above will have MB-tree on.
      Add a new option, --rc-lookahead, to control the distance MB tree looks ahead to perform propagation analysis.
      Default is 40; larger values will be slower and require more memory but give more accurate results.
      This value will be used in the future to control ratecontrol lookahead (VBV).
      Add a new option, --no-psy, to disable all psy optimizations that don't improve PSNR or SSIM.
      This disables psy-RD/trellis, but also other more subtle internal psy optimizations that can't be controlled directly via external parameters.
      Quality improvement from MB-tree is about 2-70% depending on content.
      Strength of MB-tree adjustments can be tweaked using qcompress; higher values mean lower MB-tree strength.
      Note that MB-tree may perform slightly suboptimally on fades; this will be fixed by weighted prediction, which is coming soon.
      835ccc3c
  21. 20 Jul, 2009 1 commit
    • Anton Mitrofanov's avatar
      New AQ algorithm option · 2e1db1f6
      Anton Mitrofanov authored
      "Auto-variance" uses log(var)^2 instead of log(var) and attempts to adapt strength per-frame.
      Generates significantly better SSIM; on by default with --tune ssim.
      Whether it generates visually better quality is still up for debate.
      Available as --aq-mode 2.
      2e1db1f6
  22. 07 Jul, 2009 1 commit
    • Fiona Glaser's avatar
      Totally new preset system for x264.c (not libx264), new defaults · 71b9d885
      Fiona Glaser authored
      Other new features include "tune" and "profile" settings; see --help for more details.
      Unlike most other settings, "preset" and "tune" act before all other options.
      However, "profile" acts afterwards, overriding all other options.
      Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress.
      Users will hopefully find these changes to greatly improve usability.
      71b9d885
  23. 17 Mar, 2009 1 commit
    • Fiona Glaser's avatar
      SSE2 zigzag_interleave · d25d50c9
      Fiona Glaser authored
      Replace PHADD with FastShuffle (more accurate naming).
      This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.
      d25d50c9
  24. 04 Mar, 2009 1 commit
    • Fiona Glaser's avatar
      Remove non-pre scenecut · 42f27d04
      Fiona Glaser authored
      Add support for no-b-adapt + pre-scenecut (patch by BugMaster)
      Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways.
      Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1)
      Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2.
      Simplify pre-scenecut code.
      42f27d04
  25. 20 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Eliminate support for direct_8x8_inference=0 · 1f0e78d8
      Fiona Glaser authored
      The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
      As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
      Remove some unused mc code related to sub-8x8 partitions.
      Small deblocking speedup when p4x4 is used.
      Also remove unused x264_nal_decode prototype from x264.h.
      1f0e78d8
  26. 14 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Support forced frametypes with scenecut/b-adapt · 6b4b85f1
      Fiona Glaser authored
      This allows an input qpfile to be used to force I-frames, for example.
      The same can be done through the library interface.
      Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h
      Note that forcing B-frames and B-refs may not always have the intended result.
      Patch partially by Steven Walters <kemuri9@gmail.com>.
      6b4b85f1
  27. 31 Dec, 2008 1 commit
  28. 23 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Phenom CPU optimizations · 80ea99c0
      Fiona Glaser authored
      Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
      Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
      Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
      Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
      Merge cpu-32.asm and cpu-64.asm
      Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
      80ea99c0
  29. 05 Nov, 2008 1 commit
    • Fiona Glaser's avatar
      Initial Nehalem CPU optimizations · 1bf7228f
      Fiona Glaser authored
      movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
      Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
      Thanks to Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
      Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%.
      1bf7228f
  30. 02 Oct, 2008 1 commit
    • Fiona Glaser's avatar
      Rework subme system, add RD refinement in B-frames · 60455fff
      Fiona Glaser authored
      The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.
      subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent
      --b-rdo has, accordingly, been removed.  --bime has also been removed, and instead enabled automatically at subme >= 5.
      RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.
      60455fff