1. 31 Mar, 2010 1 commit
  2. 27 Mar, 2010 36 commits
    • Henrik Gramner's avatar
      Update copyright year in SEI header · d427ae20
      Henrik Gramner authored
    • Fiona Glaser's avatar
      New "superfast" preset, much faster intra analysis · 0b720fee
      Fiona Glaser authored
      Especially at the fastest settings, intra analysis was taking up the majority of MB analysis time.
      This patch takes a ton more shortcuts at the fastest encoding settings, decreasing compression 0.5-5% but improving speed greatly.
      Also rearrange the fastest presets a bit: now we have ultrafast, superfast, veryfast, faster.
      superfast is the old veryfast (but much faster due to this patch).
      veryfast is between the old veryfast and faster.
      faster is the same as before except with MB-tree on.
      Encoding with subme >= 5 should be unaffected by this patch.
    • Fiona Glaser's avatar
    • Henrik Gramner's avatar
      Cosmetics in mvd handling · 54e09223
      Henrik Gramner authored
      Use a 2D array instead of doing manual pointer arithmetic.
    • Fiona Glaser's avatar
    • Fiona Glaser's avatar
      Add tune for still image compression · aad44376
      Fiona Glaser authored
      There has been some demand for this from companies looking to use x264 for still image compression (it can outperform JPEG or JPEG-2000 by a factor of 2 or more).
      Still image compression is a bit different; because temporal stability isn't an issue, we can get away with far more powerful psy settings.
    • Henrik Gramner's avatar
      Pad non-mod16 resolutions using the correct field · 774dbb47
      Henrik Gramner authored
      Improves compression of interlaced videos with non-mod16 heights.
    • Fiona Glaser's avatar
      Document slow/fast firstpass in --fullhelp · e4404fa3
      Fiona Glaser authored
    • Holger Lubitz's avatar
      Fix some misattributions in profiling · 084adc2e
      Holger Lubitz authored
      Cycles spent in load_hadamard and the avg2 w16 ssse3 cacheline split code were misattributed.
    • Fiona Glaser's avatar
      Much faster non-RD intra analysis · e77bbb6a
      Fiona Glaser authored
      Since every pred mode costs at least 1 bit, move that part into the initial SATD cost.
      This lets i4x4/i8x8 analysis terminate earlier.
      If the cost of the predicted mode is less than the cost of signalling any other mode, early-terminate the analysis.
    • Fiona Glaser's avatar
      Fix stack alignment in sliced threads · d8d83a96
      Fiona Glaser authored
      Could cause crashes when called from non-GCC-compiled applications.
    • Henrik Gramner's avatar
      Cosmetics: use sizeof() where appropriate · 18eed0b9
      Henrik Gramner authored
    • Fiona Glaser's avatar
      Split up analyse_init · 137e233f
      Fiona Glaser authored
      Save some time by avoiding some unnecessary inits and moving other parts to per-thread init.
    • Henrik Gramner's avatar
      Reduce stack usage of b-adapt 2's trellis · 7a282a58
      Henrik Gramner authored
      Also remove some redundant code.
    • Fiona Glaser's avatar
      Various motion estimation optimizations · 37b4707b
      Fiona Glaser authored
      Faster method of checking MV range.
      Predict MVs and cache MVs/MVDs for bidir qpel-RD.
      A whole bunch of other minor optimizations.
      Slightly better performance and compression.
    • Fiona Glaser's avatar
      Overhaul macroblock_cache_rect · 4c03ec69
      Fiona Glaser authored
      Unify the rectangle functions into a single one similar to ffmpeg's fill_rectangle.
      Remove all cases of variable-size cache_rect calls; create a function-pointer-based system for handling such cases.
      Should greatly decrease code size required for such calls.
    • Fiona Glaser's avatar
      Make a bunch of small functions ALWAYS_INLINE · 8b4cca0e
      Fiona Glaser authored
      Probably no real effect for now, but needed for the next patch.
    • Loic Minier's avatar
      Two compatibility fixes · 219505af
      Loic Minier authored
      Add IA64 support in configure.
    • Henrik Gramner's avatar
      Faster x264_macroblock_encode_pskip · 6f3a6d52
      Henrik Gramner authored
      GCC is apparently unable to optimize out the calculation of a variable when it isn't used.
    • Fiona Glaser's avatar
      Much more accurate B-skip detection at 2 < subme < 7 · 47092e82
      Fiona Glaser authored
      Use the same method that x264 uses for P-skip detection.
      This significantly improves quality (1-6%), but at a significant speed cost as well (5-20%).
      It also may have a very positive visual effect in cases where the inaccurate skip detection resulted in slightly-off vectors in B-frames.
      This could cause slight blurring or non-smooth motion in low-complexity frames at high quantizers.
      Not all instances of this problem are solved: the only universal solution is non-locally-optimal mode decision, which x264 does not currently have.
      subme >= 7 or <= 2 are unaffected.
    • Alexander Strange's avatar
      Reformat profile restrictions in --fullhelp. · 639b18a6
      Alexander Strange authored
      Put "no interlaced", "no lossless" on their own line to avoid them
      running into the default options list.
    • James Darnley's avatar
      Fix typo in configure · a9adb0d4
      James Darnley authored
    • David Conrad's avatar
    • Yusuke Nakamura's avatar
      Fix slightly wrong mp4 duration. · 6ac9e171
      Yusuke Nakamura authored
    • Yusuke Nakamura's avatar
      Fix link errors with newest gpac cvs · ddfe4124
      Yusuke Nakamura authored
      gpac decided to randomly break API and require us to use their own custom malloc and free.
    • Kieran Kunhya's avatar
      Save a few bits in slice headers · 2a2db86d
      Kieran Kunhya authored
      Don't override the maximum ref index in the slice header if it's the same as the default.
      Also update the naming of the relevant variables in the PPS.
    • Fiona Glaser's avatar
      Shrink some arrays in x264_t · 415aac4f
      Fiona Glaser authored
      Also remove an unnecessary assignment from cache_load.
    • Fiona Glaser's avatar
    • Anton Mitrofanov's avatar
      Fix two nondeterminisms · 89183a0e
      Anton Mitrofanov authored
      Move noise reduction data into thread-specific data.
      Use correct reference list for L1 temporal predictors.
    • Fiona Glaser's avatar
      "CRF-max" support with VBV · 7ff23daa
      Fiona Glaser authored
      This is a rather curious feature that may have more use than is initially obvious.
      In CRF mode with VBV enabled, CRF-max allows the user to specify a quality level which the encoder will never go below, even due to the effects of VBV.
      This is not the same as qpmax, which is not aware of issues like scene complexity.
      Setting this WILL cause VBV underflows in any situation where the encoder would have needed to exceed the relevant CRF to avoid underflow.
      Why might one want to do this even if it would cause VBV underflows?
      In the case of streaming, particularly ultra-low-latency streaming, it may be preferable to drop frames than to display frames that are of too low a quality.
      Thus, in extremely complex scenes, rather than display completely awful video, the streaming server could simply drop to a lower framerate.
      Scenecuts, which normally look terrible under situations like single-frame VBV, could be handled by just displaying them a bit later and dropping frames to compensate.
      In other words, it's better to see the scenecut 150ms delayed than for it to look like a blocky mess for 150ms.
      On the caller-side, this would be handled by detecting the output size of x264's frames and dropping future frames to compensate if necessary.
      This can also be used in normal encoding simply to ensure that VBV does not hurt quality too much (at the cost of potentially causing underflows).
      This can help quite a lot when using single-frame VBV and sliced threads, where VBV can often be somewhat unstable.
    • Kieran Kunhya's avatar
      Blu-ray support: NAL-HRD, VFR ratecontrol, filler, pulldown · bb9b16b4
      Kieran Kunhya authored
      x264 can now generate Blu-ray-compliant streams for authoring Blu-ray Discs!
      Compliance tested using Sony BD-ROM Verifier 1.21.
      Thanks to The Criterion Collection for sponsoring compliance testing!
      An example command, using constant quality mode, for 1080p24 content:
      x264 --crf 16 --preset veryslow --tune film --weightp 0 --bframes 3 --nal-hrd vbr --vbv-maxrate 40000 --vbv-bufsize 30000 --level 4.1 --keyint 24 --b-pyramid strict --slices 4 --aud --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 <input> -o <output>
      This command is much more complicated than usual due to the very complicated restrictions the Blu-ray spec has.
      Most options after "tune" are required by the spec.
      --weightp 0 is not, but there are known bugged Blu-ray player chipsets (Mediatek, notably) that will decode video with --weightp 1 or 2 incorrectly.
      Furthermore, note the Blu-ray spec has very strict limitations on allowed resolution/fps combinations.
      Examples include 1080p @ 24000/1001fps (NTSC FILM) and 720p @ 60000/1001fps.
      Detailed features introduced in this patch:
      Full NAL-HRD compliance, with both VBR (no filler) and CBR (filler) modes.
      Can be enabled with --nal-hrd vbr/cbr.
      libx264 now returns HRD timing information to the caller in the form of an x264_hrd_t.
      x264cli doesn't currently use it, but this information is critical for compliant TS muxing.
      Full VFR ratecontrol support: VBV, 1-pass ABR, and 2-pass modes.
      This means that, even without knowing the average framerate, x264 can achieve a correct bitrate in target bitrate modes.
      Note that this changes the statsfile format; first pass encodes make before this patch will have to be re-run.
      Pulldown support: libx264 allows the calling application to specify a pulldown mode for each frame.
      This is similar to the way that RFFs (Repeat Field Flags) work in MPEG-2.
      Note that libx264 does not modify timestamps: it assumes the calling application has set timestamps correctly for pulldown!
      x264cli contains an example implementation of caller-side pulldown code.
      Pic_struct support: necessary for pulldown and allows interlaced signalling.
      Also signal TFF vs BFF with delta_poc_bottom: should significantly improve interlaced compression.
      --tff and --bff should be preferred to the old --interlaced in order to tell x264 what field order to use.
      Huge thanks to Alex Giladi and Lamont Alston for their work on code that eventually became part of this patch.
    • Yusuke Nakamura's avatar
      Timecode input/output · 4d3c4787
      Yusuke Nakamura authored
      --tcfile-in allows a user to specify a timecode v1 or v2 file to override input timestamps.
      Useful for dealing with VFR input, especially when FFMS/LAVF support isn't available.
      --tcfile-out writes a timecode v2 file containing the timecodes of the output file.
      New --timebase option allows a user to change the stream timebase.
      Intended primarily for forcing timebase with timecode files if necessary.
      When using --seek, note that x264 will seek in the timecode file as well.
    • Alex Wright's avatar
      Mixed-refs support for B-frames · 1f9393eb
      Alex Wright authored
      Small speed cost, usually a few percent at most. Generally has lowest cost in cases when it isn't very useful. Up to ~2% better compression overall on highly complex sources.
      Also fix a few minor bugs in B-frame analysis and various bits of cleanup.
    • Henrik Gramner's avatar
      Faster rounding of chroma DC coefficients · a934f0fa
      Henrik Gramner authored
    • Holger Lubitz's avatar
      Faster cabac_encode_decision_asm · 9d71ff19
      Holger Lubitz authored
      Minimizes instruction count, which also means smaller code.
      Various other slight changes to allow more instruction level parallelism.
    • Holger Lubitz's avatar
      Faster hpel_filter · 125b8f6c
      Holger Lubitz authored
      On ssse3, use pmaddubsw for h filter too (similar to v filter).
      Change 32-bit v and c filters to write the result non-temporal.
      Add commented-out defines to disable non-temporal operation.
      Hardly any black magic here, but still a measurable win especially for ssse3.
  3. 24 Mar, 2010 3 commits