1. 10 Jan, 2011 1 commit
  2. 14 Dec, 2010 2 commits
  3. 07 Dec, 2010 1 commit
  4. 25 Nov, 2010 1 commit
  5. 20 Nov, 2010 1 commit
  6. 19 Nov, 2010 2 commits
  7. 31 Oct, 2010 2 commits
  8. 10 Oct, 2010 1 commit
  9. 18 Sep, 2010 1 commit
    • Fiona Glaser's avatar
      Update source file headers · 213a99d0
      Fiona Glaser authored
      Update dates, improve file descriptions, make things more consistent.
      Also add information about commercial licensing.
      213a99d0
  10. 15 Jul, 2010 1 commit
    • Loren Merritt's avatar
      Convert x264 to use NV12 pixel format internally · 387828ed
      Loren Merritt authored
      ~1% faster overall on Conroe, mostly due to improved cache locality.
      Also allows improved SIMD on some chroma functions (e.g. deblock).
      This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
      This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.
      
      Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
      387828ed
  11. 04 Jul, 2010 1 commit
    • Oskar Arvidsson's avatar
      Support for 9 and 10-bit encoding · c91f43a4
      Oskar Arvidsson authored
      Output bit depth is specified on compilation time via --bit-depth.
      There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
      Input is still 8-bit only; this will change in the future.
      
      Note that very few H.264 decoders support >8 bit depth currently.
      Also note that the quantizer scale differs for higher bit depth.  For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.
      c91f43a4
  12. 15 Jun, 2010 1 commit
    • Holger Lubitz's avatar
      Faster mbtree_propagate asm · 15501e34
      Holger Lubitz authored
      Replace fp division by multiply with the reciprocal.
      Only ~12% faster on penryn, but over 80% faster on amd k8.
      Also make checkasm slightly more tolerant to rounding error.
      15501e34
  13. 09 Jun, 2010 3 commits
  14. 02 Jun, 2010 3 commits
  15. 26 May, 2010 2 commits
    • Fiona Glaser's avatar
      Overhaul deblocking again · 4947b0fb
      Fiona Glaser authored
      Move deblock strength calculation to immediately after encoding to take advantage of the data that's already in cache.
      Keep the deblocking itself as per-row.
      4947b0fb
    • Fiona Glaser's avatar
      Detect Atom CPU, enable appropriate asm functions · 57729402
      Fiona Glaser authored
      I'm not going to actually optimize for this pile of garbage unless someone pays me.
      But it can't hurt to at least enable the correct functions based on benchmarks.
      
      Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf.
      57729402
  16. 21 May, 2010 1 commit
    • Fiona Glaser's avatar
      Rewrite deblock strength calculation, add asm · 2ea35adf
      Fiona Glaser authored
      Rewrite is significantly slower, but is necessary to make asm possible.
      Similar concept to ffmpeg's deblock strength asm.
      Roughly one order of magnitude faster than C.
      Overall, with the asm, saves ~100-300 clocks in deblocking per MB.
      2ea35adf
  17. 17 May, 2010 1 commit
    • Fiona Glaser's avatar
      Overhaul CABAC: faster, less cache usage · 3267f35a
      Fiona Glaser authored
      Horribly munge up the CABAC tables to allow deduplication of some data.
      Saves 256 bytes of L1d cache in non-RD, 512 bytes in RD.
      Add asm versions of bypass and terminal; save L1i cache by re-using putbyte code.
      Further optimize encode_decision.
      All 3 primary CABAC functions fit in under 256 bytes of code total on x86_64.
      3267f35a
  18. 06 May, 2010 2 commits
    • Anton Mitrofanov's avatar
      More cosmetics · 54e784fd
      Anton Mitrofanov authored
      54e784fd
    • Fiona Glaser's avatar
      Deduplicate asm constants, automate name prefixing · 311c4bb1
      Fiona Glaser authored
      Auto-prefix global constants with x264_ in cextern.
      Eliminate x264_ prefix from asm files; automate it in cglobal.
      Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm).
      Remove x264_emms() entirely on non-x86 (don't even call an empty function).
      Add cextern_naked for a non-prefixed cextern (used in checkasm).
      311c4bb1
  19. 23 Apr, 2010 1 commit
  20. 05 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Massive cosmetic and syntax cleanup · 58d2349d
      Fiona Glaser authored
      Convert all applicable loops to use C99 loop index syntax.
      Clean up most inconsistent syntax in ratecontrol.c, visualize, ppc, etc.
      Replace log(x)/log(2) constructs with log2, and similar with log10.
      Fix all -Wshadow violations.
      Fix visualize support.
      58d2349d
  21. 15 Feb, 2010 2 commits
  22. 17 Nov, 2009 1 commit
    • Fiona Glaser's avatar
      Faster weightp analysis · 63f71477
      Fiona Glaser authored
      Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd.
      Various minor cosmetics.
      63f71477
  23. 09 Nov, 2009 2 commits
    • Loren Merritt's avatar
      cosmetics · df732ec7
      Loren Merritt authored
      df732ec7
    • Dylan Yudaken's avatar
      Weighted P-frame prediction · ccac8546
      Dylan Yudaken authored
      Merge Dylan's Google Summer of Code 2009 tree.
      Detect fades and use weighted prediction to improve compression and quality.
      "Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
      "Smart", the default mode, also performs fade detection and decides weights accordingly.
      MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
      If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
      However, it will be used to adjust quality instead of create actual weights.
      This will improve quality in fades when encoding in Baseline profile.
      
      Doesn't add support for interlaced encoding with weightp yet.
      Only adds support for luma weights, not chroma weights.
      Internal code for chroma weights is in, but there's no analysis yet.
      Baseline profile requires that weightp be off.
      All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
      "Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.
      
      Thanks to Google for sponsoring our most successful Summer of Code yet!
      ccac8546
  24. 12 Oct, 2009 1 commit
    • Loren Merritt's avatar
      change all dct arrays to 1d. · 1fbba0ca
      Loren Merritt authored
      the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct.
      this fixes a miscompilation in x264_mb_optimize_chroma_dc.
      1fbba0ca
  25. 28 Aug, 2009 1 commit
  26. 23 Aug, 2009 1 commit
    • David Conrad's avatar
      GSOC merge part 2: ARM stack alignment · ca7da1ae
      David Conrad authored
      Neither GCC nor ARMCC support 16 byte stack alignment despite the fact that NEON loads require it.
      These macros only work for arrays, but fortunately that covers almost all instances of stack alignment in x264.
      ca7da1ae
  27. 20 Aug, 2009 2 commits
  28. 09 Aug, 2009 1 commit