1. 09 Jan, 2013 1 commit
  2. 17 Jul, 2012 1 commit
  3. 04 Feb, 2012 1 commit
  4. 15 Oct, 2011 1 commit
  5. 21 Sep, 2011 1 commit
  6. 09 Aug, 2011 1 commit
    • Loren Merritt's avatar
      Remove some unused, broken, and/or useless functions · 52f287e8
      Loren Merritt authored
      Unused frame_sort.
      Unused x86_64 dequant_4x4dc_mmx2, predict_8x8_vr_mmx2.
      Unused and broken high_depth integral_init*h_sse4, optimize_chroma_*, dequant_flat_*, sub8x8_dct_dc_*, zigzag_sub_*.
      Useless high_depth dequant_sse4, dequant_dc_sse4.
      52f287e8
  7. 22 Jul, 2011 1 commit
  8. 10 Jul, 2011 1 commit
  9. 12 May, 2011 2 commits
  10. 25 Jan, 2011 1 commit
  11. 25 Nov, 2010 1 commit
  12. 19 Nov, 2010 1 commit
  13. 18 Nov, 2010 1 commit
  14. 18 Sep, 2010 1 commit
    • Fiona Glaser's avatar
      Update source file headers · 213a99d0
      Fiona Glaser authored
      Update dates, improve file descriptions, make things more consistent.
      Also add information about commercial licensing.
      213a99d0
  15. 15 Jul, 2010 1 commit
    • Loren Merritt's avatar
      Convert x264 to use NV12 pixel format internally · 387828ed
      Loren Merritt authored
      ~1% faster overall on Conroe, mostly due to improved cache locality.
      Also allows improved SIMD on some chroma functions (e.g. deblock).
      This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
      This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.
      
      Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
      387828ed
  16. 04 Jul, 2010 1 commit
    • Oskar Arvidsson's avatar
      Support for 9 and 10-bit encoding · c91f43a4
      Oskar Arvidsson authored
      Output bit depth is specified on compilation time via --bit-depth.
      There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
      Input is still 8-bit only; this will change in the future.
      
      Note that very few H.264 decoders support >8 bit depth currently.
      Also note that the quantizer scale differs for higher bit depth.  For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.
      c91f43a4
  17. 09 Jun, 2010 1 commit
  18. 02 Jun, 2010 2 commits
  19. 26 May, 2010 1 commit
    • Fiona Glaser's avatar
      Overhaul deblocking again · 4947b0fb
      Fiona Glaser authored
      Move deblock strength calculation to immediately after encoding to take advantage of the data that's already in cache.
      Keep the deblocking itself as per-row.
      4947b0fb
  20. 21 May, 2010 1 commit
    • Fiona Glaser's avatar
      Rewrite deblock strength calculation, add asm · 2ea35adf
      Fiona Glaser authored
      Rewrite is significantly slower, but is necessary to make asm possible.
      Similar concept to ffmpeg's deblock strength asm.
      Roughly one order of magnitude faster than C.
      Overall, with the asm, saves ~100-300 clocks in deblocking per MB.
      2ea35adf
  21. 23 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Move deblocking/hpel into sliced threads · 60b15814
      Fiona Glaser authored
      Instead of doing both as a separate pass, do them during the main encode.
      This requires disabling deblocking between slices (disable_deblock_idc == 2).
      Overall performance gain is about 11% on --preset superfast with sliced threads.
      Doesn't reduce the amount of actual computation done: only better parallelizes it.
      60b15814
  22. 10 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Cleanup and simplification of macroblock_load · 95df880c
      Fiona Glaser authored
      Doesn't do anything now, but will be useful for many future changes.
      Splitting out neighbour calculation will make MBAFF implementation easier.
      Calculation of neighbour_frame value (actual neighbouring MBs, ignoring slices) will be useful for some future patches.
      95df880c
  23. 05 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Massive cosmetic and syntax cleanup · 58d2349d
      Fiona Glaser authored
      Convert all applicable loops to use C99 loop index syntax.
      Clean up most inconsistent syntax in ratecontrol.c, visualize, ppc, etc.
      Replace log(x)/log(2) constructs with log2, and similar with log10.
      Fix all -Wshadow violations.
      Fix visualize support.
      58d2349d
  24. 27 Mar, 2010 2 commits
    • Fiona Glaser's avatar
      Overhaul macroblock_cache_rect · 4c03ec69
      Fiona Glaser authored
      Unify the rectangle functions into a single one similar to ffmpeg's fill_rectangle.
      Remove all cases of variable-size cache_rect calls; create a function-pointer-based system for handling such cases.
      Should greatly decrease code size required for such calls.
      4c03ec69
    • Fiona Glaser's avatar
      Make a bunch of small functions ALWAYS_INLINE · 8b4cca0e
      Fiona Glaser authored
      Probably no real effect for now, but needed for the next patch.
      8b4cca0e
  25. 23 Feb, 2010 1 commit
    • Fiona Glaser's avatar
      Much faster and more efficient MVD handling · 5c767904
      Fiona Glaser authored
      Store MV deltas as clipped absolute values.
      This means CABAC no longer has to calculate absolute values in MV context selection.
      This also lets us cut the memory spent on MVDs by a factor of 2, speeding up cache_mvd and reducing memory usage by 32*threads*(num macroblocks) bytes.
      On a Core i7 encoding 1080p, this is about 3 megabytes saved.
      5c767904
  26. 21 Jan, 2010 1 commit
    • Fiona Glaser's avatar
      Various performance optimizations · f5af5f14
      Fiona Glaser authored
      Simplify and compact storage of direct motion vectors, faster --direct auto.
      Shrink various arrays to save a bit of cache.
      Simplify and reorganize B macroblock type writing in CABAC.
      Add some missing ALIGNED macros.
      f5af5f14
  27. 11 Dec, 2009 1 commit
  28. 09 Dec, 2009 1 commit
  29. 12 Nov, 2009 1 commit
    • Fiona Glaser's avatar
      Fix all aliasing violations · 03cb8c09
      Fiona Glaser authored
      New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification.
      GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations.
      Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken.
      As such, add -Wno-strict-aliasing to CFLAGS.
      03cb8c09
  30. 10 Jul, 2009 1 commit
  31. 27 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Much faster chroma encoding and other opts · 83d805fe
      Fiona Glaser authored
      ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
      Small optimization in cache_save (skip_bp)
      Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
      Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
      83d805fe
  32. 20 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Eliminate support for direct_8x8_inference=0 · 1f0e78d8
      Fiona Glaser authored
      The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
      As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
      Remove some unused mc code related to sub-8x8 partitions.
      Small deblocking speedup when p4x4 is used.
      Also remove unused x264_nal_decode prototype from x264.h.
      1f0e78d8
  33. 23 Dec, 2008 1 commit
  34. 15 Dec, 2008 1 commit
  35. 11 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Much faster CAVLC residual coding · 99448f6c
      Fiona Glaser authored
      Use a VLC table for common levelcodes instead of constructing them on-the-spot
      Branchless version of i_trailing calculation (2x faster on Nehalem)
      Completely remove array_non_zero_count and instead use the count calculated in level/run coding.  Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
      99448f6c
  36. 05 Sep, 2008 1 commit
  37. 16 Aug, 2008 1 commit