1. 02 Jun, 2010 2 commits
  2. 26 May, 2010 1 commit
    • Fiona Glaser's avatar
      Overhaul deblocking again · 4947b0fb
      Fiona Glaser authored
      Move deblock strength calculation to immediately after encoding to take advantage of the data that's already in cache.
      Keep the deblocking itself as per-row.
      4947b0fb
  3. 21 May, 2010 1 commit
    • Fiona Glaser's avatar
      Rewrite deblock strength calculation, add asm · 2ea35adf
      Fiona Glaser authored
      Rewrite is significantly slower, but is necessary to make asm possible.
      Similar concept to ffmpeg's deblock strength asm.
      Roughly one order of magnitude faster than C.
      Overall, with the asm, saves ~100-300 clocks in deblocking per MB.
      2ea35adf
  4. 23 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Move deblocking/hpel into sliced threads · 60b15814
      Fiona Glaser authored
      Instead of doing both as a separate pass, do them during the main encode.
      This requires disabling deblocking between slices (disable_deblock_idc == 2).
      Overall performance gain is about 11% on --preset superfast with sliced threads.
      Doesn't reduce the amount of actual computation done: only better parallelizes it.
      60b15814
  5. 10 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Cleanup and simplification of macroblock_load · 95df880c
      Fiona Glaser authored
      Doesn't do anything now, but will be useful for many future changes.
      Splitting out neighbour calculation will make MBAFF implementation easier.
      Calculation of neighbour_frame value (actual neighbouring MBs, ignoring slices) will be useful for some future patches.
      95df880c
  6. 05 Apr, 2010 1 commit
    • Fiona Glaser's avatar
      Massive cosmetic and syntax cleanup · 58d2349d
      Fiona Glaser authored
      Convert all applicable loops to use C99 loop index syntax.
      Clean up most inconsistent syntax in ratecontrol.c, visualize, ppc, etc.
      Replace log(x)/log(2) constructs with log2, and similar with log10.
      Fix all -Wshadow violations.
      Fix visualize support.
      58d2349d
  7. 27 Mar, 2010 2 commits
    • Fiona Glaser's avatar
      Overhaul macroblock_cache_rect · 4c03ec69
      Fiona Glaser authored
      Unify the rectangle functions into a single one similar to ffmpeg's fill_rectangle.
      Remove all cases of variable-size cache_rect calls; create a function-pointer-based system for handling such cases.
      Should greatly decrease code size required for such calls.
      4c03ec69
    • Fiona Glaser's avatar
      Make a bunch of small functions ALWAYS_INLINE · 8b4cca0e
      Fiona Glaser authored
      Probably no real effect for now, but needed for the next patch.
      8b4cca0e
  8. 23 Feb, 2010 1 commit
    • Fiona Glaser's avatar
      Much faster and more efficient MVD handling · 5c767904
      Fiona Glaser authored
      Store MV deltas as clipped absolute values.
      This means CABAC no longer has to calculate absolute values in MV context selection.
      This also lets us cut the memory spent on MVDs by a factor of 2, speeding up cache_mvd and reducing memory usage by 32*threads*(num macroblocks) bytes.
      On a Core i7 encoding 1080p, this is about 3 megabytes saved.
      5c767904
  9. 21 Jan, 2010 1 commit
    • Fiona Glaser's avatar
      Various performance optimizations · f5af5f14
      Fiona Glaser authored
      Simplify and compact storage of direct motion vectors, faster --direct auto.
      Shrink various arrays to save a bit of cache.
      Simplify and reorganize B macroblock type writing in CABAC.
      Add some missing ALIGNED macros.
      f5af5f14
  10. 11 Dec, 2009 1 commit
  11. 09 Dec, 2009 1 commit
  12. 12 Nov, 2009 1 commit
    • Fiona Glaser's avatar
      Fix all aliasing violations · 03cb8c09
      Fiona Glaser authored
      New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification.
      GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations.
      Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken.
      As such, add -Wno-strict-aliasing to CFLAGS.
      03cb8c09
  13. 10 Jul, 2009 1 commit
  14. 27 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Much faster chroma encoding and other opts · 83d805fe
      Fiona Glaser authored
      ~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
      Small optimization in cache_save (skip_bp)
      Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
      Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
      83d805fe
  15. 20 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Eliminate support for direct_8x8_inference=0 · 1f0e78d8
      Fiona Glaser authored
      The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
      As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
      Remove some unused mc code related to sub-8x8 partitions.
      Small deblocking speedup when p4x4 is used.
      Also remove unused x264_nal_decode prototype from x264.h.
      1f0e78d8
  16. 23 Dec, 2008 1 commit
  17. 15 Dec, 2008 1 commit
  18. 11 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Much faster CAVLC residual coding · 99448f6c
      Fiona Glaser authored
      Use a VLC table for common levelcodes instead of constructing them on-the-spot
      Branchless version of i_trailing calculation (2x faster on Nehalem)
      Completely remove array_non_zero_count and instead use the count calculated in level/run coding.  Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
      99448f6c
  19. 05 Sep, 2008 1 commit
  20. 16 Aug, 2008 1 commit
  21. 16 Jul, 2008 1 commit
  22. 11 Jul, 2008 1 commit
  23. 10 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Fix and enable I_PCM macroblock support · 6b4ad5f5
      Fiona Glaser authored
      In RD mode, always consider PCM as a macroblock mode possibility
      Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work.
      PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode.
      6b4ad5f5
  24. 06 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Various optimizations and cosmetics · c9c7edf3
      Fiona Glaser authored
      Update AUTHORS file with Gabriel and me
      update XCHG macro to work correctly in if statements
      Add new lookup tables for block_idx and fdec/fenc addresses
      Slightly faster array_non_zero_count_mmx (patch by holger)
      Eliminate branch in analyse_intra
      Unroll loops in and clean up chroma encode
      Convert some for loops to do/while loops for speed improvement
      Do explicit write-combining on --me tesa mvsad_t struct
      Shrink --me esa zero[] array
      Speed up bime by reducing size of visited[][][] array
      c9c7edf3
  25. 04 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Update file headers throughout x264 · bdbd4fe7
      Fiona Glaser authored
      Update "Authors" lists based on actual authorship; highest is most important
      Update copyright notices and remove old CVS tags from file headers
      Add file headers to GTK and other sections missing them
      Update FSF address
      Other header-related cosmetics
      bdbd4fe7
  26. 24 Jun, 2008 1 commit
    • Fiona Glaser's avatar
      Convert NNZ to raster order and other optimizations · ec3d0955
      Fiona Glaser authored
      Converting NNZ to raster order simplifies a lot of the load/store code and allows more use of write-combining.
      More use of write-combining throughout load/save code in common/macroblock.c
      GCC has aliasing issues in the case of stores to 8-bit heap-allocated arrays; dereferencing the pointer once avoids this problem and significantly increases performance.
      More manual loop unrolling and such.
      Move all packXtoY functions to macroblock.h so any function can use them.
      Add pack8to32.
      Minor optimizations to encoder/macroblock.c
      ec3d0955
  27. 15 Jun, 2008 1 commit
  28. 17 May, 2008 1 commit
  29. 24 Apr, 2008 1 commit
  30. 20 Mar, 2008 2 commits
  31. 27 Jan, 2008 1 commit
  32. 20 Dec, 2007 1 commit
  33. 02 Dec, 2007 1 commit
  34. 29 Oct, 2006 1 commit
  35. 10 Oct, 2006 1 commit
  36. 01 Oct, 2006 1 commit
  37. 16 Aug, 2006 1 commit