1. 23 Aug, 2009 1 commit
    • David Conrad's avatar
      GSOC merge part 2: ARM stack alignment · ca7da1ae
      David Conrad authored
      Neither GCC nor ARMCC support 16 byte stack alignment despite the fact that NEON loads require it.
      These macros only work for arrays, but fortunately that covers almost all instances of stack alignment in x264.
      ca7da1ae
  2. 26 Jun, 2009 2 commits
  3. 22 Jun, 2009 1 commit
    • Fiona Glaser's avatar
      Various CABAC optimizations and cleanups · 90bec46b
      Fiona Glaser authored
      Faster CABAC CBF context calculation for inter blocks.
      Add x264_constant_p(), will probably be useful in the future as well.
      Simpler subpartition functions.
      Clean up and optimize mvd_cpn a bit more.
      Various other minor optimizations.
      90bec46b
  4. 19 Jun, 2009 1 commit
  5. 24 May, 2009 2 commits
  6. 10 May, 2009 1 commit
    • Fiona Glaser's avatar
      More CABAC and CAVLC optimizations · 094a4edf
      Fiona Glaser authored
      Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
      Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
      Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.
      094a4edf
  7. 09 Apr, 2009 1 commit
    • Fiona Glaser's avatar
      Various CABAC optimizations · 2bcc39fd
      Fiona Glaser authored
      Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
      Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size.  Inlining only cache_mvd is faster and significantly smaller.
      2bcc39fd
  8. 05 Apr, 2009 1 commit
    • Fiona Glaser's avatar
      Faster CABAC RDO · be3c3d21
      Fiona Glaser authored
      Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding,
      it's faster to use a branch than a cmov.
      be3c3d21
  9. 16 Feb, 2009 1 commit
  10. 11 Feb, 2009 1 commit
  11. 09 Feb, 2009 1 commit
  12. 03 Feb, 2009 1 commit
  13. 30 Jan, 2009 1 commit
    • Fiona Glaser's avatar
      Massive overhaul of nnz/cbp calculation · e394bd60
      Fiona Glaser authored
      Modify quantization to also calculate array_non_zero.
      PPC assembly changes by gpoirior.
      New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
      Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
      Also add new i16x16 DC-only iDCT with asm.
      Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
      Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
      Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
      Overall performance increase 0-6% depending on encoding settings.
      e394bd60
  14. 28 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Much faster CABAC RDO · 406a40dc
      Fiona Glaser authored
      Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO.
      This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts.
      However, the PSNR penalty of this is extremely small (~0.001db).
      Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20.
      Overall encoding speed benefit is up to 5%, depending on encoding settings.
      Also remove an old unnecessary CABAC table that hasn't been used for years.
      406a40dc
  15. 23 Dec, 2008 1 commit
  16. 11 Dec, 2008 1 commit
    • Fiona Glaser's avatar
      Much faster CAVLC residual coding · 99448f6c
      Fiona Glaser authored
      Use a VLC table for common levelcodes instead of constructing them on-the-spot
      Branchless version of i_trailing calculation (2x faster on Nehalem)
      Completely remove array_non_zero_count and instead use the count calculated in level/run coding.  Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
      99448f6c
  17. 29 Nov, 2008 1 commit
  18. 28 Nov, 2008 2 commits
    • Fiona Glaser's avatar
      10L in r1041 · df72b08c
      Fiona Glaser authored
      df72b08c
    • Fiona Glaser's avatar
      Significantly faster CABAC and CAVLC residual coding and bit cost calculation · c1d73389
      Fiona Glaser authored
      Early-terminate in residual writing using stored nnz counts
      To allow the above, store nnz counts for luma and chroma DC
      Add assembly functions to find the last nonzero coefficient in a block
      Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
      Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
      CAVLC output should be equivalent.
      c1d73389
  19. 07 Nov, 2008 1 commit
  20. 27 Oct, 2008 1 commit
    • Fiona Glaser's avatar
      Optimize CABAC bit cost calculation · e09f55cc
      Fiona Glaser authored
      Speed up cabac mvd and add new precalculated transition/entropy table.
      Add "noup" function for cabac operations to not update the state table when it isn't necessary.
      1-3% faster macroblock_size_cabac.
      Cosmetics
      e09f55cc
  21. 22 Oct, 2008 2 commits
  22. 02 Oct, 2008 1 commit
    • Fiona Glaser's avatar
      Rework subme system, add RD refinement in B-frames · 60455fff
      Fiona Glaser authored
      The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.
      subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent
      --b-rdo has, accordingly, been removed.  --bime has also been removed, and instead enabled automatically at subme >= 5.
      RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.
      60455fff
  23. 30 Aug, 2008 1 commit
  24. 21 Aug, 2008 1 commit
  25. 30 Jul, 2008 2 commits
    • Fiona Glaser's avatar
      Fix regression in r922 · ff7639b0
      Fiona Glaser authored
      set the chroma DC coefficients to zero for residual coding in qpel-rd
      fix C99ism
      ff7639b0
    • Fiona Glaser's avatar
      Improve intra RD refine, speed up residual_write_cabac · 63b84fa4
      Fiona Glaser authored
      a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs
      proper nnz handling added to cabac intra rd refine
      chroma cbp added to 8x8 chroma rd
      cbp was tested, but wasn't useful
      63b84fa4
  26. 10 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Fix and enable I_PCM macroblock support · 6b4ad5f5
      Fiona Glaser authored
      In RD mode, always consider PCM as a macroblock mode possibility
      Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work.
      PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode.
      6b4ad5f5
  27. 04 Jul, 2008 1 commit
    • Fiona Glaser's avatar
      Update file headers throughout x264 · bdbd4fe7
      Fiona Glaser authored
      Update "Authors" lists based on actual authorship; highest is most important
      Update copyright notices and remove old CVS tags from file headers
      Add file headers to GTK and other sections missing them
      Update FSF address
      Other header-related cosmetics
      bdbd4fe7
  28. 11 Jun, 2008 1 commit
  29. 20 May, 2008 2 commits
  30. 17 May, 2008 2 commits
  31. 27 Apr, 2008 1 commit
  32. 25 Mar, 2008 2 commits