1. 04 Feb, 2012 7 commits
  2. 18 Jan, 2012 1 commit
  3. 15 Jan, 2012 12 commits
  4. 08 Dec, 2011 1 commit
  5. 01 Dec, 2011 1 commit
    • Loren Merritt's avatar
      x86inc: AVX symmetry optimization · f3a7517c
      Loren Merritt authored
      3-arg AVX ops with a memory arg can only have it in src2,
      whereas SSE emulation of 3-arg prefers to have it in src1 (i.e. the move).
      So, if the op is symmetric and the wrong one is memory, swap them.
      Eliminates redundant moves in some cases when using 3-operand without AVX with memory arguments.
      Also fix movss and movsd in some cases, and flag shufps correctly as float.
      f3a7517c
  6. 22 Oct, 2011 7 commits
  7. 21 Sep, 2011 6 commits
    • Fiona Glaser's avatar
      Optimize x86 asm for Intel macro-op fusion · 2701440c
      Fiona Glaser authored
      That is, place all loop counter tests right before their conditional jumps.
      2701440c
    • Fiona Glaser's avatar
      Some initial 4:2:2 x86 asm · 389b401a
      Fiona Glaser authored
      389b401a
    • Henrik Gramner's avatar
      4:2:2 encoding support · 5b0cb86f
      Henrik Gramner authored
      5b0cb86f
    • Loren Merritt's avatar
      SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9) · 3d82e875
      Loren Merritt authored
      i4x4 analysis cycles (per partition):
      penryn   sandybridge
      184-> 75  157-> 54  preset=superfast (sad)
      281->165  225->124  preset=faster    (satd with early termination)
      332->165  263->124  preset=medium
      379->165  297->124  preset=slower    (satd without early termination)
      
      This is the first code in x264 that intentionally produces different behavior
      on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
      directions, whereas the old code (on fast presets) may early terminate after
      checking only some of them. There is no systematic difference on slow presets,
      though they still occasionally disagree about tiebreaks.
      
      For ease of debugging, add an option "--cpu-independent" to disable satd_x9
      and any analogous future code.
      3d82e875
    • Loren Merritt's avatar
      Faster intra_mbcmp_x3 for versions without dedicated asm · e184ff26
      Loren Merritt authored
      Select asm subroutines more intelligently in the wrapper functions.
      e184ff26
    • Loren Merritt's avatar
      Optimize x86 intra_predict_4x4 and 8x8 · d94edd73
      Loren Merritt authored
      High bit depth Penryn, Sandybridge cycles:
      4x4_ddl: 11->10,  9-> 8
      4x4_ddr: 15->13, 12->11
      4x4_hd:        , 15->12
      4x4_hu:        , 14->13
      4x4_vr:  15->14, 14->12
      8x8_ddl: 32->19, 19->14
      8x8_ddr: 42->19, 21->14
      8x8_hd:        , 15->13
      8x8_hu:  21->17, 16->12
      8x8_vr:  33->19,
      
      8-bit Penryn, Sandybridge cycles:
      4x4_ddr: 24->15,
      4x4_hd:  24->16,
      4x4_hu:  23->15,
      4x4_vr:  23->16,
      4x4_vl:  10-> 9,
      8x8_ddl: 23->15,
      8x8_hd:        , 17->14
      8x8_hu:        , 15->14
      8x8_vr:  20->16, 17->13
      d94edd73
  8. 24 Aug, 2011 4 commits
  9. 10 Aug, 2011 1 commit