1. 26 Feb, 2013 4 commits
    • Fiona Glaser's avatar
      x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac · 68a6268b
      Fiona Glaser authored
      Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer
      do movddup in the load unit, so it's totally free this way.
      
      On Sandy Bridge:
      ~6% faster sa8d_satd
      ~5% faster hadamard_ac
      ~9% faster 32-bit satd
      ~2% faster sa8d
      68a6268b
    • Fiona Glaser's avatar
      x86: detect Bobcat, improve Atom optimizations, reorganize flags · 5d60b9c9
      Fiona Glaser authored
      The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
      and apply the appropriate flags.
      
      It also has an extremely slow palignr instruction; create a flag for this to
      avoid massive penalties on palignr-heavy functions.
      
      Improve Atom function selection and document exactly what the SLOW_ATOM flag
      covers.
      
      Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
      optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
      Atom along with other SIMD multiplies.
      
      Drop TBM detection; it'll probably never be useful for x264.
      
      Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).
      
      Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.
      5d60b9c9
    • Oskar Arvidsson's avatar
      x86: combined SA8D/SATD dsp function · 75d92705
      Oskar Arvidsson authored
      Speedup is most apparent for 8-bit (~30%), but gives some improvements
      for 10-bit too (~12%).
      64-bit only for now.
      75d92705
    • Oskar Arvidsson's avatar
      x86: port SSE2+ SATD functions to high bit depth · 790c648d
      Oskar Arvidsson authored
      Makes SATD 20-50% faster across all partition sizes but 4x4.
      790c648d
  2. 25 Feb, 2013 18 commits
  3. 09 Jan, 2013 4 commits
  4. 08 Jan, 2013 5 commits
  5. 12 Dec, 2012 1 commit
  6. 06 Dec, 2012 3 commits
  7. 19 Nov, 2012 1 commit
  8. 12 Nov, 2012 1 commit
  9. 08 Nov, 2012 1 commit
  10. 07 Nov, 2012 2 commits
    • Fiona Glaser's avatar
      Attempt to optimize PPS pic_init_qp in 2-pass mode · 1580a74e
      Fiona Glaser authored
      Small compression improvement; up to ~0.5% in extreme cases.
      Helps more with small slice sizes (tiny resolutions or slice-max-size).
      Note that this changes the 2-pass stats file format.
      1580a74e
    • Fiona Glaser's avatar
      Improve slice header QP selection · b304a7ca
      Fiona Glaser authored
      Use the first macroblock of each slice instead of the last of the previous.
      Lets us pick a reasonable initial QP for the first slice too.
      Slightly improved compression.
      b304a7ca