1. 23 Apr, 2013 7 commits
  2. 13 Apr, 2013 1 commit
  3. 01 Mar, 2013 1 commit
  4. 26 Feb, 2013 10 commits
    • Stefan Groenroos's avatar
      ARM: update NEON mc_chroma to work with NV12 and re-enable it · 3a8baa0e
      Stefan Groenroos authored
      Up to 10-15% faster overall.
    • Fiona Glaser's avatar
    • Fiona Glaser's avatar
      quant_4x4x4: quant one 8x8 block at a time · 993c81e9
      Fiona Glaser authored
      This reduces overhead and lets us use less branchy code for zigzag, dequant,
      decimate, and so on.
      Reorganize and optimize a lot of macroblock_encode using this new function.
      ~1-2% faster overall.
      Includes NEON and x86 versions of the new function.
      Using larger merged functions like this will also make wider SIMD, like
      AVX2, more effective.
    • Stephen Hutchinson's avatar
      Add AvxSynth support to the AviSynth input module. · 5ee1d03a
      Stephen Hutchinson authored
      Uses dlopen to load AvxSynth on Linux and OS X.
      Allows the use of --demuxer avs for AvxSynth, though the only source filter it
      can currently use is FFMS2.
      Add a local copy of avxsynth_c.h and its dependent headers in extras/ so that
      users don't need to actually have AvxSynth development headers installed to
      enable support for it (mirroring the AviSynth behavior).
      Based on a patch by 0x09 (tab@lavabit.com)
    • Fiona Glaser's avatar
      Eliminate some branchiness in ME/analysis · 7b1301e9
      Fiona Glaser authored
      Faster, fewer branch mispredictions.
    • Fiona Glaser's avatar
      Fix some store forwarding stalls · 7de9a9aa
      Fiona Glaser authored
      There's quite a few others, but most of them don't help to fix or there's no
      easy way to avoid them.
    • Fiona Glaser's avatar
      x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac · 68a6268b
      Fiona Glaser authored
      Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer
      do movddup in the load unit, so it's totally free this way.
      On Sandy Bridge:
      ~6% faster sa8d_satd
      ~5% faster hadamard_ac
      ~9% faster 32-bit satd
      ~2% faster sa8d
    • Fiona Glaser's avatar
      x86: detect Bobcat, improve Atom optimizations, reorganize flags · 5d60b9c9
      Fiona Glaser authored
      The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
      and apply the appropriate flags.
      It also has an extremely slow palignr instruction; create a flag for this to
      avoid massive penalties on palignr-heavy functions.
      Improve Atom function selection and document exactly what the SLOW_ATOM flag
      Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
      optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
      Atom along with other SIMD multiplies.
      Drop TBM detection; it'll probably never be useful for x264.
      Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).
      Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.
    • Oskar Arvidsson's avatar
      x86: combined SA8D/SATD dsp function · 75d92705
      Oskar Arvidsson authored
      Speedup is most apparent for 8-bit (~30%), but gives some improvements
      for 10-bit too (~12%).
      64-bit only for now.
    • Oskar Arvidsson's avatar
      x86: port SSE2+ SATD functions to high bit depth · 790c648d
      Oskar Arvidsson authored
      Makes SATD 20-50% faster across all partition sizes but 4x4.
  5. 25 Feb, 2013 18 commits
  6. 09 Jan, 2013 3 commits