Skip to content
  • Henrik Gramner's avatar
    x86: Rewrite wiener SSE2/SSSE3/AVX2 asm · 78d27b7d
    Henrik Gramner authored
    The previous implementation did two separate passes in the horizontal
    and vertical directions, with the intermediate values being stored
    in a buffer on the stack. This caused bad cache thrashing.
    
    By interleaving the horizontal and vertical passes in combination
    with a ring buffer for storing only a few rows at a time the
    performance is improved by a significant amount.
    
    Also split the function into 7-tap and 5-tap versions. The latter is
    faster and fairly common (always for chroma, sometimes for luma).
    78d27b7d