Skip to content
  • Martin Storsjö's avatar
    arm64: looprestoration: NEON optimized wiener filter · 513dfa99
    Martin Storsjö authored
    The relative speedup compared to C code is around 4.2 for a Cortex A53
    and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code),
    6-7x compared to GCC's output without autovectorization, and ~8x
    compared to clang's output (which doesn't seem to try to vectorize
    this function).
    513dfa99