• Martin Storsjö's avatar
    arm64: looprestoration: NEON optimized wiener filter · 513dfa99
    Martin Storsjö authored
    The relative speedup compared to C code is around 4.2 for a Cortex A53
    and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code),
    6-7x compared to GCC's output without autovectorization, and ~8x
    compared to clang's output (which doesn't seem to try to vectorize
    this function).
    513dfa99
Name
Last commit
Last update
..
32 Loading commit data...
64 Loading commit data...
asm.S Loading commit data...
cpu.c Loading commit data...
cpu.h Loading commit data...
looprestoration_init_tmpl.c Loading commit data...
mc_init_tmpl.c Loading commit data...