Skip to content
  • Martin Storsjö's avatar
    arm64: looprestoration: Add a NEON implementation of SGR · 313717da
    Martin Storsjö authored
    Relative speedup vs (autovectorized) C code:
                          Cortex A53    A72    A73
    selfguided_3x3_8bpc_neon:   2.91   2.12   2.68
    selfguided_5x5_8bpc_neon:   3.18   2.65   3.39
    selfguided_mix_8bpc_neon:   3.04   2.29   2.98
    
    The relative speedup vs non-vectorized C code is around 2.6-4.6x.
    313717da