Skip to content
  • Martin Storsjö's avatar
    arm32: looprestoration: NEON implementation of wiener filter for 16 bpc · 2c09aaa4
    Martin Storsjö authored
    Checkasm benchmarks:       Cortex A7         A8        A53       A72       A73
    wiener_chroma_10bpc_neon:   385312.5   165772.7   184308.2  122311.2  126050.2
    wiener_chroma_12bpc_neon:   385296.7   165538.0   184438.2  122290.5  126205.3
    wiener_luma_10bpc_neon:     385318.5   165985.3   184147.4  122311.1  126168.4
    wiener_luma_12bpc_neon:     385316.3   165819.1   184484.7  122304.4  125982.4
    
    The corresponding numbers for arm64 for comparison:
                                                    Cortex A53       A72       A73
    wiener_chroma_10bpc_neon:                         176319.7  125992.1  128162.4
    wiener_chroma_12bpc_neon:                         176386.2  125986.4  128343.8
    wiener_luma_10bpc_neon:                           176174.0  126001.7  128227.8
    wiener_luma_12bpc_neon:                           176176.5  125992.1  128204.8
    
    The arm32 version actually seems to run marginally faster than the arm64
    one on A72 and A73. I believe this is because the arm64 code is tuned
    for A53 (which makes it a bit slower on other cores), but the arm32 code
    can't be tuned exactly the same way due to fewer registers being available.
    2c09aaa4
Loading