-
Martin Storsjö authored
Checkasm benchmarks: Cortex A7 A8 A53 A72 A73 wiener_chroma_10bpc_neon: 385312.5 165772.7 184308.2 122311.2 126050.2 wiener_chroma_12bpc_neon: 385296.7 165538.0 184438.2 122290.5 126205.3 wiener_luma_10bpc_neon: 385318.5 165985.3 184147.4 122311.1 126168.4 wiener_luma_12bpc_neon: 385316.3 165819.1 184484.7 122304.4 125982.4 The corresponding numbers for arm64 for comparison: Cortex A53 A72 A73 wiener_chroma_10bpc_neon: 176319.7 125992.1 128162.4 wiener_chroma_12bpc_neon: 176386.2 125986.4 128343.8 wiener_luma_10bpc_neon: 176174.0 126001.7 128227.8 wiener_luma_12bpc_neon: 176176.5 125992.1 128204.8 The arm32 version actually seems to run marginally faster than the arm64 one on A72 and A73. I believe this is because the arm64 code is tuned for A53 (which makes it a bit slower on other cores), but the arm32 code can't be tuned exactly the same way due to fewer registers being available.
2c09aaa4
Loading