Improve mc-a.S Performance by Using SVE/SVE2
Imporve the performance of NEON functions of aarch64/mc-a.S by using the SVE/SVE2 instruction set. Below, the specific functions are listed together with the improved performance results. Command executed: ./checkasm8 --bench=avg Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: avg_4x2_c: 274 avg_4x2_neon: 215 avg_4x2_sve: 171 avg_4x4_c: 461 avg_4x4_neon: 343 avg_4x4_sve: 225 avg_4x8_c: 806 avg_4x8_neon: 619 avg_4x8_sve: 334 avg_4x16_c: 1523 avg_4x16_neon: 1168 avg_4x16_sve: 558 Command executed: ./checkasm8 --bench=avg Testbed: AWS Graviton3 Results: avg_4x2_c: 267 avg_4x2_neon: 213 avg_4x2_sve: 167 avg_4x4_c: 467 avg_4x4_neon: 350 avg_4x4_sve: 221 avg_4x8_c: 784 avg_4x8_neon: 624 avg_4x8_sve: 302 avg_4x16_c: 1445 avg_4x16_neon: 1182 avg_4x16_sve: 485
Loading
Please register or sign in to comment