Improve pixel-a.S Performance by Using SVE/SVE2
Imporve the performance of NEON functions of aarch64/pixel-a.S by using the SVE/SVE2 instruction set. Below, the specific functions are listed together with the improved performance results. Command executed: ./checkasm8 --bench=ssd Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: ssd_4x4_c: 235 ssd_4x4_neon: 226 ssd_4x4_sve: 151 ssd_4x8_c: 409 ssd_4x8_neon: 363 ssd_4x8_sve: 201 ssd_4x16_c: 781 ssd_4x16_neon: 653 ssd_4x16_sve: 313 ssd_8x4_c: 402 ssd_8x4_neon: 192 ssd_8x4_sve: 192 ssd_8x8_c: 728 ssd_8x8_neon: 275 ssd_8x8_sve: 275 Command executed: ./checkasm10 --bench=ssd Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: ssd_4x4_c: 256 ssd_4x4_neon: 226 ssd_4x4_sve: 153 ssd_4x8_c: 460 ssd_4x8_neon: 369 ssd_4x8_sve: 215 ssd_4x16_c: 852 ssd_4x16_neon: 651 ssd_4x16_sve: 340 Command executed: ./checkasm8 --bench=ssd Testbed: AWS Graviton3 Results: ssd_4x4_c: 295 ssd_4x4_neon: 288 ssd_4x4_sve: 228 ssd_4x8_c: 454 ssd_4x8_neon: 431 ssd_4x8_sve: 294 ssd_4x16_c: 779 ssd_4x16_neon: 631 ssd_4x16_sve: 438 ssd_8x4_c: 463 ssd_8x4_neon: 247 ssd_8x4_sve: 246 ssd_8x8_c: 781 ssd_8x8_neon: 413 ssd_8x8_sve: 353 Command executed: ./checkasm10 --bench=ssd Testbed: AWS Graviton3 Results: ssd_4x4_c: 322 ssd_4x4_neon: 335 ssd_4x4_sve: 240 ssd_4x8_c: 522 ssd_4x8_neon: 448 ssd_4x8_sve: 294 ssd_4x16_c: 832 ssd_4x16_neon: 603 ssd_4x16_sve: 440 Command executed: ./checkasm8 --bench=sa8d Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: sa8d_8x8_c: 2103 sa8d_8x8_neon: 619 sa8d_8x8_sve: 617 Command executed: ./checkasm8 --bench=sa8d Testbed: AWS Graviton3 Results: sa8d_8x8_c: 2021 sa8d_8x8_neon: 597 sa8d_8x8_sve: 580 Command executed: ./checkasm8 --bench=var Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: var_8x8_c: 595 var_8x8_neon: 262 var_8x8_sve: 262 var_8x16_c: 1193 var_8x16_neon: 435 var_8x16_sve: 419 Command executed: ./checkasm8 --bench=var Testbed: AWS Graviton3 Results: var_8x8_c: 616 var_8x8_neon: 229 var_8x8_sve: 222 var_8x16_c: 1207 var_8x16_neon: 399 var_8x16_sve: 389 Command executed: ./checkasm8 --bench=hadamard_ac Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: hadamard_ac_8x8_c: 2330 hadamard_ac_8x8_neon: 635 hadamard_ac_8x8_sve: 635 hadamard_ac_8x16_c: 4500 hadamard_ac_8x16_neon: 1152 hadamard_ac_8x16_sve: 1151 hadamard_ac_16x8_c: 4499 hadamard_ac_16x8_neon: 1151 hadamard_ac_16x8_sve: 1150 hadamard_ac_16x16_c: 8812 hadamard_ac_16x16_neon: 2187 hadamard_ac_16x16_sve: 2186 Command executed: ./checkasm8 --bench=hadamard_ac Testbed: AWS Graviton3 Results: hadamard_ac_8x8_c: 2266 hadamard_ac_8x8_neon: 517 hadamard_ac_8x8_sve: 513 hadamard_ac_8x16_c: 4444 hadamard_ac_8x16_neon: 867 hadamard_ac_8x16_sve: 849 hadamard_ac_16x8_c: 4443 hadamard_ac_16x8_neon: 880 hadamard_ac_16x8_sve: 868 hadamard_ac_16x16_c: 8595 hadamard_ac_16x16_neon: 1656 hadamard_ac_16x16_sve: 1622
Loading
Please register or sign in to comment