Skip to content
  • David Chen's avatar
    Improve pixel-a.S Performance by Using SVE/SVE2 · c1c9931d
    David Chen authored
    Imporve the performance of NEON functions of aarch64/pixel-a.S
    by using the SVE/SVE2 instruction set. Below, the specific functions
    are listed together with the improved performance results.
    
    Command executed: ./checkasm8 --bench=ssd
    Testbed: Alibaba g8y instance based on Yitian 710 CPU
    Results:
    ssd_4x4_c: 235
    ssd_4x4_neon: 226
    ssd_4x4_sve: 151
    ssd_4x8_c: 409
    ssd_4x8_neon: 363
    ssd_4x8_sve: 201
    ssd_4x16_c: 781
    ssd_4x16_neon: 653
    ssd_4x16_sve: 313
    ssd_8x4_c: 402
    ssd_8x4_neon: 192
    ssd_8x4_sve: 192
    ssd_8x8_c: 728
    ssd_8x8_neon: 275
    ssd_8x8_sve: 275
    
    Command executed: ./checkasm10 --bench=ssd
    Testbed: Alibaba g8y instance based on Yitian 710 CPU
    Results:
    ssd_4x4_c: 256
    ssd_4x4_neon: 226
    ssd_4x4_sve: 153
    ssd_4x8_c: 460
    ssd_4x8_neon: 369
    ssd_4x8_sve: 215
    ssd_4x16_c: 852
    ssd_4x16_neon: 651
    ssd_4x16_sve: 340
    
    Command executed: ./checkasm8 --bench=ssd
    Testbed: AWS Graviton3
    Results:
    ssd_4x4_c: 295
    ssd_4x4_neon: 288
    ssd_4x4_sve: 228
    ssd_4x8_c: 454
    ssd_4x8_neon: 431
    ssd_4x8_sve: 294
    ssd_4x16_c: 779
    ssd_4x16_neon: 631
    ssd_4x16_sve: 438
    ssd_8x4_c: 463
    ssd_8x4_neon: 247
    ssd_8x4_sve: 246
    ssd_8x8_c: 781
    ssd_8x8_neon: 413
    ssd_8x8_sve: 353
    
    Command executed: ./checkasm10 --bench=ssd
    Testbed: AWS Graviton3
    Results:
    ssd_4x4_c: 322
    ssd_4x4_neon: 335
    ssd_4x4_sve: 240
    ssd_4x8_c: 522
    ssd_4x8_neon: 448
    ssd_4x8_sve: 294
    ssd_4x16_c: 832
    ssd_4x16_neon: 603
    ssd_4x16_sve: 440
    
    Command executed: ./checkasm8 --bench=sa8d
    Testbed: Alibaba g8y instance based on Yitian 710 CPU
    Results:
    sa8d_8x8_c: 2103
    sa8d_8x8_neon: 619
    sa8d_8x8_sve: 617
    
    Command executed: ./checkasm8 --bench=sa8d
    Testbed: AWS Graviton3
    Results:
    sa8d_8x8_c: 2021
    sa8d_8x8_neon: 597
    sa8d_8x8_sve: 580
    
    Command executed: ./checkasm8 --bench=var
    Testbed: Alibaba g8y instance based on Yitian 710 CPU
    Results:
    var_8x8_c: 595
    var_8x8_neon: 262
    var_8x8_sve: 262
    var_8x16_c: 1193
    var_8x16_neon: 435
    var_8x16_sve: 419
    
    Command executed: ./checkasm8 --bench=var
    Testbed: AWS Graviton3
    Results:
    var_8x8_c: 616
    var_8x8_neon: 229
    var_8x8_sve: 222
    var_8x16_c: 1207
    var_8x16_neon: 399
    var_8x16_sve: 389
    
    Command executed: ./checkasm8 --bench=hadamard_ac
    Testbed: Alibaba g8y instance based on Yitian 710 CPU
    Results:
    hadamard_ac_8x8_c: 2330
    hadamard_ac_8x8_neon: 635
    hadamard_ac_8x8_sve: 635
    hadamard_ac_8x16_c: 4500
    hadamard_ac_8x16_neon: 1152
    hadamard_ac_8x16_sve: 1151
    hadamard_ac_16x8_c: 4499
    hadamard_ac_16x8_neon: 1151
    hadamard_ac_16x8_sve: 1150
    hadamard_ac_16x16_c: 8812
    hadamard_ac_16x16_neon: 2187
    hadamard_ac_16x16_sve: 2186
    
    Command executed: ./checkasm8 --bench=hadamard_ac
    Testbed: AWS Graviton3
    Results:
    hadamard_ac_8x8_c: 2266
    hadamard_ac_8x8_neon: 517
    hadamard_ac_8x8_sve: 513
    hadamard_ac_8x16_c: 4444
    hadamard_ac_8x16_neon: 867
    hadamard_ac_8x16_sve: 849
    hadamard_ac_16x8_c: 4443
    hadamard_ac_16x8_neon: 880
    hadamard_ac_16x8_sve: 868
    hadamard_ac_16x16_c: 8595
    hadamard_ac_16x16_neon: 1656
    hadamard_ac_16x16_sve: 1622
    c1c9931d