Commits on Source (8)
-
David Chen authored
Place NEON dct-a macros that are intended to be used by SVE/SVE2 functions as well in a common file.
b6190c6f -
David Chen authored
Imporve the performance of NEON functions of aarch64/dct-a.S by using the SVE/SVE2 instruction set. Below, the specific functions are listed together with the improved performance results. Command executed: ./checkasm8 --bench=sub Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: sub4x4_dct_c: 528 sub4x4_dct_neon: 322 sub4x4_dct_sve: 247 Command executed: ./checkasm8 --bench=sub Testbed: AWS Graviton3 Results: sub4x4_dct_c: 562 sub4x4_dct_neon: 376 sub4x4_dct_sve: 255 Command executed: ./checkasm8 --bench=add Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: add4x4_idct_c: 698 add4x4_idct_neon: 386 add4x4_idct_sve2: 345 Command executed: ./checkasm8 --bench=zigzag Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: zigzag_interleave_8x8_cavlc_frame_c: 582 zigzag_interleave_8x8_cavlc_frame_neon: 273 zigzag_interleave_8x8_cavlc_frame_sve: 257 Command executed: ./checkasm8 --bench=zigzag Testbed: AWS Graviton3 Results: zigzag_interleave_8x8_cavlc_frame_c: 587 zigzag_interleave_8x8_cavlc_frame_neon: 257 zigzag_interleave_8x8_cavlc_frame_sve: 249
5c382660 -
David Chen authored
Place NEON deblock-a macros that are intended to be used by SVE/SVE2 functions as well in a common file.
37949a99 -
David Chen authored
Imporve the performance of NEON functions of aarch64/deblock-a.S by using the SVE/SVE2 instruction set. Below, the specific functions are listed together with the improved performance results. Command executed: ./checkasm8 --bench=deblock Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: deblock_chroma[1]_c: 735 deblock_chroma[1]_neon: 427 deblock_chroma[1]_sve: 353 Command executed: ./checkasm8 --bench=deblock Testbed: AWS Graviton3 Results: deblock_chroma[1]_c: 719 deblock_chroma[1]_neon: 442 deblock_chroma[1]_sve: 345
5ad5e5d8 -
David Chen authored
Place NEON mc-a macros and functions that are intended to be used by SVE/SVE2 functions as well in a common file.
21a788f1 -
David Chen authored
Imporve the performance of NEON functions of aarch64/mc-a.S by using the SVE/SVE2 instruction set. Below, the specific functions are listed together with the improved performance results. Command executed: ./checkasm8 --bench=avg Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: avg_4x2_c: 274 avg_4x2_neon: 215 avg_4x2_sve: 171 avg_4x4_c: 461 avg_4x4_neon: 343 avg_4x4_sve: 225 avg_4x8_c: 806 avg_4x8_neon: 619 avg_4x8_sve: 334 avg_4x16_c: 1523 avg_4x16_neon: 1168 avg_4x16_sve: 558 Command executed: ./checkasm8 --bench=avg Testbed: AWS Graviton3 Results: avg_4x2_c: 267 avg_4x2_neon: 213 avg_4x2_sve: 167 avg_4x4_c: 467 avg_4x4_neon: 350 avg_4x4_sve: 221 avg_4x8_c: 784 avg_4x8_neon: 624 avg_4x8_sve: 302 avg_4x16_c: 1445 avg_4x16_neon: 1182 avg_4x16_sve: 485
06dcf3f9 -
David Chen authored
Place NEON pixel-a macros and constants that are intended to be used by SVE/SVE2 functions as well in a common file.
0ac52d29 -
David Chen authored
Imporve the performance of NEON functions of aarch64/pixel-a.S by using the SVE/SVE2 instruction set. Below, the specific functions are listed together with the improved performance results. Command executed: ./checkasm8 --bench=ssd Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: ssd_4x4_c: 235 ssd_4x4_neon: 226 ssd_4x4_sve: 151 ssd_4x8_c: 409 ssd_4x8_neon: 363 ssd_4x8_sve: 201 ssd_4x16_c: 781 ssd_4x16_neon: 653 ssd_4x16_sve: 313 ssd_8x4_c: 402 ssd_8x4_neon: 192 ssd_8x4_sve: 192 ssd_8x8_c: 728 ssd_8x8_neon: 275 ssd_8x8_sve: 275 Command executed: ./checkasm10 --bench=ssd Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: ssd_4x4_c: 256 ssd_4x4_neon: 226 ssd_4x4_sve: 153 ssd_4x8_c: 460 ssd_4x8_neon: 369 ssd_4x8_sve: 215 ssd_4x16_c: 852 ssd_4x16_neon: 651 ssd_4x16_sve: 340 Command executed: ./checkasm8 --bench=ssd Testbed: AWS Graviton3 Results: ssd_4x4_c: 295 ssd_4x4_neon: 288 ssd_4x4_sve: 228 ssd_4x8_c: 454 ssd_4x8_neon: 431 ssd_4x8_sve: 294 ssd_4x16_c: 779 ssd_4x16_neon: 631 ssd_4x16_sve: 438 ssd_8x4_c: 463 ssd_8x4_neon: 247 ssd_8x4_sve: 246 ssd_8x8_c: 781 ssd_8x8_neon: 413 ssd_8x8_sve: 353 Command executed: ./checkasm10 --bench=ssd Testbed: AWS Graviton3 Results: ssd_4x4_c: 322 ssd_4x4_neon: 335 ssd_4x4_sve: 240 ssd_4x8_c: 522 ssd_4x8_neon: 448 ssd_4x8_sve: 294 ssd_4x16_c: 832 ssd_4x16_neon: 603 ssd_4x16_sve: 440 Command executed: ./checkasm8 --bench=sa8d Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: sa8d_8x8_c: 2103 sa8d_8x8_neon: 619 sa8d_8x8_sve: 617 Command executed: ./checkasm8 --bench=sa8d Testbed: AWS Graviton3 Results: sa8d_8x8_c: 2021 sa8d_8x8_neon: 597 sa8d_8x8_sve: 580 Command executed: ./checkasm8 --bench=var Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: var_8x8_c: 595 var_8x8_neon: 262 var_8x8_sve: 262 var_8x16_c: 1193 var_8x16_neon: 435 var_8x16_sve: 419 Command executed: ./checkasm8 --bench=var Testbed: AWS Graviton3 Results: var_8x8_c: 616 var_8x8_neon: 229 var_8x8_sve: 222 var_8x16_c: 1207 var_8x16_neon: 399 var_8x16_sve: 389 Command executed: ./checkasm8 --bench=hadamard_ac Testbed: Alibaba g8y instance based on Yitian 710 CPU Results: hadamard_ac_8x8_c: 2330 hadamard_ac_8x8_neon: 635 hadamard_ac_8x8_sve: 635 hadamard_ac_8x16_c: 4500 hadamard_ac_8x16_neon: 1152 hadamard_ac_8x16_sve: 1151 hadamard_ac_16x8_c: 4499 hadamard_ac_16x8_neon: 1151 hadamard_ac_16x8_sve: 1150 hadamard_ac_16x16_c: 8812 hadamard_ac_16x16_neon: 2187 hadamard_ac_16x16_sve: 2186 Command executed: ./checkasm8 --bench=hadamard_ac Testbed: AWS Graviton3 Results: hadamard_ac_8x8_c: 2266 hadamard_ac_8x8_neon: 517 hadamard_ac_8x8_sve: 513 hadamard_ac_8x16_c: 4444 hadamard_ac_8x16_neon: 867 hadamard_ac_8x16_sve: 849 hadamard_ac_16x8_c: 4443 hadamard_ac_16x8_neon: 880 hadamard_ac_16x8_sve: 868 hadamard_ac_16x16_c: 8595 hadamard_ac_16x16_neon: 1656 hadamard_ac_16x16_sve: 1622
c1c9931d
Showing
- Makefile 10 additions, 1 deletionMakefile
- common/aarch64/dct-a-common.S 40 additions, 0 deletionscommon/aarch64/dct-a-common.S
- common/aarch64/dct-a-sve.S 88 additions, 0 deletionscommon/aarch64/dct-a-sve.S
- common/aarch64/dct-a-sve2.S 89 additions, 0 deletionscommon/aarch64/dct-a-sve2.S
- common/aarch64/dct-a.S 1 addition, 11 deletionscommon/aarch64/dct-a.S
- common/aarch64/dct.h 9 additions, 0 deletionscommon/aarch64/dct.h
- common/aarch64/deblock-a-common.S 43 additions, 0 deletionscommon/aarch64/deblock-a-common.S
- common/aarch64/deblock-a-sve.S 98 additions, 0 deletionscommon/aarch64/deblock-a-sve.S
- common/aarch64/deblock-a.S 1 addition, 14 deletionscommon/aarch64/deblock-a.S
- common/aarch64/deblock.h 3 additions, 0 deletionscommon/aarch64/deblock.h
- common/aarch64/mc-a-common.S 66 additions, 0 deletionscommon/aarch64/mc-a-common.S
- common/aarch64/mc-a-sve.S 108 additions, 0 deletionscommon/aarch64/mc-a-sve.S
- common/aarch64/mc-a.S 1 addition, 23 deletionscommon/aarch64/mc-a.S
- common/aarch64/mc-c.c 72 additions, 57 deletionscommon/aarch64/mc-c.c
- common/aarch64/pixel-a-common.S 44 additions, 0 deletionscommon/aarch64/pixel-a-common.S
- common/aarch64/pixel-a-sve.S 523 additions, 0 deletionscommon/aarch64/pixel-a-sve.S
- common/aarch64/pixel-a.S 1 addition, 15 deletionscommon/aarch64/pixel-a.S
- common/aarch64/pixel.h 31 additions, 1 deletioncommon/aarch64/pixel.h
- common/dct.c 18 additions, 0 deletionscommon/dct.c
- common/deblock.c 6 additions, 0 deletionscommon/deblock.c
common/aarch64/dct-a-common.S
0 → 100644
common/aarch64/dct-a-sve.S
0 → 100644
common/aarch64/dct-a-sve2.S
0 → 100644
common/aarch64/deblock-a-common.S
0 → 100644
common/aarch64/deblock-a-sve.S
0 → 100644
common/aarch64/mc-a-common.S
0 → 100644
common/aarch64/mc-a-sve.S
0 → 100644
common/aarch64/pixel-a-common.S
0 → 100644
common/aarch64/pixel-a-sve.S
0 → 100644