Skip to content
Snippets Groups Projects
Konstantinos Margaritis's avatar
Konstantinos Margaritis authored
Provide implementations for functions using the instructions SDOT/UDOT in the DotProd Armv8 extension.

Functions implemented:
sad_16x8, sad_16x16,
sad_x3_16x8_neon, sad_x3_16x16_neon,
sad_x4_16x8_neon, sad_x4_16x16_neon,
ssd_8x4, ssd_8x8, ssd_8x16, ssd_16x8, ssd_16x16,
pixel_vsad

Performance improvement against Neon ranges from 5% to 188%.
Following is the output of ./checkasm8 --bench (run on a Graviton4 system):

sad_16x8_c: 1323
sad_16x8_neon: 224
sad_16x8_dotprod: 211
sad_16x16_c: 2619
sad_16x16_neon: 365
sad_16x16_dotprod: 320
sad_x3_16x8_c: 3836
sad_x3_16x8_neon: 403
sad_x3_16x8_dotprod: 317
sad_x3_16x16_c: 7725
sad_x3_16x16_neon: 714
sad_x3_16x16_dotprod: 532
sad_x4_16x8_c: 5080
sad_x4_16x8_neon: 438
sad_x4_16x8_dotprod: 375
sad_x4_16x16_c: 10260
sad_x4_16x16_neon: 794
sad_x4_16x16_dotprod: 655
ssd_8x4_c: 381
ssd_8x4_neon: 157
ssd_8x4_dotprod: 115
ssd_8x4_sve: 150
ssd_8x8_c: 695
ssd_8x8_neon: 238
ssd_8x8_dotprod: 161
ssd_8x8_sve: 228
ssd_8x16_c: 1335
ssd_8x16_neon: 388
ssd_8x16_dotprod: 267
ssd_16x8_c: 1342
ssd_16x8_neon: 285
ssd_16x8_dotprod: 166
ssd_16x16_c: 2623
ssd_16x16_neon: 503
ssd_16x16_dotprod: 277
vsad_c: 2786
vsad_neon: 311
vsad_dotprod: 235
fe9e4a7f