
Konstantinos Margaritis
authored
Provide implementations for functions using the instructions SDOT/UDOT in the DotProd Armv8 extension. Functions implemented: sad_16x8, sad_16x16, sad_x3_16x8_neon, sad_x3_16x16_neon, sad_x4_16x8_neon, sad_x4_16x16_neon, ssd_8x4, ssd_8x8, ssd_8x16, ssd_16x8, ssd_16x16, pixel_vsad Performance improvement against Neon ranges from 5% to 188%. Following is the output of ./checkasm8 --bench (run on a Graviton4 system): sad_16x8_c: 1323 sad_16x8_neon: 224 sad_16x8_dotprod: 211 sad_16x16_c: 2619 sad_16x16_neon: 365 sad_16x16_dotprod: 320 sad_x3_16x8_c: 3836 sad_x3_16x8_neon: 403 sad_x3_16x8_dotprod: 317 sad_x3_16x16_c: 7725 sad_x3_16x16_neon: 714 sad_x3_16x16_dotprod: 532 sad_x4_16x8_c: 5080 sad_x4_16x8_neon: 438 sad_x4_16x8_dotprod: 375 sad_x4_16x16_c: 10260 sad_x4_16x16_neon: 794 sad_x4_16x16_dotprod: 655 ssd_8x4_c: 381 ssd_8x4_neon: 157 ssd_8x4_dotprod: 115 ssd_8x4_sve: 150 ssd_8x8_c: 695 ssd_8x8_neon: 238 ssd_8x8_dotprod: 161 ssd_8x8_sve: 228 ssd_8x16_c: 1335 ssd_8x16_neon: 388 ssd_8x16_dotprod: 267 ssd_16x8_c: 1342 ssd_16x8_neon: 285 ssd_16x8_dotprod: 166 ssd_16x16_c: 2623 ssd_16x16_neon: 503 ssd_16x16_dotprod: 277 vsad_c: 2786 vsad_neon: 311 vsad_dotprod: 235