-
Holger Lubitz authored
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs. 16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit) Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD. Overall performance boost is up to ~15% on 64-bit Conroe.
54e38917