Skip to content
  • Holger Lubitz's avatar
    Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT · 54e38917
    Holger Lubitz authored
    Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
    16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
    Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
    Overall performance boost is up to ~15% on 64-bit Conroe.
    54e38917