    "I have completed additonal SAD implementations (8x16, 16x8 and 16x16)
     using Sparc VIS.  Overall speedup is roughly 90% from straight C.  I'm
     doing development and testing on a Sun Fire V220, with 2 * 1.5ghz
     UltraSPARC-III CPUs.
     I've hand-unrolled each of the loops.  Sun's assembler does not appear
     to have macro functionality built-in and I didn't want to establish an
     external dependancy on m4.  Please let me know if you run into any
     trouble with the patch."
     Patch by Phil Jensen.
