Detect Atom CPU, enable appropriate asm functions
I'm not going to actually optimize for this pile of garbage unless someone pays me. But it can't hurt to at least enable the correct functions based on benchmarks. Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf.
Showing with 90 additions and 29 deletions