-
Fiona Glaser authored
Do satd 4x8 by transposing the two blocks' positions and running satd 8x4. Use pinsrd (SSE4) for faster width4 SSD Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe) Move mask_misalign declaration to cpu.h to avoid warning in encoder.c. These optimizations help on Nehalem, Phenom, and Penryn CPUs.
69e69197