• Fiona Glaser's avatar
    Faster width4 SSD+SATD, SSE4 optimizations · 69e69197
    Fiona Glaser authored
    Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
    Use pinsrd (SSE4) for faster width4 SSD
    Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
    Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
    These optimizations help on Nehalem, Phenom, and Penryn CPUs.
    69e69197
cpu.h 1.94 KB