• Loren Merritt's avatar
    SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9) · 3d82e875
    Loren Merritt authored
    i4x4 analysis cycles (per partition):
    penryn   sandybridge
    184-> 75  157-> 54  preset=superfast (sad)
    281->165  225->124  preset=faster    (satd with early termination)
    332->165  263->124  preset=medium
    379->165  297->124  preset=slower    (satd without early termination)
    
    This is the first code in x264 that intentionally produces different behavior
    on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
    directions, whereas the old code (on fast presets) may early terminate after
    checking only some of them. There is no systematic difference on slow presets,
    though they still occasionally disagree about tiebreaks.
    
    For ease of debugging, add an option "--cpu-independent" to disable satd_x9
    and any analogous future code.
    3d82e875
common.c 45.2 KB