• Fiona Glaser's avatar
    Phenom CPU optimizations · 80ea99c0
    Fiona Glaser authored
    Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
    Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
    Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
    Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
    Merge cpu-32.asm and cpu-64.asm
    Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
    80ea99c0
cpu.c 9.16 KB