• Jonathan Wright's avatar
    AArch64 Neon: Use CMLT instead of SSHR to compute sign · 4e412738
    Jonathan Wright authored and Martin Storsjö's avatar Martin Storsjö committed
    The CMLT instruction has twice the throughput of SSHR on all modern
    out-of-order Arm cores. The Software Optimization Guides (SWOG) for
    the Cortex-A76, Cortex-A77 and Neoverse-N1 cores are being updated to
    reflect this. (The current version of the SWOG for these cores states
    that CMLT and SSHR both have the same execution throughput.)
    This patch changes all instances of sign computation to use CMLT
    instead of SSHR.
    Change-Id: Ice5747fee4e3bdd98ae8fbc036d735f55e492249