• Martin Storsjö's avatar
    arm64: cdef: Use a smarter padding constant · 8f8dc928
    Martin Storsjö authored
    Pad with a value which works both as a large unsigned value and a
    negative signed value. This allows doing the max operation using
    signed max, avoiding the conditional altogether.
    
    Based on the same idea for x86 by Kyle Siefring.
    
    Before:                  Cortex A53     A72     A73
    cdef_filter_4x4_8bpc_neon:    645.5   401.9   422.5
    cdef_filter_4x8_8bpc_neon:   1193.7   756.6   782.4
    cdef_filter_8x8_8bpc_neon:   2162.4  1361.9  1375.6
    After:
    cdef_filter_4x4_8bpc_neon:    596.3   377.8   384.8
    cdef_filter_4x8_8bpc_neon:   1097.4   705.5   707.1
    cdef_filter_8x8_8bpc_neon:   1967.4  1232.3  1239.9
    8f8dc928
cdef.S 24.1 KB