• Martin Storsjö's avatar
    arm64: cdef: Use a smarter padding constant · 8f8dc928
    Martin Storsjö authored
    Pad with a value which works both as a large unsigned value and a
    negative signed value. This allows doing the max operation using
    signed max, avoiding the conditional altogether.
    
    Based on the same idea for x86 by Kyle Siefring.
    
    Before:                  Cortex A53     A72     A73
    cdef_filter_4x4_8bpc_neon:    645.5   401.9   422.5
    cdef_filter_4x8_8bpc_neon:   1193.7   756.6   782.4
    cdef_filter_8x8_8bpc_neon:   2162.4  1361.9  1375.6
    After:
    cdef_filter_4x4_8bpc_neon:    596.3   377.8   384.8
    cdef_filter_4x8_8bpc_neon:   1097.4   705.5   707.1
    cdef_filter_8x8_8bpc_neon:   1967.4  1232.3  1239.9
    8f8dc928
Name
Last commit
Last update
doc Loading commit data...
include Loading commit data...
snap Loading commit data...
src Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitignore Loading commit data...
.gitlab-ci.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING Loading commit data...
NEWS Loading commit data...
README.md Loading commit data...
THANKS.md Loading commit data...
meson.build Loading commit data...
meson_options.txt Loading commit data...