• Ronald S. Bultje's avatar
    Make per-width versions of cfl_ac · 70fb01d8
    Ronald S. Bultje authored
    Also use aligned reads and writes in sub_loop, and integrate sum_loop into
    the main loop.
    
    before:
    cfl_ac_420_w4_8bpc_c: 367.4
    cfl_ac_420_w4_8bpc_avx2: 72.8
    cfl_ac_420_w8_8bpc_c: 621.6
    cfl_ac_420_w8_8bpc_avx2: 85.1
    cfl_ac_420_w16_8bpc_c: 983.4
    cfl_ac_420_w16_8bpc_avx2: 141.0
    
    after:
    cfl_ac_420_w4_8bpc_c: 376.2
    cfl_ac_420_w4_8bpc_avx2: 28.5
    cfl_ac_420_w8_8bpc_c: 607.2
    cfl_ac_420_w8_8bpc_avx2: 29.9
    cfl_ac_420_w16_8bpc_c: 962.1
    cfl_ac_420_w16_8bpc_avx2: 48.8
    70fb01d8
Name
Last commit
Last update
doc Loading commit data...
include Loading commit data...
src Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitignore Loading commit data...
.gitlab-ci.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING Loading commit data...
NEWS Loading commit data...
README.md Loading commit data...
THANKS.md Loading commit data...
meson.build Loading commit data...
meson_options.txt Loading commit data...