• Ronald S. Bultje's avatar
    Make per-width versions of cfl_ac · 70fb01d8
    Ronald S. Bultje authored
    Also use aligned reads and writes in sub_loop, and integrate sum_loop into
    the main loop.
    
    before:
    cfl_ac_420_w4_8bpc_c: 367.4
    cfl_ac_420_w4_8bpc_avx2: 72.8
    cfl_ac_420_w8_8bpc_c: 621.6
    cfl_ac_420_w8_8bpc_avx2: 85.1
    cfl_ac_420_w16_8bpc_c: 983.4
    cfl_ac_420_w16_8bpc_avx2: 141.0
    
    after:
    cfl_ac_420_w4_8bpc_c: 376.2
    cfl_ac_420_w4_8bpc_avx2: 28.5
    cfl_ac_420_w8_8bpc_c: 607.2
    cfl_ac_420_w8_8bpc_avx2: 29.9
    cfl_ac_420_w16_8bpc_c: 962.1
    cfl_ac_420_w16_8bpc_avx2: 48.8
    70fb01d8
ipred.c 8.8 KB