-
Ronald S. Bultje authored
Also use aligned reads and writes in sub_loop, and integrate sum_loop into the main loop. before: cfl_ac_420_w4_8bpc_c: 367.4 cfl_ac_420_w4_8bpc_avx2: 72.8 cfl_ac_420_w8_8bpc_c: 621.6 cfl_ac_420_w8_8bpc_avx2: 85.1 cfl_ac_420_w16_8bpc_c: 983.4 cfl_ac_420_w16_8bpc_avx2: 141.0 after: cfl_ac_420_w4_8bpc_c: 376.2 cfl_ac_420_w4_8bpc_avx2: 28.5 cfl_ac_420_w8_8bpc_c: 607.2 cfl_ac_420_w8_8bpc_avx2: 29.9 cfl_ac_420_w16_8bpc_c: 962.1 cfl_ac_420_w16_8bpc_avx2: 48.8
70fb01d8