Move deblocking/hpel into sliced threads
Instead of doing both as a separate pass, do them during the main encode. This requires disabling deblocking between slices (disable_deblock_idc == 2). Overall performance gain is about 11% on --preset superfast with sliced threads. Doesn't reduce the amount of actual computation done: only better parallelizes it.
Showing with 146 additions and 93 deletions