Optimize coefficient decoding

Henrik Gramner requested to merge gramner/dav1d:decode_coefs into master

Separate the eob, ac, and dc cases, eliminate some branches, and make some generic integer arithmetic improvements.

Runtime statistics from Chimera 1080p on Skylake-X, before and after:

    12.06%  dav1d   [.] decode_coefs
     9.88%  dav1d   [.] decode_coefs

