itx: clip according to spec, fixes #103, #158
This does not adjust the AVX2 asm. The asm clips in many places to the required range (16-bit signed) for performance reason. No mismatch observed with coefs generated by the forward transform in checkasm in 10 thousand runs.
Showing with 427 additions and 411 deletions