Fix buffer overflow in 64x16 ssse3 idct

With frame threading enabled the code could previously clobber the
coefficients of the next block.

Update the checkasm test to check for this.
