## Transforms aren't compliant

In the standard, the add and sub steps require clamping, but currently there is no clamping in dav1d's code.

As an example, take:

```
inv_dct4_1d(const coef *const in, const ptrdiff_t in_s,
coef *const out, const ptrdiff_t out_s)
{
const int in0 = in[0 * in_s], in1 = in[1 * in_s];
const int in2 = in[2 * in_s], in3 = in[3 * in_s];
int t0 = ((in0 + in2) * 2896 + 2048) >> 12;
int t1 = ((in0 - in2) * 2896 + 2048) >> 12;
int t2 = (in1 * 1567 - in3 * 3784 + 2048) >> 12;
int t3 = (in1 * 3784 + in3 * 1567 + 2048) >> 12;
out[0 * out_s] = t0 + t3;
out[1 * out_s] = t1 + t2;
out[2 * out_s] = t1 - t2;
out[3 * out_s] = t0 - t3;
}
```

This section requires clamping to r bits (see spec).

```
out[0 * out_s] = t0 + t3;
out[1 * out_s] = t1 + t2;
out[2 * out_s] = t1 - t2;
out[3 * out_s] = t0 - t3;
```

But this section doesn't. The spec requires compliant streams/videos to fit the results (t0/t1/t2/t3) in integers of r bits.

```
int t0 = ((in0 + in2) * 2896 + 2048) >> 12;
int t1 = ((in0 - in2) * 2896 + 2048) >> 12;
int t2 = (in1 * 1567 - in3 * 3784 + 2048) >> 12;
int t3 = (in1 * 3784 + in3 * 1567 + 2048) >> 12;
```

`inv_txfm_add_c`

also requires additional clamping. Looking at libaom's code I see two places with additional clamping. I'm only seeing a reference to one of them in the spec.

Between the row and column transforms, Residual[ i ][ j ] is set equal to Clip3( - ( 1 << ( colClampRange - 1 ) ), ( 1 << ( colClampRange - 1 ) ) - 1, Residual[ i ][ j ] ) for i = 0..(h-1), for j = 0..(w-1).

The avx2 code also seems non-compliant. Notably, it calls adst in from dct in some cases. I don't believe this should work because of how dct and adst work (and due to abs int min != int max).