Skip to content

arm64: filmgrain: Fix overflows in gen_grain

After multiplying two int8_t, the maximum possible output is -128*-128 = 16384. One can't add two such values in an int16_t (even if all the products of all other int8_t combinations can be).

Previously the summing used 16 bit intermediates for the sum of two products and only lengtheted the result to 32 bit when accumulating three or more products.

Before:                    Cortex A53       A72       A73   Apple M1
gen_grain_y_ar1_8bpc_neon:   112598.5   71309.2   74889.8   372.2
gen_grain_y_ar2_8bpc_neon:   139932.4   91442.3   95788.4   387.3
gen_grain_y_ar3_8bpc_neon:   185607.6  115691.6  131655.8   403.0
After:
gen_grain_y_ar1_8bpc_neon:   112968.8   71897.9   76171.2   371.2
gen_grain_y_ar2_8bpc_neon:   142768.8   94517.9   97934.4   387.5
gen_grain_y_ar3_8bpc_neon:   191625.2  121083.0  135975.3   405.6

Merge request reports