x86/filmgrain: simplify post-horizontal filter blending
This commit makes a handful of minor changes:
- in horizontal blending, use
shufpsorvpblendd. If we change fewer pixels than can be used as one source operand for the given instruction (8 or 4 bytes), we abuse0,32as a edge/cur pair weight, so that the resulting blended register contains an unmodified cur grain. This replaces more complicatedvpblendw + vpblenddorpand/pandn/porblending combinations. - for scaling LUTs, always use
psrldinstead ofpand, since the latter requires a register.
Edited by Ronald S. Bultje