blend, blend_h, blend_v pwr9 implementation
| fn | Time | Speedup |
|---|---|---|
| blend_w4_8bpc_pwr9 | 14.4 | 1.90x |
| blend_w8_8bpc_pwr9 | 19.9 | 3.62x |
| blend_w16_8bpc_pwr9 | 50.6 | 5.17x |
| blend_w32_8bpc_pwr9 | 125.8 | 5.33x |
| blend_h_w2_8bpc_pwr9 | 18.4 | 1.20x |
| blend_h_w4_8bpc_pwr9 | 27.2 | 1.26x |
| blend_h_w8_8bpc_pwr9 | 27.9 | 2.22x |
| blend_h_w16_8bpc_pwr9 | 35.1 | 3.28x |
| blend_h_w32_8bpc_pwr9 | 57.4 | 3.88x |
| blend_h_w64_8bpc_pwr9 | 97.9 | 4.70x |
| blend_h_w128_8bpc_pwr9 | 207.6 | 5.18x |
| blend_v_w2_8bpc_pwr9 | 25.0 | 1.12x |
| blend_v_w4_8bpc_pwr9 | 79.3 | 1.35x |
| blend_v_w8_8bpc_pwr9 | 79.5 | 2.43x |
| blend_v_w16_8bpc_pwr9 | 108.0 | 3.58x |
| blend_v_w32_8bpc_pwr9 | 153.5 | 4.69x |
Edited by Luca Barbato