mc_tmpl: w_mask Induces reduction in register usage
Use distributive law to reduce register usage There's around 7% (9% max) performance gain
Why not have fancy LaTeX? \begin{aligned} & tmp1 \cdot m + tmp2 \cdot (64 - m) \\ &= tmp1 \cdot m + tmp2 \cdot 64 - tmp2 \cdot m \\ &= tmp1 \cdot m - tmp2 \cdot m + 64 \cdot tmp2 \\ &= (tmp1 - tmp2) \cdot m + 64 \cdot tmp2 \end{aligned}
Tested on AMD HX370
I think we don't need last commit actually.
Function | Before | After | % |
---------------------------------------------------------------------
w_mask_420_w4_8bpc_c | 335.3 | 312.6 | 6.78 |
w_mask_420_w4_16bpc_c | 354.5 | 326.4 | 7.94 |
w_mask_420_w8_8bpc_c | 1056.4 | 979.3 | 7.30 |
w_mask_420_w8_16bpc_c | 1068.2 | 996.4 | 6.73 |
w_mask_420_w16_8bpc_c | 3416.1 | 3169.6 | 7.22 |
w_mask_420_w16_16bpc_c | 3435.4 | 3218.0 | 6.34 |
w_mask_420_w32_8bpc_c | 13479.7 | 12550.0 | 6.91 |
w_mask_420_w32_16bpc_c | 13833.3 | 12632.7 | 8.68 |
w_mask_420_w64_8bpc_c | 32557.6 | 30166.7 | 7.35 |
w_mask_420_w64_16bpc_c | 32529.8 | 30407.0 | 6.54 |
w_mask_420_w128_8bpc_c | 81802.8 | 75856.5 | 7.27 |
w_mask_420_w128_16bpc_c | 81187.8 | 76133.9 | 6.23 |
w_mask_422_w4_8bpc_c | 331.3 | 327.1 | 1.27 |
w_mask_422_w4_16bpc_c | 365.1 | 341.2 | 6.53 |
w_mask_422_w8_8bpc_c | 1052.7 | 1003.5 | 4.68 |
w_mask_422_w8_16bpc_c | 1095.9 | 1022.6 | 6.69 |
w_mask_422_w16_8bpc_c | 3479.8 | 3248.8 | 6.67 |
w_mask_422_w16_16bpc_c | 3504.2 | 3279.5 | 6.41 |
w_mask_422_w32_8bpc_c | 13702.5 | 12801.4 | 6.58 |
w_mask_422_w32_16bpc_c | 13738.9 | 12830.5 | 6.61 |
w_mask_422_w64_8bpc_c | 32517.9 | 30818.0 | 5.23 |
w_mask_422_w64_16bpc_c | 33199.4 | 30865.3 | 7.03 |
w_mask_422_w128_8bpc_c | 82867.1 | 77978.7 | 5.90 |
w_mask_422_w128_16bpc_c | 84937.9 | 77629.8 | 8.60 |
w_mask_444_w4_8bpc_c | 340.4 | 315.6 | 7.28 |
w_mask_444_w4_16bpc_c | 361.6 | 335.0 | 7.35 |
w_mask_444_w8_8bpc_c | 1057.6 | 988.9 | 6.50 |
w_mask_444_w8_16bpc_c | 1104.3 | 1030.8 | 6.67 |
w_mask_444_w16_8bpc_c | 3414.4 | 3180.7 | 6.85 |
w_mask_444_w16_16bpc_c | 3477.4 | 3182.4 | 8.48 |
w_mask_444_w32_8bpc_c | 13455.8 | 12469.4 | 7.33 |
w_mask_444_w32_16bpc_c | 13666.9 | 12378.8 | 9.42 |
w_mask_444_w64_8bpc_c | 33587.2 | 31239.7 | 7.00 |
w_mask_444_w64_16bpc_c | 34283.3 | 30969.5 | 9.67 |
w_mask_444_w128_8bpc_c | 82084.2 | 76206.3 | 7.16 |
w_mask_444_w128_16bpc_c | 82649.4 | 75166.4 | 8.91 |
---------------------------------------------------------------------
avg | - | - | 6.95 |
Edited by Sungjoon Moon