-
1) using xxpermdi + merge instead of 2 merges improves quant_8x8 performance by 5% 2) use vec_splats instead of vec_splat checkasm timings when compiled with gcc: C: AltiVec: before: after: quant_2x2_dc: 57 163 46 quant_4x4_dc: 141 162 57 dequant_4x4_cmp: 104 101 45 dequant_4x4_flat: 104 106 46 dequant_8x8_cmp: 412 208 147 dequant_8x8_flat: 414 212 149
303c484e