ppc: Improve SATD by using vec_extract
Around 1-10% speedup of intra_satd_x3 and satd_ functions by using vec_extract instead of vec_splat and vec_ste. Microbenchmark results: Power 8: satd_4x4_altivec: 148 ==> satd_4x4_altivec: 87 satd_4x8_altivec: 186 ==> satd_4x8_altivec: 128 satd_8x4_altivec: 177 ==> satd_8x4_altivec: 114 satd_8x8_altivec: 188 ==> satd_8x8_altivec: 136 satd_8x16_altivec: 300 ==> satd_8x16_altivec: 262 satd_16x8_altivec: 269 ==> satd_16x8_altivec: 228 satd_16x16_altivec: 517 ==> satd_16x16_altivec: 485 intra_satd_x3_4x4_altivec: 528 ==> intra_satd_x3_4x4_altivec: 444 intra_satd_x3_8x8c_altivec: 679 ==> intra_satd_x3_8x8c_altivec: 593 intra_satd_x3_16x16_altivec: 1815 ==> intra_satd_x3_16x16_altivec: 1724 Power 9: satd_4x4_altivec: 131 ==> satd_4x4_altivec: 113 satd_4x8_altivec: 175 ==> satd_4x8_altivec: 155 satd_8x4_altivec: 150 ==> satd_8x4_altivec: 135 satd_8x8_altivec: 174 ==> satd_8x8_altivec: 161 satd_8x16_altivec: 290 ==> satd_8x16_altivec: 277 satd_16x8_altivec: 272 ==> satd_16x8_altivec: 270 satd_16x16_altivec: 563 ==> satd_16x16_altivec: 566 intra_satd_x3_4x4_altivec: 424 ==> intra_satd_x3_4x4_altivec: 400 intra_satd_x3_8x8c_altivec: 687 ==> intra_satd_x3_8x8c_altivec: 616 intra_satd_x3_16x16_altivec: 2047 ==> intra_satd_x3_16x16_altivec: 2062
parent
0bf44a4c
Loading
Loading
Pipeline
#8236
passed
with stages
in
12 minutes and 20 seconds
Loading
Please register or sign in to comment