PowerPC optimization roadmap
Optimizing x264 library for PowerPC architecture puts it on par with other architectures that offer a close to optimal performance for library's users. This issue lists a pivotal changes to optimize x264 library for PowerPC architecture to keep it on track with x86 and Arm architectures.
-
Build an infrastructure to use assembly language for function implementation !43 -
Handle endianness variants by macros defined in asm.S!43 -
Implement non-optimized functions (not implemented using C Intrinsic) !43 -
Use Power ISA v3.0 for further optimizations -
Optimize additional functions
Using Power ISA v3.0:
| Macro | Power ISA v3.0 instruction |
|---|---|
| LOAD_16_BYTE | lxvb16x (Load VSX Vector Byte*16 Indexed) |
| STORE_16_BYTE | stxvb16x (Store VSX Vector Byte*16 Indexed) |
| LOAD_8_HALFWORD | lxvh8x (Load VSX Vector Halfword*8 Indexed) |
| STORE_8_HALFWORD | stxvh8x (Store VSX Vector Halfword*8 Indexed) |
| LOAD_8_BYTE_H | lxvll (Load VSX Vector with Length Left-justified) |
| STORE_8_BYTE_H | stxvll (Store VSX Vector with Length Left-justified) |
| LOAD_4_BYTE_H | lxvll (Load VSX Vector with Length Left-justified) |
| STORE_4_BYTE_H | stxvll (Store VSX Vector with Length Left-justified) |
| ABS_BYTE | vabsdub (Vector Absolute Difference Unsigned Byte) |
| ABS_HALFWORD | vabsduh (Vector Absolute Difference Unsigned Halfword) |
| Function | Power ISA v3.0 instruction |
|---|---|
| zigzag_scan_8x8_field | vinserth (Vector Insert Halfword from VSR using immediate-specified index) |
| zigzag_sub_8x8 | vinsertb (Vector Insert Byte from VSR using immediate-specified index) |
| zigzag_sub_4x4_field_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
| zigzag_sub_4x4ac_field_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
| zigzag_sub_4x4_frame_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
| zigzag_sub_4x4ac_frame_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
| pixel_ssd_nv12_core_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| pixel_var2_8x8_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| pixel_var2_8x16_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| pixel_vsad_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| deblock_v_chroma_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| deblock_h_chroma_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| deblock_h_chroma_422_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| deblock_h_chroma_mbaff_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| deblock_v_chroma_intra_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
| decimate_score64_altivec | cnttzd (Count Trailing Zeros Doubleword) |
| coeff_last15_altivec | cnttzd (Count Trailing Zeros Doubleword) |
| coeff_last16_altivec | cnttzd (Count Trailing Zeros Doubleword) |
| coeff_last64_altivec | vctzw (Vector Count Trailing Zeros Word) |
| coeff_level_run8_altivec | cnttzd (Count Trailing Zeros Doubleword) |
| coeff_level_run15_altivec | cnttzd (Count Trailing Zeros Doubleword) |
| coeff_level_run16_altivec | cnttzd (Count Trailing Zeros Doubleword) |
Implementing additional functions:
| Function |
|---|
| zigzag_scan_8x8_field |
| zigzag_sub_8x8 |
| mbtree_fix8_pack |
| mbtree_fix8_unpack |
| deblock_v_luma_intra |
| deblock_h_luma_intra |
| deblock_v8_luma |
| deblock_v8_luma_intra |
| predict_8x16c_p |
| dequant_4x4dc |
Edited by Mamone Tarsha