PowerPC optimization roadmap
Optimizing x264 library for PowerPC architecture puts it on par with other architectures that offer a close to optimal performance for library's users. This issue lists a pivotal changes to optimize x264 library for PowerPC architecture to keep it on track with x86 and Arm architectures.
-
Build an infrastructure to use assembly language for function implementation !43 -
Handle endianness variants by macros defined in asm.S
!43 -
Implement non-optimized functions (not implemented using C Intrinsic) !43 -
Use Power ISA v3.0 for further optimizations -
Optimize additional functions
Using Power ISA v3.0:
Macro | Power ISA v3.0 instruction |
---|---|
LOAD_16_BYTE | lxvb16x (Load VSX Vector Byte*16 Indexed) |
STORE_16_BYTE | stxvb16x (Store VSX Vector Byte*16 Indexed) |
LOAD_8_HALFWORD | lxvh8x (Load VSX Vector Halfword*8 Indexed) |
STORE_8_HALFWORD | stxvh8x (Store VSX Vector Halfword*8 Indexed) |
LOAD_8_BYTE_H | lxvll (Load VSX Vector with Length Left-justified) |
STORE_8_BYTE_H | stxvll (Store VSX Vector with Length Left-justified) |
LOAD_4_BYTE_H | lxvll (Load VSX Vector with Length Left-justified) |
STORE_4_BYTE_H | stxvll (Store VSX Vector with Length Left-justified) |
ABS_BYTE | vabsdub (Vector Absolute Difference Unsigned Byte) |
ABS_HALFWORD | vabsduh (Vector Absolute Difference Unsigned Halfword) |
Function | Power ISA v3.0 instruction |
---|---|
zigzag_scan_8x8_field | vinserth (Vector Insert Halfword from VSR using immediate-specified index) |
zigzag_sub_8x8 | vinsertb (Vector Insert Byte from VSR using immediate-specified index) |
zigzag_sub_4x4_field_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
zigzag_sub_4x4ac_field_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
zigzag_sub_4x4_frame_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
zigzag_sub_4x4ac_frame_neon | vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector) |
pixel_ssd_nv12_core_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
pixel_var2_8x8_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
pixel_var2_8x16_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
pixel_vsad_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
deblock_v_chroma_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
deblock_h_chroma_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
deblock_h_chroma_422_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
deblock_h_chroma_mbaff_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
deblock_v_chroma_intra_altivec | vabsdub (Vector Absolute Difference Unsigned Byte) |
decimate_score64_altivec | cnttzd (Count Trailing Zeros Doubleword) |
coeff_last15_altivec | cnttzd (Count Trailing Zeros Doubleword) |
coeff_last16_altivec | cnttzd (Count Trailing Zeros Doubleword) |
coeff_last64_altivec | vctzw (Vector Count Trailing Zeros Word) |
coeff_level_run8_altivec | cnttzd (Count Trailing Zeros Doubleword) |
coeff_level_run15_altivec | cnttzd (Count Trailing Zeros Doubleword) |
coeff_level_run16_altivec | cnttzd (Count Trailing Zeros Doubleword) |
Implementing additional functions:
Function |
---|
zigzag_scan_8x8_field |
zigzag_sub_8x8 |
mbtree_fix8_pack |
mbtree_fix8_unpack |
deblock_v_luma_intra |
deblock_h_luma_intra |
deblock_v8_luma |
deblock_v8_luma_intra |
predict_8x16c_p |
dequant_4x4dc |