Skip to content

PowerPC optimization roadmap

Optimizing x264 library for PowerPC architecture puts it on par with other architectures that offer a close to optimal performance for library's users. This issue lists a pivotal changes to optimize x264 library for PowerPC architecture to keep it on track with x86 and Arm architectures.

  • Build an infrastructure to use assembly language for function implementation !43
  • Handle endianness variants by macros defined in asm.S !43
  • Implement non-optimized functions (not implemented using C Intrinsic) !43
  • Use Power ISA v3.0 for further optimizations
  • Optimize additional functions

Using Power ISA v3.0:

Macro Power ISA v3.0 instruction
LOAD_16_BYTE lxvb16x (Load VSX Vector Byte*16 Indexed)
STORE_16_BYTE stxvb16x (Store VSX Vector Byte*16 Indexed)
LOAD_8_HALFWORD lxvh8x (Load VSX Vector Halfword*8 Indexed)
STORE_8_HALFWORD stxvh8x (Store VSX Vector Halfword*8 Indexed)
LOAD_8_BYTE_H lxvll (Load VSX Vector with Length Left-justified)
STORE_8_BYTE_H stxvll (Store VSX Vector with Length Left-justified)
LOAD_4_BYTE_H lxvll (Load VSX Vector with Length Left-justified)
STORE_4_BYTE_H stxvll (Store VSX Vector with Length Left-justified)
ABS_BYTE vabsdub (Vector Absolute Difference Unsigned Byte)
ABS_HALFWORD vabsduh (Vector Absolute Difference Unsigned Halfword)
Function Power ISA v3.0 instruction
zigzag_scan_8x8_field vinserth (Vector Insert Halfword from VSR using immediate-specified index)
zigzag_sub_8x8 vinsertb (Vector Insert Byte from VSR using immediate-specified index)
zigzag_sub_4x4_field_neon vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
zigzag_sub_4x4ac_field_neon vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
zigzag_sub_4x4_frame_neon vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
zigzag_sub_4x4ac_frame_neon vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
pixel_ssd_nv12_core_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
pixel_var2_8x8_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
pixel_var2_8x16_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
pixel_vsad_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_v_chroma_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_h_chroma_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_h_chroma_422_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_h_chroma_mbaff_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_v_chroma_intra_altivec vabsdub (Vector Absolute Difference Unsigned Byte)
decimate_score64_altivec cnttzd (Count Trailing Zeros Doubleword)
coeff_last15_altivec cnttzd (Count Trailing Zeros Doubleword)
coeff_last16_altivec cnttzd (Count Trailing Zeros Doubleword)
coeff_last64_altivec vctzw (Vector Count Trailing Zeros Word)
coeff_level_run8_altivec cnttzd (Count Trailing Zeros Doubleword)
coeff_level_run15_altivec cnttzd (Count Trailing Zeros Doubleword)
coeff_level_run16_altivec cnttzd (Count Trailing Zeros Doubleword)

Implementing additional functions:

Function
zigzag_scan_8x8_field
zigzag_sub_8x8
mbtree_fix8_pack
mbtree_fix8_unpack
deblock_v_luma_intra
deblock_h_luma_intra
deblock_v8_luma
deblock_v8_luma_intra
predict_8x16c_p
dequant_4x4dc
Edited by Mamone Tarsha
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information