PowerPC optimization roadmap

Optimizing x264 library for PowerPC architecture puts it on par with other architectures that offer a close to optimal performance for library's users. This issue lists a pivotal changes to optimize x264 library for PowerPC architecture to keep it on track with x86 and Arm architectures.

Build an infrastructure to use assembly language for function implementation !43
Handle endianness variants by macros defined in asm.S !43
Implement non-optimized functions (not implemented using C Intrinsic) !43
Use Power ISA v3.0 for further optimizations
Optimize additional functions

Using Power ISA v3.0:

Macro	Power ISA v3.0 instruction
LOAD_16_BYTE	lxvb16x (Load VSX Vector Byte*16 Indexed)
STORE_16_BYTE	stxvb16x (Store VSX Vector Byte*16 Indexed)
LOAD_8_HALFWORD	lxvh8x (Load VSX Vector Halfword*8 Indexed)
STORE_8_HALFWORD	stxvh8x (Store VSX Vector Halfword*8 Indexed)
LOAD_8_BYTE_H	lxvll (Load VSX Vector with Length Left-justified)
STORE_8_BYTE_H	stxvll (Store VSX Vector with Length Left-justified)
LOAD_4_BYTE_H	lxvll (Load VSX Vector with Length Left-justified)
STORE_4_BYTE_H	stxvll (Store VSX Vector with Length Left-justified)
ABS_BYTE	vabsdub (Vector Absolute Difference Unsigned Byte)
ABS_HALFWORD	vabsduh (Vector Absolute Difference Unsigned Halfword)

Function	Power ISA v3.0 instruction
zigzag_scan_8x8_field	vinserth (Vector Insert Halfword from VSR using immediate-specified index)
zigzag_sub_8x8	vinsertb (Vector Insert Byte from VSR using immediate-specified index)
zigzag_sub_4x4_field_neon	vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
zigzag_sub_4x4ac_field_neon	vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
zigzag_sub_4x4_frame_neon	vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
zigzag_sub_4x4ac_frame_neon	vinsertw, vinserth, stxvll (Vector Insert, Store VSX Vector)
pixel_ssd_nv12_core_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
pixel_var2_8x8_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
pixel_var2_8x16_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
pixel_vsad_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_v_chroma_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_h_chroma_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_h_chroma_422_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_h_chroma_mbaff_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
deblock_v_chroma_intra_altivec	vabsdub (Vector Absolute Difference Unsigned Byte)
decimate_score64_altivec	cnttzd (Count Trailing Zeros Doubleword)
coeff_last15_altivec	cnttzd (Count Trailing Zeros Doubleword)
coeff_last16_altivec	cnttzd (Count Trailing Zeros Doubleword)
coeff_last64_altivec	vctzw (Vector Count Trailing Zeros Word)
coeff_level_run8_altivec	cnttzd (Count Trailing Zeros Doubleword)
coeff_level_run15_altivec	cnttzd (Count Trailing Zeros Doubleword)
coeff_level_run16_altivec	cnttzd (Count Trailing Zeros Doubleword)

Implementing additional functions:

Function
zigzag_scan_8x8_field
zigzag_sub_8x8
mbtree_fix8_pack
mbtree_fix8_unpack
deblock_v_luma_intra
deblock_h_luma_intra
deblock_v8_luma
deblock_v8_luma_intra
predict_8x16c_p
dequant_4x4dc

Edited Jun 19, 2021 by Mamone Tarsha

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information