AArch64: Optimize Armv8.0 Neon path of HBD horizontal 6-tap filters
The 6-tap horizontal subpel filters can be further improved by some pointer arithmetic and saving some instructions (EXTs) in their data rearrangement codes. Relative runtime of micro benchmarks after this patch on some Cortex CPU cores: regular: X1 A78 A76 A55 mc w8: 0.915x 0.937x 0.900x 0.982x mc w16: 0.917x 0.947x 0.911x 0.971x mc w32: 0.914x 0.938x 0.873x 0.961x mc w64: 0.918x 0.932x 0.882x 0.964x
parent
109b2427
No related branches found
No related tags found
Loading
Please register or sign in to comment