Skip to content

AArch64: Optimize prep_neon function

Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_prep_opt into master

Optimize the prep_neon function, details are in the commit messages.

Relative performance of micro benchmarks including all commits (lower is better):

Cortex-A55 mct_w4: 0.795x mct_w8: 0.913x mct_w16: 0.912x mct_w32: 0.838x mct_w64: 1.025x mct_w128: 1.002x
Cortex-A510 mct_w4: 0.760x mct_w8: 0.636x mct_w16: 0.640x mct_w32: 0.854x mct_w64: 0.864x mct_w128: 0.995x
Cortex-A72 mct_w4: 0.616x mct_w8: 0.854x mct_w16: 0.756x mct_w32: 1.052x mct_w64: 1.044x mct_w128: 0.702x
Cortex-A76 mct_w4: 0.837x mct_w8: 0.797x mct_w16: 0.841x mct_w32: 0.804x mct_w64: 0.948x mct_w128: 0.904x
Cortex-A78 mct_w16: 0.542x mct_w32: 0.725x mct_w64: 0.741x mct_w128: 0.745x
Cortex-A715 mct_w16: 0.561x mct_w32: 0.720x mct_w64: 0.740x mct_w128: 0.748x
Cortex-X1 mct_w32: 0.886x mct_w64: 0.882x mct_w128: 0.917x
Cortex-X3 mct_w32: 0.835x mct_w64: 0.803x mct_w128: 0.808x

Merge request reports