Skip to content

AArch64: Simplify DotProd path of subpel filters

Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_sbd_dotprod_plus into master

The purpose of this merge request is to transform the DotProd code path to be similar to our upcoming i8mm version. The modifications include comment fixes, instruction reorderings, TBL rewrites, load/store refactoring and macro simplifications. These lead to simpler filter_8tap_fn macro with less branches in it to reduce line count and to help the understanding. We also tried to avoid introducing any performance regression.

Some i8mm tunings were back-ported to this DotProd path as well, most notably:

  • horizontal filters with 2-register TBL instructions are simplified to use only 1-register TBLs, it improves performance on small cores like Cortex-A510 and newer.
  • the accumulators of vertical filters are initialised to make it possible for CPUs to use zero latency move instructions.

Details can be seen in the commit messages.

Edited by Arpad Panyik

Merge request reports