Skip to content
Snippets Groups Projects

AArch64: Optimize the init of DotProd+ 2D subpel filters

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_sbd_dotprod_init_hv into master
1 unresolved thread

Removed some unnecessary vector register copies from the initial horizontal filter parts of the HV subpel filters. The performance improvements are better for the smaller filter block sizes.

The narrowing shifts were also rewritten at the end of the *filter8* because it was only beneficial for the Cortex-A55 among the DotProd capable CPU cores. On other out-of-order or newer CPUs the UZP1+SHRN instruction combination is better.

Relative performance of micro benchmarks (lower is better):

Cortex-A55:

  mct regular w4:  0.980x    mct sharp w4:    0.983x
  mct regular w8:  1.007x    mct sharp w8:    1.012x
  mct regular w16: 1.007x    mct sharp w16:   1.005x

Cortex-A510:

  mct regular w4:  0.935x    mct sharp w4:    0.927x
  mct regular w8:  0.984x    mct sharp w8:    0.983x
  mct regular w16: 0.986x    mct sharp w16:   0.987x

Cortex-A78:

  mct regular w4:  0.974x    mct sharp w4:    0.971x
  mct regular w8:  0.988x    mct sharp w8:    0.987x
  mct regular w16: 0.991x    mct sharp w16:   0.979x

Cortex-715:

  mct regular w4:  0.958x    mct sharp w4:    0.974x
  mct regular w8:  0.993x    mct sharp w8:    0.991x
  mct regular w16: 0.998x    mct sharp w16:   0.997x

Cortex-X1:

  mct regular w4:  0.983x    mct sharp w4:    0.974x
  mct regular w8:  0.993x    mct sharp w8:    0.990x
  mct regular w16: 0.996x    mct sharp w16:   0.995x

Cortex-X3:

  mct regular w4:  0.953x    mct sharp w4:    0.981x
  mct regular w8:  0.993x    mct sharp w8:    0.993x
  mct regular w16: 0.997x    mct sharp w16:   0.995x

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
919 936 add \src, \src, #2
920 937
921 938 bl L(\type\()_hv_filter4_\isa)
922 mov v16.16b, v22.16b
939 shrn v16.4h, v22.4s, #2
  • LGTM, nice!

  • Martin Storsjö approved this merge request

    approved this merge request

  • requested review from @mstorsjo

  • changed milestone to %1.4.2

  • added 4 commits

    Compare with previous version

  • Jean-Baptiste Kempf enabled an automatic merge when the pipeline for a6d57b11 succeeds

    enabled an automatic merge when the pipeline for a6d57b11 succeeds

  • Please register or sign in to reply
    Loading