Skip to content
Snippets Groups Projects

AArch64: Optimize vertical i8mm subpel filters

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_sbd_i8mm_v into master
1 unresolved thread

Replace the accumulator initializations of the vertical subpel filters with register fills by zeros (which are usually zero latency operations in this feature class), this implies the usage of rounding shifts at the end in the prep cases. Out-of-order CPU cores can benefit from this change.

Relative performance of micro benchmarks (lower is better):

Cortex-X3:

mct_8tap_sharp_w16_v_8bpc_i8mm:	0.910x
mct_8tap_sharp_w8_v_8bpc_i8mm: 	0.986x

mc_8tap_sharp_w16_v_8bpc_i8mm: 	0.864x
mc_8tap_sharp_w8_v_8bpc_i8mm:  	0.882x
mc_8tap_sharp_w4_v_8bpc_i8mm:  	0.933x
mc_8tap_sharp_w2_v_8bpc_i8mm:  	0.926x

Cortex-A715:

mct_8tap_sharp_w16_v_8bpc_i8mm:	0.855x
mct_8tap_sharp_w8_v_8bpc_i8mm: 	0.784x
mct_8tap_sharp_w4_v_8bpc_i8mm:  1.069x

mc_8tap_sharp_w16_v_8bpc_i8mm: 	0.850x
mc_8tap_sharp_w8_v_8bpc_i8mm:  	0.779x
mc_8tap_sharp_w4_v_8bpc_i8mm:  	0.971x
mc_8tap_sharp_w2_v_8bpc_i8mm:  	0.975x

Cortex-A510:

mct_8tap_sharp_w16_v_8bpc_i8mm: 1.001x
mct_8tap_sharp_w8_v_8bpc_i8mm: 	0.979x
mct_8tap_sharp_w4_v_8bpc_i8mm: 	0.998x

mc_8tap_sharp_w16_v_8bpc_i8mm: 	0.998x
mc_8tap_sharp_w8_v_8bpc_i8mm:   1.004x
mc_8tap_sharp_w4_v_8bpc_i8mm:   1.003x
mc_8tap_sharp_w2_v_8bpc_i8mm:  	0.996x

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Looks ok, but please update the commit message to mention the register load/duplication ordering change as well.

  • Arpad Panyik added 1 commit

    added 1 commit

    • 886eb979 - AArch64: Optimize vertical i8mm subpel filters

    Compare with previous version

  • Arpad Panyik added 5 commits

    added 5 commits

    Compare with previous version

  • Arpad Panyik added 1 commit

    added 1 commit

    • b2eca1ac - AArch64: Optimize vertical i8mm subpel filters

    Compare with previous version

  • Martin Storsjö approved this merge request

    approved this merge request

  • Martin Storsjö enabled an automatic merge when the pipeline for b2eca1ac succeeds

    enabled an automatic merge when the pipeline for b2eca1ac succeeds

  • changed milestone to %1.4.2

  • Please register or sign in to reply
    Loading