Skip to content
Snippets Groups Projects

AArch64: Optimize 6-tap SBD HV Neon convolution

Merged Arpad Panyik requested to merge arpadpanyik-arm/dav1d:mc_6tap_hv into master

This is a follow-up to !1595 (merged). Optimize the 6-tap standard bit-depth horizontal-vertical combined convolution to avoid unnecessary reads and horizontal convolution steps at the beginning and end of the algorithm. This also saves some instructions in the final binary.

Performance of this function increases by up to 5.5% depending on block size.

Micro benchmark results on AWS Graviton 3:

mc_8tap_regular_w2_hv_8bpc_neon:       50.2  ->    50.2 (  0.00 % )
mc_8tap_regular_w4_hv_8bpc_neon:       63.5  ->    61.3 ( -3.46 % )
mc_8tap_regular_w8_hv_8bpc_neon:       94.3  ->    89.4 ( -5.19 % )
mc_8tap_regular_w16_hv_8bpc_neon:     253.9  ->   243.5 ( -4.09 % )
mc_8tap_regular_w32_hv_8bpc_neon:     761.4  ->   735.8 ( -3.36 % )
mc_8tap_regular_w64_hv_8bpc_neon:    2622.6  ->  2547.6 ( -2.85 % )
mc_8tap_regular_w128_hv_8bpc_neon:   7286.8  ->  7110.8 ( -2.41 % )

mct_8tap_regular_w4_hv_8bpc_neon:      44.6  ->    42.5 ( -4.70 % )
mct_8tap_regular_w8_hv_8bpc_neon:     100.3  ->    97.2 ( -3.09 % )
mct_8tap_regular_w16_hv_8bpc_neon:    309.0  ->   303.2 ( -1.87 % )
mct_8tap_regular_w32_hv_8bpc_neon:   1170.4  ->  1162.9 ( -0.64 % )
mct_8tap_regular_w64_hv_8bpc_neon:   2792.0  ->  2788.9 ( -0.11 % )
mct_8tap_regular_w128_hv_8bpc_neon:  6879.2  ->  6870.5 ( -0.12 % )
Edited by Arpad Panyik

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading