AArch64: Optimize vertical i8mm subpel filters
1 unresolved thread
1 unresolved thread
Replace the accumulator initializations of the vertical subpel filters with register fills by zeros (which are usually zero latency operations in this feature class), this implies the usage of rounding shifts at the end in the prep cases. Out-of-order CPU cores can benefit from this change.
Relative performance of micro benchmarks (lower is better):
Cortex-X3:
mct_8tap_sharp_w16_v_8bpc_i8mm: 0.910x
mct_8tap_sharp_w8_v_8bpc_i8mm: 0.986x
mc_8tap_sharp_w16_v_8bpc_i8mm: 0.864x
mc_8tap_sharp_w8_v_8bpc_i8mm: 0.882x
mc_8tap_sharp_w4_v_8bpc_i8mm: 0.933x
mc_8tap_sharp_w2_v_8bpc_i8mm: 0.926x
Cortex-A715:
mct_8tap_sharp_w16_v_8bpc_i8mm: 0.855x
mct_8tap_sharp_w8_v_8bpc_i8mm: 0.784x
mct_8tap_sharp_w4_v_8bpc_i8mm: 1.069x
mc_8tap_sharp_w16_v_8bpc_i8mm: 0.850x
mc_8tap_sharp_w8_v_8bpc_i8mm: 0.779x
mc_8tap_sharp_w4_v_8bpc_i8mm: 0.971x
mc_8tap_sharp_w2_v_8bpc_i8mm: 0.975x
Cortex-A510:
mct_8tap_sharp_w16_v_8bpc_i8mm: 1.001x
mct_8tap_sharp_w8_v_8bpc_i8mm: 0.979x
mct_8tap_sharp_w4_v_8bpc_i8mm: 0.998x
mc_8tap_sharp_w16_v_8bpc_i8mm: 0.998x
mc_8tap_sharp_w8_v_8bpc_i8mm: 1.004x
mc_8tap_sharp_w4_v_8bpc_i8mm: 1.003x
mc_8tap_sharp_w2_v_8bpc_i8mm: 0.996x
Merge request reports
Activity
Filter activity
added 3 commits
-
bbb45cc9...22390124 - 2 commits from branch
videolan:master
- 33120291 - AArch64: Optimize vertical i8mm subpel filters
-
bbb45cc9...22390124 - 2 commits from branch
It's not a problem, thanks for the reviews in advance! Have a nice long weekend!
Edited by Arpad Panyik
requested review from @mstorsjo
added ARM performance labels
- Resolved by Arpad Panyik
added 1 commit
- 886eb979 - AArch64: Optimize vertical i8mm subpel filters
added 5 commits
-
886eb979...d1bdf4f1 - 4 commits from branch
videolan:master
- a74a4682 - AArch64: Optimize vertical i8mm subpel filters
-
886eb979...d1bdf4f1 - 4 commits from branch
added 1 commit
- b2eca1ac - AArch64: Optimize vertical i8mm subpel filters
enabled an automatic merge when the pipeline for b2eca1ac succeeds
changed milestone to %1.4.2