tools: Make ARM cpu flags imply relevant lower level flags
The --cpumask
flag only takes one single flag name, one can't set
a combination like neon+dotprod
.
Therefore, apply the same pattern as for x86, by adding mask values that contain all the implied lower level flags.
This is somewhat complicated, as the set of features isn't entirely linear - in particular, SVE doesn't imply either dotprod or i8mm, and SVE2 only implies dotprod, but not i8mm.
This makes sure that dav1d --cpumask dotprod
actually uses any
SIMD at all, as it previously only set the dotprod flag but not
neon, which essentially opted out from all SIMD.
CC @another - this is relevant for !1644 (merged).
CC @arpadpanyik-arm - are the feature implications correct? I.e. setting i8mm implies dotprod. But does having SVE imply having dotprod or i8mm? SVE2 obviously implies SVE, but I guessed that SVE2 also implies i8mm, is that right?
If these flags are more decoupled than this, we probably should improve the flags parser, to allow setting combinations of more than one flag.
Merge request reports
Activity
requested review from @gramner
mentioned in merge request !1644 (merged)
- Resolved by Martin Storsjö
The dependencies of features are a bit complicated, but in short:
-
i8mm
impliesDotProd
. -
SVE
doesn't implyDotProd
nori8mm
. -
SVE2
implies Armv9.0-a, which implies Armv8.5-a, so it hasDotProd
(Armv8.4-a).
A64FX has
SVE
but noDotProd
, AWS Graviton 3 hasSVE
andi8mm
too. All Armv9 cores designed by Arm includei8mm
.Edited by Arpad Panyik -
added 1 commit
- 319ae790 - tools: Make ARM cpu flags imply relevant lower level flags
enabled an automatic merge when the pipeline for 236e1d19 succeeds
A different way of dealing with the nonlinearity of these flags, could be to simply make each of the masks imply all lower bits - e.g.
mask_feat = flag_feat | (flag_feat-1)
. Or just includes the mask from the previous line, disregarding the real relation between features.So if we set e.g.
--cpumask sve
, the mask containsdotprod
andi8mm
. But if the runtime check didn’t indicate that those flags were available, it’s a no-op, as the mask is ANDed with the flags from the runtime detection.That makes things a little bit simpler, but it implies an order/hierarchy which doesn’t really exist.
What would you think about that? Because as it stands right now, it’s impossible to express e.g.
sve+i8mm
.Admittedly, this argument isn’t really very valuable in the real world - only for benchmarking different feature flags, and for running the argon tests for trying to get coverage for all SIMD implementations.
changed milestone to %1.4.2
added ARM label
added tools label