1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
llvm-mirror/test/Transforms
David Green b177d35a9a [LV][ARM] Inloop reduction cost modelling
This adds cost modelling for the inloop vectorization added in
745bf6cf4471. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:

  %sa = sext <16 x i8> A to <16 x i32>
  %sb = sext <16 x i8> B to <16 x i32>
  %m = mul <16 x i32> %sa, %sb
  %r = vecreduce.add(%m)
  ->
  R = VMLADAV A, B

There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.

Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.

This patch attempt to improve the costmodelling of in-loop reductions
by:
 - Adding some pattern matching in the loop vectorizer cost model to
   match extended reduction patterns that are optionally extended and/or
   MLA patterns. This marks the cost of the reduction instruction correctly
   and the sext/zext/mul leading up to it as free, which is otherwise
   difficult to tell and may get a very high cost. (In the long run this
   can hopefully be replaced by vplan producing a single node and costing
   it correctly, but that is not yet something that vplan can do).
 - getExtendedAddReductionCost is added to query the cost of these
   extended reduction patterns.
 - Expanded the ARM costs to account for these expanded sizes, which is a
   fairly simple change in itself.
 - Some minor alterations to allow inloop reduction larger than the highest
   vector width and i64 MVE reductions.
 - An extra InLoopReductionImmediateChains map was added to the vectorizer
   for it to efficiently detect which instructions are reductions in the
   cost model.
 - The tests have some updates to show what I believe is optimal
   vectorization and where we are now.

Put together this can greatly improve performance for reduction loop
under MVE.

Differential Revision: https://reviews.llvm.org/D93476
2021-01-21 21:03:41 +00:00
..
ADCE [SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 1 2021-01-01 03:25:22 +03:00
AddDiscriminators
AggressiveInstCombine
AlignmentFromAssumptions
ArgumentPromotion [ArgPromotion] Delay dead GEP removal until doPromotion. 2021-01-04 09:51:20 +00:00
AtomicExpand
Attributor Allow nonnull/align attribute to accept poison 2021-01-20 11:31:23 +09:00
BDCE
BlockExtractor
BranchFolding
CalledValuePropagation
CallSiteSplitting
CanonicalizeAliases
CanonicalizeFreezeInLoops
CodeExtractor
CodeGenPrepare
ConstantHoisting
ConstantMerge
ConstraintElimination [ConstraintElimination] Add support for select form of and/or 2020-12-30 21:27:36 +09:00
Coroutines [Coroutine] Remain alignment information when merging frame variables 2021-01-20 18:59:00 +08:00
CorrelatedValuePropagation [LVI] Handle unions of conditions 2021-01-01 16:46:21 +01:00
CrossDSOCFI
DCE
DeadArgElim
DeadStoreElimination [DSE] Add tests with stores of existing values. 2021-01-13 21:56:21 +00:00
DivRemPairs
EarlyCSE [noalias.decl] Look through llvm.experimental.noalias.scope.decl 2021-01-19 20:09:42 +01:00
EliminateAvailableExternally
EntryExitInstrumenter
ExpandMemCmp
FixIrreducible
Float2Int
ForcedFunctionAttrs
FunctionAttrs [FunctionAttrs] Infer willreturn for functions without loops 2021-01-21 20:29:33 +01:00
FunctionImport
GCOVProfiling
GlobalDCE
GlobalMerge
GlobalOpt
GlobalSplit
GuardWidening
GVN [InstSimplify] Fold call null/undef to poison 2021-01-06 21:09:30 +01:00
GVNHoist
GVNSink [SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree 2020-12-30 00:48:10 +03:00
HardwareLoops [ARM] Remove LLC tests from transform/hardware loop tests. 2021-01-16 18:30:21 +00:00
HelloNew
HotColdSplit
IndirectBrExpand
IndVarSimplify [LoopDeletion] Break backedge of outermost loops when known not taken 2021-01-10 16:02:33 -08:00
InferAddressSpaces
InferFunctionAttrs [FunctionAttrs] Infer willreturn for functions without loops 2021-01-21 20:29:33 +01:00
Inline Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" 2021-01-20 13:33:43 -08:00
InstCombine [InstCombine] avoid crashing on attribute propagation 2021-01-21 08:13:26 -05:00
InstMerge
InstNamer
InstSimplify [InstSimplify] Fold x*C1/C2 <= x (PR48744) 2021-01-17 16:02:55 +01:00
InterleavedAccess [CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem 2021-01-10 18:03:51 +09:00
Internalize
IRCE
IROutliner [IROutliner] Adapting to hoisted bitcasts in CodeExtractor 2021-01-13 11:10:37 -06:00
JumpThreading [JumpThreading][NewPM] Skip when target has divergent CF 2021-01-04 16:08:08 -08:00
LCSSA
LICM [InferAttrs] Mark some library functions as willreturn. 2021-01-18 13:40:21 +00:00
LoadStoreVectorizer
LoopDataPrefetch
LoopDeletion [LoopDeletion] Break backedge of outermost loops when known not taken 2021-01-10 16:02:33 -08:00
LoopDistribute [LoopDistribute] Add tests with uncomputable BTCs. 2021-01-01 13:57:03 +00:00
LoopFlatten
LoopFusion
LoopIdiom [Tests] Added test for memcpy loop idiom recognization 2021-01-13 14:55:46 +01:00
LoopInstSimplify
LoopInterchange Scalar: Don't visit constants in findInnerReductionPhi in LoopInterchange 2021-01-21 12:33:06 -08:00
LoopLoadElim [LoopLoadElim] Add tests with uncomputable BTCs. 2021-01-01 13:57:02 +00:00
LoopPredication
LoopReroll
LoopRotate [LoopRotate] Calls not lowered to calls should not block rotation. 2021-01-19 14:37:36 +00:00
LoopSimplify
LoopSimplifyCFG
LoopStrengthReduce BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor 2021-01-15 13:51:47 -08:00
LoopTransformWarning
LoopUnroll Loop peeling: check that latch is conditional branch 2021-01-20 11:01:16 -05:00
LoopUnrollAndJam
LoopUnswitch [LoopUnswitch] Implement first version of partial unswitching. 2021-01-21 09:46:41 +00:00
LoopVectorize [LV][ARM] Inloop reduction cost modelling 2021-01-21 21:03:41 +00:00
LoopVersioning
LoopVersioningLICM
LowerAtomic
LowerConstantIntrinsics
LowerExpectIntrinsic
LowerGuardIntrinsic
LowerInvoke
LowerMatrixIntrinsics Use unary CreateShuffleVector if possible 2020-12-30 22:36:08 +09:00
LowerSwitch
LowerTypeTests
LowerWidenableCondition
MakeGuardsExplicit
Mem2Reg
MemCpyOpt [BasicAA] Move assumption tracking into AAQI 2021-01-17 10:34:35 +01:00
MergeFunc
MergeICmps
MetaRenamer
NameAnonGlobals
NaryReassociate
NewGVN [PredicateInfo] Handle logical and/or 2021-01-20 21:03:07 +01:00
ObjCARC
OpenMP [OpenMP] Add support for mapping names in mapper API 2021-01-21 09:26:44 -05:00
PartialInlining
PartiallyInlineLibCalls
PGOProfile [SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 2 2021-01-01 03:25:24 +03:00
PhaseOrdering [SLP] match maxnum/minnum intrinsics as FP reduction ops 2021-01-18 17:37:16 -05:00
PlaceSafepoints
PreISelIntrinsicLowering
PruneEH [FuncAttrs] Infer noreturn 2021-01-05 13:25:42 -08:00
Reassociate
Reg2Mem
RewriteStatepointsForGC
SafeStack
SampleProfile [SampleFDO] Add the support to split the function profiles with context into 2021-01-19 15:16:19 -08:00
ScalarizeMaskedMemIntrin
Scalarizer [Scalarizer] Use poison as insertelement's placeholder 2021-01-04 00:35:28 +09:00
SCCP [PredicateInfo] Handle logical and/or 2021-01-20 21:03:07 +01:00
SeparateConstOffsetFromGEP
SimpleLoopUnswitch [NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions 2021-01-13 14:54:49 -08:00
SimplifyCFG [SimplifyCFG] Reapply update_test_checks.py (NFC) 2021-01-20 12:41:30 +09:00
Sink
SLPVectorizer Revert "[SLP]Merge reorder and reuse shuffles." 2021-01-19 11:48:04 -08:00
SpeculateAroundPHIs
SpeculativeExecution
SROA Use unary CreateShuffleVector if possible 2020-12-30 22:36:08 +09:00
StraightLineStrengthReduce
StripDeadPrototypes
StripSymbols
StructurizeCFG [test] Pin backedge-id-bug-xfail.ll to legacy PM 2021-01-04 13:09:42 -08:00
SyntheticCountsPropagation
TailCallElim
ThinLTOBitcodeWriter
TypePromotion/ARM
UnifyFunctionExitNodes
UnifyLoopExits
UniqueInternalLinkageNames Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. 2021-01-12 15:15:53 -08:00
Util [PredicateInfo] Handle logical and/or 2021-01-20 21:03:07 +01:00
VectorCombine [Constant] Update ConstantVector::get to return poison if all input elems are poison 2021-01-07 09:26:07 +09:00
WholeProgramDevirt
lit.local.cfg