2013-01-07 04:08:10 +01:00
|
|
|
//===- llvm/Analysis/TargetTransformInfo.cpp ------------------------------===//
|
2012-10-19 01:22:48 +02:00
|
|
|
//
|
2019-01-19 09:50:56 +01:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2012-10-19 01:22:48 +02:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2013-01-07 04:08:10 +01:00
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 22:15:01 +01:00
|
|
|
#include "llvm/Analysis/CFG.h"
|
|
|
|
#include "llvm/Analysis/LoopIterator.h"
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
#include "llvm/Analysis/TargetTransformInfoImpl.h"
|
2019-10-01 09:53:28 +02:00
|
|
|
#include "llvm/IR/CFG.h"
|
2013-01-21 02:27:39 +01:00
|
|
|
#include "llvm/IR/DataLayout.h"
|
|
|
|
#include "llvm/IR/Instruction.h"
|
|
|
|
#include "llvm/IR/Instructions.h"
|
2014-01-07 12:48:04 +01:00
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
2015-02-01 11:11:22 +01:00
|
|
|
#include "llvm/IR/Module.h"
|
2014-01-07 12:48:04 +01:00
|
|
|
#include "llvm/IR/Operator.h"
|
2017-09-09 00:29:17 +02:00
|
|
|
#include "llvm/IR/PatternMatch.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 22:15:01 +01:00
|
|
|
#include "llvm/InitializePasses.h"
|
2017-07-07 04:00:06 +02:00
|
|
|
#include "llvm/Support/CommandLine.h"
|
2012-10-19 01:22:48 +02:00
|
|
|
#include "llvm/Support/ErrorHandling.h"
|
2016-05-27 16:27:24 +02:00
|
|
|
#include <utility>
|
2012-10-19 01:22:48 +02:00
|
|
|
|
|
|
|
using namespace llvm;
|
2017-09-09 00:29:17 +02:00
|
|
|
using namespace PatternMatch;
|
2012-10-19 01:22:48 +02:00
|
|
|
|
2014-04-22 04:48:03 +02:00
|
|
|
#define DEBUG_TYPE "tti"
|
|
|
|
|
2017-09-09 00:29:17 +02:00
|
|
|
static cl::opt<bool> EnableReduxCost("costmodel-reduxcost", cl::init(false),
|
|
|
|
cl::Hidden,
|
|
|
|
cl::desc("Recognize reduction patterns."));
|
|
|
|
|
2015-01-31 12:17:59 +01:00
|
|
|
namespace {
|
2018-05-01 17:54:18 +02:00
|
|
|
/// No-op implementation of the TTI interface using the utility base
|
2015-01-31 12:17:59 +01:00
|
|
|
/// classes.
|
|
|
|
///
|
|
|
|
/// This is used when no target specific information is available.
|
|
|
|
struct NoTTIImpl : TargetTransformInfoImplCRTPBase<NoTTIImpl> {
|
2015-07-09 04:08:42 +02:00
|
|
|
explicit NoTTIImpl(const DataLayout &DL)
|
2015-01-31 12:17:59 +01:00
|
|
|
: TargetTransformInfoImplCRTPBase<NoTTIImpl>(DL) {}
|
|
|
|
};
|
2020-04-15 14:43:26 +02:00
|
|
|
} // namespace
|
2015-01-31 12:17:59 +01:00
|
|
|
|
2019-06-26 14:02:43 +02:00
|
|
|
bool HardwareLoopInfo::canAnalyze(LoopInfo &LI) {
|
|
|
|
// If the loop has irreducible control flow, it can not be converted to
|
|
|
|
// Hardware loop.
|
2020-02-18 03:48:38 +01:00
|
|
|
LoopBlocksRPO RPOT(L);
|
2019-06-26 14:02:43 +02:00
|
|
|
RPOT.perform(&LI);
|
|
|
|
if (containsIrreducibleCFG<const BasicBlock *>(RPOT, LI))
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2019-06-19 03:26:31 +02:00
|
|
|
bool HardwareLoopInfo::isHardwareLoopCandidate(ScalarEvolution &SE,
|
|
|
|
LoopInfo &LI, DominatorTree &DT,
|
|
|
|
bool ForceNestedLoop,
|
2019-07-09 19:53:09 +02:00
|
|
|
bool ForceHardwareLoopPHI) {
|
2019-06-19 03:26:31 +02:00
|
|
|
SmallVector<BasicBlock *, 4> ExitingBlocks;
|
|
|
|
L->getExitingBlocks(ExitingBlocks);
|
|
|
|
|
2019-10-01 09:53:28 +02:00
|
|
|
for (BasicBlock *BB : ExitingBlocks) {
|
2019-06-19 03:26:31 +02:00
|
|
|
// If we pass the updated counter back through a phi, we need to know
|
|
|
|
// which latch the updated value will be coming from.
|
|
|
|
if (!L->isLoopLatch(BB)) {
|
|
|
|
if (ForceHardwareLoopPHI || CounterInReg)
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
const SCEV *EC = SE.getExitCount(L, BB);
|
|
|
|
if (isa<SCEVCouldNotCompute>(EC))
|
|
|
|
continue;
|
|
|
|
if (const SCEVConstant *ConstEC = dyn_cast<SCEVConstant>(EC)) {
|
|
|
|
if (ConstEC->getValue()->isZero())
|
|
|
|
continue;
|
|
|
|
} else if (!SE.isLoopInvariant(EC, L))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (SE.getTypeSizeInBits(EC->getType()) > CountType->getBitWidth())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// If this exiting block is contained in a nested loop, it is not eligible
|
|
|
|
// for insertion of the branch-and-decrement since the inner loop would
|
|
|
|
// end up messing up the value in the CTR.
|
|
|
|
if (!IsNestingLegal && LI.getLoopFor(BB) != L && !ForceNestedLoop)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// We now have a loop-invariant count of loop iterations (which is not the
|
|
|
|
// constant zero) for which we know that this loop will not exit via this
|
|
|
|
// existing block.
|
|
|
|
|
|
|
|
// We need to make sure that this block will run on every loop iteration.
|
|
|
|
// For this to be true, we must dominate all blocks with backedges. Such
|
|
|
|
// blocks are in-loop predecessors to the header block.
|
|
|
|
bool NotAlways = false;
|
2019-10-01 09:53:28 +02:00
|
|
|
for (BasicBlock *Pred : predecessors(L->getHeader())) {
|
|
|
|
if (!L->contains(Pred))
|
2019-06-19 03:26:31 +02:00
|
|
|
continue;
|
|
|
|
|
2019-10-01 09:53:28 +02:00
|
|
|
if (!DT.dominates(BB, Pred)) {
|
2019-06-19 03:26:31 +02:00
|
|
|
NotAlways = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (NotAlways)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Make sure this blocks ends with a conditional branch.
|
|
|
|
Instruction *TI = BB->getTerminator();
|
|
|
|
if (!TI)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
|
|
|
|
if (!BI->isConditional())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ExitBranch = BI;
|
|
|
|
} else
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Note that this block may not be the loop latch block, even if the loop
|
|
|
|
// has a latch block.
|
2019-10-01 09:53:28 +02:00
|
|
|
ExitBlock = BB;
|
2019-06-19 03:26:31 +02:00
|
|
|
ExitCount = EC;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!ExitBlock)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-07-09 04:08:42 +02:00
|
|
|
TargetTransformInfo::TargetTransformInfo(const DataLayout &DL)
|
2015-01-31 12:17:59 +01:00
|
|
|
: TTIImpl(new Model<NoTTIImpl>(NoTTIImpl(DL))) {}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
TargetTransformInfo::~TargetTransformInfo() {}
|
2012-10-19 01:22:48 +02:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
TargetTransformInfo::TargetTransformInfo(TargetTransformInfo &&Arg)
|
|
|
|
: TTIImpl(std::move(Arg.TTIImpl)) {}
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 02:37:14 +01:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
TargetTransformInfo &TargetTransformInfo::operator=(TargetTransformInfo &&RHS) {
|
|
|
|
TTIImpl = std::move(RHS.TTIImpl);
|
|
|
|
return *this;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2016-04-15 03:38:48 +02:00
|
|
|
unsigned TargetTransformInfo::getInliningThresholdMultiplier() const {
|
|
|
|
return TTIImpl->getInliningThresholdMultiplier();
|
|
|
|
}
|
|
|
|
|
[AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348
2019-07-17 18:51:29 +02:00
|
|
|
int TargetTransformInfo::getInlinerVectorBonusPercent() const {
|
|
|
|
return TTIImpl->getInlinerVectorBonusPercent();
|
|
|
|
}
|
|
|
|
|
2016-07-08 23:48:05 +02:00
|
|
|
int TargetTransformInfo::getGEPCost(Type *PointeeType, const Value *Ptr,
|
|
|
|
ArrayRef<const Value *> Operands) const {
|
|
|
|
return TTIImpl->getGEPCost(PointeeType, Ptr, Operands);
|
|
|
|
}
|
|
|
|
|
2017-07-15 04:12:16 +02:00
|
|
|
int TargetTransformInfo::getExtCost(const Instruction *I,
|
|
|
|
const Value *Src) const {
|
|
|
|
return TTIImpl->getExtCost(I, Src);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
int TargetTransformInfo::getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
|
|
|
|
ArrayRef<const Value *> Arguments,
|
|
|
|
const User *U) const {
|
2019-03-12 10:48:02 +01:00
|
|
|
int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments, U);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-22 12:26:02 +01:00
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
unsigned TargetTransformInfo::getEstimatedNumberOfCaseClusters(
|
2019-10-29 19:30:30 +01:00
|
|
|
const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI,
|
|
|
|
BlockFrequencyInfo *BFI) const {
|
|
|
|
return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);
|
[InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets
```
lor.lhs.false2: ; preds = %if.then
switch i32 %Status, label %if.then27 [
i32 -7012, label %if.end35
i32 -10008, label %if.end35
i32 -10016, label %if.end35
i32 15000, label %if.end35
i32 14013, label %if.end35
i32 10114, label %if.end35
i32 10107, label %if.end35
i32 10105, label %if.end35
i32 10013, label %if.end35
i32 10011, label %if.end35
i32 7008, label %if.end35
i32 7007, label %if.end35
i32 5002, label %if.end35
]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
```
.LBB853_9: // %lor.lhs.false2
mov w8, #10012
cmp w19, w8
b.gt .LBB853_14
// BB#10: // %lor.lhs.false2
mov w8, #5001
cmp w19, w8
b.gt .LBB853_18
// BB#11: // %lor.lhs.false2
mov w8, #-10016
cmp w19, w8
b.eq .LBB853_23
// BB#12: // %lor.lhs.false2
mov w8, #-10008
cmp w19, w8
b.eq .LBB853_23
// BB#13: // %lor.lhs.false2
mov w8, #-7012
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_14: // %lor.lhs.false2
mov w8, #14012
cmp w19, w8
b.gt .LBB853_21
// BB#15: // %lor.lhs.false2
mov w8, #-10105
add w8, w19, w8
cmp w8, #9 // =9
b.hi .LBB853_17
// BB#16: // %lor.lhs.false2
orr w9, wzr, #0x1
lsl w8, w9, w8
mov w9, #517
and w8, w8, w9
cbnz w8, .LBB853_23
.LBB853_17: // %lor.lhs.false2
mov w8, #10013
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_18: // %lor.lhs.false2
mov w8, #-7007
add w8, w19, w8
cmp w8, #2 // =2
b.lo .LBB853_23
// BB#19: // %lor.lhs.false2
mov w8, #5002
cmp w19, w8
b.eq .LBB853_23
// BB#20: // %lor.lhs.false2
mov w8, #10011
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_21: // %lor.lhs.false2
mov w8, #14013
cmp w19, w8
b.eq .LBB853_23
// BB#22: // %lor.lhs.false2
mov w8, #15000
cmp w19, w8
b.ne .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
-inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
2017-04-28 18:04:03 +02:00
|
|
|
}
|
|
|
|
|
2017-06-29 15:42:12 +02:00
|
|
|
int TargetTransformInfo::getUserCost(const User *U,
|
2020-04-15 14:43:26 +02:00
|
|
|
ArrayRef<const Value *> Operands) const {
|
2017-06-29 15:42:12 +02:00
|
|
|
int Cost = TTIImpl->getUserCost(U, Operands);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-21 02:27:39 +01:00
|
|
|
}
|
|
|
|
|
2013-07-27 02:01:07 +02:00
|
|
|
bool TargetTransformInfo::hasBranchDivergence() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->hasBranchDivergence();
|
2013-07-27 02:01:07 +02:00
|
|
|
}
|
|
|
|
|
Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI
Summary:
Enable the new diveregence analysis by default for AMDGPU.
Resubmit with test updates since GPUDA was causing failures on Windows.
Reviewers: rampitec, nhaehnle, arsenm, thakis
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73315
2020-01-20 16:25:20 +01:00
|
|
|
bool TargetTransformInfo::useGPUDivergenceAnalysis() const {
|
|
|
|
return TTIImpl->useGPUDivergenceAnalysis();
|
|
|
|
}
|
|
|
|
|
Divergence analysis for GPU programs
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
Reviewers: resistor, hfinkel, eliben, meheff, jholewinski
Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8576
llvm-svn: 234567
2015-04-10 07:03:50 +02:00
|
|
|
bool TargetTransformInfo::isSourceOfDivergence(const Value *V) const {
|
|
|
|
return TTIImpl->isSourceOfDivergence(V);
|
|
|
|
}
|
|
|
|
|
2017-06-15 21:33:10 +02:00
|
|
|
bool llvm::TargetTransformInfo::isAlwaysUniform(const Value *V) const {
|
|
|
|
return TTIImpl->isAlwaysUniform(V);
|
|
|
|
}
|
|
|
|
|
2017-01-31 00:02:12 +01:00
|
|
|
unsigned TargetTransformInfo::getFlatAddressSpace() const {
|
|
|
|
return TTIImpl->getFlatAddressSpace();
|
|
|
|
}
|
|
|
|
|
2019-08-14 20:13:00 +02:00
|
|
|
bool TargetTransformInfo::collectFlatAddressOperands(
|
2020-04-15 14:43:26 +02:00
|
|
|
SmallVectorImpl<int> &OpIndexes, Intrinsic::ID IID) const {
|
2019-08-14 20:13:00 +02:00
|
|
|
return TTIImpl->collectFlatAddressOperands(OpIndexes, IID);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
bool TargetTransformInfo::rewriteIntrinsicWithAddressSpace(IntrinsicInst *II,
|
|
|
|
Value *OldV,
|
|
|
|
Value *NewV) const {
|
2019-08-14 20:13:00 +02:00
|
|
|
return TTIImpl->rewriteIntrinsicWithAddressSpace(II, OldV, NewV);
|
|
|
|
}
|
|
|
|
|
2013-01-22 12:26:02 +01:00
|
|
|
bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->isLoweredToCall(F);
|
2013-01-22 12:26:02 +01:00
|
|
|
}
|
|
|
|
|
2019-06-07 09:35:30 +02:00
|
|
|
bool TargetTransformInfo::isHardwareLoopProfitable(
|
2020-04-15 14:43:26 +02:00
|
|
|
Loop *L, ScalarEvolution &SE, AssumptionCache &AC,
|
|
|
|
TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) const {
|
2019-06-07 09:35:30 +02:00
|
|
|
return TTIImpl->isHardwareLoopProfitable(L, SE, AC, LibInfo, HWLoopInfo);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
bool TargetTransformInfo::preferPredicateOverEpilogue(
|
|
|
|
Loop *L, LoopInfo *LI, ScalarEvolution &SE, AssumptionCache &AC,
|
|
|
|
TargetLibraryInfo *TLI, DominatorTree *DT,
|
|
|
|
const LoopAccessInfo *LAI) const {
|
2019-11-06 10:58:36 +01:00
|
|
|
return TTIImpl->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
void TargetTransformInfo::getUnrollingPreferences(
|
[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D34531
llvm-svn: 306554
2017-06-28 17:53:17 +02:00
|
|
|
Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {
|
|
|
|
return TTIImpl->getUnrollingPreferences(L, SE, UP);
|
2013-09-11 21:25:43 +02:00
|
|
|
}
|
|
|
|
|
2013-01-05 12:43:11 +01:00
|
|
|
bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->isLegalAddImmediate(Imm);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->isLegalICmpImmediate(Imm);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
bool TargetTransformInfo::isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV,
|
|
|
|
int64_t BaseOffset,
|
2020-04-15 14:43:26 +02:00
|
|
|
bool HasBaseReg, int64_t Scale,
|
2017-07-21 13:59:37 +02:00
|
|
|
unsigned AddrSpace,
|
|
|
|
Instruction *I) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
|
2017-07-21 13:59:37 +02:00
|
|
|
Scale, AddrSpace, I);
|
2014-12-04 10:40:44 +01:00
|
|
|
}
|
|
|
|
|
2017-06-06 01:37:00 +02:00
|
|
|
bool TargetTransformInfo::isLSRCostLess(LSRCost &C1, LSRCost &C2) const {
|
|
|
|
return TTIImpl->isLSRCostLess(C1, C2);
|
|
|
|
}
|
|
|
|
|
2018-02-06 00:43:05 +01:00
|
|
|
bool TargetTransformInfo::canMacroFuseCmp() const {
|
|
|
|
return TTIImpl->canMacroFuseCmp();
|
|
|
|
}
|
|
|
|
|
2019-07-03 03:49:03 +02:00
|
|
|
bool TargetTransformInfo::canSaveCmp(Loop *L, BranchInst **BI,
|
|
|
|
ScalarEvolution *SE, LoopInfo *LI,
|
|
|
|
DominatorTree *DT, AssumptionCache *AC,
|
|
|
|
TargetLibraryInfo *LibInfo) const {
|
|
|
|
return TTIImpl->canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo);
|
|
|
|
}
|
|
|
|
|
2018-03-26 15:10:09 +02:00
|
|
|
bool TargetTransformInfo::shouldFavorPostInc() const {
|
|
|
|
return TTIImpl->shouldFavorPostInc();
|
|
|
|
}
|
|
|
|
|
2019-02-07 14:32:54 +01:00
|
|
|
bool TargetTransformInfo::shouldFavorBackedgeIndex(const Loop *L) const {
|
|
|
|
return TTIImpl->shouldFavorBackedgeIndex(L);
|
|
|
|
}
|
|
|
|
|
2019-10-14 12:00:21 +02:00
|
|
|
bool TargetTransformInfo::isLegalMaskedStore(Type *DataType,
|
|
|
|
MaybeAlign Alignment) const {
|
|
|
|
return TTIImpl->isLegalMaskedStore(DataType, Alignment);
|
2014-12-04 10:40:44 +01:00
|
|
|
}
|
|
|
|
|
2019-10-14 12:00:21 +02:00
|
|
|
bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType,
|
|
|
|
MaybeAlign Alignment) const {
|
|
|
|
return TTIImpl->isLegalMaskedLoad(DataType, Alignment);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2019-06-17 19:20:08 +02:00
|
|
|
bool TargetTransformInfo::isLegalNTStore(Type *DataType,
|
2019-09-27 14:54:21 +02:00
|
|
|
Align Alignment) const {
|
2019-06-17 19:20:08 +02:00
|
|
|
return TTIImpl->isLegalNTStore(DataType, Alignment);
|
|
|
|
}
|
|
|
|
|
2019-09-27 14:54:21 +02:00
|
|
|
bool TargetTransformInfo::isLegalNTLoad(Type *DataType, Align Alignment) const {
|
2019-06-17 19:20:08 +02:00
|
|
|
return TTIImpl->isLegalNTLoad(DataType, Alignment);
|
|
|
|
}
|
|
|
|
|
2019-12-18 09:42:53 +01:00
|
|
|
bool TargetTransformInfo::isLegalMaskedGather(Type *DataType,
|
|
|
|
MaybeAlign Alignment) const {
|
|
|
|
return TTIImpl->isLegalMaskedGather(DataType, Alignment);
|
2015-10-25 16:37:55 +01:00
|
|
|
}
|
|
|
|
|
2019-12-18 09:42:53 +01:00
|
|
|
bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType,
|
|
|
|
MaybeAlign Alignment) const {
|
|
|
|
return TTIImpl->isLegalMaskedScatter(DataType, Alignment);
|
2015-10-25 16:37:55 +01:00
|
|
|
}
|
|
|
|
|
2019-03-21 18:38:52 +01:00
|
|
|
bool TargetTransformInfo::isLegalMaskedCompressStore(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedCompressStore(DataType);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalMaskedExpandLoad(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedExpandLoad(DataType);
|
|
|
|
}
|
|
|
|
|
2017-09-09 15:38:18 +02:00
|
|
|
bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
|
|
|
|
return TTIImpl->hasDivRemOp(DataType, IsSigned);
|
|
|
|
}
|
|
|
|
|
2017-10-24 22:31:44 +02:00
|
|
|
bool TargetTransformInfo::hasVolatileVariant(Instruction *I,
|
|
|
|
unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->hasVolatileVariant(I, AddrSpace);
|
|
|
|
}
|
|
|
|
|
2017-05-24 15:42:56 +02:00
|
|
|
bool TargetTransformInfo::prefersVectorizedAddressing() const {
|
|
|
|
return TTIImpl->prefersVectorizedAddressing();
|
|
|
|
}
|
|
|
|
|
2013-05-31 23:29:03 +02:00
|
|
|
int TargetTransformInfo::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
|
|
|
|
int64_t BaseOffset,
|
2020-04-15 14:43:26 +02:00
|
|
|
bool HasBaseReg, int64_t Scale,
|
2015-06-07 22:12:03 +02:00
|
|
|
unsigned AddrSpace) const {
|
2015-08-05 20:08:10 +02:00
|
|
|
int Cost = TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
|
|
|
|
Scale, AddrSpace);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-05-31 23:29:03 +02:00
|
|
|
}
|
|
|
|
|
2017-07-21 13:59:37 +02:00
|
|
|
bool TargetTransformInfo::LSRWithInstrQueries() const {
|
|
|
|
return TTIImpl->LSRWithInstrQueries();
|
|
|
|
}
|
|
|
|
|
2013-01-05 12:43:11 +01:00
|
|
|
bool TargetTransformInfo::isTruncateFree(Type *Ty1, Type *Ty2) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->isTruncateFree(Ty1, Ty2);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-02-23 20:15:16 +01:00
|
|
|
bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {
|
|
|
|
return TTIImpl->isProfitableToHoist(I);
|
|
|
|
}
|
|
|
|
|
2018-03-29 00:28:50 +02:00
|
|
|
bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }
|
|
|
|
|
2013-01-05 12:43:11 +01:00
|
|
|
bool TargetTransformInfo::isTypeLegal(Type *Ty) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->isTypeLegal(Ty);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::shouldBuildLookupTables() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->shouldBuildLookupTables();
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
2020-04-15 14:43:26 +02:00
|
|
|
bool TargetTransformInfo::shouldBuildLookupTablesForConstant(
|
|
|
|
Constant *C) const {
|
2016-10-07 10:48:24 +02:00
|
|
|
return TTIImpl->shouldBuildLookupTablesForConstant(C);
|
|
|
|
}
|
2013-01-05 12:43:11 +01:00
|
|
|
|
2018-01-30 17:17:22 +01:00
|
|
|
bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {
|
|
|
|
return TTIImpl->useColdCCForColdCall(F);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
unsigned TargetTransformInfo::getScalarizationOverhead(Type *Ty, bool Insert,
|
|
|
|
bool Extract) const {
|
2017-01-26 08:03:25 +01:00
|
|
|
return TTIImpl->getScalarizationOverhead(Ty, Insert, Extract);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
unsigned TargetTransformInfo::getOperandsScalarizationOverhead(
|
|
|
|
ArrayRef<const Value *> Args, unsigned VF) const {
|
2017-01-26 08:03:25 +01:00
|
|
|
return TTIImpl->getOperandsScalarizationOverhead(Args, VF);
|
|
|
|
}
|
|
|
|
|
2017-04-12 14:41:37 +02:00
|
|
|
bool TargetTransformInfo::supportsEfficientVectorElementLoadStore() const {
|
|
|
|
return TTIImpl->supportsEfficientVectorElementLoadStore();
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
bool TargetTransformInfo::enableAggressiveInterleaving(
|
|
|
|
bool LoopHasReductions) const {
|
2015-03-07 00:12:04 +01:00
|
|
|
return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);
|
|
|
|
}
|
|
|
|
|
2019-06-25 10:04:13 +02:00
|
|
|
TargetTransformInfo::MemCmpExpansionOptions
|
|
|
|
TargetTransformInfo::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
|
|
|
|
return TTIImpl->enableMemCmpExpansion(OptSize, IsZeroCmp);
|
2017-05-31 19:12:38 +02:00
|
|
|
}
|
|
|
|
|
2015-08-10 16:50:54 +02:00
|
|
|
bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
|
|
|
|
return TTIImpl->enableInterleavedAccessVectorization();
|
|
|
|
}
|
|
|
|
|
2018-10-14 10:50:06 +02:00
|
|
|
bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {
|
|
|
|
return TTIImpl->enableMaskedInterleavedAccessVectorization();
|
|
|
|
}
|
|
|
|
|
2016-04-14 22:42:18 +02:00
|
|
|
bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe() const {
|
|
|
|
return TTIImpl->isFPVectorizationPotentiallyUnsafe();
|
|
|
|
}
|
|
|
|
|
2016-08-04 18:38:44 +02:00
|
|
|
bool TargetTransformInfo::allowsMisalignedMemoryAccesses(LLVMContext &Context,
|
|
|
|
unsigned BitWidth,
|
2016-07-11 22:46:17 +02:00
|
|
|
unsigned AddressSpace,
|
|
|
|
unsigned Alignment,
|
|
|
|
bool *Fast) const {
|
2020-04-15 14:43:26 +02:00
|
|
|
return TTIImpl->allowsMisalignedMemoryAccesses(Context, BitWidth,
|
|
|
|
AddressSpace, Alignment, Fast);
|
2016-07-11 22:46:17 +02:00
|
|
|
}
|
|
|
|
|
2013-01-07 04:16:03 +01:00
|
|
|
TargetTransformInfo::PopcntSupportKind
|
|
|
|
TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->getPopcntSupport(IntTyWidthInBit);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2013-08-23 12:27:02 +02:00
|
|
|
bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->haveFastSqrt(Ty);
|
2013-08-23 12:27:02 +02:00
|
|
|
}
|
|
|
|
|
2017-11-27 22:15:43 +01:00
|
|
|
bool TargetTransformInfo::isFCmpOrdCheaperThanFCmpZero(Type *Ty) const {
|
|
|
|
return TTIImpl->isFCmpOrdCheaperThanFCmpZero(Ty);
|
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getFPOpCost(Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getFPOpCost(Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-02-05 03:09:33 +01:00
|
|
|
}
|
|
|
|
|
2016-07-14 09:44:20 +02:00
|
|
|
int TargetTransformInfo::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
|
|
|
|
const APInt &Imm,
|
|
|
|
Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCodeSizeCost(Opcode, Idx, Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCost(Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2019-12-11 20:54:58 +01:00
|
|
|
int TargetTransformInfo::getIntImmCostInst(unsigned Opcode, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCostInst(Opcode, Idx, Imm, Ty);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2014-01-25 03:02:55 +01:00
|
|
|
}
|
|
|
|
|
2019-12-11 20:54:58 +01:00
|
|
|
int TargetTransformInfo::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCostIntrin(IID, Idx, Imm, Ty);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2014-01-25 03:02:55 +01:00
|
|
|
}
|
|
|
|
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 04:53:04 +02:00
|
|
|
unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
|
|
|
|
return TTIImpl->getNumberOfRegisters(ClassID);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
unsigned TargetTransformInfo::getRegisterClassForType(bool Vector,
|
|
|
|
Type *Ty) const {
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 04:53:04 +02:00
|
|
|
return TTIImpl->getRegisterClassForType(Vector, Ty);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
const char *TargetTransformInfo::getRegisterClassName(unsigned ClassID) const {
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 04:53:04 +02:00
|
|
|
return TTIImpl->getRegisterClassName(ClassID);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2013-01-09 23:29:00 +01:00
|
|
|
unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->getRegisterBitWidth(Vector);
|
2013-01-09 23:29:00 +01:00
|
|
|
}
|
|
|
|
|
2017-05-15 23:15:01 +02:00
|
|
|
unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {
|
|
|
|
return TTIImpl->getMinVectorRegisterBitWidth();
|
|
|
|
}
|
|
|
|
|
2018-03-27 18:14:11 +02:00
|
|
|
bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {
|
|
|
|
return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);
|
|
|
|
}
|
|
|
|
|
2018-04-13 22:16:32 +02:00
|
|
|
unsigned TargetTransformInfo::getMinimumVF(unsigned ElemWidth) const {
|
|
|
|
return TTIImpl->getMinimumVF(ElemWidth);
|
|
|
|
}
|
|
|
|
|
2017-04-03 21:20:07 +02:00
|
|
|
bool TargetTransformInfo::shouldConsiderAddressTypePromotion(
|
|
|
|
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {
|
|
|
|
return TTIImpl->shouldConsiderAddressTypePromotion(
|
|
|
|
I, AllowPromotionWithoutCommonHeader);
|
|
|
|
}
|
|
|
|
|
2016-01-21 19:28:36 +01:00
|
|
|
unsigned TargetTransformInfo::getCacheLineSize() const {
|
|
|
|
return TTIImpl->getCacheLineSize();
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
llvm::Optional<unsigned>
|
|
|
|
TargetTransformInfo::getCacheSize(CacheLevel Level) const {
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 11:46:25 +02:00
|
|
|
return TTIImpl->getCacheSize(Level);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
llvm::Optional<unsigned>
|
|
|
|
TargetTransformInfo::getCacheAssociativity(CacheLevel Level) const {
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 11:46:25 +02:00
|
|
|
return TTIImpl->getCacheAssociativity(Level);
|
|
|
|
}
|
|
|
|
|
2016-01-27 23:21:25 +01:00
|
|
|
unsigned TargetTransformInfo::getPrefetchDistance() const {
|
|
|
|
return TTIImpl->getPrefetchDistance();
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
unsigned TargetTransformInfo::getMinPrefetchStride(
|
|
|
|
unsigned NumMemAccesses, unsigned NumStridedMemAccesses,
|
|
|
|
unsigned NumPrefetches, bool HasCall) const {
|
2019-10-31 16:05:58 +01:00
|
|
|
return TTIImpl->getMinPrefetchStride(NumMemAccesses, NumStridedMemAccesses,
|
|
|
|
NumPrefetches, HasCall);
|
2016-03-18 01:27:38 +01:00
|
|
|
}
|
|
|
|
|
2016-03-18 01:27:43 +01:00
|
|
|
unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {
|
|
|
|
return TTIImpl->getMaxPrefetchIterationsAhead();
|
|
|
|
}
|
|
|
|
|
2019-10-31 16:05:58 +01:00
|
|
|
bool TargetTransformInfo::enableWritePrefetching() const {
|
|
|
|
return TTIImpl->enableWritePrefetching();
|
|
|
|
}
|
|
|
|
|
2015-05-06 19:12:25 +02:00
|
|
|
unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {
|
|
|
|
return TTIImpl->getMaxInterleaveFactor(VF);
|
2013-01-09 02:15:42 +01:00
|
|
|
}
|
|
|
|
|
2018-10-05 16:34:04 +02:00
|
|
|
TargetTransformInfo::OperandValueKind
|
2018-11-13 14:45:10 +01:00
|
|
|
TargetTransformInfo::getOperandInfo(Value *V, OperandValueProperties &OpProps) {
|
2018-10-05 16:34:04 +02:00
|
|
|
OperandValueKind OpInfo = OK_AnyValue;
|
|
|
|
OpProps = OP_None;
|
|
|
|
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(V)) {
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
return OK_UniformConstantValue;
|
|
|
|
}
|
|
|
|
|
2018-11-14 16:04:08 +01:00
|
|
|
// A broadcast shuffle creates a uniform value.
|
|
|
|
// TODO: Add support for non-zero index broadcasts.
|
|
|
|
// TODO: Add support for different source vector width.
|
|
|
|
if (auto *ShuffleInst = dyn_cast<ShuffleVectorInst>(V))
|
|
|
|
if (ShuffleInst->isZeroEltSplat())
|
|
|
|
OpInfo = OK_UniformValue;
|
|
|
|
|
2018-10-05 16:34:04 +02:00
|
|
|
const Value *Splat = getSplatValue(V);
|
|
|
|
|
|
|
|
// Check for a splat of a constant or for a non uniform vector of constants
|
|
|
|
// and check if the constant(s) are all powers of two.
|
|
|
|
if (isa<ConstantVector>(V) || isa<ConstantDataVector>(V)) {
|
|
|
|
OpInfo = OK_NonUniformConstantValue;
|
|
|
|
if (Splat) {
|
|
|
|
OpInfo = OK_UniformConstantValue;
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(Splat))
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
} else if (auto *CDS = dyn_cast<ConstantDataSequential>(V)) {
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I) {
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(CDS->getElementAsConstant(I)))
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
continue;
|
|
|
|
OpProps = OP_None;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Check for a splat of a uniform value. This is not loop aware, so return
|
|
|
|
// true only for the obviously uniform cases (argument, globalvalue)
|
|
|
|
if (Splat && (isa<Argument>(Splat) || isa<GlobalValue>(Splat)))
|
|
|
|
OpInfo = OK_UniformValue;
|
|
|
|
|
|
|
|
return OpInfo;
|
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getArithmeticInstrCost(
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
|
|
|
|
OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
|
[ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
2019-12-08 16:33:24 +01:00
|
|
|
OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,
|
|
|
|
const Instruction *CxtI) const {
|
|
|
|
int Cost = TTIImpl->getArithmeticInstrCost(
|
|
|
|
Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo, Args, CxtI);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2020-04-17 14:29:31 +02:00
|
|
|
int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, VectorType *Ty,
|
|
|
|
int Index, VectorType *SubTp) const {
|
2015-08-05 20:08:10 +02:00
|
|
|
int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
int TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
|
|
|
|
const Instruction *I) const {
|
|
|
|
assert((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
2017-04-12 13:49:08 +02:00
|
|
|
int Cost = TTIImpl->getCastInstrCost(Opcode, Dst, Src, I);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2016-04-27 17:20:21 +02:00
|
|
|
int TargetTransformInfo::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
|
|
|
|
VectorType *VecTy,
|
|
|
|
unsigned Index) const {
|
|
|
|
int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getCFInstrCost(unsigned Opcode) const {
|
|
|
|
int Cost = TTIImpl->getCFInstrCost(Opcode);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
|
2020-04-15 14:43:26 +02:00
|
|
|
Type *CondTy,
|
|
|
|
const Instruction *I) const {
|
|
|
|
assert((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
2017-04-12 13:49:08 +02:00
|
|
|
int Cost = TTIImpl->getCmpSelInstrCost(Opcode, ValTy, CondTy, I);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getVectorInstrCost(unsigned Opcode, Type *Val,
|
|
|
|
unsigned Index) const {
|
|
|
|
int Cost = TTIImpl->getVectorInstrCost(Opcode, Val, Index);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getMemoryOpCost(unsigned Opcode, Type *Src,
|
2019-10-22 17:16:52 +02:00
|
|
|
MaybeAlign Alignment,
|
2017-04-12 13:49:08 +02:00
|
|
|
unsigned AddressSpace,
|
|
|
|
const Instruction *I) const {
|
2020-04-15 14:43:26 +02:00
|
|
|
assert((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
2017-04-12 13:49:08 +02:00
|
|
|
int Cost = TTIImpl->getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, I);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
|
|
|
|
unsigned Alignment,
|
|
|
|
unsigned AddressSpace) const {
|
|
|
|
int Cost =
|
|
|
|
TTIImpl->getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-01-25 09:44:46 +01:00
|
|
|
}
|
|
|
|
|
2015-12-28 21:10:59 +01:00
|
|
|
int TargetTransformInfo::getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
|
|
|
|
Value *Ptr, bool VariableMask,
|
2020-03-11 11:13:11 +01:00
|
|
|
unsigned Alignment,
|
|
|
|
const Instruction *I) const {
|
2015-12-28 21:10:59 +01:00
|
|
|
int Cost = TTIImpl->getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
|
2020-03-11 11:13:11 +01:00
|
|
|
Alignment, I);
|
2015-12-28 21:10:59 +01:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getInterleavedMemoryOpCost(
|
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
2015-06-08 08:39:56 +02:00
|
|
|
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
|
2018-10-31 10:57:56 +01:00
|
|
|
unsigned Alignment, unsigned AddressSpace, bool UseMaskForCond,
|
|
|
|
bool UseMaskForGaps) const {
|
2020-04-15 14:43:26 +02:00
|
|
|
int Cost = TTIImpl->getInterleavedMemoryOpCost(
|
|
|
|
Opcode, VecTy, Factor, Indices, Alignment, AddressSpace, UseMaskForCond,
|
|
|
|
UseMaskForGaps);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
2015-06-08 08:39:56 +02:00
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
|
2020-03-11 11:13:11 +01:00
|
|
|
ArrayRef<Type *> Tys,
|
|
|
|
FastMathFlags FMF,
|
|
|
|
unsigned ScalarizationCostPassed,
|
|
|
|
const Instruction *I) const {
|
2017-03-14 07:35:36 +01:00
|
|
|
int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
|
2020-03-11 11:13:11 +01:00
|
|
|
ScalarizationCostPassed, I);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-12-28 21:10:59 +01:00
|
|
|
int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
|
2020-03-11 11:13:11 +01:00
|
|
|
ArrayRef<Value *> Args,
|
|
|
|
FastMathFlags FMF, unsigned VF,
|
|
|
|
const Instruction *I) const {
|
|
|
|
int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Args, FMF, VF, I);
|
2015-12-28 21:10:59 +01:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getCallInstrCost(Function *F, Type *RetTy,
|
|
|
|
ArrayRef<Type *> Tys) const {
|
|
|
|
int Cost = TTIImpl->getCallInstrCost(F, RetTy, Tys);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-03-17 20:26:23 +01:00
|
|
|
}
|
|
|
|
|
2013-01-05 12:43:11 +01:00
|
|
|
unsigned TargetTransformInfo::getNumberOfParts(Type *Tp) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->getNumberOfParts(Tp);
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|
|
|
|
|
2015-08-05 20:08:10 +02:00
|
|
|
int TargetTransformInfo::getAddressComputationCost(Type *Tp,
|
2017-01-05 15:03:41 +01:00
|
|
|
ScalarEvolution *SE,
|
|
|
|
const SCEV *Ptr) const {
|
|
|
|
int Cost = TTIImpl->getAddressComputationCost(Tp, SE, Ptr);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-02-08 15:50:48 +01:00
|
|
|
}
|
2013-01-05 12:43:11 +01:00
|
|
|
|
2019-04-30 12:28:50 +02:00
|
|
|
int TargetTransformInfo::getMemcpyCost(const Instruction *I) const {
|
|
|
|
int Cost = TTIImpl->getMemcpyCost(I);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2020-04-17 14:29:31 +02:00
|
|
|
int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode,
|
|
|
|
VectorType *Ty,
|
2017-07-31 16:19:32 +02:00
|
|
|
bool IsPairwiseForm) const {
|
|
|
|
int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
|
2015-08-05 20:08:10 +02:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
}
|
|
|
|
|
2020-04-17 14:29:31 +02:00
|
|
|
int TargetTransformInfo::getMinMaxReductionCost(VectorType *Ty,
|
|
|
|
VectorType *CondTy,
|
2017-09-08 15:49:36 +02:00
|
|
|
bool IsPairwiseForm,
|
|
|
|
bool IsUnsigned) const {
|
|
|
|
int Cost =
|
|
|
|
TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm, IsUnsigned);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
unsigned
|
|
|
|
TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {
|
|
|
|
return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
|
Costmodel: Add support for horizontal vector reductions
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.
We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:
(v0, v1, v2, v3)
\ \ / /
\ \ /
+ +
(v0+v2, v1+v3, undef, undef)
\ /
((v0+v2) + (v1+v3), undef, undef)
%rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
%r = extractelement <4 x float> %bin.rdx8, i32 0
This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.
We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).
(v0, v1, v2, v3)
\ / \ /
(v0+v1, v2+v3, undef, undef)
\ /
((v0+v1)+(v2+v3), undef, undef, undef)
%rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
%r = extractelement <4 x float> %bin.rdx.1, i32 0
llvm-svn: 190876
2013-09-17 20:06:50 +02:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,
|
|
|
|
MemIntrinsicInfo &Info) const {
|
|
|
|
return TTIImpl->getTgtMemIntrinsic(Inst, Info);
|
2014-08-05 14:30:34 +02:00
|
|
|
}
|
|
|
|
|
2017-06-06 18:45:25 +02:00
|
|
|
unsigned TargetTransformInfo::getAtomicMemIntrinsicMaxElementSize() const {
|
|
|
|
return TTIImpl->getAtomicMemIntrinsicMaxElementSize();
|
|
|
|
}
|
|
|
|
|
2015-01-26 23:51:15 +01:00
|
|
|
Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic(
|
|
|
|
IntrinsicInst *Inst, Type *ExpectedType) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
return TTIImpl->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);
|
2015-01-26 23:51:15 +01:00
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
Type *TargetTransformInfo::getMemcpyLoopLoweringType(
|
|
|
|
LLVMContext &Context, Value *Length, unsigned SrcAddrSpace,
|
|
|
|
unsigned DestAddrSpace, unsigned SrcAlign, unsigned DestAlign) const {
|
2020-02-14 19:22:53 +01:00
|
|
|
return TTIImpl->getMemcpyLoopLoweringType(Context, Length, SrcAddrSpace,
|
2020-04-15 14:43:26 +02:00
|
|
|
DestAddrSpace, SrcAlign, DestAlign);
|
2017-07-07 04:00:06 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void TargetTransformInfo::getMemcpyLoopResidualLoweringType(
|
|
|
|
SmallVectorImpl<Type *> &OpsOut, LLVMContext &Context,
|
2020-04-15 14:43:26 +02:00
|
|
|
unsigned RemainingBytes, unsigned SrcAddrSpace, unsigned DestAddrSpace,
|
2020-02-14 19:22:53 +01:00
|
|
|
unsigned SrcAlign, unsigned DestAlign) const {
|
2017-07-07 04:00:06 +02:00
|
|
|
TTIImpl->getMemcpyLoopResidualLoweringType(OpsOut, Context, RemainingBytes,
|
2020-02-14 19:22:53 +01:00
|
|
|
SrcAddrSpace, DestAddrSpace,
|
2017-07-07 04:00:06 +02:00
|
|
|
SrcAlign, DestAlign);
|
|
|
|
}
|
|
|
|
|
2015-07-30 00:09:48 +02:00
|
|
|
bool TargetTransformInfo::areInlineCompatible(const Function *Caller,
|
|
|
|
const Function *Callee) const {
|
|
|
|
return TTIImpl->areInlineCompatible(Caller, Callee);
|
2018-03-26 15:10:09 +02:00
|
|
|
}
|
|
|
|
|
2019-01-16 06:15:31 +01:00
|
|
|
bool TargetTransformInfo::areFunctionArgsABICompatible(
|
|
|
|
const Function *Caller, const Function *Callee,
|
|
|
|
SmallPtrSetImpl<Argument *> &Args) const {
|
|
|
|
return TTIImpl->areFunctionArgsABICompatible(Caller, Callee, Args);
|
|
|
|
}
|
|
|
|
|
2018-03-26 15:10:09 +02:00
|
|
|
bool TargetTransformInfo::isIndexedLoadLegal(MemIndexedMode Mode,
|
|
|
|
Type *Ty) const {
|
|
|
|
return TTIImpl->isIndexedLoadLegal(Mode, Ty);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isIndexedStoreLegal(MemIndexedMode Mode,
|
|
|
|
Type *Ty) const {
|
|
|
|
return TTIImpl->isIndexedStoreLegal(Mode, Ty);
|
2015-07-02 03:11:47 +02:00
|
|
|
}
|
|
|
|
|
2016-10-03 12:31:34 +02:00
|
|
|
unsigned TargetTransformInfo::getLoadStoreVecRegBitWidth(unsigned AS) const {
|
|
|
|
return TTIImpl->getLoadStoreVecRegBitWidth(AS);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeLoad(LoadInst *LI) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeLoad(LI);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeStore(StoreInst *SI) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeStore(SI);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeLoadChain(
|
|
|
|
unsigned ChainSizeInBytes, unsigned Alignment, unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment,
|
|
|
|
AddrSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeStoreChain(
|
|
|
|
unsigned ChainSizeInBytes, unsigned Alignment, unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,
|
|
|
|
AddrSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getLoadVectorFactor(unsigned VF,
|
|
|
|
unsigned LoadSize,
|
|
|
|
unsigned ChainSizeInBytes,
|
|
|
|
VectorType *VecTy) const {
|
|
|
|
return TTIImpl->getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
|
|
|
|
unsigned StoreSize,
|
|
|
|
unsigned ChainSizeInBytes,
|
|
|
|
VectorType *VecTy) const {
|
|
|
|
return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode, Type *Ty,
|
|
|
|
ReductionFlags Flags) const {
|
2017-05-09 12:43:25 +02:00
|
|
|
return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
|
|
|
|
}
|
|
|
|
|
2017-05-10 11:42:49 +02:00
|
|
|
bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
|
|
|
|
return TTIImpl->shouldExpandReduction(II);
|
|
|
|
}
|
2017-05-09 12:43:25 +02:00
|
|
|
|
2019-06-18 01:20:29 +02:00
|
|
|
unsigned TargetTransformInfo::getGISelRematGlobalCost() const {
|
|
|
|
return TTIImpl->getGISelRematGlobalCost();
|
|
|
|
}
|
|
|
|
|
2017-09-09 00:29:17 +02:00
|
|
|
int TargetTransformInfo::getInstructionLatency(const Instruction *I) const {
|
|
|
|
return TTIImpl->getInstructionLatency(I);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool matchPairwiseShuffleMask(ShuffleVectorInst *SI, bool IsLeft,
|
|
|
|
unsigned Level) {
|
|
|
|
// We don't need a shuffle if we just want to have element 0 in position 0 of
|
|
|
|
// the vector.
|
|
|
|
if (!SI && Level == 0 && IsLeft)
|
|
|
|
return true;
|
|
|
|
else if (!SI)
|
|
|
|
return false;
|
|
|
|
|
2020-04-09 21:19:23 +02:00
|
|
|
SmallVector<int, 32> Mask(SI->getType()->getNumElements(), -1);
|
2017-09-09 00:29:17 +02:00
|
|
|
|
|
|
|
// Build a mask of 0, 2, ... (left) or 1, 3, ... (right) depending on whether
|
|
|
|
// we look at the left or right side.
|
|
|
|
for (unsigned i = 0, e = (1 << Level), val = !IsLeft; i != e; ++i, val += 2)
|
|
|
|
Mask[i] = val;
|
|
|
|
|
2020-03-31 22:08:59 +02:00
|
|
|
ArrayRef<int> ActualMask = SI->getShuffleMask();
|
2017-09-09 00:29:17 +02:00
|
|
|
return Mask == ActualMask;
|
|
|
|
}
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
/// Kind of the reduction data.
|
|
|
|
enum ReductionKind {
|
|
|
|
RK_None, /// Not a reduction.
|
|
|
|
RK_Arithmetic, /// Binary reduction data.
|
|
|
|
RK_MinMax, /// Min/max reduction data.
|
|
|
|
RK_UnsignedMinMax, /// Unsigned min/max reduction data.
|
|
|
|
};
|
|
|
|
/// Contains opcode + LHS/RHS parts of the reduction operations.
|
|
|
|
struct ReductionData {
|
|
|
|
ReductionData() = delete;
|
|
|
|
ReductionData(ReductionKind Kind, unsigned Opcode, Value *LHS, Value *RHS)
|
|
|
|
: Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind) {
|
|
|
|
assert(Kind != RK_None && "expected binary or min/max reduction only.");
|
|
|
|
}
|
|
|
|
unsigned Opcode = 0;
|
|
|
|
Value *LHS = nullptr;
|
|
|
|
Value *RHS = nullptr;
|
|
|
|
ReductionKind Kind = RK_None;
|
|
|
|
bool hasSameData(ReductionData &RD) const {
|
|
|
|
return Kind == RD.Kind && Opcode == RD.Opcode;
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
|
|
|
static Optional<ReductionData> getReductionData(Instruction *I) {
|
|
|
|
Value *L, *R;
|
|
|
|
if (m_BinOp(m_Value(L), m_Value(R)).match(I))
|
2018-07-30 21:41:25 +02:00
|
|
|
return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
|
2017-09-09 00:29:17 +02:00
|
|
|
if (auto *SI = dyn_cast<SelectInst>(I)) {
|
|
|
|
if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_SMax(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_OrdFMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_OrdFMax(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
|
|
|
|
auto *CI = cast<CmpInst>(SI->getCondition());
|
2018-07-30 21:41:25 +02:00
|
|
|
return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
|
|
|
|
}
|
2017-09-09 00:29:17 +02:00
|
|
|
if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UMax(m_Value(L), m_Value(R)).match(SI)) {
|
|
|
|
auto *CI = cast<CmpInst>(SI->getCondition());
|
|
|
|
return ReductionData(RK_UnsignedMinMax, CI->getOpcode(), L, R);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return llvm::None;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
|
|
|
|
unsigned Level,
|
|
|
|
unsigned NumLevels) {
|
|
|
|
// Match one level of pairwise operations.
|
|
|
|
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
|
|
|
|
if (!I)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
assert(I->getType()->isVectorTy() && "Expecting a vector type");
|
|
|
|
|
|
|
|
Optional<ReductionData> RD = getReductionData(I);
|
|
|
|
if (!RD)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
|
|
|
|
if (!LS && Level)
|
|
|
|
return RK_None;
|
|
|
|
ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
|
|
|
|
if (!RS && Level)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// On level 0 we can omit one shufflevector instruction.
|
|
|
|
if (!Level && !RS && !LS)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Shuffle inputs must match.
|
|
|
|
Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
|
|
|
|
Value *NextLevelOpR = RS ? RS->getOperand(0) : nullptr;
|
|
|
|
Value *NextLevelOp = nullptr;
|
|
|
|
if (NextLevelOpR && NextLevelOpL) {
|
|
|
|
// If we have two shuffles their operands must match.
|
|
|
|
if (NextLevelOpL != NextLevelOpR)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
NextLevelOp = NextLevelOpL;
|
|
|
|
} else if (Level == 0 && (NextLevelOpR || NextLevelOpL)) {
|
|
|
|
// On the first level we can omit the shufflevector <0, undef,...>. So the
|
|
|
|
// input to the other shufflevector <1, undef> must match with one of the
|
|
|
|
// inputs to the current binary operation.
|
|
|
|
// Example:
|
|
|
|
// %NextLevelOpL = shufflevector %R, <1, undef ...>
|
|
|
|
// %BinOp = fadd %NextLevelOpL, %R
|
|
|
|
if (NextLevelOpL && NextLevelOpL != RD->RHS)
|
|
|
|
return RK_None;
|
|
|
|
else if (NextLevelOpR && NextLevelOpR != RD->LHS)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
|
|
|
|
} else
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Check that the next levels binary operation exists and matches with the
|
|
|
|
// current one.
|
|
|
|
if (Level + 1 != NumLevels) {
|
|
|
|
Optional<ReductionData> NextLevelRD =
|
|
|
|
getReductionData(cast<Instruction>(NextLevelOp));
|
|
|
|
if (!NextLevelRD || !RD->hasSameData(*NextLevelRD))
|
|
|
|
return RK_None;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Shuffle mask for pairwise operation must match.
|
|
|
|
if (matchPairwiseShuffleMask(LS, /*IsLeft=*/true, Level)) {
|
|
|
|
if (!matchPairwiseShuffleMask(RS, /*IsLeft=*/false, Level))
|
|
|
|
return RK_None;
|
|
|
|
} else if (matchPairwiseShuffleMask(RS, /*IsLeft=*/true, Level)) {
|
|
|
|
if (!matchPairwiseShuffleMask(LS, /*IsLeft=*/false, Level))
|
|
|
|
return RK_None;
|
|
|
|
} else {
|
|
|
|
return RK_None;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (++Level == NumLevels)
|
|
|
|
return RD->Kind;
|
|
|
|
|
|
|
|
// Match next level.
|
|
|
|
return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp), Level,
|
|
|
|
NumLevels);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ReductionKind matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
|
2020-04-17 14:29:31 +02:00
|
|
|
unsigned &Opcode,
|
|
|
|
VectorType *&Ty) {
|
2017-09-09 00:29:17 +02:00
|
|
|
if (!EnableReduxCost)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Need to extract the first element.
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
|
|
|
|
unsigned Idx = ~0u;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
if (Idx != 0)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
|
|
|
|
if (!RdxStart)
|
|
|
|
return RK_None;
|
|
|
|
Optional<ReductionData> RD = getReductionData(RdxStart);
|
|
|
|
if (!RD)
|
|
|
|
return RK_None;
|
|
|
|
|
2020-04-09 21:19:23 +02:00
|
|
|
auto *VecTy = cast<VectorType>(RdxStart->getType());
|
|
|
|
unsigned NumVecElems = VecTy->getNumElements();
|
2017-09-09 00:29:17 +02:00
|
|
|
if (!isPowerOf2_32(NumVecElems))
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// We look for a sequence of shuffle,shuffle,add triples like the following
|
|
|
|
// that builds a pairwise reduction tree.
|
2018-07-30 21:41:25 +02:00
|
|
|
//
|
2017-09-09 00:29:17 +02:00
|
|
|
// (X0, X1, X2, X3)
|
|
|
|
// (X0 + X1, X2 + X3, undef, undef)
|
|
|
|
// ((X0 + X1) + (X2 + X3), undef, undef, undef)
|
2018-07-30 21:41:25 +02:00
|
|
|
//
|
2017-09-09 00:29:17 +02:00
|
|
|
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
|
|
|
|
// %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
|
|
|
|
// %r = extractelement <4 x float> %bin.rdx8, i32 0
|
|
|
|
if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
|
|
|
|
RK_None)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
Opcode = RD->Opcode;
|
|
|
|
Ty = VecTy;
|
|
|
|
|
|
|
|
return RD->Kind;
|
|
|
|
}
|
|
|
|
|
|
|
|
static std::pair<Value *, ShuffleVectorInst *>
|
|
|
|
getShuffleAndOtherOprd(Value *L, Value *R) {
|
|
|
|
ShuffleVectorInst *S = nullptr;
|
|
|
|
|
|
|
|
if ((S = dyn_cast<ShuffleVectorInst>(L)))
|
|
|
|
return std::make_pair(R, S);
|
|
|
|
|
|
|
|
S = dyn_cast<ShuffleVectorInst>(R);
|
|
|
|
return std::make_pair(L, S);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ReductionKind
|
|
|
|
matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
|
2020-04-17 14:29:31 +02:00
|
|
|
unsigned &Opcode, VectorType *&Ty) {
|
2017-09-09 00:29:17 +02:00
|
|
|
if (!EnableReduxCost)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Need to extract the first element.
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
|
|
|
|
unsigned Idx = ~0u;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
if (Idx != 0)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
|
|
|
|
if (!RdxStart)
|
|
|
|
return RK_None;
|
|
|
|
Optional<ReductionData> RD = getReductionData(RdxStart);
|
|
|
|
if (!RD)
|
|
|
|
return RK_None;
|
|
|
|
|
2020-04-09 21:19:23 +02:00
|
|
|
auto *VecTy = cast<VectorType>(ReduxRoot->getOperand(0)->getType());
|
|
|
|
unsigned NumVecElems = VecTy->getNumElements();
|
2017-09-09 00:29:17 +02:00
|
|
|
if (!isPowerOf2_32(NumVecElems))
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// We look for a sequence of shuffles and adds like the following matching one
|
|
|
|
// fadd, shuffle vector pair at a time.
|
2018-07-30 21:41:25 +02:00
|
|
|
//
|
2017-09-09 00:29:17 +02:00
|
|
|
// %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
|
|
|
|
// %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
|
|
|
|
// %r = extractelement <4 x float> %bin.rdx8, i32 0
|
|
|
|
|
|
|
|
unsigned MaskStart = 1;
|
|
|
|
Instruction *RdxOp = RdxStart;
|
2018-07-30 21:41:25 +02:00
|
|
|
SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
|
2017-09-09 00:29:17 +02:00
|
|
|
unsigned NumVecElemsRemain = NumVecElems;
|
|
|
|
while (NumVecElemsRemain - 1) {
|
|
|
|
// Check for the right reduction operation.
|
|
|
|
if (!RdxOp)
|
|
|
|
return RK_None;
|
|
|
|
Optional<ReductionData> RDLevel = getReductionData(RdxOp);
|
|
|
|
if (!RDLevel || !RDLevel->hasSameData(*RD))
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
Value *NextRdxOp;
|
|
|
|
ShuffleVectorInst *Shuffle;
|
|
|
|
std::tie(NextRdxOp, Shuffle) =
|
|
|
|
getShuffleAndOtherOprd(RDLevel->LHS, RDLevel->RHS);
|
|
|
|
|
|
|
|
// Check the current reduction operation and the shuffle use the same value.
|
|
|
|
if (Shuffle == nullptr)
|
|
|
|
return RK_None;
|
|
|
|
if (Shuffle->getOperand(0) != NextRdxOp)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Check that shuffle masks matches.
|
|
|
|
for (unsigned j = 0; j != MaskStart; ++j)
|
|
|
|
ShuffleMask[j] = MaskStart + j;
|
|
|
|
// Fill the rest of the mask with -1 for undef.
|
|
|
|
std::fill(&ShuffleMask[MaskStart], ShuffleMask.end(), -1);
|
|
|
|
|
2020-03-31 22:08:59 +02:00
|
|
|
ArrayRef<int> Mask = Shuffle->getShuffleMask();
|
2017-09-09 00:29:17 +02:00
|
|
|
if (ShuffleMask != Mask)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
RdxOp = dyn_cast<Instruction>(NextRdxOp);
|
|
|
|
NumVecElemsRemain /= 2;
|
|
|
|
MaskStart *= 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
Opcode = RD->Opcode;
|
|
|
|
Ty = VecTy;
|
|
|
|
return RD->Kind;
|
|
|
|
}
|
|
|
|
|
|
|
|
int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
|
|
|
|
switch (I->getOpcode()) {
|
|
|
|
case Instruction::GetElementPtr:
|
|
|
|
return getUserCost(I);
|
|
|
|
|
|
|
|
case Instruction::Ret:
|
|
|
|
case Instruction::PHI:
|
|
|
|
case Instruction::Br: {
|
|
|
|
return getCFInstrCost(I->getOpcode());
|
|
|
|
}
|
|
|
|
case Instruction::Add:
|
|
|
|
case Instruction::FAdd:
|
|
|
|
case Instruction::Sub:
|
|
|
|
case Instruction::FSub:
|
|
|
|
case Instruction::Mul:
|
|
|
|
case Instruction::FMul:
|
|
|
|
case Instruction::UDiv:
|
|
|
|
case Instruction::SDiv:
|
|
|
|
case Instruction::FDiv:
|
|
|
|
case Instruction::URem:
|
|
|
|
case Instruction::SRem:
|
|
|
|
case Instruction::FRem:
|
|
|
|
case Instruction::Shl:
|
|
|
|
case Instruction::LShr:
|
|
|
|
case Instruction::AShr:
|
|
|
|
case Instruction::And:
|
|
|
|
case Instruction::Or:
|
|
|
|
case Instruction::Xor: {
|
2018-05-22 12:40:09 +02:00
|
|
|
TargetTransformInfo::OperandValueKind Op1VK, Op2VK;
|
|
|
|
TargetTransformInfo::OperandValueProperties Op1VP, Op2VP;
|
|
|
|
Op1VK = getOperandInfo(I->getOperand(0), Op1VP);
|
|
|
|
Op2VK = getOperandInfo(I->getOperand(1), Op2VP);
|
|
|
|
SmallVector<const Value *, 2> Operands(I->operand_values());
|
|
|
|
return getArithmeticInstrCost(I->getOpcode(), I->getType(), Op1VK, Op2VK,
|
[ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
2019-12-08 16:33:24 +01:00
|
|
|
Op1VP, Op2VP, Operands, I);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
2019-05-28 06:09:18 +02:00
|
|
|
case Instruction::FNeg: {
|
|
|
|
TargetTransformInfo::OperandValueKind Op1VK, Op2VK;
|
|
|
|
TargetTransformInfo::OperandValueProperties Op1VP, Op2VP;
|
|
|
|
Op1VK = getOperandInfo(I->getOperand(0), Op1VP);
|
|
|
|
Op2VK = OK_AnyValue;
|
|
|
|
Op2VP = OP_None;
|
|
|
|
SmallVector<const Value *, 2> Operands(I->operand_values());
|
|
|
|
return getArithmeticInstrCost(I->getOpcode(), I->getType(), Op1VK, Op2VK,
|
[ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
2019-12-08 16:33:24 +01:00
|
|
|
Op1VP, Op2VP, Operands, I);
|
2019-05-28 06:09:18 +02:00
|
|
|
}
|
2017-09-09 00:29:17 +02:00
|
|
|
case Instruction::Select: {
|
|
|
|
const SelectInst *SI = cast<SelectInst>(I);
|
|
|
|
Type *CondTy = SI->getCondition()->getType();
|
|
|
|
return getCmpSelInstrCost(I->getOpcode(), I->getType(), CondTy, I);
|
|
|
|
}
|
|
|
|
case Instruction::ICmp:
|
|
|
|
case Instruction::FCmp: {
|
|
|
|
Type *ValTy = I->getOperand(0)->getType();
|
|
|
|
return getCmpSelInstrCost(I->getOpcode(), ValTy, I->getType(), I);
|
|
|
|
}
|
|
|
|
case Instruction::Store: {
|
|
|
|
const StoreInst *SI = cast<StoreInst>(I);
|
|
|
|
Type *ValTy = SI->getValueOperand()->getType();
|
|
|
|
return getMemoryOpCost(I->getOpcode(), ValTy,
|
2019-10-22 17:16:52 +02:00
|
|
|
MaybeAlign(SI->getAlignment()),
|
|
|
|
SI->getPointerAddressSpace(), I);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
|
|
|
case Instruction::Load: {
|
|
|
|
const LoadInst *LI = cast<LoadInst>(I);
|
|
|
|
return getMemoryOpCost(I->getOpcode(), I->getType(),
|
2019-10-22 17:16:52 +02:00
|
|
|
MaybeAlign(LI->getAlignment()),
|
|
|
|
LI->getPointerAddressSpace(), I);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
|
|
|
case Instruction::ZExt:
|
|
|
|
case Instruction::SExt:
|
|
|
|
case Instruction::FPToUI:
|
|
|
|
case Instruction::FPToSI:
|
|
|
|
case Instruction::FPExt:
|
|
|
|
case Instruction::PtrToInt:
|
|
|
|
case Instruction::IntToPtr:
|
|
|
|
case Instruction::SIToFP:
|
|
|
|
case Instruction::UIToFP:
|
|
|
|
case Instruction::Trunc:
|
|
|
|
case Instruction::FPTrunc:
|
|
|
|
case Instruction::BitCast:
|
|
|
|
case Instruction::AddrSpaceCast: {
|
|
|
|
Type *SrcTy = I->getOperand(0)->getType();
|
|
|
|
return getCastInstrCost(I->getOpcode(), I->getType(), SrcTy, I);
|
|
|
|
}
|
|
|
|
case Instruction::ExtractElement: {
|
2020-04-15 14:43:26 +02:00
|
|
|
const ExtractElementInst *EEI = cast<ExtractElementInst>(I);
|
2017-09-09 00:29:17 +02:00
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(I->getOperand(1));
|
|
|
|
unsigned Idx = -1;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
|
|
|
|
// Try to match a reduction sequence (series of shufflevector and vector
|
|
|
|
// adds followed by a extractelement).
|
|
|
|
unsigned ReduxOpCode;
|
2020-04-17 14:29:31 +02:00
|
|
|
VectorType *ReduxType;
|
2017-09-09 00:29:17 +02:00
|
|
|
|
|
|
|
switch (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
|
|
|
|
case RK_Arithmetic:
|
|
|
|
return getArithmeticReductionCost(ReduxOpCode, ReduxType,
|
2020-04-15 14:43:26 +02:00
|
|
|
/*IsPairwiseForm=*/false);
|
2017-09-09 00:29:17 +02:00
|
|
|
case RK_MinMax:
|
|
|
|
return getMinMaxReductionCost(
|
2020-04-17 14:29:31 +02:00
|
|
|
ReduxType, cast<VectorType>(CmpInst::makeCmpResultType(ReduxType)),
|
2017-09-09 00:29:17 +02:00
|
|
|
/*IsPairwiseForm=*/false, /*IsUnsigned=*/false);
|
|
|
|
case RK_UnsignedMinMax:
|
|
|
|
return getMinMaxReductionCost(
|
2020-04-17 14:29:31 +02:00
|
|
|
ReduxType, cast<VectorType>(CmpInst::makeCmpResultType(ReduxType)),
|
2017-09-09 00:29:17 +02:00
|
|
|
/*IsPairwiseForm=*/false, /*IsUnsigned=*/true);
|
|
|
|
case RK_None:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
|
|
|
|
case RK_Arithmetic:
|
|
|
|
return getArithmeticReductionCost(ReduxOpCode, ReduxType,
|
2020-04-15 14:43:26 +02:00
|
|
|
/*IsPairwiseForm=*/true);
|
2017-09-09 00:29:17 +02:00
|
|
|
case RK_MinMax:
|
|
|
|
return getMinMaxReductionCost(
|
2020-04-17 14:29:31 +02:00
|
|
|
ReduxType, cast<VectorType>(CmpInst::makeCmpResultType(ReduxType)),
|
2017-09-09 00:29:17 +02:00
|
|
|
/*IsPairwiseForm=*/true, /*IsUnsigned=*/false);
|
|
|
|
case RK_UnsignedMinMax:
|
|
|
|
return getMinMaxReductionCost(
|
2020-04-17 14:29:31 +02:00
|
|
|
ReduxType, cast<VectorType>(CmpInst::makeCmpResultType(ReduxType)),
|
2017-09-09 00:29:17 +02:00
|
|
|
/*IsPairwiseForm=*/true, /*IsUnsigned=*/true);
|
|
|
|
case RK_None:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2020-04-15 14:43:26 +02:00
|
|
|
return getVectorInstrCost(I->getOpcode(), EEI->getOperand(0)->getType(),
|
|
|
|
Idx);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
|
|
|
case Instruction::InsertElement: {
|
2020-04-15 14:43:26 +02:00
|
|
|
const InsertElementInst *IE = cast<InsertElementInst>(I);
|
2017-09-09 00:29:17 +02:00
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
|
2018-07-30 21:41:25 +02:00
|
|
|
unsigned Idx = -1;
|
2017-09-09 00:29:17 +02:00
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
2020-04-15 14:43:26 +02:00
|
|
|
return getVectorInstrCost(I->getOpcode(), IE->getType(), Idx);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
[CostModel] Model all `extractvalue`s as free.
Summary:
As disscussed in https://reviews.llvm.org/D65148#1606412,
`extractvalue` don't actually generate any code,
so we should treat them as free.
Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen
Reviewed By: jmolloy
Subscribers: javed.absar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66098
llvm-svn: 370339
2019-08-29 13:50:30 +02:00
|
|
|
case Instruction::ExtractValue:
|
|
|
|
return 0; // Model all ExtractValue nodes as free.
|
2017-09-09 00:29:17 +02:00
|
|
|
case Instruction::ShuffleVector: {
|
|
|
|
const ShuffleVectorInst *Shuffle = cast<ShuffleVectorInst>(I);
|
2020-04-17 14:29:31 +02:00
|
|
|
auto *Ty = cast<VectorType>(Shuffle->getType());
|
|
|
|
auto *SrcTy = cast<VectorType>(Shuffle->getOperand(0)->getType());
|
2018-11-09 17:28:19 +01:00
|
|
|
|
|
|
|
// TODO: Identify and add costs for insert subvector, etc.
|
|
|
|
int SubIndex;
|
|
|
|
if (Shuffle->isExtractSubvectorMask(SubIndex))
|
2018-11-09 19:30:59 +01:00
|
|
|
return TTIImpl->getShuffleCost(SK_ExtractSubvector, SrcTy, SubIndex, Ty);
|
2018-11-09 17:28:19 +01:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->changesLength())
|
|
|
|
return -1;
|
2018-07-30 21:41:25 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->isIdentity())
|
|
|
|
return 0;
|
2017-09-09 00:29:17 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->isReverse())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Reverse, Ty, 0, nullptr);
|
2018-06-12 16:47:13 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->isSelect())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Select, Ty, 0, nullptr);
|
2018-06-12 16:47:13 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->isTranspose())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Transpose, Ty, 0, nullptr);
|
2018-04-26 15:48:33 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->isZeroEltSplat())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Broadcast, Ty, 0, nullptr);
|
2017-09-09 00:29:17 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
if (Shuffle->isSingleSource())
|
|
|
|
return TTIImpl->getShuffleCost(SK_PermuteSingleSrc, Ty, 0, nullptr);
|
2017-09-09 00:29:17 +02:00
|
|
|
|
2018-06-19 20:44:00 +02:00
|
|
|
return TTIImpl->getShuffleCost(SK_PermuteTwoSrc, Ty, 0, nullptr);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
|
|
|
case Instruction::Call:
|
|
|
|
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
|
|
|
|
SmallVector<Value *, 4> Args(II->arg_operands());
|
|
|
|
|
|
|
|
FastMathFlags FMF;
|
|
|
|
if (auto *FPMO = dyn_cast<FPMathOperator>(II))
|
|
|
|
FMF = FPMO->getFastMathFlags();
|
|
|
|
|
2020-03-11 11:13:11 +01:00
|
|
|
return getIntrinsicInstrCost(II->getIntrinsicID(), II->getType(), Args,
|
|
|
|
FMF, 1, II);
|
2017-09-09 00:29:17 +02:00
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
default:
|
|
|
|
// We don't have any information on this instruction.
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
TargetTransformInfo::Concept::~Concept() {}
|
2015-01-26 23:51:15 +01:00
|
|
|
|
2015-02-01 11:11:22 +01:00
|
|
|
TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
|
|
|
|
|
|
|
|
TargetIRAnalysis::TargetIRAnalysis(
|
2015-09-17 01:38:13 +02:00
|
|
|
std::function<Result(const Function &)> TTICallback)
|
2016-05-27 16:27:24 +02:00
|
|
|
: TTICallback(std::move(TTICallback)) {}
|
2015-02-01 11:11:22 +01:00
|
|
|
|
2016-06-17 02:11:01 +02:00
|
|
|
TargetIRAnalysis::Result TargetIRAnalysis::run(const Function &F,
|
2016-08-09 02:28:15 +02:00
|
|
|
FunctionAnalysisManager &) {
|
2015-02-01 11:11:22 +01:00
|
|
|
return TTICallback(F);
|
|
|
|
}
|
|
|
|
|
2016-11-23 18:53:26 +01:00
|
|
|
AnalysisKey TargetIRAnalysis::Key;
|
2016-02-28 18:17:00 +01:00
|
|
|
|
2015-09-17 01:38:13 +02:00
|
|
|
TargetIRAnalysis::Result TargetIRAnalysis::getDefaultTTI(const Function &F) {
|
2015-07-09 04:08:42 +02:00
|
|
|
return Result(F.getParent()->getDataLayout());
|
2015-02-01 11:11:22 +01:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
// Register the basic pass.
|
|
|
|
INITIALIZE_PASS(TargetTransformInfoWrapperPass, "tti",
|
|
|
|
"Target Transform Information", false, true)
|
|
|
|
char TargetTransformInfoWrapperPass::ID = 0;
|
2013-01-05 12:43:11 +01:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
void TargetTransformInfoWrapperPass::anchor() {}
|
2013-01-05 12:43:11 +01:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass()
|
2015-02-01 13:26:09 +01:00
|
|
|
: ImmutablePass(ID) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
initializeTargetTransformInfoWrapperPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
|
|
|
|
|
|
|
TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass(
|
2015-02-01 13:26:09 +01:00
|
|
|
TargetIRAnalysis TIRA)
|
|
|
|
: ImmutablePass(ID), TIRA(std::move(TIRA)) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 04:43:40 +01:00
|
|
|
initializeTargetTransformInfoWrapperPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
2013-01-05 12:43:11 +01:00
|
|
|
|
2015-09-17 01:38:13 +02:00
|
|
|
TargetTransformInfo &TargetTransformInfoWrapperPass::getTTI(const Function &F) {
|
2016-08-09 02:28:15 +02:00
|
|
|
FunctionAnalysisManager DummyFAM;
|
2016-06-17 02:11:01 +02:00
|
|
|
TTI = TIRA.run(F, DummyFAM);
|
2015-02-01 13:26:09 +01:00
|
|
|
return *TTI;
|
|
|
|
}
|
|
|
|
|
2015-01-31 12:17:59 +01:00
|
|
|
ImmutablePass *
|
2015-02-01 13:26:09 +01:00
|
|
|
llvm::createTargetTransformInfoWrapperPass(TargetIRAnalysis TIRA) {
|
|
|
|
return new TargetTransformInfoWrapperPass(std::move(TIRA));
|
2013-01-05 12:43:11 +01:00
|
|
|
}
|