2008-05-14 22:38:44 +02:00
|
|
|
//===- SimplifyCFGPass.cpp - CFG Simplification Pass ----------------------===//
|
2005-04-22 01:48:37 +02:00
|
|
|
//
|
2019-01-19 09:50:56 +01:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2005-04-22 01:48:37 +02:00
|
|
|
//
|
2003-10-20 21:43:21 +02:00
|
|
|
//===----------------------------------------------------------------------===//
|
2002-05-21 22:49:37 +02:00
|
|
|
//
|
2007-12-03 20:43:18 +01:00
|
|
|
// This file implements dead code elimination and basic block merging, along
|
|
|
|
// with a collection of other peephole control flow optimizations. For example:
|
2002-05-21 22:49:37 +02:00
|
|
|
//
|
2007-11-04 17:15:04 +01:00
|
|
|
// * Removes basic blocks with no predecessors.
|
|
|
|
// * Merges a basic block into its predecessor if there is only one and the
|
2002-05-21 22:49:37 +02:00
|
|
|
// predecessor only has one successor.
|
2007-11-04 17:15:04 +01:00
|
|
|
// * Eliminates PHI nodes for basic blocks with a single predecessor.
|
|
|
|
// * Eliminates a basic block that only contains an unconditional branch.
|
2007-12-03 20:43:18 +01:00
|
|
|
// * Changes invoke instructions to nounwind functions to be calls.
|
|
|
|
// * Change things like "if (x) if (y)" into "if (x&y)".
|
|
|
|
// * etc..
|
2002-05-21 22:49:37 +02:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
#include "llvm/ADT/MapVector.h"
|
2012-12-03 17:50:05 +01:00
|
|
|
#include "llvm/ADT/SmallPtrSet.h"
|
|
|
|
#include "llvm/ADT/SmallVector.h"
|
|
|
|
#include "llvm/ADT/Statistic.h"
|
2016-12-19 09:22:17 +01:00
|
|
|
#include "llvm/Analysis/AssumptionCache.h"
|
2016-03-29 06:08:57 +02:00
|
|
|
#include "llvm/Analysis/CFG.h"
|
2020-12-16 19:48:29 +01:00
|
|
|
#include "llvm/Analysis/DomTreeUpdater.h"
|
2016-05-27 16:27:24 +02:00
|
|
|
#include "llvm/Analysis/GlobalsModRef.h"
|
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
2013-01-02 12:36:10 +01:00
|
|
|
#include "llvm/IR/Attributes.h"
|
2014-03-04 12:45:46 +01:00
|
|
|
#include "llvm/IR/CFG.h"
|
2013-01-02 12:36:10 +01:00
|
|
|
#include "llvm/IR/Constants.h"
|
|
|
|
#include "llvm/IR/DataLayout.h"
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
#include "llvm/IR/Dominators.h"
|
2013-01-02 12:36:10 +01:00
|
|
|
#include "llvm/IR/Instructions.h"
|
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
|
|
|
#include "llvm/IR/Module.h"
|
[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.
There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.
Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.
But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.
2021-01-23 14:23:11 +01:00
|
|
|
#include "llvm/IR/ValueHandle.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 22:15:01 +01:00
|
|
|
#include "llvm/InitializePasses.h"
|
2002-05-21 22:49:37 +02:00
|
|
|
#include "llvm/Pass.h"
|
[SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.
The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:
__global__ void foo(int a, int b, int c, int d, int e, int n,
const int *input, int *output) {
int sum = 0;
for (int i = 0; i < n; ++i)
sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
*output = sum;
}
The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:
sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.
We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!
Test Plan: Added one test case to check the threshold is in effect
Reviewers: nadav, eliben, meheff, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D5529
llvm-svn: 218711
2014-10-01 00:23:38 +02:00
|
|
|
#include "llvm/Support/CommandLine.h"
|
2015-02-01 12:34:21 +01:00
|
|
|
#include "llvm/Transforms/Scalar.h"
|
2016-05-27 16:27:24 +02:00
|
|
|
#include "llvm/Transforms/Scalar/SimplifyCFG.h"
|
2021-05-19 10:31:53 +02:00
|
|
|
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 22:15:01 +01:00
|
|
|
#include "llvm/Transforms/Utils/Local.h"
|
2020-07-16 11:52:55 +02:00
|
|
|
#include "llvm/Transforms/Utils/SimplifyCFGOptions.h"
|
2016-05-27 16:27:24 +02:00
|
|
|
#include <utility>
|
2004-01-09 07:02:20 +01:00
|
|
|
using namespace llvm;
|
2003-11-11 23:41:34 +01:00
|
|
|
|
2014-04-22 04:55:47 +02:00
|
|
|
#define DEBUG_TYPE "simplifycfg"
|
|
|
|
|
2017-10-28 20:43:07 +02:00
|
|
|
static cl::opt<unsigned> UserBonusInstThreshold(
|
|
|
|
"bonus-inst-threshold", cl::Hidden, cl::init(1),
|
|
|
|
cl::desc("Control the number of bonus instructions (default = 1)"));
|
|
|
|
|
|
|
|
static cl::opt<bool> UserKeepLoops(
|
|
|
|
"keep-loops", cl::Hidden, cl::init(true),
|
|
|
|
cl::desc("Preserve canonical loop structure (default = true)"));
|
|
|
|
|
|
|
|
static cl::opt<bool> UserSwitchToLookup(
|
|
|
|
"switch-to-lookup", cl::Hidden, cl::init(false),
|
|
|
|
cl::desc("Convert switches to lookup tables (default = false)"));
|
|
|
|
|
|
|
|
static cl::opt<bool> UserForwardSwitchCond(
|
|
|
|
"forward-switch-cond", cl::Hidden, cl::init(false),
|
|
|
|
cl::desc("Forward switch condition to phi ops (default = false)"));
|
[SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.
The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:
__global__ void foo(int a, int b, int c, int d, int e, int n,
const int *input, int *output) {
int sum = 0;
for (int i = 0; i < n; ++i)
sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
*output = sum;
}
The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:
sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.
We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!
Test Plan: Added one test case to check the threshold is in effect
Reviewers: nadav, eliben, meheff, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D5529
llvm-svn: 218711
2014-10-01 00:23:38 +02:00
|
|
|
|
2020-07-20 08:56:38 +02:00
|
|
|
static cl::opt<bool> UserHoistCommonInsts(
|
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec2183d466dad64b763bab4e15fd8804239
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.
It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric
So let's just reland this.
Original commit message:
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.
After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.
Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.
I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.
Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.
Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.
As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S
llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement
The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.
If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
* -14 (-73.68%) loops not rotated due to the header size (yay)
* -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
* -3937 (-64.19%) common instructions hoisted
* +561 (+0.06%) x86 asm instructions
* -2 basic blocks
* +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable {F12360201}
* -36396 (-65.29%) common instructions hoisted
* +1676 (+0.02%) x86 asm instructions
* +662 (+0.06%) basic blocks
* +4395 (+0.04%) IR instructions
It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D84108
This reverts commit 503deec2183d466dad64b763bab4e15fd8804239.
2020-09-07 22:54:06 +02:00
|
|
|
"hoist-common-insts", cl::Hidden, cl::init(false),
|
|
|
|
cl::desc("hoist common instructions (default = false)"));
|
2020-07-20 08:56:38 +02:00
|
|
|
|
2017-12-14 23:05:20 +01:00
|
|
|
static cl::opt<bool> UserSinkCommonInsts(
|
|
|
|
"sink-common-insts", cl::Hidden, cl::init(false),
|
|
|
|
cl::desc("Sink common instructions (default = false)"));
|
|
|
|
|
|
|
|
|
2006-12-19 22:40:18 +01:00
|
|
|
STATISTIC(NumSimpl, "Number of blocks simplified");
|
2002-10-02 00:38:41 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
static bool tailMergeBlocksWithSimilarFunctionTerminators(Function &F,
|
|
|
|
DomTreeUpdater *DTU) {
|
|
|
|
SmallMapVector<unsigned /*TerminatorOpcode*/, SmallVector<BasicBlock *, 2>, 4>
|
|
|
|
Structure;
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
// Scan all the blocks in the function, record the interesting-ones.
|
|
|
|
for (BasicBlock &BB : F) {
|
|
|
|
if (DTU && DTU->isBBPendingDeletion(&BB))
|
|
|
|
continue;
|
2020-12-16 21:04:41 +01:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
// We are only interested in function-terminating blocks.
|
|
|
|
if (!succ_empty(&BB))
|
|
|
|
continue;
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
auto *Term = BB.getTerminator();
|
|
|
|
|
2021-06-24 20:25:06 +02:00
|
|
|
// Fow now only support `ret`/`resume` function terminators.
|
2021-06-23 13:28:51 +02:00
|
|
|
// FIXME: lift this restriction.
|
2021-06-24 20:25:06 +02:00
|
|
|
switch (Term->getOpcode()) {
|
|
|
|
case Instruction::Ret:
|
|
|
|
case Instruction::Resume:
|
|
|
|
break;
|
|
|
|
default:
|
2021-01-11 22:35:15 +01:00
|
|
|
continue;
|
2021-06-24 20:25:06 +02:00
|
|
|
}
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
// We can't tail-merge block that contains a musttail call.
|
|
|
|
if (BB.getTerminatingMustTailCall())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Calls to experimental_deoptimize must be followed by a return
|
|
|
|
// of the value computed by experimental_deoptimize.
|
|
|
|
// I.e., we can not change `ret` to `br` for this block.
|
|
|
|
if (auto *CI =
|
|
|
|
dyn_cast_or_null<CallInst>(Term->getPrevNonDebugInstruction())) {
|
|
|
|
if (Function *F = CI->getCalledFunction())
|
|
|
|
if (Intrinsic::ID ID = F->getIntrinsicID())
|
|
|
|
if (ID == Intrinsic::experimental_deoptimize)
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
// PHI nodes cannot have token type, so if the terminator has an operand
|
|
|
|
// with token type, we can not tail-merge this kind of function terminators.
|
|
|
|
if (any_of(Term->operands(),
|
|
|
|
[](Value *Op) { return Op->getType()->isTokenTy(); }))
|
|
|
|
continue;
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
// Canonical blocks are uniqued based on the terminator type (opcode).
|
|
|
|
Structure[Term->getOpcode()].emplace_back(&BB);
|
|
|
|
}
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
bool Changed = false;
|
|
|
|
|
|
|
|
std::vector<DominatorTree::UpdateType> Updates;
|
|
|
|
|
|
|
|
for (ArrayRef<BasicBlock *> BBs : make_second_range(Structure)) {
|
|
|
|
SmallVector<PHINode *, 1> NewOps;
|
|
|
|
|
|
|
|
// We don't want to change IR just because we can.
|
|
|
|
// Only do that if there are at least two blocks we'll tail-merge.
|
|
|
|
if (BBs.size() < 2)
|
2020-03-04 17:11:40 +01:00
|
|
|
continue;
|
|
|
|
|
2009-12-22 07:07:30 +01:00
|
|
|
Changed = true;
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
if (DTU)
|
|
|
|
Updates.reserve(Updates.size() + BBs.size());
|
|
|
|
|
|
|
|
BasicBlock *CanonicalBB;
|
|
|
|
Instruction *CanonicalTerm;
|
|
|
|
{
|
|
|
|
auto *Term = BBs[0]->getTerminator();
|
|
|
|
|
|
|
|
// Create a canonical block for this function terminator type now,
|
|
|
|
// placing it *before* the first block that will branch to it.
|
|
|
|
CanonicalBB = BasicBlock::Create(
|
|
|
|
F.getContext(), Twine("common.") + Term->getOpcodeName(), &F, BBs[0]);
|
|
|
|
// We'll also need a PHI node per each operand of the terminator.
|
|
|
|
NewOps.resize(Term->getNumOperands());
|
|
|
|
for (auto I : zip(Term->operands(), NewOps)) {
|
|
|
|
std::get<1>(I) = PHINode::Create(std::get<0>(I)->getType(),
|
|
|
|
/*NumReservedValues=*/BBs.size(),
|
|
|
|
CanonicalBB->getName() + ".op");
|
|
|
|
CanonicalBB->getInstList().push_back(std::get<1>(I));
|
2020-12-16 21:04:41 +01:00
|
|
|
}
|
2021-06-23 13:28:51 +02:00
|
|
|
// Make it so that this canonical block actually has the right
|
|
|
|
// terminator.
|
|
|
|
CanonicalTerm = Term->clone();
|
|
|
|
CanonicalBB->getInstList().push_back(CanonicalTerm);
|
|
|
|
// If the canonical terminator has operands, rewrite it to take PHI's.
|
|
|
|
for (auto I : zip(NewOps, CanonicalTerm->operands()))
|
|
|
|
std::get<1>(I) = std::get<0>(I);
|
2009-12-22 07:07:30 +01:00
|
|
|
}
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
// Now, go through each block (with the current terminator type)
|
|
|
|
// we've recorded, and rewrite it to branch to the new common block.
|
|
|
|
const DILocation *CommonDebugLoc = nullptr;
|
|
|
|
for (BasicBlock *BB : BBs) {
|
|
|
|
auto *Term = BB->getTerminator();
|
|
|
|
|
|
|
|
// Aha, found a new non-canonical function terminator. If it has operands,
|
|
|
|
// forward them to the PHI nodes in the canonical block.
|
|
|
|
for (auto I : zip(Term->operands(), NewOps))
|
|
|
|
std::get<1>(I)->addIncoming(std::get<0>(I), BB);
|
|
|
|
|
|
|
|
// Compute the debug location common to all the original terminators.
|
|
|
|
if (!CommonDebugLoc)
|
|
|
|
CommonDebugLoc = Term->getDebugLoc();
|
|
|
|
else
|
|
|
|
CommonDebugLoc =
|
|
|
|
DILocation::getMergedLocation(CommonDebugLoc, Term->getDebugLoc());
|
|
|
|
|
|
|
|
// And turn BB into a block that just unconditionally branches
|
|
|
|
// to the canonical block.
|
|
|
|
Term->eraseFromParent();
|
|
|
|
BranchInst::Create(CanonicalBB, BB);
|
|
|
|
if (DTU)
|
|
|
|
Updates.push_back({DominatorTree::Insert, BB, CanonicalBB});
|
2009-12-22 07:07:30 +01:00
|
|
|
}
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2021-06-23 13:28:51 +02:00
|
|
|
CanonicalTerm->setDebugLoc(CommonDebugLoc);
|
2020-12-16 21:04:41 +01:00
|
|
|
}
|
|
|
|
|
2021-05-19 10:31:53 +02:00
|
|
|
if (DTU)
|
2021-01-04 19:38:03 +01:00
|
|
|
DTU->applyUpdates(Updates);
|
2021-05-19 10:31:53 +02:00
|
|
|
|
2009-12-22 07:07:30 +01:00
|
|
|
return Changed;
|
|
|
|
}
|
|
|
|
|
2015-06-24 22:40:57 +02:00
|
|
|
/// Call SimplifyCFG on all the blocks in the function,
|
2007-11-13 08:32:38 +01:00
|
|
|
/// iterating until no more changes are made.
|
2013-01-07 04:53:25 +01:00
|
|
|
static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,
|
2020-12-16 20:34:05 +01:00
|
|
|
DomTreeUpdater *DTU,
|
2017-09-27 16:54:16 +02:00
|
|
|
const SimplifyCFGOptions &Options) {
|
2007-11-13 08:32:38 +01:00
|
|
|
bool Changed = false;
|
2016-03-28 20:07:40 +02:00
|
|
|
bool LocalChange = true;
|
2016-03-29 06:08:57 +02:00
|
|
|
|
|
|
|
SmallVector<std::pair<const BasicBlock *, const BasicBlock *>, 32> Edges;
|
|
|
|
FindFunctionBackedges(F, Edges);
|
[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.
There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.
Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.
But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.
2021-01-23 14:23:11 +01:00
|
|
|
SmallPtrSet<BasicBlock *, 16> UniqueLoopHeaders;
|
2016-03-29 06:08:57 +02:00
|
|
|
for (unsigned i = 0, e = Edges.size(); i != e; ++i)
|
[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.
There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.
Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.
But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.
2021-01-23 14:23:11 +01:00
|
|
|
UniqueLoopHeaders.insert(const_cast<BasicBlock *>(Edges[i].second));
|
|
|
|
|
|
|
|
SmallVector<WeakVH, 16> LoopHeaders(UniqueLoopHeaders.begin(),
|
|
|
|
UniqueLoopHeaders.end());
|
2016-03-29 06:08:57 +02:00
|
|
|
|
2002-05-21 22:49:37 +02:00
|
|
|
while (LocalChange) {
|
|
|
|
LocalChange = false;
|
2012-07-24 12:51:42 +02:00
|
|
|
|
2015-06-24 22:42:33 +02:00
|
|
|
// Loop over all of the basic blocks and remove them if they are unneeded.
|
2010-08-14 02:29:42 +02:00
|
|
|
for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
|
2021-01-11 22:36:18 +01:00
|
|
|
BasicBlock &BB = *BBIt++;
|
|
|
|
if (DTU) {
|
|
|
|
assert(
|
|
|
|
!DTU->isBBPendingDeletion(&BB) &&
|
|
|
|
"Should not end up trying to simplify blocks marked for removal.");
|
|
|
|
// Make sure that the advanced iterator does not point at the blocks
|
|
|
|
// that are marked for removal, skip over all such blocks.
|
|
|
|
while (BBIt != F.end() && DTU->isBBPendingDeletion(&*BBIt))
|
|
|
|
++BBIt;
|
|
|
|
}
|
[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.
There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.
Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.
But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.
2021-01-23 14:23:11 +01:00
|
|
|
if (simplifyCFG(&BB, TTI, DTU, Options, LoopHeaders)) {
|
2002-05-21 22:49:37 +02:00
|
|
|
LocalChange = true;
|
|
|
|
++NumSimpl;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Changed |= LocalChange;
|
|
|
|
}
|
|
|
|
return Changed;
|
|
|
|
}
|
2007-11-13 08:32:38 +01:00
|
|
|
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
static bool simplifyFunctionCFGImpl(Function &F, const TargetTransformInfo &TTI,
|
|
|
|
DominatorTree *DT,
|
|
|
|
const SimplifyCFGOptions &Options) {
|
2020-12-16 19:48:29 +01:00
|
|
|
DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
|
|
|
|
|
|
|
|
bool EverChanged = removeUnreachableBlocks(F, DT ? &DTU : nullptr);
|
2021-06-23 13:28:51 +02:00
|
|
|
EverChanged |=
|
|
|
|
tailMergeBlocksWithSimilarFunctionTerminators(F, DT ? &DTU : nullptr);
|
2020-12-16 20:34:05 +01:00
|
|
|
EverChanged |= iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options);
|
2010-02-05 23:03:18 +01:00
|
|
|
|
2007-11-13 08:32:38 +01:00
|
|
|
// If neither pass changed anything, we're done.
|
|
|
|
if (!EverChanged) return false;
|
|
|
|
|
2012-09-06 02:59:08 +02:00
|
|
|
// iterativelySimplifyCFG can (rarely) make some loops dead. If this happens,
|
2013-08-13 00:38:43 +02:00
|
|
|
// removeUnreachableBlocks is needed to nuke them, which means we should
|
2007-11-13 08:32:38 +01:00
|
|
|
// iterate between the two optimizations. We structure the code like this to
|
2015-06-24 22:42:33 +02:00
|
|
|
// avoid rerunning iterativelySimplifyCFG if the second pass of
|
2013-08-13 00:38:43 +02:00
|
|
|
// removeUnreachableBlocks doesn't do anything.
|
2020-12-16 19:48:29 +01:00
|
|
|
if (!removeUnreachableBlocks(F, DT ? &DTU : nullptr))
|
2007-11-13 08:32:38 +01:00
|
|
|
return true;
|
2010-02-05 23:03:18 +01:00
|
|
|
|
2007-11-13 08:32:38 +01:00
|
|
|
do {
|
2020-12-16 20:34:05 +01:00
|
|
|
EverChanged = iterativelySimplifyCFG(F, TTI, DT ? &DTU : nullptr, Options);
|
2020-12-16 19:48:29 +01:00
|
|
|
EverChanged |= removeUnreachableBlocks(F, DT ? &DTU : nullptr);
|
2007-11-13 08:32:38 +01:00
|
|
|
} while (EverChanged);
|
2010-02-05 23:03:18 +01:00
|
|
|
|
2007-11-13 08:32:38 +01:00
|
|
|
return true;
|
|
|
|
}
|
2015-02-01 12:34:21 +01:00
|
|
|
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
static bool simplifyFunctionCFG(Function &F, const TargetTransformInfo &TTI,
|
|
|
|
DominatorTree *DT,
|
|
|
|
const SimplifyCFGOptions &Options) {
|
|
|
|
assert((!RequireAndPreserveDomTree ||
|
|
|
|
(DT && DT->verify(DominatorTree::VerificationLevel::Full))) &&
|
|
|
|
"Original domtree is invalid?");
|
|
|
|
|
|
|
|
bool Changed = simplifyFunctionCFGImpl(F, TTI, DT, Options);
|
|
|
|
|
|
|
|
assert((!RequireAndPreserveDomTree ||
|
|
|
|
(DT && DT->verify(DominatorTree::VerificationLevel::Full))) &&
|
|
|
|
"Failed to maintain validity of domtree!");
|
|
|
|
|
|
|
|
return Changed;
|
|
|
|
}
|
|
|
|
|
2017-10-28 20:43:07 +02:00
|
|
|
// Command-line settings override compile-time settings.
|
2020-07-16 14:04:47 +02:00
|
|
|
static void applyCommandLineOverridesToOptions(SimplifyCFGOptions &Options) {
|
|
|
|
if (UserBonusInstThreshold.getNumOccurrences())
|
|
|
|
Options.BonusInstThreshold = UserBonusInstThreshold;
|
|
|
|
if (UserForwardSwitchCond.getNumOccurrences())
|
|
|
|
Options.ForwardSwitchCondToPhi = UserForwardSwitchCond;
|
|
|
|
if (UserSwitchToLookup.getNumOccurrences())
|
|
|
|
Options.ConvertSwitchToLookupTable = UserSwitchToLookup;
|
|
|
|
if (UserKeepLoops.getNumOccurrences())
|
|
|
|
Options.NeedCanonicalLoop = UserKeepLoops;
|
2020-07-20 08:56:38 +02:00
|
|
|
if (UserHoistCommonInsts.getNumOccurrences())
|
|
|
|
Options.HoistCommonInsts = UserHoistCommonInsts;
|
2020-07-16 14:04:47 +02:00
|
|
|
if (UserSinkCommonInsts.getNumOccurrences())
|
|
|
|
Options.SinkCommonInsts = UserSinkCommonInsts;
|
|
|
|
}
|
|
|
|
|
2020-09-15 22:03:05 +02:00
|
|
|
SimplifyCFGPass::SimplifyCFGPass() : Options() {
|
|
|
|
applyCommandLineOverridesToOptions(Options);
|
|
|
|
}
|
|
|
|
|
2020-07-16 14:04:47 +02:00
|
|
|
SimplifyCFGPass::SimplifyCFGPass(const SimplifyCFGOptions &Opts)
|
|
|
|
: Options(Opts) {
|
|
|
|
applyCommandLineOverridesToOptions(Options);
|
2017-10-28 20:43:07 +02:00
|
|
|
}
|
2015-02-01 12:34:21 +01:00
|
|
|
|
|
|
|
PreservedAnalyses SimplifyCFGPass::run(Function &F,
|
2016-08-09 02:28:15 +02:00
|
|
|
FunctionAnalysisManager &AM) {
|
2016-03-11 12:05:24 +01:00
|
|
|
auto &TTI = AM.getResult<TargetIRAnalysis>(F);
|
2017-10-04 22:26:25 +02:00
|
|
|
Options.AC = &AM.getResult<AssumptionAnalysis>(F);
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
DominatorTree *DT = nullptr;
|
|
|
|
if (RequireAndPreserveDomTree)
|
|
|
|
DT = &AM.getResult<DominatorTreeAnalysis>(F);
|
2020-11-14 03:34:34 +01:00
|
|
|
if (F.hasFnAttribute(Attribute::OptForFuzzing)) {
|
|
|
|
Options.setSimplifyCondBranch(false).setFoldTwoEntryPHINode(false);
|
|
|
|
} else {
|
|
|
|
Options.setSimplifyCondBranch(true).setFoldTwoEntryPHINode(true);
|
|
|
|
}
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
if (!simplifyFunctionCFG(F, TTI, DT, Options))
|
2016-06-08 15:32:23 +02:00
|
|
|
return PreservedAnalyses::all();
|
|
|
|
PreservedAnalyses PA;
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
if (RequireAndPreserveDomTree)
|
|
|
|
PA.preserve<DominatorTreeAnalysis>();
|
2016-06-08 15:32:23 +02:00
|
|
|
return PA;
|
2015-02-01 12:34:21 +01:00
|
|
|
}
|
|
|
|
|
2016-05-05 22:47:53 +02:00
|
|
|
namespace {
|
2017-10-28 20:43:07 +02:00
|
|
|
struct CFGSimplifyPass : public FunctionPass {
|
|
|
|
static char ID;
|
|
|
|
SimplifyCFGOptions Options;
|
2016-05-05 22:47:53 +02:00
|
|
|
std::function<bool(const Function &)> PredicateFtor;
|
2017-10-28 20:43:07 +02:00
|
|
|
|
2020-07-15 23:48:36 +02:00
|
|
|
CFGSimplifyPass(SimplifyCFGOptions Options_ = SimplifyCFGOptions(),
|
2017-10-28 20:43:07 +02:00
|
|
|
std::function<bool(const Function &)> Ftor = nullptr)
|
2020-07-15 23:48:36 +02:00
|
|
|
: FunctionPass(ID), Options(Options_), PredicateFtor(std::move(Ftor)) {
|
2017-10-28 20:43:07 +02:00
|
|
|
|
|
|
|
initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());
|
|
|
|
|
|
|
|
// Check for command-line overrides of options for debug/customization.
|
2020-07-16 14:04:47 +02:00
|
|
|
applyCommandLineOverridesToOptions(Options);
|
2016-05-05 22:47:53 +02:00
|
|
|
}
|
2017-10-28 20:43:07 +02:00
|
|
|
|
2016-05-05 22:47:53 +02:00
|
|
|
bool runOnFunction(Function &F) override {
|
|
|
|
if (skipFunction(F) || (PredicateFtor && !PredicateFtor(F)))
|
|
|
|
return false;
|
|
|
|
|
2017-10-28 20:43:07 +02:00
|
|
|
Options.AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
DominatorTree *DT = nullptr;
|
|
|
|
if (RequireAndPreserveDomTree)
|
|
|
|
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
|
2020-10-22 02:08:56 +02:00
|
|
|
if (F.hasFnAttribute(Attribute::OptForFuzzing)) {
|
|
|
|
Options.setSimplifyCondBranch(false)
|
|
|
|
.setFoldTwoEntryPHINode(false);
|
|
|
|
} else {
|
|
|
|
Options.setSimplifyCondBranch(true)
|
|
|
|
.setFoldTwoEntryPHINode(true);
|
|
|
|
}
|
|
|
|
|
2017-10-28 20:43:07 +02:00
|
|
|
auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
return simplifyFunctionCFG(F, TTI, DT, Options);
|
2016-05-05 22:47:53 +02:00
|
|
|
}
|
|
|
|
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
2016-12-19 09:22:17 +01:00
|
|
|
AU.addRequired<AssumptionCacheTracker>();
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
if (RequireAndPreserveDomTree)
|
|
|
|
AU.addRequired<DominatorTreeWrapperPass>();
|
2016-05-05 22:47:53 +02:00
|
|
|
AU.addRequired<TargetTransformInfoWrapperPass>();
|
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped,
and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.
2020-12-15 19:36:23 +01:00
|
|
|
if (RequireAndPreserveDomTree)
|
|
|
|
AU.addPreserved<DominatorTreeWrapperPass>();
|
2016-05-05 22:47:53 +02:00
|
|
|
AU.addPreserved<GlobalsAAWrapperPass>();
|
|
|
|
}
|
|
|
|
};
|
2015-06-23 11:49:53 +02:00
|
|
|
}
|
2015-02-01 12:34:21 +01:00
|
|
|
|
|
|
|
char CFGSimplifyPass::ID = 0;
|
|
|
|
INITIALIZE_PASS_BEGIN(CFGSimplifyPass, "simplifycfg", "Simplify the CFG", false,
|
|
|
|
false)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
|
2016-12-19 09:22:17 +01:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
|
2021-01-01 14:31:11 +01:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
|
2015-02-01 12:34:21 +01:00
|
|
|
INITIALIZE_PASS_END(CFGSimplifyPass, "simplifycfg", "Simplify the CFG", false,
|
|
|
|
false)
|
|
|
|
|
|
|
|
// Public interface to the CFGSimplification pass
|
2015-06-08 20:50:43 +02:00
|
|
|
FunctionPass *
|
2020-07-16 11:52:55 +02:00
|
|
|
llvm::createCFGSimplificationPass(SimplifyCFGOptions Options,
|
2017-03-26 08:44:08 +02:00
|
|
|
std::function<bool(const Function &)> Ftor) {
|
2020-07-16 11:52:55 +02:00
|
|
|
return new CFGSimplifyPass(Options, std::move(Ftor));
|
2017-03-26 08:44:08 +02:00
|
|
|
}
|