This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide
Reviewed By: davidxl
Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D39137
llvm-svn: 317351
The old PM sets the options of what used to be known as "latesimplifycfg" on the
instantiation after the vectorizers have run, so that's what we'redoing here.
FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not
set the "late" options. I'm not sure if that's intentional or not.
Differential Revision: https://reviews.llvm.org/D39407
llvm-svn: 316869
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.
Differential Revision: https://reviews.llvm.org/D37355
llvm-svn: 316576
This pass adds pgo-memop-opt pass to the new pass manager.
It is in the old pass manager but somehow left out in the new pass manager.
Differential Revision: http://reviews.llvm.org/D39145
llvm-svn: 316384
This is the same exact change we did for the current pass manager
in rL314997, but the new pass manager pipeline already happened
to run GlobalOpt after the inliner, so we just insert a run of
GDCE here.
llvm-svn: 315003
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38094
llvm-svn: 314619
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
llvm-svn: 312862
printing techniques with a DEBUG_TYPE controlling them.
It was a mistake to start re-purposing the pass manager `DebugLogging`
variable for generic debug printing -- those logs are intended to be
very minimal and primarily used for testing. More detailed and
comprehensive logging doesn't make sense there (it would only make for
brittle tests).
Moreover, we kept forgetting to propagate the `DebugLogging` variable to
various places making it also ineffective and/or unavailable. Switching
to `DEBUG_TYPE` makes this a non-issue.
llvm-svn: 310695
Summary: Part of r310296 will disable PGOIndirectCallPromotion in ThinLTO backend if PGOOpt is None. However, as PGOOpt is not passed down to ThinLTO backend for instrumentation based PGO, that change would actually disable ICP entirely in ThinLTO backend, making it behave differently in instrumentation PGO mode. This change reverts that change, and only disable ICP there when it is SamplePGO.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36566
llvm-svn: 310550
Summary: SampleProfileLoader pass do need to happen after some early cleanup passes so that inlining can happen correctly inside the SampleProfileLoader pass.
Reviewers: chandlerc, davidxl, tejohnson
Reviewed By: chandlerc, tejohnson
Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36333
llvm-svn: 310296
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.
*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.
Reviewers: chandlerc
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36157
llvm-svn: 309886
Summary: In ThinLTO backend compile, OPTOptions are not set so that the ICP in ThinLTO backend does not know if it is a SamplePGO build, in which profile count needs to be annotated directly on call instructions. This patch cleaned up the PGOOptions handling logic and passes down PGOOptions to ThinLTO backend.
Reviewers: chandlerc, tejohnson, davidxl
Reviewed By: chandlerc
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36052
llvm-svn: 309780
Summary:
Now that SamplePGOSupport is part of PGOOpt, there are several places that need tweaking:
1. AddDiscriminator pass should *not* be invoked at ThinLTOBackend (as it's already invoked in the PreLink phase)
2. addPGOInstrPasses should only be invoked when either ProfileGenFile or ProfileUseFile is non-empty.
3. SampleProfileLoaderPass should only be invoked when SampleProfileFile is non-empty.
4. PGOIndirectCallPromotion should only be invoked in ProfileUse phase, or in ThinLTOBackend of SamplePGO.
Reviewers: chandlerc, tejohnson, davidxl
Reviewed By: chandlerc
Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36040
llvm-svn: 309478
Summary: The new PM needs to invoke add-discriminator pass when building with -fdebug-info-for-profiling.
Reviewers: chandlerc, davidxl
Reviewed By: chandlerc
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35744
llvm-svn: 309121
Summary:
This patch adds a callback registration API to the PassBuilder,
enabling registering out-of-tree passes with it.
Through the Callback API, callers may register callbacks with the
various stages at which passes are added into pass managers, including
parsing of a pass pipeline as well as at extension points within the
default -O pipelines.
Registering utilities like `require<>` and `invalidate<>` needs to be
handled manually by the caller, but a helper is provided.
Additionally, adding passes at pipeline extension points is exposed
through the opt tool. This patch adds a `-passes-ep-X` commandline
option for every extension point X, which opt parses into pipelines
inserted into that extension point.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: lksbhm, grosser, davide, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33464
llvm-svn: 307532
Summary:
For SamplePGO + ThinLTO, because profile annotation is done twice at both PrepareForThinLTO pipeline and backend compiler, the following changes are needed at the PrepareForThinLTO phase to ensure the IR is not changed dramatically. Otherwise the profile annotation will be inaccurate in the backend compiler.
* disable hot-caller heuristic
* disable loop unrolling
* disable indirect call promotion
This will unblock the new PM testing for sample PGO (tools/clang/test/CodeGen/pgo-sample-thinlto-summary.c), which will be covered in another cfe patch.
Reviewers: chandlerc, tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, Prazek, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D34895
llvm-svn: 307437
Previously it doesn't actually invoke the designated new PM builder
functions.
This patch moves NameAnonGlobalPass out from PassBuilder, as Chandler
points out that PassBuilder is used for non-O0 builds, and for
optimizations only.
Differential Revision: https://reviews.llvm.org/D34728
llvm-svn: 306756
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline. We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.
This is turned off by default. A follow-up change will turn it on to
allow for easier reversion in case it breaks something.
llvm-svn: 305146
With this, the two pipelines should be in sync again (modulo
LoopUnswitch, but Chandler is actively working on that).
Differential Revision: https://reviews.llvm.org/D33810
llvm-svn: 304671
GVNHoist was moved as part of simplification passes for the current
pass manager (but not for the new), so they're out-of-sync.
Differential Revision: https://reviews.llvm.org/D33806
llvm-svn: 304490
Based on the original patch by Davide, but I've adjusted the API exposed
to just be different entry points rather than exposing more state
parameters. I've factored all the common logic out so that we don't have
any duplicate pipelines, we just stitch them together in different ways.
I think this makes the build easier to reason about and understand.
This adds a direct method for getting the module simplification pipeline
as well as a method to get the optimization pipeline. While not my
express goal, this seems nice and gives a good place comment about the
restrictions that are imposed on them.
I did make some minor changes to the way the pipelines are structured
here, but hopefully not ones that are significant or controversial:
1) I sunk the PGO indirect call promotion to only be run when we have
PGO enabled (or as part of the special ThinLTO pipeline).
2) I made the extra GlobalOpt run in ThinLTO just happen all the time
and at a slightly more powerful place (before we remove available
externaly functions). This seems like general goodness and not a big
compile time sink, so it didn't make sense to *only* use it in
ThinLTO. Fewer differences in the pipeline makes everything simpler
IMO.
3) I hoisted the ThinLTO stop point pre-link above the the RPO function
attr inference. The RPO inference won't infer anything terribly
meaningful pre-link (recursiveness?) so it didn't make a lot of
sense. But if the placement of RPO inference starts to matter, we
should move it to the canonicalization phase anyways which seems like
a better place for it (and there is a FIXME to this effect!). But
that seemed a bridge too far for this patch.
If we ever need to parameterize these pipelines more heavily, we can
always sink the logic to helper functions with parameters to keep those
parameters out of the public API. But the changes above seemed minor
that we could possible get away without the parameters entirely.
I added support for parsing 'thinlto' and 'thinlto-pre-link' names in
pass pipelines to make it easy to test these routines and play with them
in larger pipelines. I also added a really basic manifest of passes test
that will show exactly how the pipelines behave and work as well as
making updates to them clear.
Lastly, this factoring does introduce a nesting layer of module pass
managers in the default pipeline. I don't think this is a big deal and
the flexibility of decoupling the pipelines seems easily worth it.
Differential Revision: https://reviews.llvm.org/D33540
llvm-svn: 304407
instrumenting code.
This is important in the new pass manager. The old pass manager's
inliner has a small DCE routine embedded within it. The new pass manager
relies on the actual GlobalDCE pass for this.
Without this patch, instrumentation profiling with the new PM results in
massive code bloat in the object files because the instrumentation
itself ends up preventing DCE from working to remove the code.
We should probably change the instrumentation (and/or DCE) so that we
can eliminate dead code even if instrumented, but we shouldn't even
spend the time generating instrumentation for that code so this still
seems like a good patch.
Differential Revision: https://reviews.llvm.org/D33535
llvm-svn: 303845
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
also a discussion about exactly what we should do prior to re-enabling
it.
The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.
llvm-svn: 301505
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.
This fixes PR32085
Reviewers: chandlerc, danielcdh
Subscribers: mehdi_amini, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D30468
llvm-svn: 301395
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
llvm-svn: 299980
This patch optimizes two memory intrinsic operations: memset and memcpy based
on the profiled size of the operation. The high level transformation is like:
mem_op(..., size)
==>
switch (size) {
case s1:
mem_op(..., s1);
goto merge_bb;
case s2:
mem_op(..., s2);
goto merge_bb;
...
default:
mem_op(..., size);
goto merge_bb;
}
merge_bb:
Differential Revision: http://reviews.llvm.org/D28966
llvm-svn: 299446
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.
Reviewers: davidxl, eraman, xur
Reviewed By: davidxl, xur
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30282
llvm-svn: 296028
default pipeline.
A clang with this patch built with ASan and asserts can build all of the
test-suite as well, so it seems to not uncover any latent problems.
Differential Revision: https://reviews.llvm.org/D29853
llvm-svn: 294888
All the invalidation issues and bugs in this seem to be fixed, it has
survived a full build of the test suite plus SPEC with asserts and ASan
enabled on the Clang binary used.
Differential Revision: https://reviews.llvm.org/D29815
llvm-svn: 294887
This needs explicit requires of the optimization remark emission before
loop pass pipelines containing LICM as we no longer get it from the
inliner -- Argument Promotion may invalidate it. Technically the inliner
could also have broken this, but it never came up in testing.
Differential Revision: https://reviews.llvm.org/D29595
llvm-svn: 294670
Now that the call graph supports efficient replacement of a function and
spurious reference edges, we can port ArgumentPromotion to the new pass
manager very easily.
The old PM-specific bits are sunk into callbacks that the new PM simply
doesn't use. Unlike the old PM, the new PM simply does argument
promotion and afterward does the update to LCG reflecting the promoted
function.
Differential Revision: https://reviews.llvm.org/D29580
llvm-svn: 294667