The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1].
This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since.
The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios.
[1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions | compile time tracker ]] run with the option enabled.
Differential Revision: https://reviews.llvm.org/D98103
LazyBlockFrequenceInfoPass, LazyBranchProbabilityInfoPass and
LoopAccessLegacyAnalysis all cache pointers to their nestled required
analysis passes. One need to use addRequiredTransitive to describe
that the nestled passes can't be freed until those analysis passes
no longer are used themselves.
There is still a bit of a mess considering the getLazyBPIAnalysisUsage
and getLazyBFIAnalysisUsage functions. Those functions are used from
both Transform, CodeGen and Analysis passes. I figure it is OK to
use addRequiredTransitive also when being used from Transform and
CodeGen passes. On the other hand, I figure we must do it when
used from other Analysis passes. So using addRequiredTransitive should
be more correct here. An alternative solution would be to add a
bool option in those functions to let the user tell if it is a
analysis pass or not. Since those lazy passes will be obsolete when
new PM has conquered the world I figure we can leave it like this
right now.
Intention with the patch is to fix PR49950. It at least solves the
problem for the reproducer in PR49950. However, that reproducer
need five passes in a specific order, so there are lots of various
"solutions" that could avoid the crash without actually fixing the
root cause.
This is a reapply of commit 3655f0757f2b4b, that was reverted in
33ff3c20498ef5c2057 due to problems with assertions in the polly
lit tests. That problem is supposed to be solved by also adjusting
ScopPass to explicitly preserve LazyBlockFrequencyInfo and
LazyBranchProbabilityInfo (it already preserved
OptimizationRemarkEmitter which depends on those lazy passes).
Differential Revision: https://reviews.llvm.org/D100958
LazyBlockFrequenceInfoPass, LazyBranchProbabilityInfoPass and
LoopAccessLegacyAnalysis all cache pointers to their nestled required
analysis passes. One need to use addRequiredTransitive to describe
that the nestled passes can't be freed until those analysis passes
no longer are used themselves.
There is still a bit of a mess considering the getLazyBPIAnalysisUsage
and getLazyBFIAnalysisUsage functions. Those functions are used from
both Transform, CodeGen and Analysis passes. I figure it is OK to
use addRequiredTransitive also when being used from Transform and
CodeGen passes. On the other hand, I figure we must do it when
used from other Analysis passes. So using addRequiredTransitive should
be more correct here. An alternative solution would be to add a
bool option in those functions to let the user tell if it is a
analysis pass or not. Since those lazy passes will be obsolete when
new PM has conquered the world I figure we can leave it like this
right now.
Intention with the patch is to fix PR49950. It at least solves the
problem for the reproducer in PR49950. However, that reproducer
need five passes in a specific order, so there are lots of various
"solutions" that could avoid the crash without actually fixing the
root cause.
Differential Revision: https://reviews.llvm.org/D100958
GlobalsAA is only created at the beginning of the inliner pipeline. If
an AAManager is cached from previous passes, it won't get rebuilt to
include the newly created GlobalsAA.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D101379
Relative look table converter pass caused an issue when full lto
is enabled (reported in https://reviews.llvm.org/D94355).
This patch disables that pass from full lto pre-link phase optimization
pipeline until the issue is fixed.
Differential Revision: https://reviews.llvm.org/D101664
Summary:
This patch registers OpenMPOpt as a Module pass in addition to a CGSCC
pass. This is so certain optimzations that are sensitive to intact
call-sites can happen before inlining. The old `openmpopt` pass name is
changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass.
The current module pass only runs a single check but will be expanded in
the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D99202
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.
Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.
The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.
I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.
Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.
Differential Revision: https://reviews.llvm.org/D100226
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).
The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.
There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".
To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.
Reviewed By: jamieschmeiser
Differential Revision: https://reviews.llvm.org/D100231
This reverts commit ab98f2c7129a52e216fd7e088b964cf4af27b0f2 and 98eea392cdbcdb7360e58b46e9329573f092cd96.
It includes a fix for the clang test which triggered the revert. I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message. Odd. Since only one build bot reported the clang test, I didn't notice that one.
Breaks check-clang, see comments on D100400
Also revert follow-up "[NFC] Move a recently added utility into a location to enable reuse"
This reverts commit 3ce61fb6d697d49db471c7077b88b3b9ec9dec66.
This reverts commit 61a85da88235983da565bda0160367461fa0f382.
We have some cases today where attributes can be inferred from another on access, but the result is not explicitly materialized in IR. This change is a step towards changing that.
Why? Two main reasons:
* Human clarity. It's really confusing trying to figure out why a transform is triggering when the IR doesn't appear to have the required attributes.
* This avoids the need to special case declarations in e.g. functionattrs. Since we can assume the attribute is present, we can work directly from attributes (and only attributes) without also needing to query accessors on Function to avoid missing cases due to unannotated (but infered on use) declarations. (This piece will appear must easier to follow once D100226 also lands.)
Differential Revision: https://reviews.llvm.org/D100400
These were added in 37935405efbebc4bd9f1ffac9152571c6a8469dc,
but they fail on macOS (and on Windows with MSYS based tools, before
relanding D98859). Remove the tests that exercise "not not echo", as
the primary thing to test is the plain echo patterns above.
This avoids breaking clang-tidy/infrastructure/validate-check-names.cpp
if 'not' is evaluated as a lit internal tool (making TestRunner
invoke 'grep' directly in that test, instead of invoking 'not', which
then invokes 'grep').
The quoting of arguments is still brittle if the executable is an
MSYS based tool though, as MSYS based tools incorrectly unescape
backslashes in quoted arguments (contrary to regular win32 argument
parsing rules), see D99406 and
https://github.com/msys2/msys2-runtime/issues/36 for more examples
of the issues.
Differential Revision: https://reviews.llvm.org/D99938
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
Retry of 330619a3a623 that includes a clang test update.
Original commit message:
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898 <https://reviews.llvm.org/D98898>.
In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.
We make the same change to the old pass manager to keep things synchronized.
Differential Revision: https://reviews.llvm.org/D100213
-filter-print-funcs -print-changed was crashing after the filter func
was removed by a pass with
Assertion failed: After.find("*** IR Dump") == 0 && "Unexpected banner format."
We weren't printing the banner because when we have -filter-print-funcs,
we print each function separately, letting the print function filter out
unwanted functions.
Reviewed By: jamieschmeiser
Differential Revision: https://reviews.llvm.org/D100237
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898.
In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.
We make the same change to the old pass manager to keep things synchronized.
Differential Revision: https://reviews.llvm.org/D100213
It breaks up the function pass manager in the codegen pipeline.
With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.
Add a check that we only have one function pass manager in the codegen
pipeline.
This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c:
"[AsmPrinter] Delete dead takeDeletedSymbsForFunction()".
This was not NFC as initially thought. By coalescing two function
psas managers, this exposed the reverted code as necessary.
addr-label.ll was crashing due to an emitted blockaddress's block being
removed but the label not emitted.
Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99707
After D99249 we use three different loop pass managers for LICM,
LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI
and LazyBPI are not preserved by LoopRotate (note that D74640
is no longer needed). Avoid this by marking them as preserved.
My understanding of D86156 is that it is okay to simply preserve
them (which LoopUnswitch already does for the same reason) and
rely on callbacks to deal with deleted blocks.
Differential Revision: https://reviews.llvm.org/D99843
The reason for the NewPM redesign is described in the commit
cba3e783389a: [NewPM] Disable PreservedCFGChecker ...
The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.
Along the way:
- the function CFG::printDiff() is simplified by removing function
name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
optional parameter of type FunctionAnalysisManager*, which is
needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
-verify-cfg-preserved=1 as they need.
This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.
Reviewed By: skatkov, kuhar
Differential Revision: https://reviews.llvm.org/D91327
Change several pass sequence sensitive tests to be indifferent
to the PreserveCFGChecker by explicitly settting the option
-verify-cfg-preserved=0. It is a preparation step that allows
a redesign of PreserveCFGChecker.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D99878
This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (0626367202c).
This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive.
Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work.
Differential Revision: https://reviews.llvm.org/D99749
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
Summary:
The colour characters currently added to the output of -print-changed=diff
and -print-changed=diff-quiet cause difficulties when capturing the output
and examining it in an editor. Change the function to not have the colour
characters and add 2 new choices (-print-changed=cdiff and
-print-changed=cdiff-quiet) to retain the existing functionality of adding
the colour characters.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks) yrouban (Yevgeny Rouban)
Differential Revision: https://reviews.llvm.org/D97398
With cost-benefit analysis for inlining, we bypass the cost-threshold by returning inline result from call analyzer early.
However the cost and threshold are still available from call analyzer, and when cost is actually higher than threshold, we incorrect set the reason.
The change makes the decision from cost-benefit analysis explicit. It's mostly NFC, except that it allows the priority-based sample loader inliner used by CSSPGO to use cost-benefit heuristic.
Differential Revision: https://reviews.llvm.org/D99302
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
Summary:
The IR is saved in its print form before each pass is started and a
signal handler is registered. If the compilation crashes, the signal
handler will print the saved IR to dbgs(). This option
can be modified using -print-module-scope to get the IR for the complete
module. Note that this option only works with the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks) yrouban (Yevgeny Rouban)
Differential Revision: https://reviews.llvm.org/D86657
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed.
(See D91250, D48541)
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D91661
This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595,
and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.
Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.
Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D97608
This now analyzes calls to both intrinsics and functions.
For intrinsics, grab the ones we know and care about (mem* family) and
analyze the arguments.
For calls, use TLI to get more information about the libcalls, then
analyze the arguments if known.
```
auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks]
int var[1024];
^
```
Differential Revision: https://reviews.llvm.org/D97489
We're running into undefined references using ThinLTO with -O0 on
Windows/Chrome. This fixes that.
This matches the legacy PM.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D97414
I think we can use here same logic as for nonnull.
strlen(X) - X must be noundef => valid pointer.
for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)
Reviewed By: jdoerfert, aqjune
Differential Revision: https://reviews.llvm.org/D95122
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.
Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.
Differential Revision: https://reviews.llvm.org/D94376
The NPM LTO pipeline has a lot of fixme's and missing passes, causing a
lot of regressions after the switch in c70737b. Notably unrolling and
vectorization were both disabled, but many other passes are missing
compared to the old pass manager. This attempt to enable the most
obvious missing passes like the unroller, vectorization and other loop
passes, fixing the existing FIXME comments.
Differential Revision: https://reviews.llvm.org/D96780
This version of the patch includes a fix for the cfi failures.
(undoes the revert commit 7db390cc7738a9ba0ed7d4ca59ab6ea2e69c47e9)
It also undoes reverts of follow-up patches that also needed reverting
originally:
* [LTO] Add option enable NewPM with LTOCodeGenerator.
(undoes revert commit 0a17664b47c153aa26a0d31b4835f26375440ec6)
* [LTOCodeGenerator] Use lto::Config for options (NFC)."
(undoes revert commit b0a8e41cfff717ff067bf63412d6edb0280608cd)
It seems nicer to list passes given a flag rather than displaying all
passes in opt --help.
This is awkwardly structured because a PassBuilder is required, but
reusing the PassBuilder in runPassPipeline() doesn't work because we
read the input IR before getting to runPassPipeline(). So printing the
list of passes needs to happen before reading the input IR. If we remove
the legacy PM code in main() and move everything from NewPMDriver.cpp
into opt.cpp, we can create the PassBuilder before reading IR and check
if we should print the list of passes and exit. But until then this hack
seems fine.
Compared to the legacy PM, the new PM passes are lacking descriptions.
We'll need to figure out a way to add descriptions if we think this is
important.
Also, this only works for passes specified in PassRegistry.def. If we
want to print other custom registered passes, we'll need a different
mechanism.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D96101
Link: https://lists.llvm.org/pipermail/llvm-dev/2020-October/146162.html "[RFC] FileCheck: (dis)allowing unused prefixes"
If a downstream project using lit needs time for transition,
add the following to `lit.local.cfg`:
```
from lit.llvm.subst import ToolSubst
fc = ToolSubst('FileCheck', unresolved='fatal')
config.substitutions.insert(0, (fc.regex, 'FileCheck --allow-unused-prefixes'))
```
Differential Revision: https://reviews.llvm.org/D95849
Summary:
Introduce base classes that hold a textual represent of the IR
based on basic blocks and a base class for comparing this
representation. A new change printer is introduced that uses these
classes to save and compare representations of the IR before and after
each pass. It only reports when changes are made by a pass (similar to
-print-changed) except that the changes are shown in a patch-like format
with those lines that are removed shown in red prefixed with '-' and those
added shown in green with '+'. This functionality was introduced in my
tutorial at the 2020 virtual developer's meeting.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D91890
This patch adds an option to enable the new pass manager in
LTOCodeGenerator. It also updates a few tests with legacy PM specific
tests, which started failing after 6a59f0560648 when
LLVM_ENABLE_NEW_PASS_MANAGER=true.
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D95457
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Initially reverted due to BasicAA running analyses in an unspecified
order (multiple function calls as parameters), fixed by fetching
analyses before the call to construct BasicAA.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
This patch sets the default for llvm tests, with the exception of tests
under Reduce, because quite a few of them use 'FileCheck' as parameter
to a tool, and including a flag as that parameter would complicate
matters.
The rest of the patch undo-es the lit.local.cfg changes we progressively
introduced as temporary measure to avoid regressions under various
directories.
Differential Revision: https://reviews.llvm.org/D95111
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.
This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94142
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.
Differential Revision: https://reviews.llvm.org/D94825
Summary:
Introduce a new mode of operation for -print-changed that only reports
after a pass changes the IR with all of the other messages suppressed (ie,
no initial IR and no messages about ignored, filtered or non-modifying
passes).
The option processing for -print-changed is changed to take an optional
string indicating options for print-changed. Initially, the only option
supported is quiet (as described above). This new quiet mode is specified
with -print-changed=quiet while -print-changed will continue to function
in the same way. It is intended that there will be more options in the
future.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D92589
This currently blocks --print-before/after with a legacy PM pass, for
example when we use the new PM for the optimization pipeline but the
legacy PM for the codegen pipeline. Also in the future when the codegen
pipeline works with the new PM there will be multiple places to specify
passes, so even when everything is using the new PM, there will still be
multiple places that can accept different pass names.
Reviewed By: hoy, ychen
Differential Revision: https://reviews.llvm.org/D94283
We have modules with metadata on declarations, and out-of-tree passes
use that metadata, and we need to clone those modules. We really expect
such metadata is kept during the clone operation.
Reviewed by: arsenm, aprantl
Differential Revision: https://reviews.llvm.org/D93451
`UniqueInternalLinkageNamesPass` is useful to CSSPGO, especially when pseudo probe is used. It solves naming conflict for static functions which otherwise will share a merged profile and likely have a profile quality issue with mismatched CFG checksums. Since the pseudo probe instrumentation happens very early in the pipeline, I'm moving `UniqueInternalLinkageNamesPass` right before it. This is being done only to the new pass manager.
Reviewed By: dblaikie, aeubanks
Differential Revision: https://reviews.llvm.org/D93656
CGSCCOptimizerLateEPCallbacks are supposed to be run before the function
simplification pipeline, like in the legacy PM and as specified in the
comments for registerCGSCCOptimizerLateEPCallback().
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D93871
The AnnotationRemarks pass is already run at the end of the module
pipeline. This patch also adds it before bailing out for -O0, so remarks
are also generated with -O0.
Currently, -ftime-report + new pass manager emits one line of report for each
pass run. This potentially causes huge output text especially with regular LTO
or large single file (Obeserved in private tests and was reported in D51276).
The behaviour of -ftime-report + legacy pass manager is
emitting one line of report for each pass object which has relatively reasonable
text output size. This patch adds a flag `-ftime-report=` to control time report
aggregation for new pass manager.
The flag is for new pass manager only. Using it with legacy pass manager gives
an error. It is a driver and cc1 flag. `per-pass` is the new default so
`-ftime-report` is aliased to `-ftime-report=per-pass`. Before this patch,
functionality-wise `-ftime-report` is aliased to `-ftime-report=per-pass-run`.
* Adds an boolean variable TimePassesHandler::PerRun to control per-pass vs per-pass-run.
* Adds a new clang CodeGen flag CodeGenOptions::TimePassesPerRun to work with the existing CodeGenOptions::TimePasses.
* Remove FrontendOptions::ShowTimers, its uses are replaced by the existing CodeGenOptions::TimePasses.
* Remove FrontendTimesIsEnabled (It was introduced in D45619 which was largely reverted.)
Differential Revision: https://reviews.llvm.org/D92436
Currently PassBuilder.cpp is by far the file that takes longest to
compile. This is due to tons of templates being instantiated per pass.
Follow PassManager by using wrappers around passes to avoid making
the adaptors templated on the pass type. This allows us to move various
adaptors' run methods into .cpp files.
This reduces the compile time of PassBuilder.cpp on my machine from 66
to 39 seconds. It also reduces the size of opt from 685M to 676M.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D92616
This changes --print-before/after to be a list of strings rather than
legacy passes. (this also has the effect of not showing the entire list
of passes in --help-hidden after --print-before/after, which IMO is
great for making it less verbose).
Currently PrintIRInstrumentation passes the class name rather than pass
name to llvm::shouldPrintBeforePass(), meaning
llvm::shouldPrintBeforePass() never functions as intended in the NPM.
There is no easy way of converting class names to pass names outside of
within an instance of PassBuilder.
This adds a map of pass class names to their short names in
PassRegistry.def within PassInstrumentationCallbacks. It is populated
inside the constructor of PassBuilder, which takes a
PassInstrumentationCallbacks.
Add a pointer to PassInstrumentationCallbacks inside
PrintIRInstrumentation and use the newly created map.
This is a bit hacky, but I can't think of a better way since the short
id to class name only exists within PassRegistry.def. This also doesn't
handle passes not in PassRegistry.def but rather added via
PassBuilder::registerPipelineParsingCallback().
llvm/test/CodeGen/Generic/print-after.ll doesn't seem very useful now
with this change.
Reviewed By: ychen, jamieschmeiser
Differential Revision: https://reviews.llvm.org/D87216
This is the #2 of 2 changes that make remarks hotness threshold option
available in more tools. The changes also allow the threshold to sync with
hotness threshold from profile summary with special value 'auto'.
This change expands remarks hotness threshold option
-fdiagnostics-hotness-threshold in clang and *-remarks-hotness-threshold in
other tools to utilize hotness threshold from profile summary.
Remarks hotness filtering relies on several driver options. Table below lists
how different options are correlated and affect final remarks outputs:
| profile | hotness | threshold | remarks printed |
|---------|---------|-----------|-----------------|
| No | No | No | All |
| No | No | Yes | None |
| No | Yes | No | All |
| No | Yes | Yes | None |
| Yes | No | No | All |
| Yes | No | Yes | None |
| Yes | Yes | No | All |
| Yes | Yes | Yes | >=threshold |
In the presence of profile summary, it is often more desirable to directly use
the hotness threshold from profile summary. The new argument value 'auto'
indicates threshold will be synced with hotness threshold from profile summary
during compilation. The "auto" threshold relies on the availability of profile
summary. In case of missing such information, no remarks will be generated.
Differential Revision: https://reviews.llvm.org/D85808
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.
Note that this patch does not yet enable much in terms of post-always
inline function optimization.
Differential Revision: https://reviews.llvm.org/D91567
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.
LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.
Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.
While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91800
This matches the legacy PM's EP_ModuleOptimizerEarly. Some backends use
this extension point and adding the pass somewhere else like
PipelineStartEPCallback doesn't work.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D91804
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.
This fixes some tests testing this exact behavior in the legacy PM.
Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.
check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).
http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.
The initial land assumed CallBase WeakTrackingVHs would always be
CallBases, but they can be RAUW'd with undef.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89587
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.
Use this new variant in the NPM -O0 pipeline.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91811
This moves handling of alwaysinline, coroutines, matrix lowering, PGO,
and LTO-required passes into PassBuilder. Much of this is replicated
between Clang and opt. Other out-of-tree users also replicate some of
this, such as Rust [1] replicating the alwaysinline, LTO, and PGO
passes.
The LTO passes are also now run in
build(Thin)LTOPreLinkDefaultPipeline() since they are semantically
required for (Thin)LTO.
[1]: f5230fbf76/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L896)
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D91585
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated using
__attribute__((annotate("_name"))) on functions in Clang.
This has been discussed on llvm-dev as part of
RFC: Combining Annotation Metadata and Remarks
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D91195
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
Summary:
Add an option -print-before-changed that modifies the print-changed
behaviour so that it prints the IR before a pass that changed it in
addition to printing the IR after the pass. Note that the option
does nothing in isolation. The filtering options work as expected.
Lit tests are included.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D88757
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
Reviewed By: asbirlea, Meinersbur
Differential Revision: https://reviews.llvm.org/D89158
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.
With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.
Differential Revision: https://reviews.llvm.org/D89896
This instruments a should-run-optional-pass callback using the existing
OptBisect class to decide if new passes should be skipped. Passes that
force isRequired never reach this at all, so they are not included in
"BISECT:" output nor its pass count.
The test case is resurrected from r267022, an early version of D19172
that had new pass manager support (later reverted and redone without).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D87951
This reverts commit ae38540042668675dd16c642d850115f217ea59f.
As well as some follow-up test fixes.
The original change causes new-pass-manager.ll to fail when polly is enabled.