1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00
Commit Graph

9731 Commits

Author SHA1 Message Date
Eli Friedman
1e3d2f90ba [InstCombine] Fix a couple crashes with extractelement on a scalable vector.
Differential Revision: https://reviews.llvm.org/D86989
2020-09-02 18:02:07 -07:00
Jordan Rupprecht
e72bcb7267 [NFC] Fix unused var in release build 2020-09-01 13:05:56 -07:00
Florian Hahn
2248875209 [Loads] Add canReplacePointersIfEqual helper.
This patch adds an initial, incomeplete and unsound implementation of
canReplacePointersIfEqual to check if a pointer value A can be replaced
by another pointer value B, that are deemed to be equivalent through
some means (e.g. information from conditions).

Note that is in general not sound to blindly replace pointers based on
equality, for example if they are based on different underlying objects.

LLVM's memory model is not completely settled as of now; see
https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed
discussion.

The initial version of canReplacePointersIfEqual only rejects a very
specific case: replacing a pointer with a constant expression that is
not dereferenceable. Such a replacement is problematic and can be
restricted relatively easily without impacting most code. Using it to
limit replacements in GVN/SCCP/CVP only results in small differences in
7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto.

This patch is supposed to be an initial step to improve the current
situation and the helper should be made stricter in the future. But this
will require careful analysis of the impact on performance.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D85524
2020-09-01 20:57:41 +01:00
Alina Sbirlea
deb30c80fe [MemorySSA] Update phi map with replacement value. 2020-09-01 11:56:40 -07:00
Alina Sbirlea
c9a8d991e6 [MemorySSA] Clean up single value phis.
MemoryPhis with a single value are correct, but can lead to errors when
updating. Clean up single entry Phis newly added when cloning blocks.
Resolves PR46574.
2020-08-31 19:26:08 -07:00
Nikita Popov
b469dc9b48 [InstSimplify] Reduce code duplication in simplifySelectWithICmpCond (NFC)
Canonicalize icmp ne to icmp eq and implement all the folds only once.
2020-08-29 22:38:49 +02:00
Nikita Popov
556e0d5173 [InstSimplify] Protect against more poison in SimplifyWithOpReplaced (PR47322)
Replace the check for poison-producing instructions in
SimplifyWithOpReplaced() with the generic helper canCreatePoison()
that properly handles poisonous shifts and thus avoids the problem
from PR47322.

This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which
previously could have caused this code to ignore poison flags.
Setting UseInstrInfo=false should reduce the possible optimizations,
not increase them.

This is not a full solution to the problem, as poison could be
introduced more indirectly. This is just a minimal, easy to backport
fix.

Differential Revision: https://reviews.llvm.org/D86834
2020-08-29 21:59:39 +02:00
Nikita Popov
d37c50d614 [LVI] Remove unnecessary lambda capture (NFC) 2020-08-29 21:33:19 +02:00
Nikita Popov
4e796fd7b6 Reapply [LVI] Normalize pointer behavior
This got reverted because a dependency was reverted. It has since
been reapplied, so reapply this as well.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this more powerful than the previous implementation as
a side-effect, and this does actually seem benefitial in practice.

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2020-08-29 21:17:03 +02:00
Roman Lebedev
2a46a04b28 [NFC][InstructionSimplify] Add a warning about not simplifying to not def-reachable
See
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
and
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824967.html

InstSimply is not allowed to perform simplifications to instructions
that are not def-reachable from the original instruction.
2020-08-29 09:58:08 +03:00
Owen Anderson
df34423d50 Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block"
This reverts commit 6102310d814ad73eab60a88b21dd70874f7a056f.  It
appears to cause compilation non-determinism and caused stage3
mismatches.
2020-08-28 23:43:42 +00:00
serge-sans-paille
41f672146f Skip analysis re-computation when no changes are reported
This is a follow-up to https://reviews.llvm.org/D80707, generalized to
CallGraphSCC, Loop and Region

Differential Revision: https://reviews.llvm.org/D86442
2020-08-28 21:41:01 +02:00
David Sherwood
56b8c35591 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Florian Hahn
1e84da9f17 [MemLoc] Support memcmp in MemoryLocation::getForArgument.
This patch adds support for memcmp in MemoryLocation::getForArgument.
memcmp reads from the first 2 arguments up to the number of bytes of the
third argument.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86725
2020-08-28 10:19:54 +01:00
Martin Storsjö
47182191ac [ValueTracking] Remove a stray semicolon. NFC.
This silences warnings when built with GCC at least.
2020-08-28 09:24:10 +03:00
serge-sans-paille
a12b4db565 (Expensive) Check for Loop, SCC and Region pass return status
This generalizes the logic introduced in https://reviews.llvm.org/D80916 to
other passes.

It's needed by https://reviews.llvm.org/D86442 to assert passes correctly report
their status.

Differential Revision: https://reviews.llvm.org/D86589
2020-08-28 07:56:35 +02:00
Alina Sbirlea
ee5e19fe38 [MemorySSA] Assert defining access is not a MemoryUse. 2020-08-27 18:21:10 -07:00
Vitaly Buka
2df7657efb [ValueTracking] Replace recursion with Worklist
Now findAllocaForValue can handle nontrivial phi cycles.
2020-08-27 14:44:49 -07:00
Vitaly Buka
1490706ac2 [StackSafety] Ignore allocas with partial lifetime markers
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86672
2020-08-27 13:54:41 -07:00
Vitaly Buka
4f88f9a474 [NFC][ValueTracking] Add OffsetZero into findAllocaForValue
For StackLifetime after finding alloca we need to check that
values ponting to the begining of alloca.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86692
2020-08-27 13:46:22 -07:00
Roman Lebedev
0c6e84d654 [InstSimplify] SimplifyPHINode(): check that instruction is in basic block first
As pointed out in post-commit review, this can legally be called
on instructions that are not inserted into basic blocks,
so don't blindly assume that there is basic block.
2020-08-27 22:32:03 +03:00
Simon Moll
2afc6e83ea [sda][nfc] clang-formatting 2020-08-27 18:27:44 +02:00
Roman Lebedev
2088bfe3c4 [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..

While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.

So i would think this is pretty uncontroversial.

On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE                             | 0         | 23779     |  23779 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942328   | 7942392   |     64 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 273069192 | 273084704 |  15512 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18539     |    127 |    0.69% |    0.69% |
| early-cse.NumCSE                                   | 2183283   | 2183227   |    -56 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 542090    |  -8015 |   -1.46% |    1.46% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640264   | 3664769   |  24505 |    0.67% |    0.67% |
| instcombine.NumDeadInst                            | 1778193   | 1783183   |   4990 |    0.28% |    0.28% |
| instcount.NumCallInst                              | 1758401   | 1758799   |    398 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330533    |    -24 |   -0.01% |    0.01% |
| instcount.TotalInsts                               | 8831952   | 8832286   |    334 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019808   | 999607    | -20201 |   -1.98% |    1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.

That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..

I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.

Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.

As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
2020-08-27 18:47:04 +03:00
Vitaly Buka
afd0abc1f8 [ValueTracking] Support select in findAllocaForValue 2020-08-27 02:13:52 -07:00
Nikita Popov
d8d374b2b3 [InstSimplify] Fold min/max intrinsic based on icmp of operands
This is a reboot of D84655, now performing the inner icmp
simplification query without undef folds.

It should be possible to handle the current foldMinMaxSharedOp()
fold based on this, by moving the logic into icmp of min/max instead,
making it more general. We can't drop the folds for constant operands,
because those also allow undef, which we exclude here.

The tests use assumes for exhaustive coverage, and have a few
more examples of misc folds we get based on icmp simplification.

Differential Revision: https://reviews.llvm.org/D85929
2020-08-26 22:02:57 +02:00
Arthur Eubanks
4c381d496d [InstSimplify] Simplify to vector constants when possible
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.

Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.

This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85946
2020-08-26 11:40:36 -07:00
Juneyoung Lee
5d4d13a2ae [IR] Add NoUndef attribute to Intrinsics.td
This patch adds NoUndef to Intrinsics.td.
The attribute is attached to llvm.assume's operand, because llvm.assume(undef)
is UB.
It is attached to pointer operands of several memory accessing intrinsics
as well.

This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check
unnecessary, so it is removed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86576
2020-08-27 02:54:48 +09:00
Mircea Trofin
b186c6758c [MLInliner] Simplify TFUTILS_SUPPORTED_TYPES
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.

Differential Revision: https://reviews.llvm.org/D86549
2020-08-25 14:19:39 -07:00
Juneyoung Lee
65c4cb9e7b [ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477
2020-08-26 04:40:21 +09:00
Nikita Popov
f861b3c934 [MemDep] Use BatchAA when computing pointer dependencies
We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.

For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D85583
2020-08-25 21:34:34 +02:00
Ta-Wei Tu
9380a91af9 [LoopNest] False negative of arePerfectlyNested with LCSSA loops
Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.

To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.

Reviewed By: Whitney, etiotto

Differential Revision: https://reviews.llvm.org/D86133
2020-08-25 16:20:52 +00:00
Mircea Trofin
9c71d4e1d1 [MLInliner] Support training that doesn't require partial rewards
If we use training algorithms that don't need partial rewards, we don't
need to worry about an ir2native model. In that case, training logs
won't contain a 'delta_size' feature either (since that's the partial
reward).

Differential Revision: https://reviews.llvm.org/D86481
2020-08-24 17:36:29 -07:00
Sam Parker
0828b21ed0 [SCEV] Still trying to fix windows buildbots 2020-08-24 10:26:48 +01:00
Alina Sbirlea
d3e372ea8f [DomTree] Extend update API to allow a post CFG view.
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.

Differential Revision: https://reviews.llvm.org/D85472
2020-08-21 17:23:08 -07:00
Arthur Eubanks
5e4555a20b [opt][NewPM] Add basic-aa in legacy PM compatibility mode
The legacy PM alias analysis pipeline by default includes basic-aa.
When running `opt -foo-pass` under the NPM and -disable-basic-aa is not
specified, use basic-aa.

This decreases the number of check-llvm failures under NPM from 913 to 752.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86167
2020-08-21 14:05:07 -07:00
Roman Lebedev
2893862e92 [NFC] Port InstCount pass to new pass manager 2020-08-21 12:39:42 +03:00
Yevgeny Rouban
1bdf10a116 [NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.

Reviewers: fedor.sergeev

Differential Revision: https://reviews.llvm.org/D81555
2020-08-21 16:10:42 +07:00
David Green
53fac1f9ad [ARM][LV] Add a preferPredicatedReductionSelect target hook
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.

Differential Revision: https://reviews.llvm.org/D85980
2020-08-21 08:48:12 +01:00
Sanjay Patel
042574c236 [ValueTracking] define/use max recursion depth in header
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191

But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.

Differential Revision: https://reviews.llvm.org/D86113
2020-08-19 16:56:59 -04:00
Mehdi Amini
db235b2187 Revert "Revert "[NFC][llvm] Make the contructors of ElementCount private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini
4386b1823a Revert "[NFC][llvm] Make the contructors of ElementCount private."
This reverts commit 264afb9e6aebc98c353644dd0700bec808501cab.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Francesco Petrogalli
d75808bc7f [NFC][llvm] Make the contructors of ElementCount private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Mircea Trofin
3359e4e021 [MLInliner] In development mode, obtain the output specs from a file
Different training algorithms may produce models that, besides the main
policy output (i.e. inline/don't inline), produce additional outputs
that are necessary for the next training stage. To facilitate this, in
development mode, we require the training policy infrastructure produce
a description of the outputs that are interesting to it, in the form of
a JSON file. We special-case the first entry in the JSON file as the
inlining decision - we care about its value, so we can guide inlining
during training - but treat the rest as opaque data that we just copy
over to the training log.

Differential Revision: https://reviews.llvm.org/D85674
2020-08-17 16:56:47 -07:00
Tyker
dfa36c6cdf [AssumeBundles] Fix Bug in Assume Queries
this bug was causing miscompile.
now clang cant properly selfhost with -mllvm --enable-knowledge-retention

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D83507
2020-08-17 21:36:53 +02:00
Dávid Bolvanský
26599cbe3f Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa713002fe4e0c76d23043e6c1f9e9fe6f. Patch will be split to smaller ones.
2020-08-17 20:44:33 +02:00
Simon Pilgrim
8366289c89 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Cullen Rhodes
ed95f77522 [InlineCost] Fix scalable vectors in visitAlloca
Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848
2020-08-17 10:34:27 +00:00
Vitaly Buka
42145d2b15 [NFC][StackSafety] Move out sort from the loop 2020-08-17 03:30:14 -07:00
Vitaly Buka
6f71d99b21 [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Vitaly Buka
7e7fd55416 [StackSafety] Change how callee searched in index
Handle other than local linkage types.
2020-08-16 04:37:19 -07:00
Wenlei He
8c3d7a1d09 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743
2020-08-15 20:17:21 -07:00
Vitaly Buka
320ac778a0 [StackSafety] Use ValueInfo in ParamAccess::Call
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85269
2020-08-14 12:42:44 -07:00
Matt Morehouse
9fdad1bbaa Revert "[NFC][StackSafety] Move out sort from the loop"
This reverts commit 0426e28419799c35cf52fe3d773c5bab9928c699 due to ASan
buildbot failure.
2020-08-14 08:17:35 -07:00
Vitaly Buka
9ad15cd174 [NFC][StackSafety] Change map key comparison 2020-08-14 04:23:15 -07:00
Vitaly Buka
aecc9e0fb0 [NFC][StackSafety] Move out sort from the loop 2020-08-14 04:19:10 -07:00
Vitaly Buka
f936ac90a2 [NFC][StackSafety] Dedup callees 2020-08-14 01:14:52 -07:00
Dávid Bolvanský
7129f2d26c [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 19:54:27 +02:00
Simon Pilgrim
f02d625e34 Fix unused variable warning. NFC.
Reduce the dyn_cast<> to a isa<> as that's all non-assert builds require, and move the cast<> inside the assert.
2020-08-13 15:43:20 +01:00
Dávid Bolvanský
baa55bd4d6 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e732604cd6340061d40ac21e7e188e5. Sanitizer tests need to be updated.
2020-08-13 14:37:40 +02:00
Dávid Bolvanský
f4c1a714d0 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 14:23:58 +02:00
Dávid Bolvanský
aecc53e597 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f217e176b18e7bf6fe055154ac589c6.
2020-08-13 12:59:15 +02:00
Dávid Bolvanský
b38379d5d6 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 12:45:40 +02:00
Ali Tamur
e7d6dfa5d7 Revert "[SCEV] Look through single value PHIs."
This reverts commit e441b7a7a0a72c28daf5a8e594559c667e5b4534.

This patch causes a compile error in tensorflow opensource project. The stack trace looks like:

Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35

(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]

(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
  DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
        CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}

(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe

I will try to get a case from tensorflow or use creduce to get a small case.
2020-08-12 23:13:24 -07:00
Nikita Popov
3cea08454d [ValueTracking] Add abs intrinsics support to computeConstantRange()
Implementation is the same as for SPF_ABS.
2020-08-12 22:28:46 +02:00
Nikita Popov
2a8504a89f [ValueTracking] Support min/max intrinsics in computeConstantRange()
The implementation is the same as for the SPF_* case.
2020-08-12 22:07:29 +02:00
Craig Topper
0308690c50 Recommit "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and its follow up patches
This recommits the following patches now that D85684 has landed

1cf6f210a2e [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2d [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc9 [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2f [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329af [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
2020-08-12 10:45:27 -07:00
Florian Hahn
371c1e57df [SCEV] Look through single value PHIs.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.

This increases precision of the analysis in some cases.

Reviewed By: mkazantsev, bmahjour

Differential Revision: https://reviews.llvm.org/D71539
2020-08-12 10:03:42 +01:00
Nikita Popov
587bdc1d95 [InstSimplify] Respect CanUseUndef in more places
Similar to what we do in IIQ, add an isUndefValue() helper that
checks for undef values while respective CanUseUndef. This makes
it much easier to search for places that don't respect the flag
yet.
2020-08-11 21:53:33 +02:00
Dávid Bolvanský
ee8d84179f [BPI] Teach BPI about bcmp function
bcmp is similar to memcmp
2020-08-11 20:44:53 +02:00
Nikita Popov
577d874016 [InstSimplify] Forbid undef folds in expandBinOp
This is the replacement for D84250 based on D84792. As we recursively
fold with the same value twice, we need to disable undef folds,
to prevent an undef from being folded to two different values.

Reverting rG00f3579aea6e3d4a4b7464c3db47294f71cef9e4 and using the
test case from https://reviews.llvm.org/D83360#2145793, it no longer
performs the incorrect fold.

Differential Revision: https://reviews.llvm.org/D85684
2020-08-11 18:39:24 +02:00
Sanjay Patel
5b7d18ac79 [InstSimplify] fold min/max with matching min/max operands
I think this is the last remaining translation of an existing
instcombine transform for the corresponding cmp+sel idiom.

This interpretation is more general though - we can remove
mismatched signed/unsigned combinations in addition to the
more obvious cases.

min/max(X, Y) must produce X or Y as the result, so this is
just another clause in the existing transform that was already
matching a min/max of min/max.
2020-08-11 11:23:15 -04:00
Florian Hahn
fc2f262900 [SCEV] ] If RHS >= Start, simplify (Start smax RHS) to RHS for trip counts.
This is the max version of D85046.

This change causes binary changes in 44 out of 237 benchmarks (out of
MultiSource/SPEC2000/SPEC2006)

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85189
2020-08-11 13:20:24 +01:00
Juneyoung Lee
6e8034136f [LazyValueInfo] Let getEdgeValueLocal look into freeze instructions
This patch makes getEdgeValueLocal more precise when a freeze instruction is
given, by adding support for freeze into constantFoldUser

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84629
2020-08-11 16:39:34 +09:00
Thomas Lively
1da4ba1036 [WebAssembly][ConstantFolding] Fold fp-to-int truncation intrinsics
Constant fold both the trapping and saturating versions of the
WebAssembly truncation intrinsics. The tests are adapted from the
WebAssembly spec tests for the corresponding instructions.

Requested in PR46982.

Differential Revision: https://reviews.llvm.org/D85392
2020-08-10 12:40:05 -07:00
Mircea Trofin
0ec177f3b4 [NFC][MLInliner] remove curly braces for a few sinle-line loops 2020-08-10 09:32:21 -07:00
Mircea Trofin
f46e151000 [NFC][MLInliner] Set up the logger outside the development mode advisor
This allows us to subsequently configure the logger for the case when we
use a model evaluator and want to log additional outputs.

Differential Revision: https://reviews.llvm.org/D85577
2020-08-10 09:22:17 -07:00
Vitaly Buka
82a20d8b12 [NFC][StackSafety] Add a couple of early returns 2020-08-09 23:42:09 -07:00
Vitaly Buka
4f3a3f0cc2 [NFC][StackSafety] Count dataflow inputs 2020-08-09 23:32:41 -07:00
Vitaly Buka
55c588271d [StackSafety] Fix union which produces wrapped sets 2020-08-09 23:20:17 -07:00
Vitaly Buka
6795706f92 [NFC][StackSafety] Avoid assert in getBaseObjec 2020-08-09 23:20:17 -07:00
Vitaly Buka
00ccee5400 [StackSafety] Don't keep FullSet in index
Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09 15:01:46 -07:00
Florian Hahn
564f5c4ac7 [InstSimplify/NewGVN] Add option to control the use of undef.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84792
2020-08-09 19:16:56 +01:00
Vitaly Buka
3e043ca45f [NFC][StackSafety] Fix statistics 2020-08-07 16:18:52 -07:00
Mircea Trofin
be6dadeffd [NFC][MLInliner] Refactor logging implementation
This prepares it for logging externally-specified outputs.

Differential Revision: https://reviews.llvm.org/D85451
2020-08-07 14:56:56 -07:00
Vitaly Buka
6669d78639 Revert "[StackSafety] Skip ambiguous lifetime analysis"
This reverts commit 0b2616a8045cb776ea1514c3401d0a8577de1060.

Crashes with safe-stack.
2020-08-07 14:02:50 -07:00
Vitaly Buka
ae3ad63414 [StackSafety,NFC] Add Stats counters 2020-08-07 14:02:50 -07:00
Vitaly Buka
f8fc07d99e [StackSafety,NFC] Fix tests in debug 2020-08-06 20:46:39 -07:00
Vitaly Buka
15a533fc32 [StackSafety,NFC] Add debug counters 2020-08-06 19:24:02 -07:00
Vitaly Buka
4124f20201 [StackSafety,NFC] Use CHECK-EMPTY in tests 2020-08-06 19:19:51 -07:00
Vitaly Buka
3b944733de [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-06 19:10:33 -07:00
Vitaly Buka
567e88646c [LTO,NFC] Skip generateParamAccessSummary when empty
addGlobalValueSummary can check newly added FunctionSummary
and set HasParamAccess to mark that generateParamAccessSummary
is needed.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85182
2020-08-06 19:01:19 -07:00
Jessica Paquette
e43ddd00f6 [GlobalISel] Fix computing known bits for loads with range metadata
In GlobalISel, if you have a load into a small type with a range, you'll hit
an assert if you try to compute known bits on it starting at a larger type.

e.g.

```
%x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n)
...
%y:_(s32) = G_SOMETHING %x
```

When we walk through G_SOMETHING and hit the load, the width of our known bits
is 32. However, the width of the range is going to be 8. This will cause us
to hit an assert.

To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate
the range type to match the bitwidth of the known bits we're calculating.

Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this
works now.

https://reviews.llvm.org/D85375
2020-08-06 16:47:07 -07:00
Sanjay Patel
01843e6c65 [InstSimplify] avoid crashing by trying to rem-by-zero
Bug was noted in the post-commit comments for:
rGe8760bb9a8a3
2020-08-06 16:06:31 -04:00
Mircea Trofin
1f12cdca6a [NFC]{MLInliner] Point out the tests' model dependencies 2020-08-06 09:57:26 -07:00
Mircea Trofin
b790a0e2cc [llvm][MLInliner] Don't log 'mandatory' events
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.

Fixed the code, also expanded the test to capture this.

Differential Revision: https://reviews.llvm.org/D85373
2020-08-06 09:04:15 -07:00
David Green
77d21dcd3f [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-06 10:10:50 +01:00
Sanjay Patel
82aa9a6b56 [InstSimplify] fold icmp with mul nsw and constant operands
https://rise4fun.com/Alive/slvl

  Name: mul nsw with icmp eq
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nsw with icmp ne
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

Follow-up to the 'nuw' variation added with:
rGf879c9b79621
2020-08-05 14:38:39 -04:00
Sanjay Patel
60815c576a [InstSimplify] fold icmp with mul nuw and constant operands
https://rise4fun.com/Alive/pZEr

  Name: mul nuw with icmp eq
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nuw with icmp ne
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

There are potentially several other transforms we need to add based on:
D51625
...but it doesn't look like there was follow-up to that patch.
2020-08-05 14:32:17 -04:00
Jordan Rupprecht
eb9074b6d8 Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e41cb979a1fa6a79eb18145a75104933. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
  return ReductionOperations;
```
2020-08-05 10:24:15 -07:00
Mircea Trofin
3bd1a7f753 [TFUtils] Expose untyped accessor to evaluation result tensors
These were implementation detail, but become necessary for generic data
copying.

Also added const variations to them, and move assignment, since we had a
move ctor (and the move assignment helps in a subsequent patch).

Differential Revision: https://reviews.llvm.org/D85262
2020-08-05 10:22:45 -07:00
David Green
8e671cc375 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-05 18:14:05 +01:00
Sanjay Patel
fb7a244b8e [InstSimplify] reduce code duplication in simplifyICmpWithMinMax(); NFC 2020-08-05 11:39:28 -04:00
Evgeniy Brevnov
08e662f69a [BPI][NFC] Unify handling of normal and SCC based loops
This is one more NFC part extracted from D79485. Normal and SCC based loops have very different representation and have to be handled separatly each time we deal with loops. D79485 is going to introduce much more extensive use of loops what will be problematic with out this change.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84838
2020-08-05 11:19:24 +07:00
Mircea Trofin
96e978c534 [llvm] Expose type and element count-related APIs on TensorSpec
Added a mechanism to check the element type, get the total element
count, and the size of an element.

Differential Revision: https://reviews.llvm.org/D85250
2020-08-04 17:32:16 -07:00
Mircea Trofin
d0d6d07a22 [llvm][NFC] Moved implementation of TrainingLogger outside of its decl
Also renamed a method - printTensor - to print; and added comments.
2020-08-04 14:35:35 -07:00
Xavier Denis
fc5d02688b [InstSimplify] Peephole optimization for icmp (urem X, Y), X
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: https://github.com/rust-lang/rust/issues/74938

Differential Revision: https://reviews.llvm.org/D85092
2020-08-04 20:48:37 +02:00
Sanjay Patel
d960c0f146 [InstSimplify] refactor min/max folds with shared operand; NFC 2020-08-04 12:21:05 -04:00
Sanjay Patel
46f44cd641 [InstSimplify] fold nested min/max intrinsics with constant operands
This is based on the existing code for the non-intrinsic idioms
in InstCombine.

The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:

https://alive2.llvm.org/ce/z/-2bVbM
  define <2 x i8> @src(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    %m2 = umin <2 x i8> { 9, 9 }, %m
    ret <2 x i8> %m2
  }
  =>
  define <2 x i8> @tgt(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    ret <2 x i8> %m
  }
  Transformation doesn't verify!
  ERROR: Value mismatch

  Example:
  <2 x i8> %x = < undef, undef >

  Source:
  <2 x i8> %m = < #x00 (0)	[based on undef value], #x00 (0) >
  <2 x i8> %m2 = < #x00 (0), #x00 (0) >

  Target:
  <2 x i8> %m = < #x07 (7), #x10 (16) >
  Source value: < #x00 (0), #x00 (0) >
  Target value: < #x07 (7), #x10 (16) >
2020-08-04 08:44:48 -04:00
Sanjay Patel
5cdd72d32f [InstSimplify] reduce code for min/max analysis; NFC
This should probably be moved up to some common area eventually
when there's another user.
2020-08-04 08:02:33 -04:00
David Green
9fdafb1a2b [BasicAA] Enable -basic-aa-recphi by default
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
2020-08-04 10:43:42 +01:00
Alina Sbirlea
592b072474 [MemorySSA] Restrict optimizations after a PhiTranslation.
Merging alias results from different paths, when a path did phi
translation is not necesarily correct. Conservatively terminate such paths.
Aimed to fix PR46156.

Differential Revision: https://reviews.llvm.org/D84905
2020-08-03 14:46:41 -07:00
Sanjay Patel
e545f2a08e [InstSimplify] fold variations of max-of-min with common operand
https://alive2.llvm.org/ce/z/ZtxpZ3
2020-08-03 15:02:46 -04:00
Mircea Trofin
1cbf2902fb [llvm] Add a parser from JSON to TensorSpec
A JSON->TensorSpec utility we will use subsequently to specify
additional outputs needed for certain training scenarios.

Differential Revision: https://reviews.llvm.org/D84976
2020-08-03 09:49:31 -07:00
Florian Hahn
6d82efa764 [SCEV] If Start>=RHS, simplify (Start smin RHS) = RHS for trip counts.
In some cases, it seems like we can get rid of unnecessary s/umins by
using information from the loop guards (unless I am missing something).

One place where this seems to be helpful in practice is when computing
loop trip counts. This patch just changes howManyGreaterThans for now.
Note that this requires a loop for which we can check 'is guarded'.

On SPEC2000/SPEC2006/MultiSource, there are some notable changes for
some programs in the number of loops unrolled and trip counts computed.

```
Same hash: 179 (filtered out)
Remaining: 58
Metric: scalar-evolution.NumTripCountsComputed

Program                                        base    patch   diff
 test-suite...langs-C/compiler/compiler.test    25.00   31.00  24.0%
 test-suite.../Applications/SPASS/SPASS.test   2020.00 2323.00 15.0%
 test-suite...langs-C/allroots/allroots.test    29.00   32.00  10.3%
 test-suite.../Prolangs-C/loader/loader.test    17.00   18.00   5.9%
 test-suite...fice-ispell/office-ispell.test   253.00  265.00   4.7%
 test-suite...006/450.soplex/450.soplex.test   3552.00 3692.00  3.9%
 test-suite...chmarks/MallocBench/gs/gs.test   453.00  470.00   3.8%
 test-suite...ngs-C/assembler/assembler.test    29.00   30.00   3.4%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test   263.00  270.00   2.7%
 test-suite...rks/FreeBench/pifft/pifft.test   722.00  741.00   2.6%
 test-suite...count/automotive-bitcount.test    41.00   42.00   2.4%
 test-suite...0/253.perlbmk/253.perlbmk.test   1417.00 1451.00  2.4%
 test-suite...000/197.parser/197.parser.test   387.00  396.00   2.3%
 test-suite...lications/sqlite3/sqlite3.test   1168.00 1189.00  1.8%
 test-suite...000/255.vortex/255.vortex.test   173.00  176.00   1.7%

Metric: loop-unroll.NumUnrolled

Program                                        base   patch  diff
 test-suite...langs-C/compiler/compiler.test     1.00   3.00 200.0%
 test-suite.../Applications/SPASS/SPASS.test   134.00 234.00 74.6%
 test-suite...count/automotive-bitcount.test     3.00   4.00 33.3%
 test-suite.../Prolangs-C/loader/loader.test     3.00   4.00 33.3%
 test-suite...langs-C/allroots/allroots.test     3.00   4.00 33.3%
 test-suite...Source/Benchmarks/sim/sim.test    10.00  12.00 20.0%
 test-suite...fice-ispell/office-ispell.test    21.00  25.00 19.0%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    32.00  38.00 18.8%
 test-suite...006/450.soplex/450.soplex.test   300.00 352.00 17.3%
 test-suite...rks/FreeBench/pifft/pifft.test    60.00  69.00 15.0%
 test-suite...chmarks/MallocBench/gs/gs.test    57.00  63.00 10.5%
 test-suite...ngs-C/assembler/assembler.test    10.00  11.00 10.0%
 test-suite...0/253.perlbmk/253.perlbmk.test   145.00 157.00  8.3%
 test-suite...000/197.parser/197.parser.test    43.00  46.00  7.0%
 test-suite...TimberWolfMC/timberwolfmc.test   205.00 214.00  4.4%
 Geomean difference                                           7.6%
```

Fixes https://bugs.llvm.org/show_bug.cgi?id=46939
Fixes https://bugs.llvm.org/show_bug.cgi?id=46924 on X86.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D85046
2020-08-03 17:22:42 +01:00
Vitaly Buka
5f096386c6 [StackSafety, NFC] Don't insert empty objects into the map
Result should be the same but it makes generateParamAccessSummary 5x
faster.
2020-08-02 13:58:56 -07:00
Sanjay Patel
a4319ccb7e [InstSimplify] fold max (max X, Y), X --> max X, Y
https://alive2.llvm.org/ce/z/VGgG3M
2020-08-02 11:50:58 -04:00
Nikita Popov
86f94d1395 [InstSimplify] Reduce code duplication in icmp of binop folds (NFC)
For folds where we check for the binop on both the LHS and RHS,
extract a function that expects it on the LHS and call it with
swapped order.
2020-08-02 15:47:18 +02:00
Kazu Hirata
6e4cee6f1e Use llvm::is_contained where appropriate (NFC)
Use llvm::is_contained where appropriate (NFC)

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D85083
2020-08-01 21:51:06 -07:00
Craig Topper
183d6fbbe7 [InstSimplify] Fold abs(abs(x)) -> abs(x)
It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs.

Differential Revision: https://reviews.llvm.org/D85053
2020-08-01 13:25:00 -07:00
Sanjay Patel
afc3df8955 [InstSimplify] simplify abs if operand is known non-negative
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.

Differential Revision: https://reviews.llvm.org/D85043
2020-08-01 07:47:06 -04:00
Craig Topper
9eeafe7bcb [ValueTracking] Improve llvm.abs handling in computeKnownBits.
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.

Differential Revision: https://reviews.llvm.org/D85000
2020-07-31 15:55:03 -07:00
Sanjay Patel
65a64fcb5b [ConstantFolding] fold abs intrinsic
The handling for minimum value is similar to cttz/ctlz with 0 just above this case.

Differential Revision: https://reviews.llvm.org/D84942
2020-07-31 14:08:44 -04:00
Craig Topper
2c4eee97f8 [ValueTracking] Add ComputeNumSignBits support for llvm.abs intrinsic
If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1.

Differential Revision: https://reviews.llvm.org/D84971
2020-07-31 10:59:12 -07:00
Vitaly Buka
1bae08d2a5 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka
4ee4573a60 [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Vitaly Buka
0093612032 [ValueTracking] Remove AllocaForValue parameter
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.

In followup patches I'd like to replace the function with
GetUnderlyingObjects.

Depends on D84616.

Differential Revision: https://reviews.llvm.org/D84617
2020-07-30 18:48:34 -07:00
Vitaly Buka
fe28af466f [NFC] Move findAllocaForValue into ValueTracking.h
Differential Revision: https://reviews.llvm.org/D84616
2020-07-30 18:22:59 -07:00
Craig Topper
5b76494391 [ValueTracking] Add basic computeKnownBits support for llvm.abs intrinsic
This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.

Differential Revision: https://reviews.llvm.org/D84963
2020-07-30 16:26:54 -07:00
Florian Hahn
db60ce547b [LAA] Avoid adding pointers to the checks if they are not needed.
Currently we skip alias sets with only reads or a single write and no
reads, but still add the pointers to the list of pointers in RtCheck.

This can lead to cases where we try to access a pointer that does not
exist when grouping checks.  In most cases, the way we access
PositionMap masked that, as the value would default to index 0.

But in the example in PR46854 it causes a crash.

This patch updates the logic to avoid adding pointers for alias sets
that do not need any checks. It makes things slightly more verbose, by
first checking the numbers of reads/writes and bailing out early if we don't
need checks for the alias set.

I think this makes the logic a bit simpler to follow.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D84608
2020-07-30 19:21:14 +01:00
Yuanfang Chen
e1803bebb8 [NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.

Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)

This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.

This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.

The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)

Reviewed By: asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D84774
2020-07-30 10:07:57 -07:00
Mircea Trofin
ff4bf8bfb5 [llvm][NFC] TensorSpec abstraction for ML evaluator
Further abstracting the specification of a tensor, to more easily
support different types and shapes of tensor, and also to perform
initialization up-front, at TFModelEvaluator construction time.

Differential Revision: https://reviews.llvm.org/D84685
2020-07-29 16:29:21 -07:00
Sanjay Patel
486acf841a [InstSimplify] fold min/max intrinsic with undef operand 2020-07-29 17:03:50 -04:00
Sanjay Patel
b787afe9ff [InstSimplify] fold min/max with opposite of limit value 2020-07-29 17:03:50 -04:00
Nikita Popov
fca04145c2 [ConstantRange] Add API for intrinsics (NFC)
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
   constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
   ConstantRange, not any consumers.

Differential Revision: https://reviews.llvm.org/D84587
2020-07-29 22:16:27 +02:00
Craig Topper
7c95515f0a [LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable
This patch adds support for vectorizing these intrinsics.

Differential Revision: https://reviews.llvm.org/D84796
2020-07-29 10:23:07 -07:00
Sanjay Patel
26020de8d8 [InstSimplify] try constant folding intrinsics before general simplifications
This matches the behavior of simplify calls for regular opcodes -
rely on ConstantFolding before spending time on folds with variables.

I am not aware of any diffs from this re-ordering currently, but there was
potential for unintended behavior from the min/max intrinsics because that
code is implicitly assuming that only 1 of the input operands is constant.
2020-07-29 13:18:40 -04:00
Sanjay Patel
6422893580 [InstSimplify] allow partial undef constants for vector min/max folds 2020-07-29 11:53:41 -04:00
Sanjay Patel
41638d7550 [InstSimplify] fold integer min/max intrinsic with same args 2020-07-29 11:53:41 -04:00
Sanjay Patel
af9b68aef4 [ConstantFolding] fold integer min/max intrinsics
If both operands are undef, return undef.
If one operand is undef, clamp to limit constant.
2020-07-29 11:01:13 -04:00
David Green
49873f2449 [Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types.  Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization.  Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 13:32:53 +01:00
Sanjay Patel
5fbd1942fd [InstSimplify] allow undefs in icmp with vector constant folds
This is the main icmp simplification shortcoming seen in D84655.

Alive2 agrees that the basic examples are correct at least:

define <2 x i1> @src(<2 x i8> %x) {
%0:
  %r = icmp sle <2 x i8> { undef, 128 }, %x
  ret <2 x i1> %r
}
=>
define <2 x i1> @tgt(<2 x i8> %x) {
%0:
  ret <2 x i1> { 1, 1 }
}
Transformation seems to be correct!

define <2 x i1> @src(<2 x i32> %X) {
%0:
  %A = or <2 x i32> %X, { 63, 63 }
  %B = icmp ult <2 x i32> %A, { undef, 50 }
  ret <2 x i1> %B
}
=>
define <2 x i1> @tgt(<2 x i32> %X) {
%0:
  ret <2 x i1> { 0, 0 }
}
Transformation seems to be correct!

https://alive2.llvm.org/ce/z/omt2ee
https://alive2.llvm.org/ce/z/GW4nP_

Differential Revision: https://reviews.llvm.org/D84762
2020-07-28 15:13:53 -04:00
Evgeniy Brevnov
7532dd9434 [BPI] Fix memory leak reported by sanitizer bots
There is a silly mistake where release() is used instead of reset() for free resources of unique pointer.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D84747
2020-07-28 19:53:46 +07:00
Evgeniy Brevnov
06a0c0fb9c [BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84514
2020-07-28 17:42:33 +07:00
Alina Sbirlea
bb6015d9c6 [GraphDiff] Use class method getChildren instead of GraphTraits.
Summary:
Use getChildren() method in GraphDiff instead of GraphTraits.

This simplifies the code and allows for refactorigns inside GraphDiff.
All usecase need not have a light-weight/copyable range.
Clean GraphTraits implementation.

Reviewers: dblaikie

Subscribers: hiraditya, llvm-commits, george.burgess.iv

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84562
2020-07-27 16:12:34 -07:00
Kazu Hirata
0d01902704 Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Sergey Dmitriev
02505ee23a [CallGraph] Preserve call records vector when replacing call edge
Summary:
Try not to resize vector of call records in a call graph node when
replacing call edge. That would prevent invalidation of iterators
stored in the CG SCC pass manager's scc_iterator.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84295
2020-07-27 06:02:55 -07:00
Sanjay Patel
3c7b643ee6 [InstSimplify] fold integer min/max intrinsics with limit constant 2020-07-26 09:41:54 -04:00
Sanjay Patel
c36efeb0c6 [InstSimplify] fold fcmp using isKnownNeverInfinity + isKnownNeverNaN
Follow-up to D84035 / rG7393d7574c09.
This sidesteps a question of FMF/poison on fcmp raised in PR46077:
http://bugs.llvm.org/PR46077

https://alive2.llvm.org/ce/z/TCsyzD
  define i1 @src(float %x) {
  %0:
    %x42 = fadd nnan ninf float %x, 42.000000
    %r = fcmp ueq float %x42, inf
    ret i1 %r
  }
  =>
  define i1 @tgt(float %x) {
  %0:
    ret i1 0
  }
  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/FQaH7a
  define i1 @src(i8 %x) {
  %0:
    %cast = uitofp i8 %x to float
    %r = fcmp one float inf, %cast
    ret i1 %r
  }
  =>
  define i1 @tgt(i8 %x) {
  %0:
    ret i1 1
  }
  Transformation seems to be correct!
2020-07-26 09:04:37 -04:00
Juneyoung Lee
2b46786139 [ConstantFolding] Fold freeze if it is never undef or poison
This is a simple patch that adds constant folding for freeze
instruction.

IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze
constexpr.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84597
2020-07-26 21:54:44 +09:00
Juneyoung Lee
e7db975cc1 [ValueTracking] Instruction::isBinaryOp should be used for constexprs
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84596
2020-07-26 21:48:51 +09:00