1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00
Commit Graph

10700 Commits

Author SHA1 Message Date
Sanjay Patel
2a668ab374 [ValueTracking] soften assert for invertible recurrence matching
There's a TODO comment in the code and discussion in D99912
about generalizing this, but I wasn't sure how to implement that,
so just going with a potential minimal fix to avoid crashing.

The test is a reduction beyond useful code (there's no user of
%user...), but it is based on https://llvm.org/PR50191, so this
is asserting on real code.

Differential Revision: https://reviews.llvm.org/D101772
2021-05-03 15:57:40 -04:00
Juneyoung Lee
33fda542f4 Fix MSan crash after 1977c53b 2021-05-02 13:44:43 +09:00
Arthur Eubanks
0086e8ade4 [NFC] Use getParamByValType instead of pointee type
To reduce dependence on pointee types for opaque pointers.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101706
2021-05-01 21:22:41 -07:00
Juneyoung Lee
b4b7d6d0cc [ValueTracking] ctpop propagates poison
This is a patch that adds ctpop intrinsics to propagatesPoison.

Splitted from D101191
2021-05-02 13:04:37 +09:00
Juneyoung Lee
75528017f8 [ValueTracking] Improve impliesPoison to look into overflow intrinsics
This update supports the following transformation:

```
select(extract(mul_with_overflow(a, _), _), (a == 0), false)
=>
and(extract(mul_with_overflow(a, _), _), (a == 0))
```

which is correct because if `a` was poison the select's condition was
also poison.

This update is splitted from D101423.
2021-05-02 12:03:55 +09:00
Juneyoung Lee
c112624b0b [InstCombine] Fold overflow bit of [u|s]mul.with.overflow in a poison-safe way
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check:
```
  %Op0 = icmp ne i4 %X, 0
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = select i1 %Op0, i1 %Op1, i1 false
=>
  %Y.fr = freeze %Y
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = %Op1
```

https://alive2.llvm.org/ce/z/zgPUGT
https://alive2.llvm.org/ce/z/h2gZ_6

Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`.
In this case, LLVM is already good because `%ret` is already successfully folded into `and`,
triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K

Differential Revision: https://reviews.llvm.org/D101423
2021-05-02 11:54:12 +09:00
Nikita Popov
adfe3b159b [LVI] Handle mask not equal zero conditions
If V & Mask != 0, we know that at least one of the bits in Mask
must be set, so the value must be >= the lowest bit in Mask.
2021-05-01 23:08:49 +02:00
Nikita Popov
b39114fa53 [SCEV] Simplify backedge count clearing (NFC)
This seems to be a leftover from when the BackedgeTakenInfo
stored multiple exit counts with manual memory management. At
some point this was switchted to a simple vector, and there should
be no need to micro-manage the clearing anymore. We can simply
drop the loop from the map and the the destructor do its job.
2021-05-01 17:50:01 +02:00
Adrian Prantl
32b1ee25e7 Revert "[VP,Integer,#2] ExpandVectorPredication pass"
This reverts commit 43bc584dc05e24c6d44ece8e07d4bff585adaf6d.

The commit broke the -DLLVM_ENABLE_MODULES=1 builds.

http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31603/consoleFull#2136199809a1ca8a51-895e-46c6-af87-ce24fa4cd561
2021-04-30 17:02:28 -07:00
Nikita Popov
8d71021388 [ValueTracking] Slightly clean up programUndefinedIfUndefOrPoison() (NFC)
Use contains() to check set membership, and adjust an oddly
structured loop.
2021-04-30 23:05:41 +02:00
Nikita Popov
1aee99e8ca [ValueTracking] Limit scan when checking poison UB (PR50155)
The current code can scan an unlimited number of instructions,
if the containing basic block is very large. The test case from
PR50155 contains a basic block with approximately 100k instructions.

To avoid this, limit the number of instructions we inspect. At
the same time, drop the limit on the number of basic blocks, as
this will be implicitly limited by the number of instructions as
well.
2021-04-30 23:04:49 +02:00
Duncan P. N. Exon Smith
ee59f57519 Support: Stop using F_{None,Text,Append} compatibility synonyms, NFC
Stop using the compatibility spellings of `OF_{None,Text,Append}`
left behind by 1f67a3cba9b09636c56e2109d8a35ae96dc15782. A follow-up
will remove them.

Differential Revision: https://reviews.llvm.org/D101650
2021-04-30 11:00:03 -07:00
Simon Moll
63ed8031a5 [VP,Integer,#2] ExpandVectorPredication pass
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203
2021-04-30 15:47:28 +02:00
Roman Lebedev
9713d9790b [InlineCost] CallAnalyzer: use TTI info for extractvalue - they are free (PR50099)
It seems incorrect to use TTI data in some places,
and override it in others. In this case, TTI says
that `extractvalue` are free, yet we bill them.

While this doesn't address https://bugs.llvm.org/show_bug.cgi?id=50099 yet,
it reduces the cost from 55 to 50 while the threshold is 45.

Differential Revision: https://reviews.llvm.org/D101228
2021-04-30 13:55:11 +03:00
Arthur Eubanks
c48d82de15 [InlineCost] Remove visitUnaryInstruction()
The simplifyInstruction() in visitUnaryInstruction() does not trigger
for all of check-llvm. Looking at all delegates to UnaryInstruction in
InstVisitor, the only instructions that either don't have a visitor in
CallAnalyzer, or redirect to UnaryInstruction, are VAArgInst and Alloca.
VAArgInst will never get simplified, and visitUnaryInstruction(Alloca)
would always return false anyway.

Reviewed By: mtrofin, lebedev.ri

Differential Revision: https://reviews.llvm.org/D101577
2021-04-29 20:33:30 -07:00
jasonliu
f6fc5dd17d [XCOFF] Handle the case when personality routine is an alias
Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D101401
2021-04-29 22:03:30 +00:00
Philip Reames
a17954e6f1 Revert "Generalize getInvertibleOperand recurrence handling slightly"
This reverts commit 0c01b37eeb18a51a7e9c9153330d8009de0f600e while a problem reported is investigated.
2021-04-29 13:06:26 -07:00
Sanjay Patel
8e0f77764c [ConstantFolding] propagate poison through vector reduction intrinsics 2021-04-29 12:54:20 -04:00
Sanjay Patel
b8ab3f33f1 [ConstantFolding] refactor helper for vector reductions; NFC
We should handle other cases (undef/poison), so reduce
the duplication of repeated switches.
2021-04-29 12:09:22 -04:00
Craig Topper
dd5a612546 [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Philip Reames
5b3993fcf6 Generalize getInvertibleOperand recurrence handling slightly
Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884
2021-04-28 14:38:07 -07:00
Philip Reames
f9ae80f6b1 [SCEV] Avoid range intersection idiom in getRangeForUnkownRecurrence [NFC]
Addresses a review comment from D101181
2021-04-28 12:48:17 -07:00
Philip Reames
cf98cdf20e [SCEV] Compute ranges for ashr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl. This was originally posted as part of D99687, but split off for ease of review.

(I also decided to exclude the unknown start sign case explicitly for simplicity of understanding.)

Differential Revision: https://reviews.llvm.org/D101181
2021-04-28 12:36:20 -07:00
Florian Hahn
939b677f1c [LAA] Support pointer phis in loop by analyzing each incoming pointer.
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101286
2021-04-28 20:19:40 +01:00
Arthur Eubanks
6c8d16d78d [ConstFold] Use const-folded operands in more places
Previously we were const folding operands but not passing them.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101394
2021-04-27 14:30:19 -07:00
Nikita Popov
e7bbddfa9f [SCEV] Handle uge/ugt predicates in applyLoopGuards()
These can be handled the same way as ule/ult, just using umax
instead of umin. This is useful in cases where the umax prevents
the upper bound from overflowing.

Differential Revision: https://reviews.llvm.org/D101196
2021-04-27 22:41:05 +02:00
Andy Kaylor
64bce9007c [Dependence Analysis] Fix ExactSIV producing wrong analysis
Patch by Artem Radzikhovskyy!

Symptom: ExactSIV test produced incorrect analysis of dependencies see LIT tests
Bug: At the end of the algorithm when determining dependence direction original author forgot to divide intermediate results by gcd and round result toward zero

Although this bug can be fixed with significantly fewer changes I opted to write the code in such a way that reflects the original algorithm that Banerjee proposed, for easier reference in the future. This surprisingly results in shorter code, and fewer quotient and max/min calculations.

Changes Summary:

- fixed findGCD to return valid x and y so that they match the function description where: ax - by = gcd(a,b)
- Fixed ExactSIV test, to produce proper results
- Documented the extension of Banerjee's algorithm that the original code author introduced. Banerjee's original algorithm only tested whether Dst depends on Src, the extension also allows us to test whether Src depends on Dst, in one pass.
- ExactRDIV test worked fine. Since it uses findGCD(), it needed to be updated.Since ExactRDIV test has very few changes from the core algorithm of ExactSIV I modified the test to have consistent format as ExactSIV.
- Updated the LIT tests to be testing for correct values.

Differential Revision: https://reviews.llvm.org/D100331
2021-04-27 12:24:00 -07:00
dfukalov
584872cffa [TTI] NFC: Change getScalarizationOverhead and getOperandsScalarizationOverhead to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101283
2021-04-27 08:51:48 +03:00
Hongtao Yu
7416a011e2 [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2.
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.

The optimizations unblocked are:
		- In-block load propagation.
		- In-block dead store elimination
		- Memory copy optimization that turns stores to consecutive memories into a memset.

These optimizations are local to a block, so they shouldn't affect the profile quality.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100075
2021-04-26 16:52:33 -07:00
Vineet Kumar
01779a312f Implementation for TargetTransformInfo::hasActiveVectorLength()
This patch adds the missing implementation for
TargetTransformInfo::hasActiveVectorLength() without which using
hasActiveVectorLength() causes linker error.

Patch by Vineet Kumar!

Differential Revision: https://reviews.llvm.org/D100941
2021-04-26 21:20:05 +00:00
Nikita Popov
de1f20abe1 [SCEV] Fix applyLoopGuards() chaining for ne predicates
ICMP_NE predicates directly overwrote the rewritten result,
instead of chaining it with previous rewrites, as was done for
ICMP_ULT and ICMP_ULE. This means that some guards were effectively
discarded, depending on their order.
2021-04-24 21:43:46 +02:00
dfukalov
4398288fa7 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
Sander de Smalen
b71b6a828f [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen
b447679db3 [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Sander de Smalen
a6cde052ae [TTI] NFC: Change getMemcpyCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100563
2021-04-23 16:06:35 +01:00
Sander de Smalen
9eda3a0c2e [TTI] NFC: Change getGEPCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100562
2021-04-23 16:06:35 +01:00
Sander de Smalen
9390d5a48f [TTI] NFC: Change getAddressComputationCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100561
2021-04-23 16:06:35 +01:00
dfukalov
9779f666df [TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D101151
2021-04-23 18:02:00 +03:00
Philip Reames
555456c598 [SCEV] Compute ranges for lshr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl.

Differential Revision: https://reviews.llvm.org/D99687
2021-04-22 11:06:31 -07:00
Wenlei He
e734b4c21b [CSSPGO][llvm-profdata] Support trimming cold context when merging profiles
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.

Differential Revision: https://reviews.llvm.org/D100528
2021-04-22 00:42:37 -07:00
Sanjay Patel
b3bc645e79 [InstSimplify] generalize ctlz-of-shifted-constant
https://alive2.llvm.org/ce/z/zWL_VQ
2021-04-21 14:23:55 -04:00
Nico Weber
25b1225bca [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
Yang Fan
0c89a953db [SCEV] Fix -Wunused-variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp: In member function ‘const llvm::SCEV* llvm::ScalarEvolution::getLosslessPtrToIntExpr(const llvm::SCEV*, unsigned int)::SCEVPtrToIntSinkingRewriter::visitUnknown(const llvm::SCEVUnknown*)’:
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:1152:13: warning: unused variable ‘ExprPtrTy’ [-Wunused-variable]
 1152 |       Type *ExprPtrTy = Expr->getType();
      |             ^~~~~~~~~
```
2021-04-21 16:01:46 +08:00
Nikita Popov
f8a60b9733 Revert "[InstSimplify] Bypass no-op and-mask, using known bits (PR49543)"
This reverts commit ea1a0d7c9ae3e5232a4163fc67efad4aabd51f2b.

While this is strictly more powerful, it is also strictly slower.
InstSimplify intentionally does not perform many folds that it
is allowed to perform, if doing so requires a KnownBits calculation
that will be repeated in InstCombine.

Maybe it's worthwhile to do this here, but that needs a more
explicitly stated motivation, evaluated in a review.
2021-04-21 09:55:25 +02:00
Philip Reames
5209f18c1c Revert "Allow invokable sub-classes of IntrinsicInst"
This reverts commit d87b9b81ccb95217181ce75515c6c68bbb408ca4.

Post commit review raised concerns, reverting while discussion happens.
2021-04-20 15:38:38 -07:00
Philip Reames
4f3ea7d288 Allow invokable sub-classes of IntrinsicInst
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.

This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.

After this lands and has baked for a couple days, planned cleanups:
    Make GCStatepointInst a IntrinsicInst subclass.
    Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
    Do the same in SelectionDAG.
    Do the same in FastISEL.

Differential Revision: https://reviews.llvm.org/D99976
2021-04-20 15:03:49 -07:00
Roman Lebedev
b7b50a3bdd [InstSimplify] Bypass no-op and-mask, using known bits (PR49543)
We already special-cased a few interesting patterns,
but that is strictly less powerful than using KnownBits.

So instead get the known bits for the operand of `and`,
and iff all the unset bits of the `and`-mask are known to be zeros
in the operand, we can omit said `and`.
2021-04-21 00:31:46 +03:00
Philip Reames
6215df8065 Reapply "Look through invertible recurrences in isKnownNonEqual"
I'd reverted this in commit 3b6acb179708ea2f3caf95ace0f134fcbc460333 due to buildbot failures.  This patch contains the fix for said issue.  I'd forgotten to handle the case where two phis in the same block have different operand order.  We canonicalize away from this, but it's still valid IR.  The tests included in this change (as opposed to simply having test output changed), crashed without the fix.

Original commit message follows...

This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 12:47:59 -07:00
Philip Reames
952b6e81cd Revert "Look through invertible recurrences in isKnownNonEqual"
This reverts commit be20eae25f50f5ef648aeefa1143e1c31e4410fc.  It appears to have caused a crash on a buildbot (https://lab.llvm.org/buildbot#builders/77/builds/5653).  Reverting while investigating.
2021-04-20 11:47:10 -07:00
Philip Reames
de5d0e597e Rearrange code to reduce diff for D99687 [nfc]
Adding the switches to reduce diffs.  I'm about to split that into an lshr part and an ashr part, doing the NFC part first makes it easier to maintain both diffs.
2021-04-20 11:40:15 -07:00
Roman Lebedev
b30f6f3772 [NFC][SCEV] Split getLosslessPtrToIntExpr out of getPtrToIntExpr() 2021-04-20 21:29:21 +03:00
Philip Reames
c647071b9b Look through invertible recurrences in isKnownNonEqual
This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 10:52:22 -07:00
Dávid Bolvanský
5cbb47ede4 [MemoryBuiltins] Added support for memalign
memalign is older aligned_alloc.
2021-04-20 12:39:54 +02:00
Joe Ellis
a248bdd4af [AArch64] Constant fold sve_convert_from_svbool(zero) to zero
Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100463
2021-04-20 10:02:49 +00:00
Arthur Eubanks
8d6572654c Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Sanjay Patel
5c6b4c85d3 [LowerConstantIntrinsics] avoid crashing on alloca with unexpected operand type
The test here is reduced from the fuzzer-generated crasher in:
https://llvm.org/PR50023
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33395

I don't know if this is the best or complete solution, but the
zext of the i42 type appears to match the behavior if we run a
weird type example like this through the IR optimizer with -O1.

Differential Revision: https://reviews.llvm.org/D100766
2021-04-19 13:06:29 -04:00
Roman Lebedev
a1bd0e9ac4 [NFC][SCEV] Assert that we don't try to create SCEVPtrToIntExpr of a non-integral pointer
ptr<->int casts are only valid for integral pointes,
defensively assert that we don't try to break that here.
2021-04-19 18:38:38 +03:00
Simon Pilgrim
0dc98b49a6 [Analysis] ImportedFunctionsInliningStatistics.h - add <memory> and remove unused <string> include. NFCI.
Move <string> include to ImportedFunctionsInliningStatistics.cpp and add missing <memory> include as we have explicit uses of std::unique_ptr in the header.
2021-04-19 16:20:56 +01:00
Cullen Rhodes
9d799468bf [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
Roman Lebedev
5a174914f6 Revert "[SCEV] Model ashr exact x, C as (abs(x) EXACT/u (1<<C)) * signum(x)"
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..

It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.

This reverts commit ec54867df5e7f20e12146e628af34f0384308bcb.
2021-04-18 16:26:45 +03:00
Serge Guelton
b2fe6dcbc2 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Thomas Lively
135ccc6ee6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Sanjay Patel
8bb5944166 [ValueTracking] don't recursively compute known bits using multiple llvm.assumes
This is an alternative to D99759 to avoid the compile-time explosion seen in:
https://llvm.org/PR49785

Another potential solution would make the exclusion logic stronger to avoid
blowing up, but note that we reduced the complexity of the exclusion mechanism
in D16204 because it was too costly.

So I'm questioning the need for recursion/exclusion entirely - what is the
optimization value vs. cost of recursively computing known bits based on
assumptions?
This was built into the implementation from the start with 60db058,
and we have kept adding code/cost to deal with that capability.

By clearing the query's AssumptionCache inside computeKnownBitsFromAssume(),
this patch retains all existing assume functionality except refining known
bits based on even more assumptions.

We have 1 regression test that shows a difference in optimization power.

Differential Revision: https://reviews.llvm.org/D100573
2021-04-16 08:43:35 -04:00
Mircea Trofin
ed178714f6 [MLGO] Fix use of AM.invalidate post D100519
The ML inline advisors more aggressively invalidate certain analyses
after each call site inlining, to more accurately capture the problem
state.
2021-04-15 18:45:39 -07:00
Arthur Eubanks
2a2896669d [NewPM] Cleanup IR printing instrumentation
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).

The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.

There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".

To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D100231
2021-04-15 09:50:55 -07:00
dfukalov
7b0a514671 [AA] Updates for D95543.
Addressing latter comments in D95543:
- `AliasResult::Result` renamed to `AliasResult::Kind`
- Offset printing added for `PartialAlias` case in `-aa-eval`
- Removed VisitedPhiBBs check from BasicAA'

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D100454
2021-04-15 12:22:03 +03:00
Nikita Popov
79b9e7549e Revert "[SCEV] Don't walk uses of phis without SCEV expression when forgetting"
This reverts commit faf9f11589ce892b31d271917cf840f8ca903221.

Issues with this patch have been reported in
https://reviews.llvm.org/D100264#2689917 and
https://bugs.llvm.org/show_bug.cgi?id=49967.
2021-04-15 09:43:52 +02:00
William S. Moses
706fc0221d [SROA][TBAA] Handle shift of regular TBAA nodes
SROA shifts TBAA nodes in a way that may present a problem for !tbaa but not !tbaa.struct nodes.

Differential Revision: https://reviews.llvm.org/D99851
2021-04-14 14:35:20 -04:00
Nikita Popov
e6b1858425 [ValueTracking] Don't require strictly positive for mul nsw recurrence
Just like in the mul nuw case, it's sufficient that the step is
non-zero. If the step is negative, then the values will jump
between positive and negative, "crossing" zero, but the value of
the recurrence is never actually zero.
2021-04-14 19:39:59 +02:00
Nikita Popov
14134a9eac [ValueTracking] Don't require non-zero step for add nuw
It's okay if the step is zero, we'll just stay at the same non-zero
value in that case. The valuable part of this is that the step
doesn't even need to be a constant anymore.
2021-04-14 19:06:18 +02:00
Sander de Smalen
fed6e0656a [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen
995f28b2cc [TTI] NFC: Change getFPOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D100316
2021-04-14 17:20:36 +01:00
Sander de Smalen
5aaf843fd4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen
40eaded9c5 [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen
2dfc199700 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
Sander de Smalen
882e61a85c [TTI] NFC: Change getCallInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D100312
2021-04-14 17:20:34 +01:00
Sanjay Patel
f26eaa7622 [InstSimplify] improve efficiency for detecting non-zero value
Stepping through callstacks in the example from D99759 reveals
this potential compile-time improvement.

The savings come from avoiding ValueTracking's computing known
bits if we have already dealt with special-case patterns.

Further improvements in this direction seem possible.

This makes a degenerate test based on PR49785 about 40x faster
(25 sec -> 0.6 sec), but it does not address the larger question
of how to limit computeKnownBitsFromAssume(). Ie, the original
test there is still infinite-time for all practical purposes.

Differential Revision: https://reviews.llvm.org/D100408
2021-04-14 09:04:15 -04:00
Sanjay Patel
7d78140278 [ValueTracking] match negative-stepping non-zero recurrence
This is pulled out of D100408.

This avoids a regression that would be exposed by making the
calling code from InstSimplify more efficient.
2021-04-14 08:57:53 -04:00
Sanjay Patel
b568083407 [ValueTracking] reduce code duplication; NFC
The start value can't be null for something to be a non-zero
recurrence, so hoist that common check out of the switch.

Subsequent checks may be incomplete or over-specified as noted in:
D100408
2021-04-14 08:32:42 -04:00
Philip Reames
3ccbf06fa0 fix whitespace type 2021-04-13 19:02:41 -07:00
Nikita Popov
a79b75d192 [SCEV] Don't walk uses of phis without SCEV expression when forgetting
I've run into some cases where a large fraction of compile-time is
spent invalidating SCEV. One of the causes is forgetLoop(), which
walks all values that are def-use reachable from the loop header
phis. When invalidating a topmost loop, that might be close to all
values in a function. Additionally, it's fairly common for there to
not actually be anything to invalidate, but we'll still be performing
this walk again and again.

My first thought was that we don't need to continue walking the uses
if the current value doesn't have a SCEV expression. However, this
isn't quite right, because SCEV construction can skip over values
(e.g. for a chain of adds, we might only create a SCEV expression
for the final value).

What this patch does instead is to only walk the (full) def-use chain
of loop phis that have a SCEV expression. If there's no expression
for a phi, then we also don't have any dependent expressions to
invalidate.

Differential Revision: https://reviews.llvm.org/D100264
2021-04-13 20:28:17 +02:00
Sander de Smalen
7016294be8 [TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100205
2021-04-13 14:21:02 +01:00
Sander de Smalen
32a70b87c0 [TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100204
2021-04-13 14:21:01 +01:00
Sander de Smalen
c121181edc [TTI] NFC: Change getCmpSelInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100203
2021-04-13 14:21:01 +01:00
Sander de Smalen
be189a5f92 [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen
7bd4ae29c1 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen
ee0f916cf3 [TTI] NFC: Change getGatherScatterOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100200
2021-04-13 14:20:59 +01:00
Sander de Smalen
cb03a318d0 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Gulfem Savrun Yeniceri
c650eac142 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-04-13 01:29:41 +00:00
Yuanfang Chen
4d84232908 Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom""
This reverts commit a3fabc79ae9d7dd76545b2abc2a3bfb66c6d3175 (relands
f4d682d6ce6c5b3a41a0acf297507c82f5c21eef with fix for the compile-time
regression issue).
2021-04-12 14:50:54 -07:00
Nikita Popov
bc4863dd89 Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"
This reverts commit f4d682d6ce6c5b3a41a0acf297507c82f5c21eef.

This caused a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions

Possibly this is due to overeager parsing of target triples.
2021-04-12 22:55:59 +02:00
Arthur Eubanks
de5a1417da [Inliner] Propagate SROA analysis through invariant group intrinsics
SROA can handle invariant group intrinsics, let the inliner know that
for better heuristics when the intrinsics are present.

This fixes size issues in a couple files when turning on
-fstrict-vtable-pointers in Chrome.

Reviewed By: rnk, mtrofin

Differential Revision: https://reviews.llvm.org/D100249
2021-04-12 10:54:22 -07:00
Hamza Sood
34a3716b46 Replace uses of std::iterator with explicit using
This patch removes all uses of `std::iterator`, which was deprecated in C++17.
While this isn't currently an issue while compiling LLVM, it's useful for those using LLVM as a library.

For some reason there're a few places that were seemingly able to use `std` functions unqualified, which no longer works after this patch. I've updated those places, but I'm not really sure why it worked in the first place.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D67586
2021-04-12 10:47:14 -07:00
Yuanfang Chen
9c6645c02c [InstCombine] when calling conventions are compatible, don't convert the call to undef idiom
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).

Differential Revision: https://reviews.llvm.org/D99773
2021-04-12 09:32:23 -07:00
Roman Lebedev
645cab47a5 [NFCI][DomTreeUpdater] applyUpdates(): reserve space for updates first
While, indeed, we may end up pushing less updates that we'd reserve space
for, self-dominating updates aren't often enough for that to matter.
But this should matter for normal updates.
2021-04-11 23:56:22 +03:00
Roman Lebedev
fcd012c5b0 [CVP] @llvm.[us]{min,max}() intrinsics handling
If we can tell that either one of the arguments is taken,
bypass the intrinsic.

Notably, we are indeed fine with non-strict predicate:
* UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf
      https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i
* UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx
* SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W
* SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr

Much like with all other comparison handling in CVP,
while we could sort-of handle two Value's,
at least for plain ICmpInst it does not appear to be worthwhile.

This only fires 78 times on test-suite + dt + rs,
but we don't canonicalize to these yet. (only SCEV produces them)
2021-04-11 00:33:47 +03:00
Nikita Popov
28b15148fb [IVUsers] Check LoopSimplify cache earlier (NFC)
Check the cache before calling isLoopSimplifyForm(). Otherwise we'd
always perform the check for the innermost loop and only skip it
for dominating loops.
2021-04-10 22:58:13 +02:00
Roman Lebedev
d9b2a4b5e7 [NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:38:55 +03:00
Roman Lebedev
4fdceaffc0 Revert "[NFC][ConstantRange] Add 'icmp' helper method"
This reverts commit 17cf2c94230bc107e7294ef84fad3b47f4cd1b73.
2021-04-10 19:37:53 +03:00
Roman Lebedev
c6f9ab66c8 [NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:09:52 +03:00
dfukalov
5a24e56a24 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Roman Lebedev
f51bde334f [Analysis] isDereferenceableAndAlignedPointer(): recurse into select's hands
By doing this within the method itself,
we support traversing multiple levels of selects (TODO: PHI's),
fixing the SROA `std::clamp()` testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47271
Mostly fixes https://bugs.llvm.org/show_bug.cgi?id=49909
2021-04-10 00:56:28 +03:00
Alina Sbirlea
6b9a9ec24d [MSSA] Rename uses in IDF regardless of new def position in basic block.
When inserting a new def and renaming of uses is asked, always compute
IDF and do the renaming for the blocks with Phis in that IDF.
Resolves PR49859.

Differential Revision: https://reviews.llvm.org/D100163
2021-04-09 12:32:37 -07:00
Momchil Velikov
c20a0e76af For non-null pointer checks, do not descend through out-of-bounds GEPs
In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.

This issue appeared as miscompilation of the following test case:

int puts(const char *);

typedef struct iter {
  int *val;
} iter_t;

static long distance(iter_t first, iter_t last) {
  long r = 0;
  for (; first.val != last.val; first.val++)
    ++r;
  return r;
}

int main() {
  int arr[2] = {0};
  iter_t i, j;
  i.val = arr;
  j.val = arr + 1;
  if (distance(i, j) >= 2)
    puts("failed");
  else
    puts("passed");
}

This fixes PR49662.

Differential Revision: https://reviews.llvm.org/D99642
2021-04-09 14:09:23 +01:00
dfukalov
1f5e832e39 [AA][NFC] Convert AliasResult to class containing offset for PartialAlias case.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98718
2021-04-09 13:26:09 +03:00
dfukalov
cb0d7fd331 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Arthur Eubanks
1debc51397 [GVN] Properly invalidate ICF cache when we simplify a value
This fixes a "Cached first special instruction is wrong!" assert.

The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.

Reviewed By: nikic, rnk

Differential Revision: https://reviews.llvm.org/D99977
2021-04-08 14:01:57 -07:00
Stanislav Mekhanoshin
3071fb3566 Set IgnoreLLVMUsed to false in CallGraph::addToCallGraph()
clang++ uses llvm.compiler.used in certain cases to preserve
symbol which is fully inlined. D96087 has resulted in undefined
symbols in such cases. Set it to false by default to preserve
old behavior but keep the option for specific uses where we
want to ignore these (e.g. to detect a potential indirect call
to a function).

Differential Revision: https://reviews.llvm.org/D99897
2021-04-08 11:14:09 -07:00
Max Kazantsev
c4958f4b06 [SCEV] Fix false-positive recognition of simple recurrences. PR49856
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence  which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.

Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.

This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).

Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
2021-04-07 13:55:17 +07:00
Philip Reames
09ede8de1f Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Philip Reames
631f83b6d7 Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Philip Reames
411b9be9f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Florian Hahn
5c0788f272 [SimplifyInst] Use correct type for GEPs with vector indices.
The current code does not properly handle vector indices unless they are
the first index.

At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).

But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.

This patch updates SimplifyGEPInst to properly handle those additional
cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99961
2021-04-06 17:56:10 +01:00
Philip Reames
5c891f93c3 Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc]
These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.
2021-04-06 08:33:15 -07:00
Kerry McLaughlin
fdd1bfb79d [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Kerry McLaughlin
92b30fa263 [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Simon Pilgrim
8cd06abbb1 [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Philip Reames
a0ffcb69ee Comment adjustments for a rename 2021-04-05 21:07:42 -07:00
Philip Reames
103039b1aa Exact ashr/lshr don't loose any set bits and are thus trivially invertible
Use that fact to improve isKnownNonEqual.
2021-04-05 19:22:36 -07:00
Philip Reames
7f5ff4ee52 Address minor post commit feedback on 0e59dd 2021-04-05 18:22:17 -07:00
Sanjay Patel
032ed4beee [InstSimplify] fix potential miscompile in select value equivalence
This is the sibling fix to c590a9880d7a -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
2021-04-05 16:52:34 -04:00
Philip Reames
25318785e3 Extract a helper for figuring out if an operator is invertible [nfc]
For use in an uncoming patch.  Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch.  We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.
2021-04-05 12:14:21 -07:00
Nikita Popov
ceee5076bf [LVI] Don't bail on overdefined value in select
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.

The slot poisoning comment refers to a much older LVI cache
implementation.
2021-04-04 11:11:01 +02:00
Mircea Trofin
f5b32d9a38 [mlgo] fix build rules
This was prompted by D95727, which had the side-effect to break the
'release' mode build bot for ML-driven policies. The problem is that now
the pre-compiled object files don't get transitively carried through as
'source' anymore; that being said, the previous way of consuming them
was problematic, because it was only working for static builds; in
dynamic builds, the whole tf_xla_runtime was linked, which is
undesirable.

The alternative is to treat tf_xla_runtime as an archive, which then
leads to the desired effect.

Differential Revision: https://reviews.llvm.org/D99829
2021-04-03 12:49:03 -07:00
Nikita Popov
725bd76263 [Loads] Forward constant vector store to load of first element
InstCombine performs simple forwarding from stores to loads, but
currently only handles the case where the load and store have the
same size. This extends it to also handle a store of a constant
with a larger size followed by a load with a smaller size.

This is implemented through ConstantFoldLoadThroughBitcast() which
is fairly primitive (e.g. does not allow storing a large integer
and then loading a small one), but at least can forward the first
element of a vector store. Unfortunately it seems that we currently
don't have a generic helper for "read a constant value as a different
type", it's all tangled up with other logic in either
ConstantFolding or VNCoercion.

Differential Revision: https://reviews.llvm.org/D98114
2021-04-03 12:10:31 +02:00
Nikita Popov
108a3b9085 [BasicAA] Don't store AATags in cache key (NFC)
The AAMDNodes part of the MemoryLocation is not used by the BasicAA
cache, so don't store it. This reduces the size of each cache entry
from 112 bytes to 48 bytes.
2021-04-03 11:32:01 +02:00
Nikita Popov
4c92cfe28b [BasicAA] Don't pass through AA metadata (NFCI)
BasicAA itself doesn't make use of AA metadata, but passes it
through to recursive queries and makes it part of the cache key.
Aliasing decisions that are based on AA metadata (i.e. TBAA and
ScopedAA) are based *only* on AA metadata, so checking them with
different pointer values or sizes is not useful, the result will
always be the same.

While this change is a mild compile-time improvement by itself,
the actual goal here is to reduce the size of AA cache keys in
a followup change.

Differential Revision: https://reviews.llvm.org/D90098
2021-04-03 11:21:50 +02:00
Simon Pilgrim
cad94dd97b [KnownBits] Add KnownBits::haveNoCommonBitsSet helper. NFCI.
Include exhaustive test coverage.
2021-04-02 21:44:33 +01:00
Nikita Popov
08587aa237 [LVI] Use range metadata on intrinsics
If we don't know how to handle an intrinsic, we should still
make use of normal call range metadata.
2021-04-02 16:45:31 +02:00
Sander de Smalen
1ba98aa252 Always emit error for wrong interfaces to scalable vectors, unless cmdline flag is passed.
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.

This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.

The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.

This allows us to do away with the following pattern in many of the SVE tests:
  RUN: .... 2>%t
  RUN: cat %t | FileCheck --check-prefix=WARN
  WARN-NOT: warning: ...

The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.

This patch also has fixes the following tests:
 unittests:
   ScalableVectorMVTsTest.SizeQueries
   SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
   AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG

 regression tests:
   Transforms/InstCombine/vscale_gep.ll

Reviewed By: paulwalker-arm, ctetreau

Differential Revision: https://reviews.llvm.org/D98856
2021-04-02 10:55:22 +01:00
Philip Reames
5615539472 Infer dereferenceability from malloc and friends
Hookup TLI when inferring object size from allocation calls. This allows the analysis to prove dereferenceability for known allocation functions (such as malloc/new/etc) in addition to those marked explicitly with the allocsize attribute.

This is a follow up to 0129cd5 now that the bug fixed by e2c6621e6 is resolved.

As noted in the test, this relies on being able to prove that there is no free between allocation and context (e.g. hoist location). At the moment, this is handled conservatively. I'm working strengthening out ability to reason about no-free regions separately.

Differential Revision: https://reviews.llvm.org/D99737
2021-04-01 11:33:35 -07:00
Philip Reames
a931c3545b Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Philip Reames
0cedfde64d [deref-at-point] restrict inference of dereferenceability based on allocsize attribute
Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.)

This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me).

There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles.

Differential Revision: https://reviews.llvm.org/D95815
2021-04-01 08:34:40 -07:00
Philip Reames
5a0375405f [ValueTracking] Handle non-zero ashr/lshr recurrences
If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.
2021-03-31 16:48:32 -07:00
George Mitenkov
b6034064b6 [ConstantFolding] Fixing addo/subo with undef
When folding addo/subo with undef, the current
convention is to use { -1, false } for addo and
{ 0, false } for subo. This was fixed for InstSimplify in
https://reviews.llvm.org/rGf094d65beaa492e845b03561eddd75b5be653a01,
but not in ConstantFolding.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D99564
2021-03-31 21:47:29 +03:00
Juneyoung Lee
3e94a07500 [ValueTracking] Add with.overflow intrinsics to poison analysis functions
This is a patch teaching ValueTracking that `s/u*.with.overflow` intrinsics do not
create undef/poison and they propagate poison.
I couldn't write a nice example like the one with ctpop; ValueTrackingTest.cpp were simply updated
to check these instead.
This patch helps reducing regression while fixing https://llvm.org/pr49688 .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99671
2021-04-01 02:41:38 +09:00
Philip Reames
4da7b4cf4f [SCEV] Handle unreachable binop when matching shift recurrence
This fixes an issue introduced with my change d4648e, and reported in pr49768.

The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code.  Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.
2021-03-31 10:33:34 -07:00
Sander de Smalen
b60a1a8c9a NFC: Change getIntrinsicInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97468

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97469
2021-03-31 14:04:41 +01:00
Liqiang Tao
7888599625 [InlineCost] Remove TODO comment that consider other forms of savings in the cost-benefit analysis
Attempts to compute savings more accurately cannot impact the set of critically important call sites.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D98577
2021-03-31 20:11:32 +08:00
Sander de Smalen
6ecee9c6fa NFC: Change getUserCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97382

Reviewed By: ctetreau, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97466
2021-03-31 10:13:09 +01:00
Krasimir Georgiev
72bca5f483 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5178ffc7cf92527557ae16e86d0fa90d538c2a19.

Compiling `llvm-profdata` with a compiler build from this produces a
crashing binary.
2021-03-30 14:13:37 +02:00
Sander de Smalen
76e5989934 [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
Gulfem Savrun Yeniceri
abf79b4a39 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-29 21:53:32 +00:00
Nikita Popov
ab6e561d30 [BasicAA] Make sure types match in constant offset heuristic
This can only happen if offset types that are larger than the
pointer size are involved. The previous implementation did not
assert in this case because it initialized the APInts to the
width of one of the variables -- though I strongly suspect it
did not compute correct results in this case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32621
reported by fhahn.
2021-03-28 21:38:09 +02:00
Nikita Popov
bdb84921a1 [BasicAA] Handle gep with unknown sizes earlier (NFCI)
If the sizes of both memory locations are unknown, we can only
perform a check on the underlying objects. There's no point in
going through GEP decomposition in this case.
2021-03-28 15:48:49 +02:00
Nikita Popov
f2b0645b2f [BasicAA] Refactor linear expression decomposition
The current linear expression decomposition handles zext/sext by
decomposing the casted operand, and then checking NUW/NSW flags
to determine whether the extension can be distributed. This has
some disadvantages:

First, it is not possible to perform a partial decomposition. If
we have zext((x + C1) +<nuw> C2) then we will fail to decompose
the expression entirely, even though it would be safe and
profitable to decompose it to zext(x + C1) +<nuw> zext(C2)

Second, we may end up performing unnecessary decompositions,
which will later be discarded because they lack nowrap flags
necessary for extensions.

Third, correctness of the code is not entirely obvious: At a high
level, we encounter zext(x -<nuw> C) in the form of a zext on the
linear expression x + (-C) with nuw flag set. Notably, this case
must be treated as zext(x) + -zext(C) rather than zext(x) + zext(-C).
The code handles this correctly by speculatively zexting constants
to the final bitwidth, and performing additional fixup if the
actual extension turns out to be an sext. This was not immediately
obvious to me.

This patch inverts the approach: An ExtendedValue represents a
zext(sext(V)), and linear expression decomposition will try to
decompose V further, either by absorbing another sext/zext into the
ExtendedValue, or by distributing zext(sext(x op C)) over a binary
operator with appropriate nsw/nuw flags. At each step we can
determine whether distribution is legal and abort with a partial
decomposition if not. We also know which extensions we need to
apply to constants, and don't need to speculate or fixup.
2021-03-27 23:31:58 +01:00
Nikita Popov
4cf39be5a9 [BasicAA] Correct handle implicit sext in decomposition
While explicit sext instructions were handled correctly, the
implicit sext that occurs if the offset is smaller than the
pointer size blindly assumed that sext(X * Scale + Offset) is the
same as sext(X) * Scale + Offset, which is obviously not correct.

Fix this by extracting the code that handles linear expression
extension and reusing it for the implicit sext as well.
2021-03-27 15:15:47 +01:00
Nikita Popov
13fbfb64c9 [BasicAA] Clarify entry values of GetLinearExpression() (NFC)
A number of variables need to be correctly initialized on entry
to GetLinearExpression() for the implementation to behave reasonably.

The fact that SExtBits can currenlty be non-zero on entry is a bug,
as demonstrated by the added test: For implicit sexts by the GEP,
we do currently skip legality checks.
2021-03-27 14:50:09 +01:00
Nikita Popov
50e408c21a [BasicAA] Bail out earlier for invalid shift amount
Currently, we'd produce an incorrect decomposition, because we
already recursively called GetLinearExpression(), so the Scale=1,
Offset=0 will not necessarily be relative to the shl itself.

Now, this doesn't actually matter for functional correctness,
because such a shift is poison anyway, so its okay to return
an incorrect decomposition. It's still unnecessarily confusing
though, and we can easily avoid this by checking the bitwidth
earlier.
2021-03-27 12:41:16 +01:00
Nikita Popov
f99cc32fd4 [BasicAA] Retain shl nowrap flags in GetLinearExpression()
Nowrap flags between mul and shl differ in that mul nsw allows
multiplication of 1 * INT_MIN, while shl nsw does not. This means
that it is always fine to transfer shl nowrap flags to muls, but
not necessarily the other way around. In this case the NUW/NSW
results refer to mul/add operations, so it's fine to retain the
flags from the shl.
2021-03-27 12:26:22 +01:00
Nikita Popov
86d338c7f7 [ValueTracking] Handle shl pair in isKnownNonEqual()
Handle (x << s) != (y << s) where x != y and the shifts are
non-wrapping. Once again, this establishes parity with the
corresponing mul fold that already exists. The shift case is
more powerful because we don't need to guard against multiplies
by zero.
2021-03-26 20:21:05 +01:00
Nikita Popov
81d01bceaa [ValueTracking] Handle shl in isKnownNonEqual()
This handles the pattern X != X << C for non-zero X and C and a
non-overflowing shift. This establishes parity with the corresponing
fold for multiplies.
2021-03-26 20:21:05 +01:00
Nikita Popov
56b9e804f2 [ValueTracking] Handle non-zero shl recurrence
In this case we don't care about the step at all, and only require
that the starting value is non-zero.
2021-03-26 18:39:06 +01:00
Nikita Popov
0926f8f2a9 [ValueTracking] Handle non-zero add/mul recurrences more precisely
This is mainly for clarity: It doesn't make sense to do any
negative/positive checks when dealing with a nuw add/mul. These
only make sense to nsw add/mul.
2021-03-26 18:30:07 +01:00
Kazu Hirata
5ac8ba2c65 Reapply [InlineCost] Enable the cost benefit analysis on FDO
This patch enables the cost-benefit-analysis-based inliner by default
if we have instrumentation profile.

- SPEC CPU 2017 shows a 0.4% improvement.

- An internal large benchmark shows a 0.9% reduction in the cycle
  count along with 14.6% reduction in the number of call instructions
  executed.

Differential Revision: https://reviews.llvm.org/D98213
2021-03-25 21:51:38 -07:00
Kazu Hirata
3899383bc1 [InlineCost] Reject a zero entry count
This patch teaches the cost-benefit-analysis-based inliner to reject a
zero entry count so that we don't trigger a divide-by-zero.
2021-03-25 21:51:36 -07:00
Richard Smith
7389d10425 Fix a miscompile introduced by 99203f2.
getPointersDiff would previously round down the difference between two
pointers to a multiple of the element size of the pointee, which could
result in a pointer value being decreased a little.

Alexey Bataev has graciously agreed to add a testcase for this;
submitting the bugfix now to unblock.
2021-03-25 16:53:58 -07:00
Jingu Kang
a1f6b21702 [ValueTracking] Handle two PHIs in isKnownNonEqual()
loop:
  %cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ]
  %pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ]
  ...
  %inc = add i32 %cmp.0, 1
  br label %loop

On above example, %pos.0 uses previous iteration's %cmp.0 with backedge
according to PHI's instruction's defintion. If the %inc is not same among
iterations, we can say the two PHIs are not same.

Differential Revision: https://reviews.llvm.org/D98422
2021-03-25 22:56:05 +00:00
Nico Weber
124a82a5ee Revert "[InlineCost] Enable the cost benefit analysis on FDO"
This reverts commit ef69aa961d12dee2141a79b05c9637d8cc9c0c74.
Makes clang assert in PGO builds, see repro tgz in
https://bugs.chromium.org/p/chromium/issues/detail?id=1192783#c6
2021-03-25 16:42:19 -04:00
Nikita Popov
2369372c85 [IR] Lift attribute handling for assume bundles into CallBase
Rather than special-casing assume in BasicAA getModRefBehavior(),
do this one level higher, in the attribute handling of CallBase.

For assumes with operand bundles, the inaccessiblememonly attribute
applies regardless of operand bundles.
2021-03-25 21:15:39 +01:00
Philip Reames
eafb369a5a Plumb TLI through isSafeToExecuteUnconditionally [NFC]
Split from D95815 to reduce patch size.  Isn't (yet) used for anything, only the client side is wired up.
2021-03-24 17:52:04 -07:00
Wenlei He
4391957539 [InlineCost] Make cost-benefit decision explicit
With cost-benefit analysis for inlining, we bypass the cost-threshold by returning inline result from call analyzer early.

However the cost and threshold are still available from call analyzer, and when cost is actually higher than threshold, we incorrect set the reason.

The change makes the decision from cost-benefit analysis explicit. It's mostly NFC, except that it allows the priority-based sample loader inliner used by CSSPGO to use cost-benefit heuristic.

Differential Revision: https://reviews.llvm.org/D99302
2021-03-24 16:10:58 -07:00
Kazu Hirata
4cf653e83d [InlineCost] Enable the cost benefit analysis on FDO
This patch enables the cost-benefit-analysis-based inliner by default
if we have instrumentation profile.

- SPEC CPU 2017 shows a 0.4% improvement.

- An internal large benchmark shows a 0.9% reduction in the cycle
  count along with 14.6% reduction in the number of call instructions
  executed.

Differential Revision: https://reviews.llvm.org/D98213
2021-03-24 15:36:49 -07:00
Sanjay Patel
33a0e046e2 [ValueTracking] peek through min/max to find isKnownToBeAPowerOfTwo
This is similar to the select logic just ahead of the new code.
Min/max choose exactly one value from the inputs, so if both of
those are a power-of-2, then the result must be a power-of-2.

This might help with D98152, but we likely still need other
pieces of the puzzle to avoid regressions.

The change in PatternMatch.h is needed to build with clang.
It's possible there is a better way to deal with the 'const'
incompatibities.

Differential Revision: https://reviews.llvm.org/D99276
2021-03-24 17:54:38 -04:00
Nikita Popov
a718ad096c [SCEV] Improve handling of not expressions in isImpliedCond()
SCEV currently tries to prove implications of x pred y by also
trying to imply ~y pred ~x. This is expensive in terms of
compile-time (in fact, the majority of isImpliedCond compile-time
is spent here) and generally not fruitful. The issue is that this
also swaps the operands and thus breaks canonical ordering. If
originally we were trying to prove an implication like
X > C1 -> Y > C2, then we'll now try to prove X > C1 -> C3 > ~Y,
which will not work.

The only real case where we can get some use out of this transform
is if the original conditions were in the form X > C1 -> Y < C2, were
then swapped to X > C1 -> C2 > Y and are then swapped again here to
X > C1 -> ~Y > C3.

As such, handle this at a higher level, where we are doing the
swapping in the first place. There's four different ways that we
can line up a predicate and a swapped predicate, so we use some
heuristics to pick some profitable way.

Because we now try this transform at a higher level
(isImpliedCondOperands rather than isImpliedCondOperandsHelper),
we can also prove additional facts. Of the added tests, one was
proven previously while the other wasn't.

Differential Revision: https://reviews.llvm.org/D90926
2021-03-24 21:53:02 +01:00
Gulfem Savrun Yeniceri
54e2d4cdab Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5fd001a5ffbad403053c4a06bf4b2b76dc52bba8
because it broke clang-with-thin-lto-ubuntu bot.
2021-03-24 18:59:33 +00:00
Gulfem Savrun Yeniceri
93b265f8c0 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-24 17:31:18 +00:00
Thomas Preud'homme
365b05fe29 Make FindAvailableLoadedValue TBAA aware
FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run
the alias analysis when searching for an equivalent value. However,
FindAvailablePtrLoadStore() calls the alias analysis framework with a
memory location for the load constructed from an address and a size,
which thus lacks TBAA metadata info. This commit modifies
FindAvailablePtrLoadStore() to accept an optional memory location as
parameter to allow FindAvailableLoadedValue() to create it based on the
load instruction, which would then have TBAA metadata info attached.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99206
2021-03-24 17:20:26 +00:00
Sander de Smalen
1c27e91e31 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Alexey Bataev
1b88ea3100 [LoopAnalysis][NFC]Remove redundant code.
Removed redundant code for IsConsecutive variable.
2021-03-24 05:37:19 -07:00
Yang Fan
404db5968d [InstSimplify] Fix unused variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp: In function ‘llvm::Value* SimplifyWithOpReplaced(llvm::Value*, llvm::Value*, llvm::Value*, const llvm::SimplifyQuery&, bool, unsigned int)’:
/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp:3993:15: warning: unused variable ‘SI’ [-Wunused-variable]
 3993 |     if (auto *SI = dyn_cast<SelectInst>(I))
      |               ^~
```
2021-03-24 09:56:36 +08:00
Jingu Kang
af9f082c95 [ValueTracking] Handle increasing mul recurrence in isKnownNonZero()
Differential Revision: https://reviews.llvm.org/D99069
2021-03-23 23:04:41 +00:00
Matteo Favaro
a4c86712c5 [MSSA] Extending IsGuaranteedLoopInvariant to support an instruction defined in the entry block
As mentioned in [[ https://reviews.llvm.org/D96979 | D96979 ]], I'm extending the **IsGuaranteedLoopInvariant** check also to the `MemorySSA.cpp` file.

@fhahn For now I didn't unify the function into `MemorySSA.h` because, as you mentioned, it's not directly MSSA related. I'm open to suggestions to find a better place so we can improve the unification process.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97155
2021-03-23 21:50:56 +00:00
Alexey Bataev
147e88fa15 [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 14:25:36 -07:00
Nikita Popov
57bd8da517 [BasicAA] Handle assumes with operand bundles
This fixes a regression reported on D99022: If a call has operand
bundles, then the inaccessiblememonly attribute on the function
will be ignored, as operand bundles can affect modref behavior in
the general case. However, for assume operand bundles in particular
this is not the case.

Adjust getModRefBehavior() to always report inaccessiblememonly
for assumes, regardless of presence of operand bundles.
2021-03-23 21:21:19 +01:00
Alexey Bataev
2cb8308176 Revert "[Analysis]Add getPointersDiff function to improve compile time."
This reverts commit 065a14a12d2694f26f4e894641f5ab8cfc5da8bd to
investigate and fix crash in SLP vectorizer.
2021-03-23 13:17:54 -07:00
Alexey Bataev
9601d11568 [Analysis]Add getPointersDiff function to improve compile time.
Added getPointersDiff function to LoopAccessAnalysis and used it instead
direct calculatoin of the distance between pointers and/or
isConsecutiveAccess function in SLP vectorizer to improve compile time
and detection of stores consecutive chains.

Part of D57059

Differential Revision: https://reviews.llvm.org/D98967
2021-03-23 12:58:42 -07:00
Craig Topper
b271af80f3 [ValueTracking] Teach canCreateUndefOrPoison that ctpop does not create undef or poison.
This select of ctpop with 0 pattern can get left behind after
loop idiom recognize converts a loop to ctpop. LLVM 10 was able
to optimize this, but LLVM 11 and later is not. The difference
seems to be that some select transforms are now limited based
on canCreateUndefOrPoison.

Teaching canCreateUndefOrPoison about ctpop restores the
LLVM 10 codegen.

Differential Revision: https://reviews.llvm.org/D99207
2021-03-23 12:42:18 -07:00
Juneyoung Lee
ba5c93c7e6 Reland "[InstCombine] Add simplification of two logical and/ors"
This relands 07c3b97e184d5bd828b8a680cdce46e73f3db9fc (D96945) which was reverted by
commit f49354838e526671e616d16199ebdee653b9f6fa.
The two-stage compilation successfully tests passes on my machine.
2021-03-23 16:24:50 +09:00
Philip Reames
08288bc355 Minor format tweak to deref analysis printer 2021-03-22 18:44:18 -07:00
Gulfem Savrun Yeniceri
61bfb34ac2 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 78a65cd945d006ff02f9d24d9cc20a302ed93b08 which
caused buildbot failures.
2021-03-23 00:43:16 +00:00
Gulfem Savrun Yeniceri
59cc51764b [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-22 22:09:02 +00:00
Nikita Popov
c053a6554f [InstCombine] Whitelist non-refining folds in SimplifyWithOpReplaced
This is an alternative to D98391/D98585, playing things more
conservatively. If AllowRefinement == false, then we don't use
InstSimplify methods at all, and instead explicitly implement a
small number of non-refining folds. Most cases are handled by
constant folding, and I only had to add three folds to cover
our unit tests / test-suite. While this may lose some optimization
power, I think it is safer to approach from this direction, given
how many issues this code has already caused.

Differential Revision: https://reviews.llvm.org/D99027
2021-03-22 22:12:56 +01:00
Nikita Popov
99aeb8e3a5 [IR] Mark assume/annotation as InaccessibleMemOnly
These intrinsics don't need to be marked as arbitrary writing,
it's sufficient to write inaccessible memory (aka "side effect")
to preserve control dependencies. This means less special-casing
in BasicAA. This is intended as an alternative to D98925.

Differential Revision: https://reviews.llvm.org/D99022
2021-03-22 22:01:03 +01:00
Juneyoung Lee
b8eb63ecee [SCEV] Use logical and/or matcher
This is a minor patch that updates ScalarEvolution::isImpliedCond to use logical and/or matcher.
2021-03-23 06:00:54 +09:00
Sanjay Patel
a1536cccaf [TargetTransformInfo] move branch probability query from TargetLoweringInfo
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.

I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
2021-03-22 15:55:34 -04:00
Philip Reames
56d303cbd3 2nd attempt at a speculative fix for windows builders after d4648eea 2021-03-22 10:32:57 -07:00
Philip Reames
8dd6b19a47 Speculative fix for windows builders after d4648eea 2021-03-22 10:22:01 -07:00
Philip Reames
cf5ef4e304 [SCEV] Use trip count information to improve shift recurrence ranges
This patch exploits the knowledge that we may be running many fewer than bitwidth iterations of the loop, and may be able to disallow the overflow case. This patch specifically implements only the shl case, but this can be generalized to ashr and lshr without difficulty.

Differential Revision: https://reviews.llvm.org/D98222
2021-03-22 09:38:43 -07:00
Wenlei He
88fb6d7a0e [CSSPGO][llvm-profgen] Use profile summary based threshold for context trimming and merging
Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc.

Differential Revision: https://reviews.llvm.org/D98921
2021-03-22 08:56:59 -07:00
Nikita Popov
60f065900e [ValueTracking] Improve mul handling in isKnownNonEqual()
X != X * C is true if:
 * C is not 0 or 1
 * X is not 0
 * mul is nsw or nuw

Proof: https://alive2.llvm.org/ce/z/uwF29z

This is motivated by one of the cases in D98422.
2021-03-21 18:41:35 +01:00
Nikita Popov
4766db4e38 Reapply [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Adjust the type check to consider vectors as well. The previous
version of the patch dropped the type check entirely, but it
turns out that getAggregateElement() does require the constant
to be an aggregate in some edge cases: For Poison/Undef the
getNumElements() API is called, without checking in advance that
we're dealing with an aggregate. Possibly the implementation should
avoid doing that, but for now I'm adding an assert so the next
person doesn't fall into this trap.
2021-03-21 17:48:21 +01:00
Nikita Popov
befc0b53a8 [InstSimplify] Clean up SimplifyReplacedWithOp implementation (NFCI)
Replace Op with RepOp up-front, and then always work with the new
operands, rather than checking for replacement in various places.
2021-03-21 15:30:30 +01:00
Juneyoung Lee
4bb78e2cb0 [CFLGraph] Fix a crash due to missing handling of freeze
https://reviews.llvm.org/D85534#2636321
2021-03-21 02:14:13 +09:00
Philip Reames
eaf092af50 Update basic deref API to account for possiblity of free [NFC]
This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree".

The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet.

Differential Revision: https://reviews.llvm.org/D98908
2021-03-19 11:17:19 -07:00
Philip Reames
1a629f23e4 [SCEV] Factor out a lambda for strict condition splitting [NFC] 2021-03-19 10:07:12 -07:00
Max Kazantsev
8d28637a8e [SCEV] Add false->any implication
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.

In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.

Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
2021-03-19 11:29:48 +07:00
Mircea Trofin
2ca653b1d6 Reapply "[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes"
This reverts commit 11b70b9e3a7458b5b78c30020b56e8ca563a4801.

The bot failure was due to ArgumentPromotion deleting functions
without deleting their analyses. This was separately fixed in 4b1c807.
2021-03-18 09:44:34 -07:00
Max Kazantsev
fb109837da [SCEV][NFC] API for predicate evaluation
Provides API that allows to check predicate for being true or
false with one call. Current implementation is naive and just
calls isKnownPredicate twice, but further we can rework this
logic trying to use one check to prove both facts.
2021-03-18 19:21:29 +07:00
Philip Reames
2b6a185756 [LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC]
(Triggered by a review comment on D98728, but otherwise unrelated.)
2021-03-17 12:14:01 -07:00
David Green
f5b24f17f1 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Bardia Mahjour
487229f50e [CGSCC] Print CG node itself instead of its address
Fix the debug output from cgscc
2021-03-17 12:36:55 -04:00
Max Kazantsev
c38d6febb5 [BasicAA] Drop dependency on Loop Info. PR43276
BasicAA stores a reference to LoopInfo inside. This imposes an implicit
requirement of keeping it up to date whenever we modify the IR (in particular,
whenever we modify terminators of blocks that belong to loops). Failing
to do so leads to incorrect state of the LoopInfo.

Because general AA does not require loop info updates and provides to API to
update it properly, the users of AA reasonably assume that there is no need to
update the loop info. It may be a reason of bugs, as example in PR43276 shows.

This patch drops dependence of BasicAA on LoopInfo to avoid this problem.

This may potentially pessimize the result of queries to BasicAA.

Differential Revision: https://reviews.llvm.org/D98627
Reviewed By: nikic
2021-03-17 11:43:44 +07:00
Zequan Wu
380ae2df93 Revert "[ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()"
That commit caused chromium build to crash: https://bugs.chromium.org/p/chromium/issues/detail?id=1188885

This reverts commit edf7004851519464f86b0f641da4d6c9506decb1.
2021-03-16 14:36:21 -07:00
Simonas Kazlauskas
59b63b74d5 [InstSimplify] Restrict a GEP transform to avoid provenance changes
This is a follow-up to D98588, and fixes the inline `FIXME` about a GEP-related simplification not
preserving the provenance.

https://alive2.llvm.org/ce/z/qbQoAY

Additional tests were added in {rGf125f28afdb59eba29d2491dac0dfc0a7bf1b60b}

Depends on D98672

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98611
2021-03-16 18:53:05 +02:00
Thomas Preud'homme
efde4985b2 [MemDepAnalysis] Remove redundant comment.
Exact same comment is found 2 lines above.
2021-03-16 15:51:17 +00:00
Max Kazantsev
9f606251f1 [SCEV][NFC] Move check up the stack
One of (and primary) callers of isBasicBlockEntryGuardedByCond is
isKnownPredicateAt, which makes isKnownPredicate check before it.
It already makes non-recursive check inside. So, on this execution
path this check is made twice. The only other caller is
isLoopEntryGuardedByCond. Moving the check there should save some
compile time.
2021-03-16 22:09:17 +07:00
Simonas Kazlauskas
533dfa60d2 [InstSimplify] Match PtrToInt more directly in a GEP transform (NFC)
In preparation for D98611, the upcoming change will need to apply additional checks to `P` and `V`,
and so this refactor paves the way for adding additional checks in a less awkward way.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98672
2021-03-16 15:45:19 +02:00
Johannes Doerfert
60c629d360 [NVPTX] CUDA does provide malloc/free since compute capability 2.X
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D98606
2021-03-15 22:45:56 -05:00
Sanjay Patel
69f00a58ef [InstSimplify] ctlz({signbit} >>u x) --> x
The motivating pattern was handled in 0a2d69480d ,
but we should have this for symmetry.

But this really highlights that we could generalize for
any shifted constant if we match this in instcombine.

https://alive2.llvm.org/ce/z/MrmVNt
2021-03-15 12:03:35 -04:00
Nikita Popov
7180decf64 Revert "[NFCI][ValueTracking] getUnderlyingObject(): gracefully handle cycles"
This reverts commit aa440ba24dc25e4c95f6dcf8ff647024f3b12661.

This has a non-trivial compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0c5b789c7342ee8384507c3242fc256e23248c4d&to=aa440ba24dc25e4c95f6dcf8ff647024f3b12661&stat=instructions

I don't believe this is the correct way to address the issue in
this case.
2021-03-15 13:12:39 +01:00
Roman Lebedev
39d655ee4f [NFCI][ValueTracking] getUnderlyingObject(): gracefully handle cycles
Normally, this function just doesn't bother about cycles,
and hopes that the caller supplied small-enough depth
so that at worst it will take a potentially large,
but limited amount of time. But that obviously doesn't work
if there is no depth limit.

This reapples 36f1c3db66f7268ea3183bcf0bbf05b3e1c570b4,
but without asserting, just bailout once cycle is detected.
2021-03-15 13:51:02 +03:00
Roman Lebedev
a289925031 Revert "[NFCI][ValueTracking] getUnderlyingObject(): assert that no cycles are encountered"
This reverts commit 36f1c3db66f7268ea3183bcf0bbf05b3e1c570b4.
Seems to make bots unhappy.
2021-03-15 12:00:59 +03:00
Roman Lebedev
b4b4a1122d [NFCI][ValueTracking] getUnderlyingObject(): assert that no cycles are encountered
Jeroen Dobbelaere in
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149206.html
is reporting that this function can end up in an endless loop
when called from SROA w/ full restrict patches.

For now, simply ensure that such problems are caught earlier/easier.
2021-03-15 11:52:31 +03:00
Roman Lebedev
19cfa09971 Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595,
and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.

This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-13 16:05:34 +03:00
Nikita Popov
902b0f7b38 [MemorySSA] Don't bail on phi starting access
When calling getClobberingMemoryAccess() with MemoryLocation on a
MemoryPHI starting access, the walker currently immediately bails
and returns the starting access. This makes sense for the API that
does not accept a location (as we wouldn't know what clobber we
should be checking for), but doesn't make sense for the
MemoryLocation-based API. This means that it can't look through
a MemoryPHI if it's the starting access, but can if there is one
more non-clobbering def in between. This patch removes the limitation.

Differential Revision: https://reviews.llvm.org/D98557
2021-03-13 10:53:13 +01:00
Roman Lebedev
7c83c3a8e7 Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot:
https://lab.llvm.org/buildbot/#/builders/85/builds/3062
https://reviews.llvm.org/D98147#2623549

It looks like LSR needs some kind of a change around insertion point handling.
Reverting until i have a fix.

This reverts commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
2021-03-13 09:10:28 +03:00
Roman Lebedev
678779eec7 [SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-12 22:11:58 +03:00
Stanislav Mekhanoshin
5aafd8a5a0 [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost
Before D94153 this threshold was in a pre-scaled units.
After D94153 inlining threshold multiplier is not applied
to this portion of the threshold anymore. Restore the
threshold by applying the multiplier.

Differential Revision: https://reviews.llvm.org/D98362
2021-03-12 10:19:50 -08:00
serge-sans-paille
8a1cc679e3 [NFC] Use llvm::raw_string_ostream instead of std::stringstream
That's more efficient and we don't loose any valuable feature when doing so.
2021-03-12 18:43:59 +01:00
Bjorn Pettersson
bcace0dff8 [InstSimplify] Simplify smul.fix and smul.fix.sat
Add simplification of smul.fix and smul.fix.sat according to
  X * 0 -> 0
  X * undef -> 0
  X * (1 << scale) -> X

This includes the commuted patterns and splatted vectors.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D98299
2021-03-12 09:09:58 +01:00
Bjorn Pettersson
5d0c366826 [ConstantFold] Handle undef/poison when constant folding smul_fix/smul_fix_sat
Do constant folding according to
  posion * C -> poison
  C * poison -> poison
  undef * C -> 0
  C * undef -> 0
for smul_fix and smul_fix_sat intrinsics (for any scale).

Reviewed By: nikic, aqjune, nagisa

Differential Revision: https://reviews.llvm.org/D98410
2021-03-12 09:09:58 +01:00
Johannes Doerfert
6c444365cb [FIX] Allow non-constant assume operand bundle operands.
Fixes PR49545

Reviewed By: zequanwu, fhahn, lebedev.ri

Differential Revision: https://reviews.llvm.org/D98444
2021-03-11 23:31:09 -06:00
Mircea Trofin
87d29d8dee Revert "[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes"
This reverts commit 5eaeb0fa67e57391f5584a3f67fdb131e93afda6.

It appears there are analyses that assume clearing - example:
https://lab.llvm.org/buildbot#builders/36/builds/5964
2021-03-11 18:31:19 -08:00
Mircea Trofin
c85bdf8369 [NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes
Check with the analysis result by calling invalidate instead of clear on
the analysis manager.

Differential Revision: https://reviews.llvm.org/D98440
2021-03-11 18:15:28 -08:00
Nikita Popov
a833e60074 Reapply [LICM] Make promotion faster
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-11 10:50:28 +01:00
Juneyoung Lee
f828df253e Resolve unused variable warning (NFC) 2021-03-11 12:03:03 +09:00
Juneyoung Lee
2a52d0ee68 [InstSimplify] Pass SimplifyQuery to computePointerICmp (NFC) 2021-03-11 11:13:46 +09:00
Ta-Wei Tu
95819d3c63 Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from LoopNest"
This reverts commit df9158c9a45a6902c2b0394f9bd6512e3e441f31.
2021-03-11 01:24:43 +08:00
Alex Richardson
6ae0a3f9c3 [SLC] Simplify strcpy and friends with non-zero address spaces
The current logic in TargetLibraryInfoImpl::getLibFunc() was only treating
strcpy, etc. with i8* arguments in address space zero as a valid library
function. However, in the CHERI and Morello targets we expect all libc
functions to use address space 200 arguments.

This commit updates isValidProtoForLibFunc() to check that the argument
is a pointer type. This also drops the check for i8* since we should not
be checking the pointee type any more.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95142
2021-03-10 11:17:34 +00:00
William S. Moses
1194a6394c [MemoryDependence] Fix invariant group store
Fix bug in MemoryDependence [and thus GVN] for invariant group.

Previously MemDep didn't verify that the store was storing into a
pointer rather than a store simply using a pointer.

Differential Revision: https://reviews.llvm.org/D98267
2021-03-09 19:03:39 -05:00
Juneyoung Lee
0ac1e53c7e Revert "[InstCombine] Add simplification of two logical and/ors"
This reverts commit 07c3b97e184d5bd828b8a680cdce46e73f3db9fc due to a reported failure in two-stage build.
2021-03-10 05:48:31 +09:00
Philip Reames
54b4ece1ff [SCEV] Infer known bits from known sign bits
This was suggested by lebedev.ri over on D96534.  You'll note lack of tests.  During review, we weren't actually able to find a case which exercises it, but both I and lebedev.ri feel it's a reasonable change, straight forward, and near free.

Differential Revision: https://reviews.llvm.org/D97064
2021-03-09 12:37:17 -08:00
Benjamin Kramer
5672913dc3 [ValueTracking] Move matchSimpleRecurrence out of line
The header only has a forward declaration of PHINode available, and this
function doesn't seem to get much out of inlining.
2021-03-09 00:04:47 +01:00
Sanjay Patel
cc5fb7e4a2 [ValueTracking] move/add helper to get inverse min/max; NFC
We will need to this functionality to improve min/max folds
in instcombine when we canonicalize to intrinsics.
2021-03-08 17:38:22 -05:00
Sanjay Patel
7358b98bc7 [InstSimplify] cttz(1<<x) --> x
https://alive2.llvm.org/ce/z/TDacYu
https://alive2.llvm.org/ce/z/KF84S3
2021-03-08 16:30:14 -05:00
Alina Sbirlea
87fdb76351 Revert "[LICM] Make promotion faster"
Revert 3d8f842712d49b0767832b6e3f65df2d3f19af4e
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-08 12:53:03 -08:00
Philip Reames
a7e3a0cf49 constify getUnderlyingObject implementation [nfc] 2021-03-08 11:32:54 -08:00
Ta-Wei Tu
df760e157d [LoopInterchange] Replace tightly-nesting-ness check with the one from LoopNest
The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`.
In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour.
Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113.

`LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`.
Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97290
2021-03-08 11:36:08 +08:00
Juneyoung Lee
ad32346391 [InstCombine] Add simplification of two logical and/ors
This is a patch that adds folding of two logical and/ors that share one variable:

a && (a && b) -> a && b
a && (a & b)  -> a && b
...

This is towards removing the poison-unsafe select optimization (D93065 has more context).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96945
2021-03-08 02:38:43 +09:00
Juneyoung Lee
dbc2db1042 [ValueTracking] update directlyImpliesPoison to look into select's condition
This is a minor update in directlyImpliesPoison and makes it look into select's
condition.
Splitted from https://reviews.llvm.org/D96945
2021-03-07 23:16:44 +09:00
Fangrui Song
c788c9781a [ModuleSummaryAnalysis] Avoid duplicate elements in Worklist. NFC 2021-03-06 14:19:22 -08:00
Nikita Popov
fbb0e66968 [Loads] Restructure getAvailableLoadStore implementation (NFC)
Separate out some conditions with early exits, to make it easier to
support additional cases.
2021-03-06 16:58:11 +01:00
Nikita Popov
fa0178256e [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Rather than adjusting the type check, simply drop it entirely,
as getAggregateElement() is well-defined for non-aggregates: It
simply returns null in that case.
2021-03-06 12:17:56 +01:00
Nikita Popov
9f2cfa55ab [LVI] Simplify and generalize handling of clamp patterns
Instead of handling a number of special cases for selects, handle
this generally when inferring ranges from conditions. We already
infer ranges from `x + C pred C2` to `x`, so doing the same for
`x pred C2` to `x + C` is straightforward.
2021-03-06 10:42:41 +01:00
Nikita Popov
634e7e2d73 [LVI] Pass offset by reference (NFC)
Instead of by pointer. This allows us to use offsets that are not
materialized in the IR.
2021-03-06 10:24:44 +01:00
Wei Mi
ad2a6f2861 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
dfukalov
7524927003 [NFC][AliasSetTracker] Remove implicit conversion AliasResult to integer.
Preparation to make AliasResult scoped enumeration.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D97973
2021-03-05 00:53:27 +03:00
Philip Reames
883f87b472 [basicaa] Recurse through a single phi input
BasicAA knows how to analyze phis, but to control compile time, we're fairly limited in doing so. This patch loosens that restriction just slightly when there is exactly one phi input (after discounting induction variable increments). The result of this is that we can handle more cases around nested and sibling loops with pointer induction variables.

A few points to note.
* This is deliberately extremely restrictive about recursing through at most one input of the phi.  There's a known general problem with BasicAA sometimes hitting exponential compile time already, and this patch makes every effort not to compound the problem.  Once the root issue is fixed, we can probably loosen the restrictions here a bit.
* As seen in the test file, we're still missing cases which aren't *directly* based on phis (e.g. using the indvar increment). I believe this to be a separate problem and am going to explore this in another patch once this one lands.
* As seen in the test file, this results in the unfortunate fact that using phivalues sometimes results in worse quality results. I believe this comes down to an oversight in how recursive phi detection was implemented for phivalues. I'm happy to tackle this in a follow up change.

Differential Revision: https://reviews.llvm.org/D97401
2021-03-04 13:07:06 -08:00
Akira Hatanaka
4055195f29 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb12bd42214ca4fb17d196d49561c0c7, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5ff874676f3559c069fcd6737838f5c0.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00