r243382 changed the behavior to always require a set of memchecks to be
passed to LoopVer. This change restores the prior behavior as an
alternative to the new behavior. This allows the checks to be
implicitly taken from the LAA object.
Patch by Ashutosh Nema!
llvm-svn: 244763
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).
InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).
I also moved all the relevant combine tests into InstCombine/blend_x86.ll
Differential Revision: http://reviews.llvm.org/D11934
llvm-svn: 244723
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant. Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.
llvm-svn: 244676
Summary: This patch adds check for dead blocks and skip them for processSwitchInst(). This will help reduce compilation time.
Reviewers: reames, hans
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11953
llvm-svn: 244656
Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks.
Reviewers: kariddi, resistor, hans, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11852
llvm-svn: 244642
Summary:
For LTO we need to enable this pass in the LTO pipeline,
as it is skipped during the "-flto -c" compile step (when PrepareForLTO is
set).
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11919
llvm-svn: 244622
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.
matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.
C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.
ARM and AArch64 at least have different instructions for these different cases.
llvm-svn: 244580
This adds somewhat basic preparation functionality including:
- Formation of funclets via coloring basic blocks.
- Cloning of polychromatic blocks to ensure that funclets have unique
program counters.
- Demotion of values used between different funclets.
- Some amount of cleanup once we have removed predecessors from basic
blocks.
- Verification that we are left with a CFG that makes some amount of
sense.
N.B. Arguments and numbering still need to be done.
Differential Revision: http://reviews.llvm.org/D11750
llvm-svn: 244558
This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.
llvm-svn: 244555
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
llvm-svn: 244523
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Differential Revision: http://reviews.llvm.org/D11886
llvm-svn: 244495
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
llvm-svn: 244489
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
llvm-svn: 244485
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
llvm-svn: 244466
Summary:
This adds a hook to TTI which enables us to selectively turn on by default
interleaved access vectorization for targets on which we have have performed
the required benchmarking.
Reviewers: rengolin
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11901
llvm-svn: 244449
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D11559
llvm-svn: 244448
This is the full set of checks that clients can further filter. IOW,
it's client-agnostic. This makes LAA complete in the sense that it now
provides the two main results of its analysis precomputed:
1. memory dependences via getDepChecker().getInsterestingDependences()
2. run-time checks via getRuntimePointerCheck().getChecks()
However, as a consequence we now compute this information pro-actively.
Thus if the client decides to skip the loop based on the dependences
we've computed the checks unnecessarily. In order to see whether this
was a significant overhead I checked compile time on SPEC2k6 LTO bitcode
files. The change was in the noise.
The checks are generated in canCheckPtrAtRT, at the same place where we
used to call groupChecks to merge checks.
llvm-svn: 244368
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.
Reviewers: sanjoy, weimingz, chenli
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11841
llvm-svn: 244348
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.
e.g.
%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.
In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).
Differential Revision: http://reviews.llvm.org/D11760
llvm-svn: 244341
As a follow-up to r244181, resolve uniquing cycles underneath distinct
nodes on the fly. This prevents uniquing cycles in early operands from
affecting later operands. It also removes an iteration through distinct
nodes' operands.
No real functional change here, just more prompt resolution of temporary
nodes.
llvm-svn: 244302
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst. This commit changes a bunch
of eligible loops to use it.
llvm-svn: 244260
iisUnmovableInstruction() had a list of instructions hardcoded which are
considered unmovable. The list lacked (at least) an entry for the va_arg
and cmpxchg instructions.
Fix this by introducing a new Instruction::mayBeMemoryDependent()
instead of maintaining another instruction list.
Patch by Matthias Braun <matze@braunis.de>.
Differential Revision: http://reviews.llvm.org/D11577
rdar://problem/22118647
llvm-svn: 244244
This is the first mechanical step in preparation for making this and all
the other alias analysis passes available to the new pass manager. I'm
factoring out all the totally boring changes I can so I'm moving code
around here with no other changes. I've even minimized the formatting
churn.
I'll reformat and freshen comments on the interface now that its located
in the right place so that the substantive changes don't triger this.
llvm-svn: 244197
around a DataLayout interface in favor of directly querying DataLayout.
This wrapper specifically helped handle the case where this no
DataLayout, but LLVM now requires it simplifynig all of this. I've
updated callers to directly query DataLayout. This in turn exposed
a bunch of places where we should have DataLayout readily available but
don't which I've fixed. This then in turn exposed that we were passing
DataLayout around in a bunch of arguments rather than making it readily
available so I've also fixed that.
No functionality changed.
llvm-svn: 244189
Rotate the algorithm for remapping distinct nodes in order to simplify
how uniquing cycles get resolved. This removes some of the recursion,
and, most importantly, exposes all uniquing cycles at the top-level.
Besides being a little more efficient -- temporary MDNodes won't live as
long -- the clearer logic should help protect against bugs like those
fixed in r243961 and r243976.
What are uniquing cycles? Why do they present challenges when remapping
metadata?
!0 = !{!1}
!1 = !{!0}
!0 and !1 form a simple uniquing cycle. When remapping from one
metadata graph to another, every uniquing cycle gets "duplicated"
through a dance:
!0-temp = !{!1?} ; map(!0): clone !0, VM[!0] = !0-temp
!1-temp = !{!0?} ; ..map(!1): clone !1, VM[!1] = !1-temp
!1-temp = !{!0-temp} ; ..map(!1): remap !1's operands
!2 = !{!0-temp} ; ..map(!1): uniquify: !1-temp => !2
!0-temp = !{!2} ; map(!0): remap !0's operands
!3 = !{!2} ; map(!0): uniquify: !0-temp => !3
; Result
!2 = !{!3}
!3 = !{!2}
(In the two "uniquify" steps above, the operands of !X-temp are compared
to the operands of !X. If they're the same, then !X-temp gets RAUW'ed
to !X; if they're different, then !X-temp is promoted to a new unique
node. The latter case always hits in for uniquing cycles, so we
duplicate all the nodes involved.)
Why is this a problem? Uniquable Metadata nodes that have temporary
node as transitive operands keep RAUW support until the temporary nodes
get finalized. With non-cycles, this happens automatically: when a
uniquable node's count of unresolved operands drops to zero, it
immediately sheds its own RAUW support (possibly triggering the same in
any node that references it). However, uniquing cycles create a
reference cycle, and uniqued nodes that transitively reference a
uniquing cycle are "stuck" in an unresolved state until someone calls
`MDNode::resolveCycles()` on a node in the unresolved subgraph.
Distinct nodes should help here (and mostly do): since they aren't
uniqued anywhere, they are guaranteed not to be RAUW'ed. They
effectively form a barrier between uniqued nodes, breaking some uniquing
cycles, and shielding uniqued nodes from uniquing cycles.
Unfortunately, with this barrier in place, the unresolved subgraph(s)
can be disjoint from the top-level node. The mapping algorithm needs to
find at least one representative from each disjoint subgraph. But which
nodes are *stuck*, and which will get resolved automatically? And which
nodes are in the unresolved subgraph? The old logic was conservative.
This commit rotates the logic for distinct nodes, so that we have access
to unresolved nodes at the top-level call to `llvm::MapMetadata()`.
Each time we return to the top-level, we know that all temporaries have
been RAUW'ed away. Here, it's safe (and necessary) to call
`resolveCycles()` immediately on unresolved operands.
This should also perform better than the old algorithm. The recursion
stack is shorter, temporary nodes don't live as long, and there are
fewer tracking references to unresolved nodes. As the debug info graph
introduces more 'distinct' nodes, remapping should incrementally get
cheaper and cheaper.
Aside from possible performance improvements (and reduced cruft in the
`LLVMContext`), there should be no functionality change here.
llvm-svn: 244181
Rename `remap()` to `remapOperands()`, and restrict its contract to
remapping operands. Previously, it also called `mapToMetadata()`, but
this logic is hard to reason about externally. In particular, this
refactors `mapUniquedNode()` to avoid redundant mapping calls, taking
advantage of the RAUWs that are already in place.
llvm-svn: 244168
The only place that tries to return a CallGraph by value
(CallGraphAnalysis::run) doesn't seem to be used right now, but it's a
reasonable bit of cleanup anyway.
llvm-svn: 244122