Summary:
This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function.
This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash.
Reviewers: courbet, trentxintong
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54089
llvm-svn: 346151
There was no coverage for at least 2 out of the 4 patterns because
of fcmp canonicalization. The tests and code should be moved to
InstSimplify in a follow-up because this doesn't create any new values.
llvm-svn: 346150
As stated in IEEE-754 and discussed in:
https://bugs.llvm.org/show_bug.cgi?id=38086
...the sign of zero does not affect any FP compare predicate.
Known regressions were fixed with:
rL346097 (D54001)
rL346143
The transform will help reduce pattern-matching complexity to solve:
https://bugs.llvm.org/show_bug.cgi?id=39475
...as well as improve CSE and codegen (a zero constant is almost always
easier to produce than 0x80..00).
llvm-svn: 346147
It looks like we correctly removed edge cases with 0.0 from D50714,
but we were a bit conservative because getBinOpIdentity() doesn't
distinguish between +0.0 and -0.0 and 'nsz' is effectively always
true for fcmp (see discussion in:
https://bugs.llvm.org/show_bug.cgi?id=38086
Without this change, we would get regressions by canonicalizing
to +0.0 in all fcmp, and that's a step towards solving:
https://bugs.llvm.org/show_bug.cgi?id=39475
llvm-svn: 346143
We currently seem to underestimate the size of functions with loops in them,
both in terms of absolute code size and in the difficulties of dealing with
such code. (Calls, for example, can be tail merged to further reduce
codesize). At -Oz, we can then increase code size by inlining small loops
multiple times.
This attempts to penalise functions with loops at -Oz by adding a CallPenalty
for each top level loop in the function. It uses LI (and hence DT) to calculate
the number of loops. As we are dealing with minsize, the inline threshold is
small and functions at this point should be relatively small, making the
construction of these cheap.
Differential Revision: https://reviews.llvm.org/D52716
llvm-svn: 346134
Using TargetTransformInfo allows the splitting pass to factor in the
code size cost of instructions as it decides whether or not outlining is
profitable.
This did not regress the overall amount of outlining seen on the handful
of internal frameworks I tested.
Thanks to Jun Bum Lim for suggesting this!
Differential Revision: https://reviews.llvm.org/D53835
llvm-svn: 346108
In PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
..we may fail to recognize/simplify fabs() in some cases because we do not
canonicalize fcmp with a -0.0 operand.
Adding that canonicalization can cause regressions on min/max FP tests, so
that's this patch: for the purpose of determining whether something is min/max,
let the value returned by the select determine how we treat a 0.0 operand in the fcmp.
This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so
we don't fail to recognize equivalent min/max patterns that only differ in the
signbit of 0.0.
Differential Revision: https://reviews.llvm.org/D54001
llvm-svn: 346097
This patch gives the IR ComputeNumSignBits the same functionality as the
DAG version (the code is derived from the existing code).
This an extension of the single input shuffle analysis added with D53659.
Differential Revision: https://reviews.llvm.org/D53987
llvm-svn: 346071
Summary:
-mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this.
Fixes PR38177. The test case here is (somewhat) simplified from:
```
struct S {
int foo;
void fn(int bar);
};
void S::fn(int bar) {
if (bar)
foo = 1;
else
foo = 0;
}
```
Reviewers: dblaikie, gbedwell, aprantl, vsk
Reviewed By: vsk
Subscribers: vsk, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D54019
llvm-svn: 346027
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.
This makes it more readable and just slightly more accurate generally.
Review: Ulrich Weigand
https://reviews.llvm.org/D53071
llvm-svn: 345998
Fix PR39417, PR39497
The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.
Differential Revision: https://reviews.llvm.org/D53612
llvm-svn: 345959
Inner-loop only reductions require additional checks to make sure they
form a load-phi-store cycle across inner and outer loop. Otherwise the
reduction value is not properly preserved. This patch disables
interchanging such loops for now, as it causes miscompiles in some
cases and it seems to apply only for a tiny amount of loops. Across the
test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are
interchange with inner loop reduction support disabled. With
-loop-interchange-threshold=-1000, 3256 instead of 3267.
See the discussion and history of D53027 for an outline of how such legality
checks could look like.
Reviewers: efriedma, mcrosier, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D53027
llvm-svn: 345877
When rewriting loop exit values, IndVars considers this transform not profitable if
the loop instruction has a loop user which it believes cannot be optimized away.
In current implementation only calls that immediately use the instruction are considered
as such.
This patch extends the definition of "hard" users to any side-effecting instructions
(which usually cannot be optimized away from the loop) and also allows handling
of not just immediate users, but use chains.
Differentlai Revision: https://reviews.llvm.org/D51584
Reviewed By: etherzhhb
llvm-svn: 345814
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.
In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.
Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.
Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440
llvm-svn: 345723
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.
There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.
It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.
Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb
llvm-svn: 345708
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
InstCombine features an optimization that essentially replaces:
if (a)
free(a)
into:
free(a)
Right now, this optimization is gated by the minsize attribute and therefore
we only perform it if we can prove that we are going to be able to eliminate
the branch and the destination block.
However when casts are involved the optimization would fail to apply, because
the optimization was not smart enough to realize that it is possible to also
move the casts away from the destination block and that is harmless to the
performance since they are just noops.
E.g.,
foo(int *a)
if (a)
free((char*)a)
Wouldn't be optimized by instcombine, because
- We would refuse to hoist the `bitcast i32* %a to i8` in the source block
- We would fail to see that `bitcast i32* %a to i8` and %a are the same value.
This patch fixes both these problems:
- It teaches the pattern matching of the comparison how to look
through casts.
- It checks that whether the additional instruction in the destination block
can be hoisted and are harmless performance-wise.
- It hoists all the code of the destination block in the source block.
Differential Revision: D53356
llvm-svn: 345644
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'
The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start
with or instcombine has converted insert/extract to shuffles.
Independent of that, an insertelement is always a simpler op for IR
analysis vs. a shuffle, so we should transform to insert when possible.
I don't think there's any codegen concern here - if a target can't insert
a scalar directly to some fixed element in a vector (x86?), then this
should get expanded to the insert+shuffle that we started with.
Differential Revision: https://reviews.llvm.org/D53507
llvm-svn: 345607
This commit is a combination of two patches:
* "Fix in getScalarizationOverhead()"
If target returns false in TTI.prefersVectorizedAddressing(), it means the
address registers will not need to be extracted. Therefore, there should
be no operands scalarization overhead for a load instruction.
* "Don't pass the instruction pointer from getMemInstScalarizationCost."
Since VF is always > 1, this is a cost query for an instruction in the
vectorized loop and it should not be evaluated within the scalar
context of the instruction.
Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351https://reviews.llvm.org/D52417
llvm-svn: 345603
This fixes an assertion when constant folding a GEP when the part of the offset
was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in
the newly created test case.
Differential Revision: https://reviews.llvm.org/D52609
llvm-svn: 345585
It can be profitable to outline single-block cold regions because they
may be large.
Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.
In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.
Differential Revision: https://reviews.llvm.org/D53824
llvm-svn: 345524
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.
In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.
I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.
Reviewers: efriedma, nlopes, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D51629
llvm-svn: 345456
The motivating case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The analysis improvement allows us to form a vector 'select' out of
bitwise logic (the use of ComputeNumSignBits was added at rL345149).
The smaller test shows another InstCombine improvement - we use
ComputeNumSignBits to add 'nsw' to shift-left. But the negative
test shows an example where we must not add 'nsw' - when the shuffle
mask contains undef elements.
Differential Revision: https://reviews.llvm.org/D53659
llvm-svn: 345429
Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.
Reviewers: christylee, trentxintong, courbet
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53536
llvm-svn: 345413
This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:
- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.
The patch has no compile time effect on code that does not contain any guards.
Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc
llvm-svn: 345387
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.
Differential Revision: https://reviews.llvm.org/D53650
llvm-svn: 345295
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.
Differential Revision: https://reviews.llvm.org/D53562
llvm-svn: 345272