Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
This was first checked in at r265912 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 271152
Fixes PR27315.
The post-inc version of an add recurrence needs to "follow the same
rules" as a normal add or subtract expression. Otherwise we miscompile
programs like
```
int main() {
int a = 0;
unsigned a_u = 0;
volatile long last_value;
do {
a_u += 3;
last_value = (long) ((int) a_u);
if (will_add_overflow(a, 3)) {
// Leave, and don't actually do the increment, so no UB.
printf("last_value = %ld\n", last_value);
exit(0);
}
a += 3;
} while (a != 46);
return 0;
}
```
This patch changes SCEV to put no-wrap flags on post-inc add recurrences
only when the poison from a potential overflow will go ahead to cause
undefined behavior.
To avoid regressing performance too much, I've assumed infinite loops
without side effects is undefined behavior to prove poison<->UB
equivalence in more cases. This isn't ideal, but is not new to LLVM as
a whole, and far better than the situation I'm trying to fix.
llvm-svn: 271151
Summary:
Unroll factor (Count) calculations moved to a new function.
Early exits on pragma and "-unroll-count" defined factor added.
New type of unrolling "Force" introduced (previously used implicitly).
New unroll preference "AllowRemainder" introduced and set "true" by default.
(should be set to false for architectures that suffers from it).
Reviewers: hfinkel, mzolotukhin, zzheng
Differential Revision: http://reviews.llvm.org/D19553
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271071
r270777 improved the precision of alloca vs. inbounbds GEP alias queries: if
we have (a) an inbounds GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points to would have a negative offset with
respect to the alloca, then the GEP can not alias pointer (b).
This makes the same logic fire when (b) is based on a GlobalVariable instead
of an alloca.
Differential Revision: http://reviews.llvm.org/D20652
llvm-svn: 270893
If a we have (a) a GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points would have a negative offset with
repsect to the alloca, then the GEP can not alias pointer (b).
For example, consider code like:
struct { int f0, int f1, ...} foo;
...
foo alloca;
foo *random = bar(alloca);
int *f0 = &alloca.f0
int *f1 = &random->f1;
Which is lowered, approximately, to:
%alloca = alloca %struct.foo
%random = call %struct.foo* @random(%struct.foo* %alloca)
%f0 = getelementptr inbounds %struct, %struct.foo* %alloca, i32 0, i32 0
%f1 = getelementptr inbounds %struct, %struct.foo* %random, i32 0, i32 1
Assume %f1 and %f0 alias. Then %f1 would point into the object allocated
by %alloca. Since the %f1 GEP is inbounds, that means %random must also
point into the same object. But since %f0 points to the beginning of %alloca,
the highest %f1 can be is (%alloca + 3). This means %random can not be higher
than (%alloca - 1), and so is not inbounds, a contradiction.
Differential Revision: http://reviews.llvm.org/D20495
llvm-svn: 270777
Getting accurate locations for loops is important, because those locations are
used by the frontend to generate optimization remarks. Currently, optimization
remarks for loops often appear on the wrong line, often the first line of the
loop body instead of the loop itself. This is confusing because that line might
itself be another loop, or might be somewhere else completely if the body was
inlined function call. This happens because of the way we find the loop's
starting location. First, we look for a preheader, and if we find one, and its
terminator has a debug location, then we use that. Otherwise, we look for a
location on an instruction in the loop header.
The fallback heuristic is not bad, but will almost always find the beginning of
the body, and not the loop statement itself. The preheader location search
often fails because there's often not a preheader, and even when there is a
preheader, depending on how it was formed, it sometimes carries the location of
some preceeding code.
I don't see any good theoretical way to fix this problem. On the other hand,
this seems like a straightforward solution: Put the debug location in the
loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert
llvm.loop metadata with appropriate locations when generating debugging
information. With these changes, our loop remarks have much more accurate
locations.
Differential Revision: http://reviews.llvm.org/D19738
llvm-svn: 270771
Also s/Cycles/Iters/ in NumCyclesForStoreLoadThroughMemory to make it
clear that this is not about clock cycles but loop cycles/iterations.
llvm-svn: 269667
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
Ported DA to the new PM by splitting the former DependenceAnalysis Pass
into a DependenceInfo result type and DependenceAnalysisWrapperPass type
and adding a new PM-style DependenceAnalysis analysis pass returning the
DependenceInfo.
Patch by Philip Pfaffe, most of the review by Justin.
Differential Revision: http://reviews.llvm.org/D18834
llvm-svn: 269370
The plan is to eventually make this logic simpler, however I expect it to
be a little tricky for the foreseeable future (at least until we're rid of
pointee types), so move it here so that it can be reused to build a summary
index for devirtualization.
Differential Revision: http://reviews.llvm.org/D20005
llvm-svn: 269081
We can use calls to @llvm.experimental.guard to prove predicates,
relying on the fact that in all locations domianted by a call to
@llvm.experimental.guard the predicate it is guarding is known to be
true.
llvm-svn: 268997
This test was crashing, and currently it breaks bootstrapping clang with debuginfo
Differential Revision: http://reviews.llvm.org/D20008
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268715
A loop pass that didn't preserve this entire set of passes wouldn't
play well with other loop passes, since these are generally a basic
requirement to do any interesting transformations to a loop.
Adds a helper to get the set of analyses a loop pass should preserve,
and checks that any loop pass we run satisfies the requirement.
llvm-svn: 268444
that it computes. Currently this is used for testing and precision
tuning, but it might be used by optimizations later.
Differential Revision: http://reviews.llvm.org/D19179
llvm-svn: 268291
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.
This will also make it easier to turn it on in buildbots.
Reviewers: chandlerc
Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D19723
llvm-svn: 268050
I tried to be as close as possible to the strongest check that
existed before; cleaning these up properly is left for future work.
Differential Revision: http://reviews.llvm.org/D19469
llvm-svn: 267758
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
This is required to use this function from isSafeToSpeculativelyExecute
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16231
llvm-svn: 267692
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
Summary:
Also adds a small comment blurb on control flow + no-wrap flags, since
that question came up a few days back on llvm-dev.
Reviewers: bjarke.roune, broune
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19209
llvm-svn: 267110
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
Summary:
IntrReadWriteArgMem simply becomes IntrArgMemOnly.
So there are fewer intrinsic properties that express their orthogonality
better, and correspond more closely to the corresponding IR attributes.
Suggested by: Philip Reames
Reviewers: joker.eph, reames, tstellarAMD
Subscribers: jholewinski, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19291
llvm-svn: 267021
No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table:
```
A is
+, -, +/-
F F F + B is
T F ? -
? F ? +/-
```
The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate.
There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers.
llvm-svn: 266939
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic. If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.
Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.
llvm-svn: 266801
This patch improves SimplifyCFG to catch cases like:
if (a < b) {
if (a > b) <- known to be false
unreachable;
}
Phabricator Revision: http://reviews.llvm.org/D18905
llvm-svn: 266767
Summary:
Need to use predecessors for reverse graph, successors for forward graph.
succ_iterator/pred_iterator are not compatible, this patch is all the work necessary to work around that (which is what everywhere else does). Not sure if there is a better way, so cc'ing some random folks to take a gander :)
Reviewers: dblaikie, qcolombet, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18796
llvm-svn: 266718
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
Summary:
InlineCost's threshold is multiplied by this value. This lets us adjust
the inlining threshold up or down on a per-target basis. For example,
we might want to increase the threshold on targets where calls are
unusually expensive.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18560
llvm-svn: 266405
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.
This fixes PR27262.
Differential Revision: http://reviews.llvm.org/D18939
llvm-svn: 266381
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).
For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.
This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).
As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.
Fixes PR16275.
llvm-svn: 266363
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279