contained in it each time we try to add it to the worklist, just check
this when pulling it off the worklist. That way we do it at most once
per instruction with the cost of the worklist set we would need to pay
anyways.
llvm-svn: 229060
vector.
In addition to dramatically reducing the work required for contrived
example loops, this also has to correct some serious latent bugs in the
cost computation. Previously, we might add an instruction onto the
worklist once for every load which it used and was simplified. Then we
would visit it many times and accumulate "savings" each time.
I mean, fortunately this couldn't matter for things like calls with 100s
of operands, but even for binary operators this code seems like it must
be double counting the savings.
I just noticed this by inspection and due to the runtime problems it can
introduce, I don't have any test cases for cases where the cost produced
by this routine is unacceptable.
llvm-svn: 229059
In the unroll analyzer, it is checking each user to see if that user
will become dead. However, it first checked if that user was missing
from the simplified values map, and then if was also missing from the
dead instructions set. We add everything from the simplified values map
to the dead instructions set, so the first step is completely subsumed
by the second. Moreover, the first step requires *inserting* something
into the simplified value map which isn't what we want at all.
This also replaces a dyn_cast with a cast as an instruction cannot be
used by a non-instruction.
llvm-svn: 229057
check.
Also hoist this into the enqueue process as it is faster even than
testing the worklist set, we should just directly filter these out much
like we filter out constants and such.
llvm-svn: 229056
We don't just want to handle duplicate operands within an instruction,
but also duplicates across operands of different instructions. I should
have gone straight to this, but I had convinced myself that it wasn't
going to be necessary briefly. I've come to my senses after chatting
more with Nick, and am now happier here.
llvm-svn: 229054
reasonably quickly.
I don't have a reduced test case, but for a version of FFMPEG, this
makes the loop unroller start finishing at all (after over 15 minutes of
running, it hadn't terminated for me, no idea if it was a true infloop
or just exponential work).
The key thing here is to check the DeadInstructions set when pulling
things off the worklist. Without this, we would re-walk the user list of
already dead instructions again and again and again. Consider phi nodes
with many, many operands and other patterns.
The other important aspect of this is that because we would keep
re-visiting instructions that were already known dead, we kept adding
their cost savings to this! This would cause our cost savings to be
*insanely* inflated from this.
While I was here, I also rotated the operand walk out of the worklist
loop to make the code easier to read. There is still work to be done to
minimize worklist traffic because we don't de-duplicate operands. This
means we may add the same instruction onto the worklist 1000s of times
if it shows up in 1000s of operansd to a PHI node for example.
Still, with this patch, the ffmpeg testcase I have finishes quickly and
I can't measure the runtime impact of the unroll analysis any more. I'll
probably try to do a few more cleanups to this code, but not sure how
much cleanup I can justify right now.
llvm-svn: 229038
readable.
The biggest thing that was causing me problems is recognizing the
references vs. poniters here. I also found that for maps naming the loop
variable as KeyValue helps make it obvious why you don't actually use it
directly. Finally, using 'auto' instead of 'User *' doesn't seem like
a good tradeoff. Much like with the other cases, I like to know its
a pointer, and 'User' is just as long and tells the reader a lot more.
llvm-svn: 229033
propagating of metadata.
We were propagating !nonnull metadata even when the newly formed load is
no longer of a pointer type. This is clearly broken and results in LLVM
failing the verifier and aborting. This patch just restricts the
propagation of !nonnull metadata to when we actually have a pointer
type.
This bug report and the initial version of this patch was provided by
Charles Davis! Many thanks for finding this!
We still need to add logic to round-trip the metadata correctly if we
combine from pointer types to integer types and then back by using range
metadata for the integer type loads. But this is the minimal and safe
version of the patch, which is important so we can backport it into 3.6.
llvm-svn: 229029
hard to type and read for me, and is inconsistent with the other
abbreviation in the base class "Inst". For most of these (where they are
used widely) I prefer just spelling it out as Instruction. I've changed
two of the short-lived variables to use "Inst" to match the base class.
llvm-svn: 229028
This is much more efficient. In particular, the query with the user
instruction has to insert a false for every missing instruction into the
set. This is just a cleanup a long the way to fixing the underlying
algorithm problems here.
llvm-svn: 228994
When we try to estimate number of potentially removed instructions in
loop unroller, we analyze first N iterations and then scale the
computed number by TripCount/N. We should bail out early if N is 0.
llvm-svn: 228988
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.
llvm-svn: 228931
I've built some tests in WebRTC with and without this change. With this change number of __tsan_read/write calls is reduced by 20-40%, binary size decreases by 5-10% and execution time drops by ~5%. For example:
$ ls -l old/modules_unittests new/modules_unittests
-rwxr-x--- 1 dvyukov 41708976 Jan 20 18:35 old/modules_unittests
-rwxr-x--- 1 dvyukov 38294008 Jan 20 18:29 new/modules_unittests
$ objdump -d old/modules_unittests | egrep "callq.*__tsan_(read|write|unaligned)" | wc -l
239871
$ objdump -d new/modules_unittests | egrep "callq.*__tsan_(read|write|unaligned)" | wc -l
148365
http://reviews.llvm.org/D7069
llvm-svn: 228917
Apparently some code finally started to tickle this after my
canonicalization changes to instcombine.
The bug stems from trying to form a vector type out of scalars that
aren't compatible at all. In this example, from x86_mmx values. The code
in the vectorizer that checks for reasonable types whas checking for
aggregates or vectors, but there are lots of other types that should
just never reach the vectorizer.
Debugging this was made more confusing by the lie in an assert in
VectorType::get() -- it isn't that the types are *primitive*. The types
must be integer, pointer, or floating point types. No other types are
allowed.
I've improved the assert and added a helper to the vectorizer to handle
the element type validity checks. It now re-uses the VectorType static
function and then further excludes weird target-specific types that we
probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of
these are really reachable anyways (neither 80-bit nor 128-bit things
will get vectorized) but it seems better to just eagerly exclude such
nonesense.
I've added a test case, but while it definitely covers two of the paths
through this code there may be more paths that would benefit from test
coverage. I'm not familiar enough with the SLP vectorizer to synthesize
test cases for all of these, but was able to update the code itself by
inspection.
llvm-svn: 228899
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.
llvm-svn: 228885
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7286
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 228872
analysis.
We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.
Test changes:
* combine-comparisons-by-cse.ll: Removed unneeded branch check.
* 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
* coalesce-subregs.ll: Superfluous block check.
* 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
* PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
* select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.
llvm-svn: 228826
A DAGRootSet models an induction variable being used in a rerollable
loop. For example:
x[i*3+0] = y1
x[i*3+1] = y2
x[i*3+2] = y3
Base instruction -> i*3
+---+----+
/ | \
ST[y1] +1 +2 <-- Roots
| |
ST[y2] ST[y3]
There may be multiple DAGRootSets, for example:
x[i*2+0] = ... (1)
x[i*2+1] = ... (1)
x[i*2+4] = ... (2)
x[i*2+5] = ... (2)
x[(i+1234)*2+5678] = ... (3)
x[(i+1234)*2+5679] = ... (3)
This concept is similar to the "Scale" member used previously, but allows
multiple independent sets of roots based off the same induction variable.
llvm-svn: 228821
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.
Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman
llvm-svn: 228798
Add handling for __llvm_coverage_mapping to the InstrProfiling
pass. We need to make sure the constant and any profile names it
refers to are in the correct sections, which is easier and cleaner to
do here where we have to know about profiling sections anyway.
This is really tricky to test without a frontend, so I'm committing
the test for the fix in clang. If anyone knows a good way to test this
within LLVM, please let me know.
Fixes PR22531.
llvm-svn: 228793
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.
Also add some landingpads to invalid LLVM IR test cases that lack them.
Over-the-shoulder reviewed by David Majnemer.
llvm-svn: 228782
Unless we meet an insertvalue on a path from some value to a return, that value
will be live if *any* of the return's components are live, so all of those
components must be added to the MaybeLiveUses.
Previously we were deleting arguments if sub-value 0 turned out to be dead.
llvm-svn: 228731
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.
llvm-svn: 228677
I realized that my early fix for this was overly complicated. Rather than scatter checks around in a bunch of places, just exit early when we visit the poll function itself.
Thinking about it a bit, the whole inlining mechanism used with gc.safepoint_poll could probably be cleaned up a bit. Originally, poll insertion was fused with gc relocation rewriting. It might be worth going back to see if we can simplify the chain of events now that these two are seperated. As one thought, maybe it makes sense to rewrite calls inside the helper function before inlining it to the many callers. This would require us to visit the poll function before any other functions though..
llvm-svn: 228634
for any padding introduced by SROA. In particular, do not emit debug info
for an alloca that represents only the padding introduced by a previous
iteration.
Fixes PR22495.
llvm-svn: 228632
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.
Part of PR22495.
llvm-svn: 228631
Summary:
It's important that our users immediately know what gc.safepoint_poll
is. Also fix the style of the declaration of CreateGCStatepoint, in
preparation for another change that will wrap it.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7517
llvm-svn: 228626
`DIExpression` deals with `uint64_t`, so it doesn't make sense that
`createExpression()` is created from `int64_t`. Switch to `uint64_t` to
unify them.
I've temporarily left in the `int64_t` version, which forwards to the
`uint64_t` version. I'll delete it once I've updated the callers.
llvm-svn: 228619
This is just adding really simple tests which should have been part of the original submission. When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission. I fixed two such bugs in this submission.
llvm-svn: 228610
wrong basic block.
This would happen when the result of an invoke was used by a phi instruction
in the invoke's normal destination block. An instruction to reload the invoke's
value would get inserted before the critical edge was split and a new basic
block (which is the correct insertion point for the reload) was created. This
commit fixes the bug by splitting the critical edge before all the reload
instructions are inserted.
Also, hoist up the code which computes the insertion point to the only place
that need that computation.
rdar://problem/15978721
llvm-svn: 228566
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.
This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.
llvm-svn: 228559
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.
E.g.
%val = call { i32, i32 } @whatever()
[...]
ret { i32, i32 } %val
The return is using the entire aggregate (sub-values 0 and 1). We can
still simplify @whatever if we can prove that this return is itself
unused.
Also unifies the logic slightly between aggregate and non-aggregate
cases..
llvm-svn: 228558
Make assume (load (call|invoke) != null) set nonNull return attribute
for the call and invoke. Also include tests.
Differential Revision: http://reviews.llvm.org/D7107
llvm-svn: 228556
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7490
llvm-svn: 228525
The only difference between deleteIfDeadInstruction and
RecursivelyDeleteTriviallyDeadInstructions is that the former also
manually invalidates SCEV. That's unnecessary because SCEV automatically
gets informed when an instruction is deleted via a ValueHandle. NFC.
llvm-svn: 228508
An atomic store always make the target location fully initialized (in the
current implementation). It should not store origin. Initialized memory can't
have meaningful origin, and, due to origin granularity (4 bytes) there is a
chance that this extra store would overwrite meaningfull origin for an adjacent
location.
llvm-svn: 228444
If complete-unroll could help us to optimize away N% of instructions, we
might want to do this even if the final size would exceed loop-unroll
threshold. However, we don't want to unroll huge loop, and we are add
AbsoluteThreshold to avoid that - this threshold will never be crossed,
even if we expect to optimize 99% instructions after that.
llvm-svn: 228434
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).
Example:
float foo(float *a, float b) {
float r;
if (a[1] * b)
r = /* a lot of expensive computations */;
else
r = 1;
return r;
}
float boo(float *a) {
return foo(a, 0.0);
}
Without this patch, we don't inline 'foo' into 'boo'.
llvm-svn: 228432