a) Making it a per call site bonus for functions that we can move from
indirect to direct calls.
b) Reduces the bonus from 500 to 100 per call site.
c) Subtracts the size of the possible newly inlineable call from the
bonus to only add a bonus if we can inline a small function to devirtualize
it.
Also changes the bonus from a positive that's subtracted to a negative
that's added.
Fixes the remainder of rdar://8546196 by reducing the object file size
after inlining by 84%.
llvm-svn: 124916
Modified patch by Adam Preuss.
This builds on the existing framework for block tracing, edge profiling and optimal edge profiling.
See -help-hidden for new flags.
For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs.
llvm-svn: 124515
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487
a few loops accordingly. Should be no functional change.
This is a step for more accurate cost/benefit analysis of devirt/inlining
bonuses.
llvm-svn: 124275
optimized code are:
(non-negative number)+(power-of-two) != 0 -> true
and
(x | 1) != 0 -> true
Instcombine knows about the second one of course, but only does it if X|1
has only one use. These fire thousands of times in the testsuite.
llvm-svn: 124183
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.
Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.
llvm-svn: 124134
computation, the Ancestor field is always set to the Parent, so we can remove
the explicit link entirely and merge the Parent and Ancestor fields. Instead of
checking for whether an ancestor exists for a node or not, we simply check
whether the node has already been processed. This is simpler if Compress is
inlined into Eval, so I did that as well.
This is about a 3% speedup running -domtree on test-suite + SPEC2000 & SPEC2006,
but it also opens up some opportunities for further improvement.
llvm-svn: 124061
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
llvm-svn: 123417
Add methods for accessing the (single) entry / exit edge of a region. If no such
edge exists, null is returned. Both accessors return the start block of the
corresponding edge. The edge can finally be formed by utilizing
Region::getEntry() or Region::getExit();
Contributed by: Andreas Simbuerger <simbuerg@fim.uni-passau.de>
llvm-svn: 123410
void f(int* begin, int* end) { std::fill(begin, end, 0); }
which turns into a != exit expression where one pointer is
strided and (thanks to step #1) known to not overflow, and
the other is loop invariant.
The observation here is that, though the IV is strided by
4 in this case, that the IV *has* to become equal to the
end value. It cannot "miss" the end value by stepping over
it, because if it did, the strided IV expression would
eventually wrap around.
Handle this by turning A != B into "A-B != 0" where the A-B
part is known to be NUW.
llvm-svn: 123131
a pointer value has potentially become escaping. Implementations can choose to either fall back to
conservative responses for that value, or may recompute their analysis to accomodate the change.
llvm-svn: 122777
update a callGraph when performing the common operation of splicing the body to
a new function and updating all callers (such as via RAUW).
No users yet, though this is intended for DeadArgumentElimination as part of
PR8887.
llvm-svn: 122728
compile, and everyone's tests have shown it to be slower in practice, even for
quite large graphs.
I also hope to do an optimization that is only correct with the simpler data
structure, which would break this even further.
llvm-svn: 122684
naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket
for each vertex. However, this is unnecessary, because each vertex is only
placed into a single bucket (that of its semidominator), and each vertex's
bucket is processed before it is added to any bucket itself.
Instead of using a bucket per vertex, we use a single array Buckets that has two
purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores
the index of the first element in V's bucket. After V's bucket is processed,
Buckets[i] stores the index of the next element in the bucket to which V now
belongs, if any.
Reading from the buckets can also be optimized. Instead of processing the bucket
of V's parent at the end of processing V, we process the bucket of V itself at
the beginning of processing V. This means that the case of the root vertex can
be simplified somewhat. It also means that we don't need to look up the DFS
number of the semidominator of every node in the bucket we are processing,
since we know it is the current index being processed.
This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with
larger speedups of around 12% on the larger benchmarks like GCC.
llvm-svn: 122680