1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

4994 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith
6ccaf0fa23 Support: Extract ScaledNumbers::MinScale and MaxScale
llvm-svn: 211558
2014-06-24 00:15:19 +00:00
Duncan P. N. Exon Smith
a21f5c3569 BFI: Change language from "exponent" to "scale"
llvm-svn: 211557
2014-06-23 23:57:12 +00:00
Duncan P. N. Exon Smith
1c9633c62e BFI: Rename UnsignedFloat => ScaledNumber
A lot of the docs and API are out of date, but I'll leave that for a
separate commit.

llvm-svn: 211555
2014-06-23 23:36:17 +00:00
Benjamin Kramer
8d54e9ca1f SCEVExpander: Fold constant PHIs harder. The logic below only understands proper IVs.
PR20093.

llvm-svn: 211433
2014-06-21 11:47:18 +00:00
Richard Trieu
b7d5af56cb Add back functionality removed in r210497.
Instead of asserting, output a message stating that a null pointer was found.

llvm-svn: 211430
2014-06-21 02:43:02 +00:00
Duncan P. N. Exon Smith
db0cbc8b8a Support: Write ScaledNumber::getQuotient() and getProduct()
llvm-svn: 211409
2014-06-20 21:47:47 +00:00
Jingyue Wu
52b8eafe4c [ValueTracking] Extend range metadata to call/invoke
Summary:
With this patch, range metadata can be added to call/invoke including
IntrinsicInst. Previously, it could only be added to load.

Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because
range metadata is not only used by load.

Update the language reference to reflect this change.

Test Plan:
Add several tests in range-2.ll to confirm the verifier is happy with
having range metadata on call/invoke.

Add two tests in AddOverFlow.ll to confirm annotating range metadata to
call/invoke can benefit InstCombine.

Reviewers: meheff, nlewycky, reames, hfinkel, eliben

Reviewed By: eliben

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4187

llvm-svn: 211281
2014-06-19 16:50:16 +00:00
Nick Lewycky
051f63ab97 Move optimization of some cases of (A & C1)|(B & C2) from instcombine to instsimplify. Patch by Rahul Jain, plus some last minute changes by me -- you can blame me for any bugs.
llvm-svn: 211252
2014-06-19 03:51:46 +00:00
Nick Lewycky
4eb68b1ca7 Make instsimplify's analysis of icmp eq/ne use computeKnownBits to determine whether the icmp is always true or false. Patch by Suyog Sarda!
llvm-svn: 211251
2014-06-19 03:35:49 +00:00
Richard Trieu
8c7b353cd7 Removing an "if (!this)" check from two print methods. The condition will
never be true in a well-defined context.  The checking for null pointers
has been moved into the caller logic so it does not rely on undefined behavior.

llvm-svn: 210497
2014-06-09 22:53:16 +00:00
Alp Toker
a9e2748af6 Remove old fenv.h workaround for a historic clang driver bug
Tested and works fine with clang using libstdc++.

All indications are that this was fixed some time ago and isn't a problem with
any clang version we support.

I've added a note in PR6907 which is still open for some reason.

llvm-svn: 210485
2014-06-09 19:00:52 +00:00
Alp Toker
a026ddb3ba Fold FEnv.h into the implementation
Support headers shouldn't use config.h definitions, and they should never be
undefined like this.

ConstantFolding.cpp was the only user of this facility and already includes
config.h for other math features, so it makes sense to move the checks there at
point of use.

(The implicit config.h was also quite dangerous -- removing the FEnv.h include
would have silently disabled math constant folding without causing any tests to
fail. Need to investigate -Wundef once the cleanup is done.)

This eliminates the last config.h include from LLVM headers, paving the way for
more consistent configuration checks.

llvm-svn: 210483
2014-06-09 18:28:53 +00:00
Tobias Grosser
e914a50dc9 ScalarEvolution: Derive element size from the type of the loaded element
Before, we where looking at the size of the pointer type that specifies the
location from which to load the element. This did not make any sense at all.

This change fixes a bug in the delinearization where we failed to delinerize
certain load instructions.

llvm-svn: 210435
2014-06-08 19:21:20 +00:00
Tom Roeder
740d86dc79 Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute.
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.

This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.

llvm-svn: 210280
2014-06-05 19:29:43 +00:00
Rafael Espindola
0746266d63 Add a Constant version of stripPointerCasts.
Thanks to rnk for the suggestion.

llvm-svn: 210205
2014-06-04 19:01:48 +00:00
Sebastian Pop
e038cb3e5a implement missing SCEVDivision case
without this case we would end on an infinite recursion: the remainder is zero,
so Numerator - Remainder is equal to Numerator and so we would recursively ask
for the division of Numerator by Denominator.

llvm-svn: 209838
2014-05-29 19:44:09 +00:00
Sebastian Pop
a5d17facf7 fail to find dimensions when ElementSize is nullptr
when ScalarEvolution::getElementSize returns nullptr it is safe to early return
in ScalarEvolution::findArrayDimensions such that we avoid later problems when
we try to divide the terms by ElementSize.

llvm-svn: 209837
2014-05-29 19:44:05 +00:00
Sanjay Patel
3591bead10 test check-in: added missing parenthesis in comment
llvm-svn: 209763
2014-05-28 19:03:33 +00:00
Sebastian Pop
6efdf0e296 avoid type mismatch when building SCEVs
This is a corner case I have stumbled upon when dealing with ARM64 type
conversions. I was not able to extract a testcase for the community codebase to
fail on. The patch conservatively discards a division that would have ended up
in an ICE due to a type mismatch when building a multiply expression. I have
also added code to a place that builds add expressions and in which we should be
careful not to pass in operands of different types.

llvm-svn: 209694
2014-05-27 22:42:00 +00:00
Sebastian Pop
fa763d3c07 do not use the GCD to compute the delinearization strides
We do not need to compute the GCD anymore after we removed the constant
coefficients from the terms: the terms are now all parametric expressions and
there is no need to recognize constant terms that divide only a subset of the
terms. We only rely on the size of the terms, i.e., the number of operands in
the multiply expressions, to sort the terms and recognize the parametric
dimensions.

llvm-svn: 209693
2014-05-27 22:41:56 +00:00
Sebastian Pop
721b704445 remove BasePointer before delinearizing
No functional change is intended: instead of relying on the delinearization to
come up with the base pointer as a remainder of the divisions in the
delinearization, we just compute it from the array access and use that value.
We substract the base pointer from the SCEV to be delinearized and that
simplifies the work of the delinearizer.

llvm-svn: 209692
2014-05-27 22:41:51 +00:00
Sebastian Pop
1664c3c2ec remove constant terms
The delinearization is needed only to remove the non linearity induced by
expressions involving multiplications of parameters and induction variables.
There is no problem in dealing with constant times parameters, or constant times
an induction variable.

For this reason, the current patch discards all constant terms and multipliers
before running the delinearization algorithm on the terms. The only thing
remaining in the term expressions are parameters and multiply expressions of
parameters: these simplified term expressions are passed to the array shape
recognizer that will not recognize constant dimensions anymore: these will be
recognized as different strides in parametric subscripts.

The only important special case of a constant dimension is the size of elements.
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.

llvm-svn: 209691
2014-05-27 22:41:45 +00:00
Michael Zolotukhin
406287c5b7 Some cleanup for r209568.
llvm-svn: 209634
2014-05-26 14:49:46 +00:00
Michael Zolotukhin
df83a19a09 Implement sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar
Evolution.

That helps SLP-vectorizer to recognize consecutive loads/stores.

<rdar://problem/14860614>

llvm-svn: 209568
2014-05-24 08:09:57 +00:00
Andrew Trick
3b4463f718 Fix and improve SCEV ComputeBackedgeTankCount.
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.

That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.

<rdar://17005101> PR19799: Indvars miscompile...

llvm-svn: 209545
2014-05-23 19:47:13 +00:00
Justin Bogner
5e6887dc27 ScalarEvolution: Fix handling of AddRecs in isKnownPredicate
ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison
when both the LHS and RHS are SCEVAddRecExprs. This checks that both
LHS and RHS are guarded in the case when both are SCEVAddRecExprs.

The test case is against indvars because I could not find a way to
directly test SCEV.

Patch by Sanjay Patel!

llvm-svn: 209487
2014-05-23 00:06:56 +00:00
Andrew Trick
102d4404fb Fix a bug in SCEV's backedge taken count computation from my prior fix in Jan.
This has to do with the trip count computation for loops with multiple
exits, which is quite subtle. Most passes just ask for a single trip
count number, so we must be conservative assuming any exit could be
taken.  Normally, we rely on the "exact" trip count, which was
correctly given as "unknown". However, SCEV also gives a "max"
back-edge taken count. The loops max BE taken count is conservatively
a maximum over the max of each exit's non-exiting iterations
count. Note that some exit tests can be skipped so the max loop
back-edge taken count can actually exceed the max non-exiting
iterations for some exits. However, when we know the loop *latch*
cannot be skipped, we can directly use its max taken count
disregarding other exits. I previously took the minimum here without
checking whether the other exit could be skipped. The correct, and
simpler thing to do here is just to directly use the loop latch's max
non-exiting iterations as the loops max back-edge count.

In the problematic test case, the first loop exit had a max of zero
non-exiting iterations, but could be skipped. The loop latch was known
not to be skipped but had max of one non-exiting iteration. We
incorrectly claimed the loop back-edge could be taken zero times, when
it is actually taken one time.

Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count.
Loop %for.body.i: max backedge-taken count is 1.

llvm-svn: 209358
2014-05-22 00:37:03 +00:00
Eric Christopher
262770bdee Clean up language and grammar.
Based on a patch by jfcaron3@gmail.com!
PR19806

llvm-svn: 209216
2014-05-20 17:11:11 +00:00
Nick Lewycky
ea4c3a9a9c Teach isKnownNonNull that a nonnull return is not null. Add a test for this case as well as the case of a nonnull attribute (already handled but not tested).
llvm-svn: 209193
2014-05-20 05:13:21 +00:00
Nick Lewycky
de84a8bb51 Add 'nonnull', a new parameter and return attribute which indicates that the pointer is not null. Instcombine will elide comparisons between these and null. Patch by Luqman Aden!
llvm-svn: 209185
2014-05-20 01:23:40 +00:00
Peter Collingbourne
6b9c51d275 Check the alwaysinline attribute on the call as well as on the caller.
Differential Revision: http://reviews.llvm.org/D3815

llvm-svn: 209150
2014-05-19 18:25:54 +00:00
David Majnemer
ef2cb1fc63 InstSimplify: Improve handling of ashr/lshr
Summary:
Analyze the range of values produced by ashr/lshr cst, %V when it is
being used in an icmp.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3774

llvm-svn: 209000
2014-05-16 17:14:03 +00:00
David Majnemer
186633e0f8 InstSimplify: Optimize using dividend in sdiv
Summary:
The dividend in an sdiv tells us the largest and smallest possible
results.  Use this fact to optimize comparisons against an sdiv with a
constant dividend.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3795

llvm-svn: 208999
2014-05-16 16:57:04 +00:00
Juergen Ributzka
271cad0970 Add C API for thread yielding callback.
Sometimes a LLVM compilation may take more time then a client would like to
wait for. The problem is that it is not possible to safely suspend the LLVM
thread from the outside. When the timing is bad it might be possible that the
LLVM thread holds a global mutex and this would block any progress in any other
thread.

This commit adds a new yield callback function that can be registered with a
context. LLVM will try to yield by calling this callback function, but there is
no guaranteed frequency. LLVM will only do so if it can guarantee that
suspending the thread won't block any forward progress in other LLVM contexts
in the same process.

Once the client receives the call back it can suspend the thread safely and
resume it at another time.

Related to <rdar://problem/16728690>

llvm-svn: 208945
2014-05-16 02:33:15 +00:00
Jay Foad
2827803889 Instead of littering asserts throughout the code after every call to
computeKnownBits, consolidate them into one assert at the end of
computeKnownBits itself.

llvm-svn: 208876
2014-05-15 12:12:55 +00:00
Chandler Carruth
0d35e1f8fc Teach the constant folder to look through bitcast constant expressions
much more effectively when trying to constant fold a load of a constant.
Previously, we only handled bitcasts by trying to find a totally generic
byte representation of the constant and use that. Now, we look through
the bitcast to see what constant we might fold the load into, and then
try to form a constant expression cast of the found value that would be
equivalent to loading the value.

You might wonder why on earth this actually matters. Well, turns out
that the Itanium ABI causes us to create a single array for a vtable
where the first elements are virtual base offsets, followed by the
virtual function pointers. Because the array is homogenous the element
type is consistently i8* and we inttoptr the virtual base offsets into
the initial elements.

Then constructors bitcast these pointers to i64 pointers prior to
loading them. Boom, no more constant folding of virtual base offsets.
This is the first fix to LLVM to address the *insane* performance Eric
Niebler discovered with Clang on his range comprehensions[1]. There is
more to come though, this doesn't *really* fix the problem fully.

[1]: http://ericniebler.com/2014/04/27/range-comprehensions/

llvm-svn: 208856
2014-05-15 09:56:28 +00:00
Alp Toker
18115693f7 Fix typos
llvm-svn: 208839
2014-05-15 01:52:21 +00:00
Jay Foad
e0eac700cb Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
David Majnemer
6098432810 InstSimplify: Optimize signed icmp of -(zext V)
Summary:
We know that -(zext V) will always be <= zero, simplify signed icmps
that have these.

Uncovered using http://www.cs.utah.edu/~regehr/souper/

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3754

llvm-svn: 208809
2014-05-14 20:16:28 +00:00
Jay Foad
df682f6c8b Update the comments for ComputeMaskedBits, which lost its Mask parameter
in r154011.

llvm-svn: 208757
2014-05-14 08:00:07 +00:00
Sebastian Pop
25e94ba142 use nullptr instead of NULL
llvm-svn: 208622
2014-05-12 20:11:01 +00:00
Sebastian Pop
de2f65cfdd do not assert when delinearization fails
llvm-svn: 208615
2014-05-12 19:01:53 +00:00
Sebastian Pop
28499bfbb2 use isZero()
llvm-svn: 208614
2014-05-12 19:01:49 +00:00
Benjamin Kramer
28edee20f3 SCEV: Use range-based for loop and fold variable into assert.
llvm-svn: 208476
2014-05-10 17:47:18 +00:00
Sebastian Pop
f6b4cc99cd move findArrayDimensions to ScalarEvolution
we do not use the information from SCEVAddRecExpr to compute the shape of the array,
so a better place for this function is in ScalarEvolution.

llvm-svn: 208456
2014-05-09 22:45:07 +00:00
Sebastian Pop
76196fee4f fix typo in debug message
llvm-svn: 208455
2014-05-09 22:45:02 +00:00
Tobias Grosser
f264562cb9 Correct formatting.
Sorry for the commit spam. My clang-format crashed on me and the vim
plugin did not print an error, but instead just left the formatting
untouched.

llvm-svn: 208358
2014-05-08 21:43:19 +00:00
Tobias Grosser
7888d0c465 Use std::remove_if to remove elements from a vector
Suggested-by: Benjamin Kramer <benny.kra@gmail.com>
llvm-svn: 208357
2014-05-08 21:32:59 +00:00
Rafael Espindola
c6c3ed654b Use a range loop.
llvm-svn: 208343
2014-05-08 17:57:50 +00:00
Tobias Grosser
358e9a97e7 Revert "SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time"
as committed in r208282. The original commit was incorrect.

llvm-svn: 208286
2014-05-08 07:55:34 +00:00
Tobias Grosser
4c447db9fb SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time
llvm-svn: 208282
2014-05-08 07:12:44 +00:00
Sebastian Pop
866eb1eecf avoid segfaulting
*Quotient and *Remainder don't have to be initialized.

llvm-svn: 208238
2014-05-07 19:00:37 +00:00
Sebastian Pop
8f355b84ae do not collect undef terms
llvm-svn: 208237
2014-05-07 19:00:32 +00:00
Sebastian Pop
d5cb815565 split delinearization pass in 3 steps
To compute the dimensions of the array in a unique way, we split the
delinearization analysis in three steps:

- find parametric terms in all memory access functions
- compute the array dimensions from the set of terms
- compute the delinearized access functions for each dimension

The first step is executed on all the memory access functions such that we
gather all the patterns in which an array is accessed. The second step reduces
all this information in a unique description of the sizes of the array. The
third step is delinearizing each memory access function following the common
description of the shape of the array computed in step 2.

This rewrite of the delinearization pass also solves a problem we had with the
previous implementation: because the previous algorithm was by induction on the
structure of the SCEV, it would not correctly recognize the shape of the array
when the memory access was not following the nesting of the loops: for example,
see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll

; void foo(long n, long m, long o, double A[n][m][o]) {
;
;   for (long i = 0; i < n; i++)
;     for (long j = 0; j < m; j++)
;       for (long k = 0; k < o; k++)
;         A[i][k][j] = 1.0;

Starting with this patch we no longer delinearize access functions that do not
contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll

;;  for (long int i = 0; i < 100; i++)
;;    for (long int j = 0; j < 100; j++) {
;;      A[2*i - 4*j] = i;
;;      *B++ = A[6*i + 8*j];

these accesses will not be delinearized as the upper bound of the loops are
constants, and their access functions do not contain SCEVUnknown parameters.

llvm-svn: 208232
2014-05-07 18:01:20 +00:00
Tobias Grosser
8128149f4a [C++11] Add NArySCEV->Operands iterator range
llvm-svn: 208158
2014-05-07 06:07:47 +00:00
Duncan P. N. Exon Smith
cbf47b5244 blockfreq: Move include to .cpp
llvm-svn: 208035
2014-05-06 01:57:42 +00:00
Chandler Carruth
2ccafe8399 [LCG] Add the last (and most complex) of the edge insertion mutation
operations on the call graph. This one forms a cycle, and while not as
complex as removing an internal edge from an SCC, it involves
a reasonable amount of work to find all of the nodes newly connected in
a cycle.

Also somewhat alarming is the worst case complexity here: it might have
to walk roughly the entire SCC inverse DAG to insert a single edge. This
is carefully documented in the API (I hope).

llvm-svn: 207935
2014-05-04 09:38:32 +00:00
Juergen Ributzka
b855191ef0 [TBAA] Fix handling of mixed TBAA (path-aware and non-path-aware TBAA).
This fix simply ensures that both metadata nodes are path-aware before
performing path-aware alias analysis.

This issue isn't normally triggered in LLVM, because we perform an autoupgrade
of the TBAA metadata to the new format when reading in LL or BC files. This
issue only appears when a client creates the IR manually and mixes old and new
TBAA metadata format.

This fixes <rdar://problem/16760860>.

llvm-svn: 207923
2014-05-03 22:32:52 +00:00
Chandler Carruth
143c70588a [LCG] Add the other simple edge insertion API to the call graph. This
just connects an SCC to one of its descendants directly. Not much of an
impact. The last one is the hard one -- connecting an SCC to one of its
ancestors, and thereby forming a cycle such that we have to merge all
the SCCs participating in the cycle.

llvm-svn: 207751
2014-05-01 12:18:20 +00:00
Chandler Carruth
6f4d8c2889 [LCG] Don't lookup the child SCC twice. Spotted this by inspection, and
no functionality changed.

llvm-svn: 207750
2014-05-01 12:16:31 +00:00
Chandler Carruth
91cf62ad50 [LCG] Add some basic methods for querying the parent/child relationships
of SCCs in the SCC DAG. Exercise them in the big graph test case. These
will be especially useful for establishing invariants in insertion
logic.

llvm-svn: 207749
2014-05-01 12:12:42 +00:00
Chandler Carruth
bd97884116 [LCG] Add the really, *really* boring edge insertion case: adding an
edge entirely within an existing SCC. Shockingly, making the connected
component more connected is ... a total snooze fest. =]

Anyways, its wired up, and I even added a test case to make sure it
pretty much sorta works. =D

llvm-svn: 207631
2014-04-30 10:48:36 +00:00
Chandler Carruth
aa6122effe [LCG] Actually test the *basic* edge removal bits (IE, the non-SCC
bits), and discover that it's totally broken. Yay tests. Boo bug. Fix
the basic edge removal so that it works by nulling out the removed edges
rather than actually removing them. This leaves the indices valid in the
map from callee to index, and preserves some of the locality for
iterating over edges. The iterator is made bidirectional to reflect that
it now has to skip over null entries, and the skipping logic is layered
onto it.

As future work, I would like to track essentially the "load factor" of
the edge list, and when it falls below a threshold do a compaction.

An alternative I considered (and continue to consider) is storing the
callees in a doubly linked list where each element of the list is in
a set (which is essentially the classical linked-hash-table
datastructure). The problem with that approach is that either you need
to heap allocate the linked list nodes and use pointers to them, or use
a bucket hash table (with even *more* linked list pointer overhead!),
etc. It's pretty easy to get 5x overhead for values that are just
pointers. So far, I think punching holes in the vector, and periodic
compaction is likely to be much more efficient overall in the space/time
tradeoff.

llvm-svn: 207619
2014-04-30 07:45:27 +00:00
Benjamin Kramer
4f8fb8ff6c raw_ostream: Forward declare OpenFlags and include FileSystem.h only where necessary.
llvm-svn: 207593
2014-04-29 23:26:49 +00:00
Duncan P. N. Exon Smith
705fc7169e blockfreq: Defer to BranchProbability::scale()
`BlockMass` can now defer to `BranchProbability::scale()`.

llvm-svn: 207547
2014-04-29 16:20:05 +00:00
Duncan P. N. Exon Smith
583ed8f3b0 blockfreq: Remove more extra typenames from r207438
llvm-svn: 207440
2014-04-28 20:22:29 +00:00
Duncan P. N. Exon Smith
2eaef1aa01 Reapply "blockfreq: Approximate irreducible control flow"
This reverts commit r207287, reapplying r207286.

I'm hoping that declaring an explicit struct and instantiating
`addBlockEdges()` directly works around the GCC crash from r207286.
This is a lot more boilerplate, though.

llvm-svn: 207438
2014-04-28 20:02:29 +00:00
Chandler Carruth
08eb8582cd [LCG] Add the most basic of edge insertion to the lazy call graph. This
just handles the pre-DFS case. Also add some test cases for this case to
make sure it works.

llvm-svn: 207411
2014-04-28 11:10:23 +00:00
Chandler Carruth
4098580cb2 [LCG] Make the return of the IntraSCC removal method actually match its
contract (and be much more useful). It now provides exactly the
post-order traversal a caller might need to perform on newly formed
SCCs.

llvm-svn: 207410
2014-04-28 10:49:06 +00:00
Chandler Carruth
02b3960e8a [inliner] Significantly improve the compile time in cases like PR19499
by avoiding inlining massive switches merely because they have no
instructions in them. These switches still show up where we fail to form
lookup tables, and in those cases they are actually going to cause
a very significant code size hit anyways, so inlining them is not the
right call. The right way to fix any performance regressions stemming
from this is to enhance the switch-to-lookup-table logic to fire in more
places.

This makes PR19499 about 5x less bad. It uncovers a second compile time
problem in that test case that is unrelated (surprisingly!).

llvm-svn: 207403
2014-04-28 08:52:44 +00:00
Craig Topper
b663bffa27 [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Chandler Carruth
1b5573df25 [LCG] Re-organize the methods for mutating a call graph to make their
API requirements much more obvious.

The key here is that there are two totally different use cases for
mutating the graph. Prior to doing any SCC formation, it is very easy to
mutate the graph. There may be users that want to do small tweaks here,
and then use the already-built graph for their SCC-based operations.
This method remains on the graph itself and is documented carefully as
being cheap but unavailable once SCCs are formed.

Once SCCs are formed, and there is some in-flight DFS building them, we
have to be much more careful in how we mutate the graph. These mutation
operations are sunk onto the SCCs themselves, which both simplifies
things (the code was already there!) and helps make it obvious that
these interfaces are only applicable within that context. The other
primary constraint is that the edge being mutated is actually related to
the SCC on which we call the method. This helps make it obvious that you
cannot arbitrarily mutate some other SCC.

I've tried to write much more complete documentation for the interesting
mutation API -- intra-SCC edge removal. Currently one aspect of this
documentation is a lie (the result list of SCCs) but we also don't even
have tests for that API. =[ I'm going to add tests and fix it to match
the documentation next.

llvm-svn: 207339
2014-04-27 01:59:50 +00:00
Chandler Carruth
864b47743f [LCG] Rather than removing nodes from the SCC entry set when we process
them, just skip over any DFS-numbered nodes when finding the next root
of a DFS. This allows the entry set to just be a vector as we populate
it from a uniqued source. It also removes the possibility for a linear
scan of the entry set to actually do the removal which can make things
go quadratic if we get unlucky.

llvm-svn: 207312
2014-04-26 09:45:55 +00:00
Chandler Carruth
4dbb64e3cd [LCG] Rotate the full SCC finding algorithm to avoid round-trips through
the DFS stack for leaves in the call graph. As mentioned in my previous
commit, this is particularly interesting for graphs which have high fan
out but low connectivity resulting in many leaves. For such graphs, this
can remove a large % of the DFS stack traffic even though it doesn't
make the stack much smaller.

It's a bit easier to formulate this for the full algorithm because that
one stops completely for each SCC. For example, I was able to directly
eliminate the "Recurse" boolean used to continue an outer loop from the
inner loop.

llvm-svn: 207311
2014-04-26 09:28:00 +00:00
Chandler Carruth
3a16e3f5fa [LCG] Hoist the main DFS loop out of the edge removal function. This
makes working through the worklist much cleaner, and makes it possible
to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge
difference, but I think this is approaching as polished as I can make
it.

llvm-svn: 207310
2014-04-26 09:06:53 +00:00
Chandler Carruth
87d8609624 [LCG] In the incremental SCC re-formation, lift the node currently being
processed in the DFS out of the stack completely. Keep it exclusively in
a variable. Re-shuffle some code structure to make this easier. This can
have a very dramatic effect in some cases because call graphs tend to
look like a high fan-out spanning tree. As a consequence, there are
a large number of leaf nodes in the graph, and this technique causes
leaf nodes to never even go into the stack. While this only reduces the
max depth by 1, it may cause the total number of round trips through the
stack to drop by a lot.

Now, most of this isn't really relevant for the incremental version. =]
But I wanted to prototype it first here as this variant is in ways more
complex. As long as I can get the code factored well here, I'll next
make the primary walk look the same. There are several refactorings this
exposes I think.

llvm-svn: 207306
2014-04-26 03:36:42 +00:00
Chandler Carruth
04ce1b92d9 [LCG] Special case the removal of self edges. These don't impact the SCC
graph in any way because we don't track edges in the SCC graph, just
nodes. This also lets us add a nice assert about the invariant that
we're working on at least a certain number of nodes within the SCC.

llvm-svn: 207305
2014-04-26 03:36:37 +00:00
Chandler Carruth
0e388582e5 [LCG] Refactor the duplicated code I added in my last commit here into
a helper function. Also factor the other two places where we did the
same thing into the helper function. =] Much cleaner this way. NFC.

llvm-svn: 207300
2014-04-26 01:03:46 +00:00
Duncan P. N. Exon Smith
c54b3a7e23 Revert "blockfreq: Approximate irreducible control flow"
This reverts commit r207286.  It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:

    llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
    llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035

[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio

llvm-svn: 207287
2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith
3189616c35 blockfreq: Approximate irreducible control flow
Previously, irreducible backedges were ignored.  With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.

This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops).  Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers.  All backedges and entries to the irreducible SCC
point to this imaginary block.  This imaginary block has an edge (with
even probability) to each header block.

The result is now reasonable enough that I've added a number of
testcases for irreducible control flow.  I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.

<rdar://problem/14292693>

llvm-svn: 207286
2014-04-25 23:08:57 +00:00
Duncan P. N. Exon Smith
0f4795de58 blockfreq: Further shift logic to LoopData
Move a lot of the loop-related logic that was sprinkled around the code
into `LoopData`.

<rdar://problem/14292693>

llvm-svn: 207258
2014-04-25 18:47:04 +00:00
Duncan P. N. Exon Smith
ea68e6a3d5 SCC: Change clients to use const, NFC
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit.  First, update the users.

<rdar://problem/14292693>

llvm-svn: 207252
2014-04-25 18:24:50 +00:00
Chandler Carruth
f9a129a8ff [LCG] During the incremental update of an SCC, switch to using the
SCCMap to test for nodes that have been re-added to the root SCC rather
than a set vector. We already have done the SCCMap lookup, we juts need
to test it in two different ways. In turn, do most of the processing of
these nodes as they go into the root SCC rather than lazily. This
simplifies the final loop to just stitch the root SCC into its
children's parent sets. No functionlatiy changed.

However, this makes a few things painfully obvious, which was my intent.
=] There is tons of repeated code introduced here and elsewhere. I'm
splitting the refactoring of that code into helpers from this change so
its clear that this is the change which switches the datastructures used
around, and the other is a pure factoring & deduplication of code
change.

llvm-svn: 207217
2014-04-25 09:52:44 +00:00
Chandler Carruth
099d43c4dc [LCG] During the incremental re-build of an SCC after removing an edge,
remove the nodes in the SCC from the SCC map entirely prior to the DFS
walk. This allows the SCC map to represent both the state of
not-yet-re-added-to-an-SCC and added-back-to-this-SCC independently. The
first is being missing from the SCC map, the second is mapping back to
'this'. In a subsequent commit, I'm going to use this property to
simplify the new node list for this SCC.

In theory, I think this also makes the contract for orphaning a node
from the graph slightly less confusing. Now it is also orphaned from the
SCC graph. Still, this isn't quite right either, and so I'm not adding
test cases here. I'll add test cases for the behavior of orphaning nodes
when the code *actually* supports it. The change here is mostly
incidental, my goal is simplifying the algorithm.

llvm-svn: 207213
2014-04-25 09:08:10 +00:00
Chandler Carruth
92048ceb62 [LCG] Rather than doing a linear time SmallSetVector removal of each
child from the worklist, wait until we actually need to pop another
element off of the worklist and skip over any that were already visited
by the DFS. This also enables swapping the nodes of the SCC into the
worklist. No functionality changed.

llvm-svn: 207212
2014-04-25 09:08:05 +00:00
Chandler Carruth
a937a4a8a9 [LCG] Remove a completely unnecessary loop. It wasn't even doing any
thing, just mucking up the code. I feel bad that I even wrote this loop.
Very sorry. The diff is huge because of the indent change, but I promise
all this is doing is realizing that the outer two loops were actually
the exact same loops, and we didn't need two of them.

llvm-svn: 207202
2014-04-25 06:45:06 +00:00
Chandler Carruth
6a4aae8f97 [LCG] Now that the loop structure of the core SCC finding routine is
factored into a more reasonable form, replace the tail call with
a simple outer-loop continuation. It's sad that C++ makes this so
awkward to write, but it seems more direct and clear than the tail call
at this point.

llvm-svn: 207201
2014-04-25 06:38:58 +00:00
Duncan P. N. Exon Smith
6117699d7e blockfreq: Only one mass distribution per node
Remove the concepts of "forward" and "general" mass distributions, which
was wrong.  The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.

<rdar://problem/14292693>

llvm-svn: 207195
2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith
eebda28f4c blockfreq: Document assertion
<rdar://problem/14292693>

llvm-svn: 207194
2014-04-25 04:38:40 +00:00
Duncan P. N. Exon Smith
20240d1036 blockfreq: Document high-level functions
<rdar://problem/14292693>

llvm-svn: 207191
2014-04-25 04:38:32 +00:00
Duncan P. N. Exon Smith
23f427f288 blockfreq: Scale LoopData::Scale on the way down
Rather than scaling loop headers and then scaling all the loop members
by the header frequency, scale `LoopData::Scale` itself, and scale the
loop members by it.  It's much more obvious what's going on this way,
and doesn't cost any extra multiplies.

<rdar://problem/14292693>

llvm-svn: 207189
2014-04-25 04:38:27 +00:00
Duncan P. N. Exon Smith
103f12e173 blockfreq: unwrapLoopPackage() => unwrapLoop()
<rdar://problem/14292693>

llvm-svn: 207188
2014-04-25 04:38:25 +00:00
Duncan P. N. Exon Smith
4f4a0c2fa2 blockfreq: Pass the Loop directly into unwrapLoopPackage()
<rdar://problem/14292693>

llvm-svn: 207187
2014-04-25 04:38:23 +00:00
Duncan P. N. Exon Smith
3fb457a1d2 blockfreq: Unwrap from Loops
When unwrapping loops, just visit the loops rather than all nodes.

<rdar://problem/14292693>

llvm-svn: 207186
2014-04-25 04:38:20 +00:00
Duncan P. N. Exon Smith
20bb8bb185 blockfreq: Separate unwrapLoops() from finalizeMetrics()
<rdar://problem/14292693>

llvm-svn: 207185
2014-04-25 04:38:17 +00:00
Duncan P. N. Exon Smith
bed52c2a81 blockfreq: Expose getPackagedNode()
Make `getPackagedNode()` a member function of
`BlockFrequencyInfoImplBase` so that it's available for templated code.

<rdar://problem/14292693>

llvm-svn: 207183
2014-04-25 04:38:12 +00:00
Duncan P. N. Exon Smith
b9744991af blockfreq: Store the header with the members
<rdar://problem/14292693>

llvm-svn: 207182
2014-04-25 04:38:09 +00:00
Duncan P. N. Exon Smith
2a2a2175c7 blockfreq: Encapsulate LoopData::Header
<rdar://problem/14292693>

llvm-svn: 207181
2014-04-25 04:38:06 +00:00
Duncan P. N. Exon Smith
e670959c2a blockfreq: Use LoopData directly
Instead of passing around loop headers, pass around `LoopData` directly.

<rdar://problem/14292693>

llvm-svn: 207179
2014-04-25 04:38:01 +00:00
Duncan P. N. Exon Smith
0b338fa955 blockfreq: Use a std::list for Loops
As pointed out by David Blaikie in code review, a `std::list<T>` is
simpler than a `std::vector<std::unique_ptr<T>>`.  Another option is a
`std::deque<T>` (which allocates in chunks), but I'd like to leave open
the option of inserting in the middle of the sequence for handling
irreducible control flow on the fly.

<rdar://problem/14292693>

llvm-svn: 207177
2014-04-25 04:30:06 +00:00