1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00
Commit Graph

4883 Commits

Author SHA1 Message Date
Chandler Carruth
bd97884116 [LCG] Add the really, *really* boring edge insertion case: adding an
edge entirely within an existing SCC. Shockingly, making the connected
component more connected is ... a total snooze fest. =]

Anyways, its wired up, and I even added a test case to make sure it
pretty much sorta works. =D

llvm-svn: 207631
2014-04-30 10:48:36 +00:00
Chandler Carruth
aa6122effe [LCG] Actually test the *basic* edge removal bits (IE, the non-SCC
bits), and discover that it's totally broken. Yay tests. Boo bug. Fix
the basic edge removal so that it works by nulling out the removed edges
rather than actually removing them. This leaves the indices valid in the
map from callee to index, and preserves some of the locality for
iterating over edges. The iterator is made bidirectional to reflect that
it now has to skip over null entries, and the skipping logic is layered
onto it.

As future work, I would like to track essentially the "load factor" of
the edge list, and when it falls below a threshold do a compaction.

An alternative I considered (and continue to consider) is storing the
callees in a doubly linked list where each element of the list is in
a set (which is essentially the classical linked-hash-table
datastructure). The problem with that approach is that either you need
to heap allocate the linked list nodes and use pointers to them, or use
a bucket hash table (with even *more* linked list pointer overhead!),
etc. It's pretty easy to get 5x overhead for values that are just
pointers. So far, I think punching holes in the vector, and periodic
compaction is likely to be much more efficient overall in the space/time
tradeoff.

llvm-svn: 207619
2014-04-30 07:45:27 +00:00
Benjamin Kramer
4f8fb8ff6c raw_ostream: Forward declare OpenFlags and include FileSystem.h only where necessary.
llvm-svn: 207593
2014-04-29 23:26:49 +00:00
Duncan P. N. Exon Smith
705fc7169e blockfreq: Defer to BranchProbability::scale()
`BlockMass` can now defer to `BranchProbability::scale()`.

llvm-svn: 207547
2014-04-29 16:20:05 +00:00
Duncan P. N. Exon Smith
583ed8f3b0 blockfreq: Remove more extra typenames from r207438
llvm-svn: 207440
2014-04-28 20:22:29 +00:00
Duncan P. N. Exon Smith
2eaef1aa01 Reapply "blockfreq: Approximate irreducible control flow"
This reverts commit r207287, reapplying r207286.

I'm hoping that declaring an explicit struct and instantiating
`addBlockEdges()` directly works around the GCC crash from r207286.
This is a lot more boilerplate, though.

llvm-svn: 207438
2014-04-28 20:02:29 +00:00
Chandler Carruth
08eb8582cd [LCG] Add the most basic of edge insertion to the lazy call graph. This
just handles the pre-DFS case. Also add some test cases for this case to
make sure it works.

llvm-svn: 207411
2014-04-28 11:10:23 +00:00
Chandler Carruth
4098580cb2 [LCG] Make the return of the IntraSCC removal method actually match its
contract (and be much more useful). It now provides exactly the
post-order traversal a caller might need to perform on newly formed
SCCs.

llvm-svn: 207410
2014-04-28 10:49:06 +00:00
Chandler Carruth
02b3960e8a [inliner] Significantly improve the compile time in cases like PR19499
by avoiding inlining massive switches merely because they have no
instructions in them. These switches still show up where we fail to form
lookup tables, and in those cases they are actually going to cause
a very significant code size hit anyways, so inlining them is not the
right call. The right way to fix any performance regressions stemming
from this is to enhance the switch-to-lookup-table logic to fire in more
places.

This makes PR19499 about 5x less bad. It uncovers a second compile time
problem in that test case that is unrelated (surprisingly!).

llvm-svn: 207403
2014-04-28 08:52:44 +00:00
Craig Topper
b663bffa27 [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Chandler Carruth
1b5573df25 [LCG] Re-organize the methods for mutating a call graph to make their
API requirements much more obvious.

The key here is that there are two totally different use cases for
mutating the graph. Prior to doing any SCC formation, it is very easy to
mutate the graph. There may be users that want to do small tweaks here,
and then use the already-built graph for their SCC-based operations.
This method remains on the graph itself and is documented carefully as
being cheap but unavailable once SCCs are formed.

Once SCCs are formed, and there is some in-flight DFS building them, we
have to be much more careful in how we mutate the graph. These mutation
operations are sunk onto the SCCs themselves, which both simplifies
things (the code was already there!) and helps make it obvious that
these interfaces are only applicable within that context. The other
primary constraint is that the edge being mutated is actually related to
the SCC on which we call the method. This helps make it obvious that you
cannot arbitrarily mutate some other SCC.

I've tried to write much more complete documentation for the interesting
mutation API -- intra-SCC edge removal. Currently one aspect of this
documentation is a lie (the result list of SCCs) but we also don't even
have tests for that API. =[ I'm going to add tests and fix it to match
the documentation next.

llvm-svn: 207339
2014-04-27 01:59:50 +00:00
Chandler Carruth
864b47743f [LCG] Rather than removing nodes from the SCC entry set when we process
them, just skip over any DFS-numbered nodes when finding the next root
of a DFS. This allows the entry set to just be a vector as we populate
it from a uniqued source. It also removes the possibility for a linear
scan of the entry set to actually do the removal which can make things
go quadratic if we get unlucky.

llvm-svn: 207312
2014-04-26 09:45:55 +00:00
Chandler Carruth
4dbb64e3cd [LCG] Rotate the full SCC finding algorithm to avoid round-trips through
the DFS stack for leaves in the call graph. As mentioned in my previous
commit, this is particularly interesting for graphs which have high fan
out but low connectivity resulting in many leaves. For such graphs, this
can remove a large % of the DFS stack traffic even though it doesn't
make the stack much smaller.

It's a bit easier to formulate this for the full algorithm because that
one stops completely for each SCC. For example, I was able to directly
eliminate the "Recurse" boolean used to continue an outer loop from the
inner loop.

llvm-svn: 207311
2014-04-26 09:28:00 +00:00
Chandler Carruth
3a16e3f5fa [LCG] Hoist the main DFS loop out of the edge removal function. This
makes working through the worklist much cleaner, and makes it possible
to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge
difference, but I think this is approaching as polished as I can make
it.

llvm-svn: 207310
2014-04-26 09:06:53 +00:00
Chandler Carruth
87d8609624 [LCG] In the incremental SCC re-formation, lift the node currently being
processed in the DFS out of the stack completely. Keep it exclusively in
a variable. Re-shuffle some code structure to make this easier. This can
have a very dramatic effect in some cases because call graphs tend to
look like a high fan-out spanning tree. As a consequence, there are
a large number of leaf nodes in the graph, and this technique causes
leaf nodes to never even go into the stack. While this only reduces the
max depth by 1, it may cause the total number of round trips through the
stack to drop by a lot.

Now, most of this isn't really relevant for the incremental version. =]
But I wanted to prototype it first here as this variant is in ways more
complex. As long as I can get the code factored well here, I'll next
make the primary walk look the same. There are several refactorings this
exposes I think.

llvm-svn: 207306
2014-04-26 03:36:42 +00:00
Chandler Carruth
04ce1b92d9 [LCG] Special case the removal of self edges. These don't impact the SCC
graph in any way because we don't track edges in the SCC graph, just
nodes. This also lets us add a nice assert about the invariant that
we're working on at least a certain number of nodes within the SCC.

llvm-svn: 207305
2014-04-26 03:36:37 +00:00
Chandler Carruth
0e388582e5 [LCG] Refactor the duplicated code I added in my last commit here into
a helper function. Also factor the other two places where we did the
same thing into the helper function. =] Much cleaner this way. NFC.

llvm-svn: 207300
2014-04-26 01:03:46 +00:00
Duncan P. N. Exon Smith
c54b3a7e23 Revert "blockfreq: Approximate irreducible control flow"
This reverts commit r207286.  It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:

    llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
    llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035

[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio

llvm-svn: 207287
2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith
3189616c35 blockfreq: Approximate irreducible control flow
Previously, irreducible backedges were ignored.  With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.

This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops).  Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers.  All backedges and entries to the irreducible SCC
point to this imaginary block.  This imaginary block has an edge (with
even probability) to each header block.

The result is now reasonable enough that I've added a number of
testcases for irreducible control flow.  I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.

<rdar://problem/14292693>

llvm-svn: 207286
2014-04-25 23:08:57 +00:00
Duncan P. N. Exon Smith
0f4795de58 blockfreq: Further shift logic to LoopData
Move a lot of the loop-related logic that was sprinkled around the code
into `LoopData`.

<rdar://problem/14292693>

llvm-svn: 207258
2014-04-25 18:47:04 +00:00
Duncan P. N. Exon Smith
ea68e6a3d5 SCC: Change clients to use const, NFC
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit.  First, update the users.

<rdar://problem/14292693>

llvm-svn: 207252
2014-04-25 18:24:50 +00:00
Chandler Carruth
f9a129a8ff [LCG] During the incremental update of an SCC, switch to using the
SCCMap to test for nodes that have been re-added to the root SCC rather
than a set vector. We already have done the SCCMap lookup, we juts need
to test it in two different ways. In turn, do most of the processing of
these nodes as they go into the root SCC rather than lazily. This
simplifies the final loop to just stitch the root SCC into its
children's parent sets. No functionlatiy changed.

However, this makes a few things painfully obvious, which was my intent.
=] There is tons of repeated code introduced here and elsewhere. I'm
splitting the refactoring of that code into helpers from this change so
its clear that this is the change which switches the datastructures used
around, and the other is a pure factoring & deduplication of code
change.

llvm-svn: 207217
2014-04-25 09:52:44 +00:00
Chandler Carruth
099d43c4dc [LCG] During the incremental re-build of an SCC after removing an edge,
remove the nodes in the SCC from the SCC map entirely prior to the DFS
walk. This allows the SCC map to represent both the state of
not-yet-re-added-to-an-SCC and added-back-to-this-SCC independently. The
first is being missing from the SCC map, the second is mapping back to
'this'. In a subsequent commit, I'm going to use this property to
simplify the new node list for this SCC.

In theory, I think this also makes the contract for orphaning a node
from the graph slightly less confusing. Now it is also orphaned from the
SCC graph. Still, this isn't quite right either, and so I'm not adding
test cases here. I'll add test cases for the behavior of orphaning nodes
when the code *actually* supports it. The change here is mostly
incidental, my goal is simplifying the algorithm.

llvm-svn: 207213
2014-04-25 09:08:10 +00:00
Chandler Carruth
92048ceb62 [LCG] Rather than doing a linear time SmallSetVector removal of each
child from the worklist, wait until we actually need to pop another
element off of the worklist and skip over any that were already visited
by the DFS. This also enables swapping the nodes of the SCC into the
worklist. No functionality changed.

llvm-svn: 207212
2014-04-25 09:08:05 +00:00
Chandler Carruth
a937a4a8a9 [LCG] Remove a completely unnecessary loop. It wasn't even doing any
thing, just mucking up the code. I feel bad that I even wrote this loop.
Very sorry. The diff is huge because of the indent change, but I promise
all this is doing is realizing that the outer two loops were actually
the exact same loops, and we didn't need two of them.

llvm-svn: 207202
2014-04-25 06:45:06 +00:00
Chandler Carruth
6a4aae8f97 [LCG] Now that the loop structure of the core SCC finding routine is
factored into a more reasonable form, replace the tail call with
a simple outer-loop continuation. It's sad that C++ makes this so
awkward to write, but it seems more direct and clear than the tail call
at this point.

llvm-svn: 207201
2014-04-25 06:38:58 +00:00
Duncan P. N. Exon Smith
6117699d7e blockfreq: Only one mass distribution per node
Remove the concepts of "forward" and "general" mass distributions, which
was wrong.  The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.

<rdar://problem/14292693>

llvm-svn: 207195
2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith
eebda28f4c blockfreq: Document assertion
<rdar://problem/14292693>

llvm-svn: 207194
2014-04-25 04:38:40 +00:00
Duncan P. N. Exon Smith
20240d1036 blockfreq: Document high-level functions
<rdar://problem/14292693>

llvm-svn: 207191
2014-04-25 04:38:32 +00:00
Duncan P. N. Exon Smith
23f427f288 blockfreq: Scale LoopData::Scale on the way down
Rather than scaling loop headers and then scaling all the loop members
by the header frequency, scale `LoopData::Scale` itself, and scale the
loop members by it.  It's much more obvious what's going on this way,
and doesn't cost any extra multiplies.

<rdar://problem/14292693>

llvm-svn: 207189
2014-04-25 04:38:27 +00:00
Duncan P. N. Exon Smith
103f12e173 blockfreq: unwrapLoopPackage() => unwrapLoop()
<rdar://problem/14292693>

llvm-svn: 207188
2014-04-25 04:38:25 +00:00
Duncan P. N. Exon Smith
4f4a0c2fa2 blockfreq: Pass the Loop directly into unwrapLoopPackage()
<rdar://problem/14292693>

llvm-svn: 207187
2014-04-25 04:38:23 +00:00
Duncan P. N. Exon Smith
3fb457a1d2 blockfreq: Unwrap from Loops
When unwrapping loops, just visit the loops rather than all nodes.

<rdar://problem/14292693>

llvm-svn: 207186
2014-04-25 04:38:20 +00:00
Duncan P. N. Exon Smith
20bb8bb185 blockfreq: Separate unwrapLoops() from finalizeMetrics()
<rdar://problem/14292693>

llvm-svn: 207185
2014-04-25 04:38:17 +00:00
Duncan P. N. Exon Smith
bed52c2a81 blockfreq: Expose getPackagedNode()
Make `getPackagedNode()` a member function of
`BlockFrequencyInfoImplBase` so that it's available for templated code.

<rdar://problem/14292693>

llvm-svn: 207183
2014-04-25 04:38:12 +00:00
Duncan P. N. Exon Smith
b9744991af blockfreq: Store the header with the members
<rdar://problem/14292693>

llvm-svn: 207182
2014-04-25 04:38:09 +00:00
Duncan P. N. Exon Smith
2a2a2175c7 blockfreq: Encapsulate LoopData::Header
<rdar://problem/14292693>

llvm-svn: 207181
2014-04-25 04:38:06 +00:00
Duncan P. N. Exon Smith
e670959c2a blockfreq: Use LoopData directly
Instead of passing around loop headers, pass around `LoopData` directly.

<rdar://problem/14292693>

llvm-svn: 207179
2014-04-25 04:38:01 +00:00
Duncan P. N. Exon Smith
0b338fa955 blockfreq: Use a std::list for Loops
As pointed out by David Blaikie in code review, a `std::list<T>` is
simpler than a `std::vector<std::unique_ptr<T>>`.  Another option is a
`std::deque<T>` (which allocates in chunks), but I'd like to leave open
the option of inserting in the middle of the sequence for handling
irreducible control flow on the fly.

<rdar://problem/14292693>

llvm-svn: 207177
2014-04-25 04:30:06 +00:00
Chandler Carruth
59e03f926f [LCG] Switch a weird do/while loop that actually couldn't fail its
condition into an obviously infinite loop with an assert about the
degenerate condition. No functionality changed.

llvm-svn: 207147
2014-04-24 21:19:30 +00:00
Chandler Carruth
9e4513f082 [LCG] Incorporate the core trick of improvements on the naive Tarjan's
algorithm here: http://dl.acm.org/citation.cfm?id=177301.

The idea of isolating the roots has even more relevance when using the
stack not just to implement the DFS but also to implement the recursive
step. Because we use it for the recursive step, to isolate the roots we
need to maintain two stacks: one for our recursive DFS walk, and another
of the nodes that have been walked. The nice thing is that the latter
will be half the size. It also fixes a complete hack where we scanned
backwards over the stack to find the next potential-root to continue
processing. Now that is always the top of the DFS stack.

While this is a really nice improvement already (IMO) it further opens
the door for two important simplifications:

1) De-duplicating some of the code across the two different walks. I've
   actually made the duplication a bit worse in some senses with this
   patch because the two are starting to converge.
2) Dramatically simplifying the loop structures of both walks.

I wanted to do those separately as they'll be essentially *just* CFG
restructuring. This patch on the other hand actually uses different
datastructures to implement the algorithm itself.

llvm-svn: 207098
2014-04-24 11:05:20 +00:00
Chandler Carruth
9e16f14789 [LCG] Rotate logic applied to the top of the DFSStack to instead be
applied prior to pushing a node onto the DFSStack. This is the first
step toward avoiding the stack entirely for leaf nodes. It also
simplifies things a bit and I think is pointing the way toward factoring
some more of the shared logic out of the two implementations.

It is also making it more obvious how to restructure the loops
themselves to be a bit easier to read (although no different in terms of
functionality).

llvm-svn: 207095
2014-04-24 09:59:59 +00:00
Chandler Carruth
cd39f4c2e6 [LCG] Switch the parent SCC tracking from a SmallSetVector to
a SmallPtrSet. Currently, there is no need for stable iteration in this
dimension, and I now thing there won't need to be going forward.

If this is ever re-introduced in any form, it needs to not be
a SetVector based solution because removal cannot be linear. There will
be many SCCs with large numbers of parents. When encountering these, the
incremental SCC update for intra-SCC edge removal was quadratic due to
linear removal (kind of).

I'm really hoping we can avoid having an ordering property here at all
though...

llvm-svn: 207091
2014-04-24 09:22:31 +00:00
Chandler Carruth
ccccef94ac [LCG] We don't actually need a set in each SCC to track the nodes. We
can use the node -> SCC mapping in the top-level graph to test this on
the rare occasions we need it.

llvm-svn: 207090
2014-04-24 08:55:36 +00:00
Craig Topper
c7c3a99ec2 [C++] Use 'nullptr'.
llvm-svn: 207083
2014-04-24 06:44:33 +00:00
Chandler Carruth
a18f590cc4 [LCG] Normalize the post-order SCC iterator to just iterate over the SCC
values rather than having pointers in weird places.

llvm-svn: 207053
2014-04-23 23:51:07 +00:00
Chandler Carruth
1d124691ed [LCG] Switch the primary node iterator to be a *much* more normal C++
iterator, returning a Node by reference on dereference.

llvm-svn: 207048
2014-04-23 23:34:48 +00:00
Chandler Carruth
18f0202abb [LCG] Make the insertion and query paths into the LCG which cannot fail
return references to better model this property.

No functionality changed.

llvm-svn: 207047
2014-04-23 23:20:36 +00:00
Chandler Carruth
ddc1da4ac6 [LCG] Switch the SCC lookup to be in terms of call graph nodes rather
than functions. So far, this access pattern is *much* more common. It
seems likely that any user of this interface is going to have nodes at
the point that they are querying the SCCs.

No functionality changed.

llvm-svn: 207045
2014-04-23 23:12:06 +00:00
Chandler Carruth
e064af9075 [LCG] Switch the primary SCC building code to use the negative low-link
values rather than an expensive dense map query to test whether children
have already been popped into an SCC. This matches the incremental SCC
building code. I've also included the assert that I put there but
updated both of their text.

No functionality changed here.

I still don't have any great ideas for sharing the code between the two
implementations, but I may try a brute-force approach to factoring it at
some point.

llvm-svn: 207042
2014-04-23 22:28:13 +00:00