1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00
Commit Graph

83 Commits

Author SHA1 Message Date
Chandler Carruth
0fe0191603 [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind
a flag for now.

First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.

The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:

  if (a1()) { if (b1()) c1(); else d1(); f1(); }
  if (a2()) { if (b2()) c2(); else d2(); f2(); }
  done();

Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:

  a1, a2, done;

And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:

  a1, a2, done; b1, c1;

At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.

... wait for it ...

Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!

  a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;

So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.

It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.

Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.

llvm-svn: 231238
2015-03-04 12:18:08 +00:00
Daniel Jasper
3a2dd1b264 Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.

LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.

Review: http://reviews.llvm.org/D7719
llvm-svn: 231230
2015-03-04 11:05:34 +00:00
Daniel Jasper
1a2370862a NFC: Use range-based for loops and more consistent naming.
No functional changes intended.

(I plan on doing some modifications to this function and would like to
have as few unrelated changes as possible in the patch)

llvm-svn: 229649
2015-02-18 08:19:16 +00:00
Daniel Jasper
3f53d83cd0 Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

llvm-svn: 229648
2015-02-18 08:18:07 +00:00
Duncan P. N. Exon Smith
33685ffe5d CodeGen: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

Also, add `Function::getFnStackAlignment()`, and canonicalize:

getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()

llvm-svn: 229208
2015-02-14 01:44:41 +00:00
Chandler Carruth
ba862f13cb [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.

A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.

In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.

Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.

Differential Revision: http://reviews.llvm.org/D6969

llvm-svn: 226034
2015-01-14 20:19:29 +00:00
Hal Finkel
2aea11b438 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

llvm-svn: 225117
2015-01-03 17:58:24 +00:00
David Blaikie
60e6c80905 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Eric Christopher
67c04e77e5 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Alp Toker
97022b0c1f Revert "Introduce a string_ostream string builder facilty"
Temporarily back out commits r211749, r211752 and r211754.

llvm-svn: 211814
2014-06-26 22:52:05 +00:00
Alp Toker
fd9ead3b6f Introduce a string_ostream string builder facilty
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.

small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.

This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.

The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.

llvm-svn: 211749
2014-06-26 00:00:48 +00:00
Chandler Carruth
2361db41db [Modules] Remove potential ODR violations by sinking the DEBUG_TYPE
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.

Other sub-trees will follow.

llvm-svn: 206837
2014-04-22 02:02:50 +00:00
Craig Topper
30281a67fb [C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr.
llvm-svn: 206142
2014-04-14 00:51:57 +00:00
Paul Robinson
266d563df6 Disable each MachineFunctionPass for 'optnone' functions, unless that
pass normally runs at optimization level None, or is part of the
register allocation pipeline.

llvm-svn: 205228
2014-03-31 17:43:35 +00:00
Craig Topper
b3cfc7916b [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203220
2014-03-07 09:26:03 +00:00
Benjamin Kramer
e4eb1b495f [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Benjamin Kramer
803ba41365 Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Nico Weber
7e53ec0698 Add a LLVM_DUMP_METHOD macro.
The motivation is to mark dump methods as used in debug builds so that they can
be called from lldb, but to not do so in release builds so that they can be
dead-stripped.

There's lots of potential follow-up work suggested in the thread
"Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
but everyone seems to agreen on this subset.

Macro name chosen by fair coin toss.

llvm-svn: 198456
2014-01-03 22:53:37 +00:00
Michael Gottesman
4149bc8110 [block-freq] Update MachineBlockPlacement and RegAllocGreedy to use the new MachineBlockFrequencyInfo methods.
llvm-svn: 197290
2013-12-14 00:25:45 +00:00
Matt Arsenault
ceacb41269 Fix gcc warnings.
Unused variable and unused typedef in release build.

llvm-svn: 196947
2013-12-10 18:55:37 +00:00
Matt Arsenault
f4228e2204 Revert part of GCC warning fix to fix debug build.
The typedef is used inside the DEBUG(), and apparently can't be moved
inside of it.

llvm-svn: 196528
2013-12-05 20:02:18 +00:00
Matt Arsenault
57be52c610 Fix minor GCC warnings.
Unused typedefs and unused variables.

llvm-svn: 196526
2013-12-05 19:37:36 +00:00
Chandler Carruth
0e71e8676c Output a bit more information in the debug printing for MBP. This was
useful when analyzing parts of zlib's behavior here.

llvm-svn: 195588
2013-11-25 00:43:41 +00:00
Benjamin Kramer
40f6475264 MachineBlockPlacement: Strengthen the source order bias when picking an exit block.
We now only allow breaking source order if the exit block frequency is
significantly higher than the other exit block. The actual bias is
currently under a flag so the best cut-off can be found; the flag
defaults to the old behavior. The idea is to get some benchmark coverage
over different values for the flag and pick the best one.

When we require the new frequency to be at least 20% higher than the old
frequency I see a 5% speedup on zlib's deflate when compressing a random
file on x86_64/westmere. Hal reported a small speedup on Fhourstones on
a BG/Q and no regressions in the test suite.

The test case is the full long_match function from zlib's deflate. I was
reluctant to add it for previous tweaks to branch probabilities because
it's large and potentially fragile, but changed my mind since it's an
important use case and more likely to break with all the current work
going into the PGO infrastructure.

Differential Revision: http://llvm-reviews.chandlerc.com/D2202

llvm-svn: 195265
2013-11-20 19:08:44 +00:00
Shuxin Yang
ea044d82e0 Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by about 30%
(4.58s vs 3.2s on an oldish Mac Tower). 

  The corresponding src is excerpted bellow. The lopp accounts for about 90% of execution time.
  --------------------
    cat -n test-suite/MultiSource/Benchmarks/Olden/em3d/make_graph.c
     90 
     91         for (k=0; k<j; k++)
     92           if (other_node == cur_node->to_nodes[k]) break;

  The defective layout is sketched bellow, where the two branches need to swap.
  ------------------------------------------------------------------------
      L:
         ...
      if (cond) goto out-of-loop
      goto L

  While this code sequence is defective, I don't understand why it incurs 1/3 of 
execution time. CPU-event-profiling indicates the poor laoyout dose not increase
in br-misprediction; it dosen't increase stall cycle at all, and it dosen't 
prevent the CPU detect the loop (i.e. Loop-Stream-Detector seems to be working fine
as well)... 

   The root cause of the problem is that the layout pass calls AnalyzeBranch() 
with basic-block which is not updated to reflect its current layout.

rdar://13966341

llvm-svn: 183174
2013-06-04 01:00:57 +00:00
Nadav Rotem
9cfc1da8c2 Don't disable block layout when forcing block alignment.
llvm-svn: 179355
2013-04-12 01:24:16 +00:00
Nadav Rotem
662256bafa Add a flag to align all basic blocks in the function.
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem.  When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.

llvm-svn: 179353
2013-04-12 00:48:32 +00:00
Nadav Rotem
6d5518d53d Fix a typo
llvm-svn: 178346
2013-03-29 16:34:23 +00:00
Benjamin Kramer
7097792a88 Split TargetLowering into a CodeGen and a SelectionDAG part.
This fixes some of the cycles between libCodeGen and libSelectionDAG. It's still
a complete mess but as long as the edges consist of virtual call it doesn't
cause breakage. BasicTTI did static calls and thus broke some build
configurations.

llvm-svn: 172246
2013-01-11 20:05:37 +00:00
Bill Wendling
e0920e4122 Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Bill Wendling
56d9c4b832 Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future.
llvm-svn: 170502
2012-12-19 07:18:57 +00:00
Chandler Carruth
a490793037 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Bill Wendling
b53357de39 Create enums for the different attributes.
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488
2012-10-09 07:45:08 +00:00
Bill Wendling
92f3ab845d Remove the `hasFnAttr' method from Function.
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725
2012-09-26 21:48:26 +00:00
Duncan Sands
f1fad3ee2e Remove silly dead store. Patch by Ettl Martin.
llvm-svn: 163882
2012-09-14 09:00:11 +00:00
Chandler Carruth
49d4e3f282 Add a much more conservative strategy for aligning branch targets.
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.

As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.

Fixes PR13265.

llvm-svn: 161409
2012-08-07 09:45:24 +00:00
Manman Ren
3769ac64a6 Reverse order of the two branches at end of a basic block if it is profitable.
We branch to the successor with higher edge weight first.
Convert from
     je    LBB4_8  --> to outer loop
     jmp   LBB4_14 --> to inner loop
to
     jne   LBB4_14
     jmp   LBB4_8

PR12750
rdar: 11393714

llvm-svn: 161018
2012-07-31 01:11:07 +00:00
Chandler Carruth
872747b0f0 Update a bunch of stale comments that dated from when this folled the
very first (and worst) placement algorithm. These should now more
accurately reflect the reality of the pass.

llvm-svn: 159185
2012-06-26 05:16:37 +00:00
Benjamin Kramer
bb30e1face Fix typos found by http://github.com/lyda/misspell-check
llvm-svn: 157885
2012-06-02 10:20:22 +00:00
Chandler Carruth
fbb6219d5b Add a somewhat hacky heuristic to do something different from whole-loop
rotation. When there is a loop backedge which is an unconditional
branch, we will end up with a branch somewhere no matter what. Try
placing this backedge in a fallthrough position above the loop header as
that will definitely remove at least one branch from the loop iteration,
where whole loop rotation may not.

I haven't seen any benchmarks where this is important but loop-blocks.ll
tests for it, and so this will be covered when I flip the default.

llvm-svn: 154812
2012-04-16 13:33:36 +00:00
Chandler Carruth
33b200ad13 Tweak the loop rotation logic to check whether the loop is naturally
laid out in a form with a fallthrough into the header and a fallthrough
out of the bottom. In that case, leave the loop alone because any
rotation will introduce unnecessary branches. If either side looks like
it will require an explicit branch, then the rotation won't add any, do
it to ensure the branch occurs outside of the loop (if possible) and
maximize the benefit of the fallthrough in the bottom.

llvm-svn: 154806
2012-04-16 09:31:23 +00:00
Chandler Carruth
fc5ab5d388 Rewrite how machine block placement handles loop rotation.
This is a complex change that resulted from a great deal of
experimentation with several different benchmarks. The one which proved
the most useful is included as a test case, but I don't know that it
captures all of the relevant changes, as I didn't have specific
regression tests for each, they were more the result of reasoning about
what the old algorithm would possibly do wrong. I'm also failing at the
moment to craft more targeted regression tests for these changes, if
anyone has ideas, it would be welcome.

The first big thing broken with the old algorithm is the idea that we
can take a basic block which has a loop-exiting successor and a looping
successor and use the looping successor as the layout top in order to
get that particular block to be the bottom of the loop after layout.
This happens to work in many cases, but not in all.

The second big thing broken was that we didn't try to select the exit
which fell into the nearest enclosing loop (to which we exit at all). As
a consequence, even if the rotation worked perfectly, it would result in
one of two bad layouts. Either the bottom of the loop would get
fallthrough, skipping across a nearer enclosing loop and thereby making
it discontiguous, or it would be forced to take an explicit jump over
the nearest enclosing loop to earch its successor. The point of the
rotation is to get fallthrough, so we need it to fallthrough to the
nearest loop it can.

The fix to the first issue is to actually layout the loop from the loop
header, and then rotate the loop such that the correct exiting edge can
be a fallthrough edge. This is actually much easier than I anticipated
because we can handle all the hard parts of finding a viable rotation
before we do the layout. We just store that, and then rotate after
layout is finished. No inner loops get split across the post-rotation
backedge because we check for them when selecting the rotation.

That fix exposed a latent problem with our exitting block selection --
we should allow the backedge to point into the middle of some inner-loop
chain as there is no real penalty to it, the whole point is that it
*won't* be a fallthrough edge. This may have blocked the rotation at all
in some cases, I have no idea and no test case as I've never seen it in
practice, it was just noticed by inspection.

Finally, all of these fixes, and studying the loops they produce,
highlighted another problem: in rotating loops like this, we sometimes
fail to align the destination of these backwards jumping edges. Fix this
by actually walking the backwards edges rather than relying on loopinfo.

This fixes regressions on heapsort if block placement is enabled as well
as lots of other cases where the previous logic would introduce an
abundance of unnecessary branches into the execution.

llvm-svn: 154783
2012-04-16 01:12:56 +00:00
Chandler Carruth
3c9796d9b0 Make a somewhat subtle change in the logic of block placement. Sometimes
the loop header has a non-loop predecessor which has been pre-fused into
its chain due to unanalyzable branches. In this case, rotating the
header into the body of the loop in order to place a loop exit at the
bottom of the loop is a Very Bad Idea as it makes the loop
non-contiguous.

I'm working on a good test case for this, but it's a bit annoynig to
craft. I should get one shortly, but I'm submitting this now so I can
begin the (lengthy) performance analysis process. An initial run of LNT
looks really, really good, but there is too much noise there for me to
trust it much.

llvm-svn: 154395
2012-04-10 13:35:57 +00:00
Chandler Carruth
5ec9b9fd94 Remove an over zealous assert. The assert was trying to catch places
where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.

No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.

This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.

llvm-svn: 154287
2012-04-08 14:37:02 +00:00
Chandler Carruth
2fe7a17703 Add a debug-only 'dump' method to the BlockChain structure to ease
debugging.

llvm-svn: 154286
2012-04-08 14:37:01 +00:00
Andrew Trick
b9d2e9e81d Codegen pass definition cleanup. No functionality.
Moving toward a uniform style of pass definition to allow easier target configuration.
Globally declare Pass ID.
Globally declare pass initializer.
Use INITIALIZE_PASS consistently.
Add a call to the initializer from CodeGen.cpp.
Remove redundant "createPass" functions and "getPassName" methods.

While cleaning up declarations, cleaned up comments (sorry for large diff).

llvm-svn: 150100
2012-02-08 21:23:13 +00:00
Jakub Staszak
4226104d8a Revert patch from 147090. There is not point to make code less readable if we
don't get any serious benefit there.

llvm-svn: 147101
2011-12-21 23:02:08 +00:00
Jakub Staszak
3a3ffa2d1e - Change a few operator[] to lookup which is cheaper.
- Add some constantness.

llvm-svn: 147090
2011-12-21 20:18:54 +00:00
Jakub Staszak
a8a18f2cf5 Remove unneeded semicolon.
Skip two looking up at BlockChain.

llvm-svn: 146053
2011-12-07 19:46:10 +00:00