Interval. This generalizes the isDefinedOnce mechanism that we used before
to help us coallesce ranges that overlap. As part of this, every logical
range with a different value is assigned a different number in the interval.
For example, for code that looks like this:
0 X = ...
4 X += ...
...
N = X
We now generate a live interval that contains two ranges: [2,6:0),[6,?:1)
reflecting the fact that there are two different values in the range at
different positions in the code.
Currently we are not using this information at all, so this just slows down
liveintervals. In the future, this will change.
Note that this change also substantially refactors the joinIntervalsInMachineBB
method to merge the cases for virt-virt and phys-virt joining into a single
case, adds comments, and makes the code a bit easier to follow.
llvm-svn: 15154
* Fix comment typeo
* add dump() methods
* add a few new methods like getLiveRangeContaining, removeRange & joinable
(which is currently the same as overlaps)
* Remove the unused operator==
Bigger change:
* In LiveInterval, instead of using a boolean isDefinedOnce to keep track of
if there are > 1 definitions in a particular interval, keep a counter,
NumValues to keep track of exactly how many there are.
* In LiveRange, add a new ValId element to indicate which of the numbered
values each LiveRange belongs to. We now no longer merge LiveRanges if
they are of differing value ID's even if they are neighbors.
llvm-svn: 15152
want to insert a new range into the middle of the vector, then delete ranges
one at a time next to the inserted one as they are merged.
Instead, if the inserted interval overlaps, just start merging. The only time
we insert into the middle of the vector is when we don't overlap at all. Also
delete blocks of live ranges if we overlap with many of them.
This patch speeds up joining by .7 seconds on a large testcase, but more
importantly gets all of the range adding code into addRangeFrom.
llvm-svn: 15141
will soon be renamed) into their own file. The new file should not emit
DEBUG output or have other side effects. The LiveInterval class also now
doesn't know whether its working on registers or some other thing.
In the future we will want to use the LiveInterval class and friends to do
stack packing. In addition to a code simplification, this will allow us to
do it more easily.
llvm-svn: 15134
Use an explicit LiveRange class to represent ranges instead of an std::pair.
This is a minor cleanup, but is really intended to make a future patch simpler
and less invasive.
Alkis, could you please take a look at LiveInterval::liveAt? I suspect that
you can add an operator<(unsigned) to LiveRange, allowing us to speed up the
upper_bound call by quite a bit (this would also apply to other callers of
upper/lower_bound). I would do it myself, but I still don't understand that
crazy liveAt function, despite the comment. :)
Basically I would like to see this:
LiveRange dummy(index, index+1);
Ranges::const_iterator r = std::upper_bound(ranges.begin(),
ranges.end(),
dummy);
Turn into:
Ranges::const_iterator r = std::upper_bound(ranges.begin(),
ranges.end(),
index);
llvm-svn: 15130
interfere. Because these intervals have a single definition, and one of them
is a copy instruction, they are always safe to merge even if their lifetimes
interfere. This slightly reduces the amount of spill code, for example on
252.eon, from:
12837 spiller - Number of loads added
7604 spiller - Number of stores added
5842 spiller - Number of register spills
18155 liveintervals - Number of identity moves eliminated after coalescing
to:
12754 spiller - Number of loads added
7585 spiller - Number of stores added
5803 spiller - Number of register spills
18262 liveintervals - Number of identity moves eliminated after coalescing
The much much bigger win would be to merge intervals with multiple definitions
(aka phi nodes) but this is not that day.
llvm-svn: 15124
compilation of gcc:
* Use vectors instead of lists for the intervals sets
* Use a heap for the unhandled set to keep intervals always sorted and
makes insertions back to the heap very fast (compared to scanning a
list)
llvm-svn: 15103
fortunately, they are easy to handle if we know about them. This patch fixes
some serious pessimization of code produced by the linscan register allocator.
llvm-svn: 15092
* vreg <-> vreg joining now works, enable it unconditionally when joining
is enabled (which is the default).
* Fix a serious pessimization of spill code where we were saying that a
spilled DEF operand was live into the subsequent instruction. This allows
for substantially better code when spilling starts to happen.
llvm-svn: 14993
order, causing the inactive list in the linearscan list to get unsorted, which
basically fuxored everything up severely.
These seems to fix the joiner, so with more testing I will enable it by default.
llvm-svn: 14992
Heavily refactor handleVirtualRegisterDef, adding comments and making it more
efficient. It is also much easier to follow and convince ones self that it is
correct :)
Add -debug output to the joine, showing the result of joining the intervals.
llvm-svn: 14989
basic block clear()'s all of the operands lists, including phis. This
caused removePredecessor to get confused later. Because of this, we just
nuke (without prejudice) PHI nodes in unreachable blocks.
llvm-svn: 14635
pass is required to paper over problems in the code generator (primarily
live variables and its clients) which doesn't really have any well defined
semantics for unreachable code.
The proper solution to this problem is to have instruction selectors not
select blocks that are unreachable. Until we have a instruction selection
framework available for use, however, we can't expect all instruction
selector writers to do this. Until then, this pass should be used.
llvm-svn: 14563
instructions. Instead, keep a map of instructions -> MCFI objects in the
already sparc-specific class MachineFunctionInfo. This will slow down the
sparc backend a bit, but it does not penalize the rest of LLVM!
llvm-svn: 14438
The vector may actually be empty if the register that we are marking as
recently used is not actually allocatable. This happens for physical registers
that are not allocatable, like the ST(x) registers on X86.
llvm-svn: 14195
broke obsequi and a lot of other things. It all boiled down to MBB being
overloaded in an inner scope and me confusing it with the one in the outer
scope. Ugh!
llvm-svn: 13517
MBBs start out as #-1. When a MBB is added to a MachineFunction, it
gets the next available unique MBB number. If it is removed from a
MachineFunction, it goes back to being #-1.
llvm-svn: 13514
in the basic block being processed. This fixes PhiElimination on kimwitu++
from taking 105s to taking a much more reasonable 0.6s (in a debug build).
llvm-svn: 13453
than before. Because this is the case, we can compute the first non-phi
instruction once when de-phi'ing a block. This shaves ~4s off of
phielimination of _Z7yyparsev in kimwitu++ from 109s -> 105s. There are
still much more important gains to come.
llvm-svn: 13452
when we see a read of a register. This is important in cases like:
AL = ...
AH = ...
= AX
The read of AX must make both the AL and AH defs live until the use.
llvm-svn: 13444
use MachineBasicBlocks. To do this, we traverse the Machine CFG instead of
the LLVM CFG, which is also *MUCH* more efficient by having fewer levels of
indirections and mappings.
llvm-svn: 13301
(16) into certain areas of the SPARC V9 back-end. I'm fairly sure the US IIIi's
dcache has 32-byte lines, so I'm not sure where the 16 came from. However, in
the interest of not breaking things any more than they already are, I'm going
to leave the constant alone.
llvm-svn: 12043
allocator.
The implementation is completely rewritten and now employs several
optimizations not exercised before. For example for 164.gzip we have
997 loads and 699 stores vs the 1221 loads and 880 stores we have
before.
llvm-svn: 11798
block into MachineBasicBlock::getFirstTerminator().
This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).
llvm-svn: 11748
1. LiveIntervals now implement a 4 slot per instruction model. Load,
Use, Def and a Store slot. This is required in order to correctly
represent caller saved register clobbering on function calls,
register reuse in the same instruction (def resues last use) and
also spill code added later by the allocator. The previous
representation (2 slots per instruction) was insufficient and as a
result was causing subtle bugs.
2. Fixes in spill code generation. This was the major cause of
failures in the test suite.
3. Linear scan now has core support for folding memory operands. This
is untested and not enabled (the live interval update function does
not attempt to fold loads/stores in instructions).
4. Lots of improvements in the debugging output of both live intervals
and linear scan. Give it a try... it is beautiful :-)
In summary the above fixes all the issues with the recent reserved
register elimination changes and get the allocator very close to the
next big step: folding memory operands.
llvm-svn: 11654
that need them. This is very useful on CISCy targets like the X86 because it
reduces the total spill pressure, and makes better use of it's (large)
instruction set. Though the X86 backend doesn't know how to rewrite many
instructions yet, this already makes a substantial difference on 176.gcc for
example:
Before:
Time:
8.0099 ( 31.2%) 0.0100 ( 12.5%) 8.0199 ( 31.2%) 7.7186 ( 30.0%) Local Register Allocator
Code quality:
734559 asm-printer - Number of machine instrs printed
111395 ra-local - Number of registers reloaded
79902 ra-local - Number of registers spilled
231554 x86-peephole - Number of peephole optimization performed
After:
Time:
7.8700 ( 30.6%) 0.0099 ( 19.9%) 7.8800 ( 30.6%) 7.7892 ( 30.2%) Local Register Allocator
Code quality:
733083 asm-printer - Number of machine instrs printed
2379 ra-local - Number of reloads fused into instructions
109046 ra-local - Number of registers reloaded
79881 ra-local - Number of registers spilled
230658 x86-peephole - Number of peephole optimization performed
So by fusing 2300 instructions, we reduced the static number of instructions
by 1500, and reduces the number of peepholes (and thus the work) by about 900.
This also clearly reduces the number of reload/spill instructions that are
emitted.
llvm-svn: 11542
MRegisterInfo::getNumRegs() instead of
MRegisterInfo::FirstVirtualRegister.
Also use MRegisterInfo::is{Physical,Virtual}Register where
appropriate.
llvm-svn: 11477
Rename SetMachineOperandConst's formal parameters to match other methods here.
Mark some methods as being used only by the SPARC back-end.
Fix a missing-paren bug in OutputValue().
llvm-svn: 11363
MachineBasicBlock. Also change opcode to a short and numImplicitRefs
to an unsigned char so that overall MachineInstr's size stays the
same.
llvm-svn: 11357
ilist of MachineInstr objects. This allows constant time removal and
insertion of MachineInstr instances from anywhere in each
MachineBasicBlock. It also allows for constant time splicing of
MachineInstrs into or out of MachineBasicBlocks.
llvm-svn: 11340
instead of randomly groping about inside its outEdges array.
Make SchedGraph::addDummyEdges() use getNumOutEdges() instead of
outEdges.size().
Get rid of ifdefed-out code in SchedGraph::buildGraph().
llvm-svn: 11238
the Virt2PhysRegMap std::map with an std::vector. This speeds up the
register allocator another (almost) 40%, from .72->.45s in a release build
of LLC on 253.perlbmk.
llvm-svn: 11219
from physical registers, and they are always dense, it makes sense to not have
a ton of RBtree overhead. This change speeds up regalloclocal about ~30% on
253.perlbmk, from .35s -> .27s in the JIT (in LLC, it goes from .74 -> .55).
Now live variable analysis is the slowest codegen pass. Of course it doesn't
help that we have to run it twice, because regalloclocal doesn't update it,
but even if it did it would be the slowest pass (now it's just the 2x slowest
pass :(
llvm-svn: 11215
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
llvm-svn: 11141