number of copies, potentially defining live ranges that appear to have
differing value numbers that become identical when coallsced. Among other
things, this fixes CodeGen/X86/shift-coalesce.ll and PR687.
llvm-svn: 29968
paves the way for future changes, increases coallescing opportunities (in
theory, not witnessed in practice), and eliminates the really expensive
LiveIntervals::overlapsAliases method.
llvm-svn: 29890
instructions which define each value#) to simplify and improve the coallescer.
In particular, this patch:
1. Implements iterative coallescing.
2. Reverts an unsafe hack from handlePhysRegDef, superceeding it with a
better solution.
3. Implements PR865, "coallescing" away the second copy in code like:
A = B
...
B = A
This also includes changes to symbolically print registers in intervals
when possible.
llvm-svn: 29862
(an unused method).
Fix the merger so that it can merge ranges like this [10:12)[16:40) with
[12:38) into [10:40) instead of bogus ranges. This sort of input will be
possible for the merger coming shortly
llvm-svn: 23865
Fix a *bug* in the extendIntervalEndTo method. In particular, if adding
[2:10) to an interval containing [0:2),[10:30), we produced [0:10),[10,30).
Which is not the most smart thing to do. Now produce [0:30).
llvm-svn: 23841
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
aggressively coallesce live ranges even if they overlap. Consider this LLVM
code for example:
int %test(int %X) {
%Y = mul int %X, 1 ;; Codegens to Y = X
%Z = add int %X, %Y
ret int %Z
}
The mul is just there to get a copy into the code stream. This produces
this machine code:
(0x869e5a8, LLVM BB @0x869b9a0):
%reg1024 = mov <fi#-2>, 1, %NOREG, 0 ;; "X"
%reg1025 = mov %reg1024 ;; "Y" (subsumed by X)
%reg1026 = add %reg1024, %reg1025
%EAX = mov %reg1026
ret
Note that the life times of reg1024 and reg1025 overlap, even though they
contain the same value. This results in this machine code:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
add %EAX, %ECX
ret
Another, worse case involves loops and PHI nodes. Consider this trivial loop:
testcase:
int %test2(int %X) {
entry:
br label %Loop
Loop:
%Y = phi int [%X, %entry], [%Z, %Loop]
%Z = add int %Y, 1
%cond = seteq int %Z, 100
br bool %cond, label %Out, label %Loop
Out:
ret int %Z
}
Because of interactions between the PHI elimination pass and the register
allocator, this got compiled to this code:
test2:
mov %ECX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
*** mov %EAX, %ECX
inc %EAX
cmp %EAX, 100
*** mov %ECX, %EAX
jne .LBBtest2_1
ret
Or on powerpc, this code:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r2, r3, 1
cmpwi cr0, r2, 100
*** or r3, r2, r2
bne cr0, .LBB_test2_1
*** or r3, r2, r2
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
With this improvement in place, we now generate this code for these two
testcases, which is what we want:
test:
mov %EAX, DWORD PTR [%ESP + 4]
add %EAX, %EAX
ret
test2:
mov %EAX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
inc %EAX
cmp %EAX, 100
jne .LBBtest2_1 # Loop
ret
Or on PPC:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r3, r3, 1
cmpwi cr0, r3, 100
bne cr0, .LBB_test2_1
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
Static numbers for spill code loads/stores/reg-reg copies (smaller is better):
em3d: before: 47/25/26 after: 44/22/24
164.gzip: before: 433/245/310 after: 403/231/278
175.vpr: before: 3721/2189/1581 after: 4144/2081/1423
176.gcc: before: 26195/8866/9235 after: 25942/8082/8275
186.crafty: before: 4295/2587/3079 after: 4119/2519/2916
252.eon: before: 12754/7585/5803 after: 12508/7425/5643
256.bzip2: before: 463/226/315 after: 482:241/309
Runtime perf number samples on X86:
gzip: before: 41.09 after: 39.86
bzip2: runtime: before: 56.71s after: 57.07s
gcc: before: 6.16 after: 6.12
eon: before: 2.03s after: 2.00s
llvm-svn: 15194
us back to taking about 10.5s on gcc, instead of taking 15.6s! The net result
is that my big patches have hand no significant effect on compile time or code
quality. heh.
llvm-svn: 15156
* Fix comment typeo
* add dump() methods
* add a few new methods like getLiveRangeContaining, removeRange & joinable
(which is currently the same as overlaps)
* Remove the unused operator==
Bigger change:
* In LiveInterval, instead of using a boolean isDefinedOnce to keep track of
if there are > 1 definitions in a particular interval, keep a counter,
NumValues to keep track of exactly how many there are.
* In LiveRange, add a new ValId element to indicate which of the numbered
values each LiveRange belongs to. We now no longer merge LiveRanges if
they are of differing value ID's even if they are neighbors.
llvm-svn: 15152
want to insert a new range into the middle of the vector, then delete ranges
one at a time next to the inserted one as they are merged.
Instead, if the inserted interval overlaps, just start merging. The only time
we insert into the middle of the vector is when we don't overlap at all. Also
delete blocks of live ranges if we overlap with many of them.
This patch speeds up joining by .7 seconds on a large testcase, but more
importantly gets all of the range adding code into addRangeFrom.
llvm-svn: 15141
will soon be renamed) into their own file. The new file should not emit
DEBUG output or have other side effects. The LiveInterval class also now
doesn't know whether its working on registers or some other thing.
In the future we will want to use the LiveInterval class and friends to do
stack packing. In addition to a code simplification, this will allow us to
do it more easily.
llvm-svn: 15134