two spillers produce perfectly identical code (at least on povray and eon),
but the simple spiller is substantially faster than the local spiller. Once
the local spiller is improved, we can switch back.
Switching cuts 5.2% off of the llc time for povray (about 1.3s).
llvm-svn: 16608
use a simple vector. This speeds up -spiller=simple from taking 22s to taking
.1s on povray (debug build). This change does not modify the generated code.
llvm-svn: 16607
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
lists. Instead of scanning the vector backwards, scan it forward and
swap each element we want to erase. Then at the end erase all removed
intervals at once. This doesn't save much: 0.08s out of 4s when
compiling 176.gcc.
llvm-svn: 16136
Regression.CodeGen.Generic.2004-04-09-SameValueCoalescing.llx and the
code size problem.
This bug prevented us from doing most register coallesces.
llvm-svn: 16031
with emitting .xwords when not on an 8-byte boundary (.xword 0 is not the
same as 8 .byte 0's). Because we do not know when or when we are not aligned,
just emit bytes like the old V9 asmprinter did.
llvm-svn: 16006
Add support for targets that must spill certain physregs at certain locations.
Patch contributed by Nate Begeman, slightly hacked by me.
llvm-svn: 15701
These side-effects seem to make a difference when using llc -march=sparcv9
in Release mode (i.e., with -DNDEBUG); when they are left out, lots of
instructions just get dropped on the floor, because they never end up
in the schedule.
llvm-svn: 15339
aggressively coallesce live ranges even if they overlap. Consider this LLVM
code for example:
int %test(int %X) {
%Y = mul int %X, 1 ;; Codegens to Y = X
%Z = add int %X, %Y
ret int %Z
}
The mul is just there to get a copy into the code stream. This produces
this machine code:
(0x869e5a8, LLVM BB @0x869b9a0):
%reg1024 = mov <fi#-2>, 1, %NOREG, 0 ;; "X"
%reg1025 = mov %reg1024 ;; "Y" (subsumed by X)
%reg1026 = add %reg1024, %reg1025
%EAX = mov %reg1026
ret
Note that the life times of reg1024 and reg1025 overlap, even though they
contain the same value. This results in this machine code:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
add %EAX, %ECX
ret
Another, worse case involves loops and PHI nodes. Consider this trivial loop:
testcase:
int %test2(int %X) {
entry:
br label %Loop
Loop:
%Y = phi int [%X, %entry], [%Z, %Loop]
%Z = add int %Y, 1
%cond = seteq int %Z, 100
br bool %cond, label %Out, label %Loop
Out:
ret int %Z
}
Because of interactions between the PHI elimination pass and the register
allocator, this got compiled to this code:
test2:
mov %ECX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
*** mov %EAX, %ECX
inc %EAX
cmp %EAX, 100
*** mov %ECX, %EAX
jne .LBBtest2_1
ret
Or on powerpc, this code:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r2, r3, 1
cmpwi cr0, r2, 100
*** or r3, r2, r2
bne cr0, .LBB_test2_1
*** or r3, r2, r2
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
With this improvement in place, we now generate this code for these two
testcases, which is what we want:
test:
mov %EAX, DWORD PTR [%ESP + 4]
add %EAX, %EAX
ret
test2:
mov %EAX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
inc %EAX
cmp %EAX, 100
jne .LBBtest2_1 # Loop
ret
Or on PPC:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r3, r3, 1
cmpwi cr0, r3, 100
bne cr0, .LBB_test2_1
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
Static numbers for spill code loads/stores/reg-reg copies (smaller is better):
em3d: before: 47/25/26 after: 44/22/24
164.gzip: before: 433/245/310 after: 403/231/278
175.vpr: before: 3721/2189/1581 after: 4144/2081/1423
176.gcc: before: 26195/8866/9235 after: 25942/8082/8275
186.crafty: before: 4295/2587/3079 after: 4119/2519/2916
252.eon: before: 12754/7585/5803 after: 12508/7425/5643
256.bzip2: before: 463/226/315 after: 482:241/309
Runtime perf number samples on X86:
gzip: before: 41.09 after: 39.86
bzip2: runtime: before: 56.71s after: 57.07s
gcc: before: 6.16 after: 6.12
eon: before: 2.03s after: 2.00s
llvm-svn: 15194
same as the PHI use. This is not correct as the PHI use value is different
depending on which branch is taken. This fixes espresso with aggressive
coallescing, and perhaps others.
llvm-svn: 15189
us back to taking about 10.5s on gcc, instead of taking 15.6s! The net result
is that my big patches have hand no significant effect on compile time or code
quality. heh.
llvm-svn: 15156
Interval. This generalizes the isDefinedOnce mechanism that we used before
to help us coallesce ranges that overlap. As part of this, every logical
range with a different value is assigned a different number in the interval.
For example, for code that looks like this:
0 X = ...
4 X += ...
...
N = X
We now generate a live interval that contains two ranges: [2,6:0),[6,?:1)
reflecting the fact that there are two different values in the range at
different positions in the code.
Currently we are not using this information at all, so this just slows down
liveintervals. In the future, this will change.
Note that this change also substantially refactors the joinIntervalsInMachineBB
method to merge the cases for virt-virt and phys-virt joining into a single
case, adds comments, and makes the code a bit easier to follow.
llvm-svn: 15154