When a call is placed to spill an interval this spiller will first try to
break the interval up into its component values. Single value intervals and
intervals which have already been split (or are the result of previous splits)
are spilled by the default spiller.
Splitting intervals as described above may improve the performance of generated
code in some circumstances. This work is experimental however, and it still
miscompiles many benchmarks. It's not recommended for general use yet.
llvm-svn: 90951
This introduces a new pass, SlotIndexes, which is responsible for numbering
instructions for register allocation (and other clients). SlotIndexes numbering
is designed to match the existing scheme, so this patch should not cause any
changes in the generated code.
For consistency, and to avoid naming confusion, LiveIndex has been renamed
SlotIndex.
The processImplicitDefs method of the LiveIntervals analysis has been moved
into its own pass so that it can be run prior to SlotIndexes. This was
necessary to match the existing numbering scheme.
llvm-svn: 85979
a new class, MachineInstrIndex, which hides arithmetic details from
most clients. This is a step towards allowing the register allocator
to update/insert code during allocation.
llvm-svn: 81040
range's weight properly. This is turned off right now in the sense that
you'll get an assert if you get into a situation that can only be caused
by an iterative coalescer. All other code paths operate exactly as
before so there is no functional change with this patch. The asserts
should be disabled if/when an iterative coalescer gets added to trunk.
llvm-svn: 76680
as an (index,bool) pair. The bool flag records whether the kill is a
PHI kill or not. This code will be used to enable splitting of live
intervals containing PHI-kills.
A slight change to live interval weights introduced an extra spill
into lsr-code-insertion (outside the critical sections). The test
condition has been updated to reflect this.
llvm-svn: 75097
- Change register allocation hint to a pair of unsigned integers. The hint type is zero (which means prefer the register specified as second part of the pair) or entirely target dependent.
- Allow targets to specify alternative register allocation orders based on allocation hint.
Part 2.
- Use the register allocation hint system to implement more aggressive load / store multiple formation.
- Aggressively form LDRD / STRD. These are formed *before* register allocation. It has to be done this way to shorten live interval of base and offset registers. e.g.
v1025 = LDR v1024, 0
v1026 = LDR v1024, 0
=>
v1025,v1026 = LDRD v1024, 0
If this transformation isn't done before allocation, v1024 will overlap v1025 which means it more difficult to allocate a register pair.
- Even with the register allocation hint, it may not be possible to get the desired allocation. In that case, the post-allocation load / store multiple pass must fix the ldrd / strd instructions. They can either become ldm / stm instructions or back to a pair of ldr / str instructions.
This is work in progress, not yet enabled.
llvm-svn: 73381
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
1. Use the same value# to represent unknown values being merged into sub-registers.
2. When coalescer commute an instruction and the destination is a physical register, update its sub-registers by merging in the extended ranges.
llvm-svn: 66610
RA problem by expanding the live interval of an
earlyclobber def back one slot. Remove
overlap-earlyclobber throughout. Remove
earlyclobber bits and their handling from
live internals.
llvm-svn: 56539
the source register will be coalesced to the super register of the LHS. Properly
merge in the live ranges of the resulting coalesced interval that were part of
the original source interval to the live interval of the super-register.
llvm-svn: 42961
(almost) a register copy. However, it always coalesced to the register of the
RHS (the super-register). All uses of the result of a EXTRACT_SUBREG are sub-
register uses which adds subtle complications to load folding, spiller rewrite,
etc.
llvm-svn: 42899
Changes related modules so VNInfo's are not copied. This decrease
copy coalescing time by 45% and overall compilation time by 10% on siod.
llvm-svn: 41579
1. Eliminate the costly live interval "swapping".
2. Change ValueNumberInfo container from SmallVector to std::vector. The former
performs slowly when the vector size is very large.
llvm-svn: 41536
kill instruction #, and source register number (iff the value# is defined by a
copy).
- Now def instruction # is set for every value#, not just for copy defined ones.
- Update some outdated code related inactive live ranges.
- Kill info not yet set. That's next patch.
llvm-svn: 40913
rework the hacks that had us passing OStream in. We pass in std::ostream*
instead, check for null, and then dispatch to the correct print() method.
llvm-svn: 32636
Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.
llvm-svn: 31380
number of copies, potentially defining live ranges that appear to have
differing value numbers that become identical when coallsced. Among other
things, this fixes CodeGen/X86/shift-coalesce.ll and PR687.
llvm-svn: 29968
paves the way for future changes, increases coallescing opportunities (in
theory, not witnessed in practice), and eliminates the really expensive
LiveIntervals::overlapsAliases method.
llvm-svn: 29890
instructions which define each value#) to simplify and improve the coallescer.
In particular, this patch:
1. Implements iterative coallescing.
2. Reverts an unsafe hack from handlePhysRegDef, superceeding it with a
better solution.
3. Implements PR865, "coallescing" away the second copy in code like:
A = B
...
B = A
This also includes changes to symbolically print registers in intervals
when possible.
llvm-svn: 29862
(an unused method).
Fix the merger so that it can merge ranges like this [10:12)[16:40) with
[12:38) into [10:40) instead of bogus ranges. This sort of input will be
possible for the merger coming shortly
llvm-svn: 23865
Fix a *bug* in the extendIntervalEndTo method. In particular, if adding
[2:10) to an interval containing [0:2),[10:30), we produced [0:10),[10,30).
Which is not the most smart thing to do. Now produce [0:30).
llvm-svn: 23841
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
aggressively coallesce live ranges even if they overlap. Consider this LLVM
code for example:
int %test(int %X) {
%Y = mul int %X, 1 ;; Codegens to Y = X
%Z = add int %X, %Y
ret int %Z
}
The mul is just there to get a copy into the code stream. This produces
this machine code:
(0x869e5a8, LLVM BB @0x869b9a0):
%reg1024 = mov <fi#-2>, 1, %NOREG, 0 ;; "X"
%reg1025 = mov %reg1024 ;; "Y" (subsumed by X)
%reg1026 = add %reg1024, %reg1025
%EAX = mov %reg1026
ret
Note that the life times of reg1024 and reg1025 overlap, even though they
contain the same value. This results in this machine code:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
add %EAX, %ECX
ret
Another, worse case involves loops and PHI nodes. Consider this trivial loop:
testcase:
int %test2(int %X) {
entry:
br label %Loop
Loop:
%Y = phi int [%X, %entry], [%Z, %Loop]
%Z = add int %Y, 1
%cond = seteq int %Z, 100
br bool %cond, label %Out, label %Loop
Out:
ret int %Z
}
Because of interactions between the PHI elimination pass and the register
allocator, this got compiled to this code:
test2:
mov %ECX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
*** mov %EAX, %ECX
inc %EAX
cmp %EAX, 100
*** mov %ECX, %EAX
jne .LBBtest2_1
ret
Or on powerpc, this code:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r2, r3, 1
cmpwi cr0, r2, 100
*** or r3, r2, r2
bne cr0, .LBB_test2_1
*** or r3, r2, r2
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
With this improvement in place, we now generate this code for these two
testcases, which is what we want:
test:
mov %EAX, DWORD PTR [%ESP + 4]
add %EAX, %EAX
ret
test2:
mov %EAX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
inc %EAX
cmp %EAX, 100
jne .LBBtest2_1 # Loop
ret
Or on PPC:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r3, r3, 1
cmpwi cr0, r3, 100
bne cr0, .LBB_test2_1
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
Static numbers for spill code loads/stores/reg-reg copies (smaller is better):
em3d: before: 47/25/26 after: 44/22/24
164.gzip: before: 433/245/310 after: 403/231/278
175.vpr: before: 3721/2189/1581 after: 4144/2081/1423
176.gcc: before: 26195/8866/9235 after: 25942/8082/8275
186.crafty: before: 4295/2587/3079 after: 4119/2519/2916
252.eon: before: 12754/7585/5803 after: 12508/7425/5643
256.bzip2: before: 463/226/315 after: 482:241/309
Runtime perf number samples on X86:
gzip: before: 41.09 after: 39.86
bzip2: runtime: before: 56.71s after: 57.07s
gcc: before: 6.16 after: 6.12
eon: before: 2.03s after: 2.00s
llvm-svn: 15194