experimental pass that allocates locals relative to one another before
register allocation and then assigns them to actual stack slots as a block
later in PEI. This will eventually allow targets with limited index offset
range to allocate additional base registers (not just FP and SP) to
more efficiently reference locals, as well as handle situations where
locals cannot be referenced via SP or FP at all (dynamic stack realignment
together with variable sized objects, for example). It's currently
incomplete and almost certainly buggy. Work in progress.
Disabled by default and gated via the -enable-local-stack-alloc command
line option.
rdar://8277890
llvm-svn: 111059
The earliestStart argument is entirely specific to linear scan allocation, and
can be easily calculated by RegAllocLinearScan.
Replace std::vector with SmallVector.
llvm-svn: 111055
When a live range is contained a single block, we can split it around
instruction clusters. The current approach is very primitive, splitting before
and after the largest gap between uses.
llvm-svn: 111043
numbers match. The old check could accidentally leave holes in openli.
Also let useIntv add all ranges for the phi-def value inserted by
enterIntvAtEnd. This works as long at the value mapping is established in
enterIntvAtEnd.
llvm-svn: 110995
This can happen if the original interval has been broken into two disconnected
parts. Ideally, we should be able to detect when the graph is disconnected and
create separate intervals, but that code is not implemented yet.
Example:
Two basic blocks are both branching to a loop header. Our interval is defined in
both basic blocks, and live into the loop along both edges.
We decide to split the interval around the loop. The interval is split into an
inside part and an outside part. The outside part now has two disconnected
segments, one in each basic block.
If we later decide to split the outside interval into single blocks, we get one
interval per basic block and an empty dupli for the remainder.
llvm-svn: 110976
Before spilling a live range, we split it into a separate range for each basic
block where it is used. That way we only get one reload per basic block if the
new smaller ranges can allocate to a register.
This type of splitting is already present in the standard spiller.
llvm-svn: 110934
operands. We don't currently have a hook to provide "the largest super class of
A where all registers' getSubReg(subidx) is valid and in B".
llvm-svn: 110730
The live interval may be used for a spill slot as well, and that spill slot
could be shared by split registers. We cannot shrink it, even if we know the
current register won't need the spill slot in that range.
llvm-svn: 110721
When splitting a live range, the new registers have fewer uses and the
permissible register class may be less constrained. Recompute the register class
constraint from the uses of new registers created for a split. This may let them
be allocated from a larger set, possibly avoiding a spill.
llvm-svn: 110703
register at a time. This turns out to be slightly faster than iterating over
instructions, but more importantly, it allows us to compute spill weights for
new registers created after the spill weight pass has run.
Also compute the allocation hint at the same time as the spill weight. This
allows us to use the spill weight as a cost metric for copies, and choose the
most profitable hint if there is more than one possibility.
The new hints provide a very small (< 0.1%) but universal code size improvement.
llvm-svn: 110631
If we are emitting COPY instructions for the REG_SEQUENCE, make sure the kill
flag goes on the last COPY. Otherwise we may be using a killed register.
<rdar://problem/8287792>
llvm-svn: 110589
relatively expensive comparison analyzer on each instruction. Also rename the
comparison analyzer method to something more in line with what it actually does.
This pass is will eventually be folded into the Machine CSE pass.
llvm-svn: 110539
necessary.
Sometimes, live range splitting doesn't shrink the current interval, but simply
changes some instructions to use a new interval. That makes the original more
suitable for spilling. In this case, we don't need to duplicate the original.
llvm-svn: 110481
After heavy editing of a live interval, it is much easier to simply renumber the
live values instead of trying to keep track of the unused ones.
llvm-svn: 110463
When a physical register is in use, some alias of that register has a live
interval with a relevant live range. That is the sad state of intervals after
physreg coalescing of subregs, and it is good enough for correct register
allocation.
llvm-svn: 110452
This pass tries to remove comparison instructions when possible. For instance,
if you have this code:
sub r1, 1
cmp r1, 0
bz L1
and "sub" either sets the same flag as the "cmp" instruction or could be
converted to set the same flag, then we can eliminate the "cmp" instruction all
together. This is a important for ARM where the ALU instructions could set the
CPSR flag, but need a special suffix ('s') to do so.
llvm-svn: 110423
LiveVariables becomes horribly wrong while the coalescer is running, but the
analysis is not zapped until after the coalescer pass has run. This causes tons
of false reports when calling verify form the coalescer.
llvm-svn: 110402
We verify that the LiveInterval is live at uses and defs, and that all
instructions have a SlotIndex.
Stuff we don't check yet:
- Is the LiveInterval minimal?
- Do all defs correspond to instructions or phis?
- Do all defs dominate all their live ranges?
- Are all live ranges continually reachable from their def?
llvm-svn: 110386
be killed before being redefined.
These checks are usually disabled, and usually fail when enabled. We de facto
allow live registers to be redefined without a kill, the corresponding
assertions in RegScavenger were removed long ago.
llvm-svn: 110362
When the normalizeSpillWeights function was introduced, I forgot to remove this
normalization.
This change could affect register allocation. Hopefully for the better.
llvm-svn: 110119
multiple defs, like t2LDRSB_POST.
The first def could accidentally steal the physreg that the second, tied def was
required to be allocated to.
Now, the tied use-def is treated more like an early clobber, and the physreg is
reserved before allocating the other defs.
This would never be a problem when the tied def was the only def which is the
usual case.
This fixes MallocBench/gs for thumb2 -O0.
llvm-svn: 109715
protectors, to be near the stack protectors on the stack. Accomplish this by
tagging the stack object with a predicate that indicates that it would trigger
this. In the prolog-epilog inserter, assign these objects to the stack after the
stack protector but before the other objects.
llvm-svn: 109481
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
llvm-svn: 109300
to be of a different register class. For example, in Thumb1 if the live-in is
a high register, we want the vreg to be a low register. rdar://8224931
llvm-svn: 109291
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
llvm-svn: 109279
Make MDNode::destroy private.
Fix the one thing that used MDNode::destroy, outside of MDNode itself.
One should never delete or destroy an MDNode explicitly. MDNodes
implicitly go away when there are no references to them (implementation
details aside).
llvm-svn: 109028
This is a work in progress. So far we have some basic loop analysis to help
determine where it is useful to split a live range around a loop.
The actual loop splitting code from Splitter.cpp is also going to move in here.
llvm-svn: 108842
loop, for the reasons in the comments. This is a
major win on 253.perlbmk on ARM Darwin. I expect it
to be a good heuristic in general, but it's possible
some things will regress; I'll be watching.
7940152.
llvm-svn: 108792
- Unfortunate, but necessary for now to handle subtarget instruction matching. Eventually we should factor out the lower level target machine information so we don't need to do this.
llvm-svn: 108664
I am assured by people more knowledgeable than me that there are no rounding issues in eliminating this.
This fixed <rdar://problem/8197504>.
llvm-svn: 108639
Still very much under development. Comments and fixes will be forthcoming.
(This commit includes some small tweaks to LiveIntervals & LoopInfo to support the splitter)
llvm-svn: 108615
any command line paramater changed the register allocation produced by
PBQP.
Turns out variety is not the spice of life.
Fixed some comparators, added others. All good now.
llvm-svn: 108613
void foo() { __builtin_unreachable(); }
It will output the following on Darwin X86:
_func1:
Leh_func_begin0:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
Leh_func_end0:
This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.
llvm-svn: 108568
since it doesn't work for front-ends which don't emit column information
(which includes llvm-gcc in its present configuration), and doesn't
work for clang for K&R style variables where the variables are declared
in a different order from the parameter list.
Instead, make a separate pass through the instructions to collect the
llvm.dbg.declare instructions in order. This ensures that the debug
information for variables is emitted in this order.
llvm-svn: 108538
occasions, caused code to be generated in a different order.
All cases I've seen involved float softening in the type
legalizer, and this could be perhaps be fixed there, but
it's better not to generate things differently in the first
place. 7797940 (6/29/2010..7/15/2010).
llvm-svn: 108484
the function. We'll just turn it into a "trap" instruction instead.
The problem with not handling this is that it might generate a prologue without
the equivalent epilogue to go with it:
$ cat t.ll
define void @foo() {
entry:
unreachable
}
$ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
Leh_func_begin0:
## BB#0: ## %entry
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
Leh_func_end0:
...
The unwind tables then have bad data in them causing all sorts of problems.
Fixes <rdar://problem/8096481>.
llvm-svn: 108473
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.
llvm-svn: 108465
to keep "Text" in sync with the "pure instructions" section attribute.
Lack of this attribute was preventing the assembler from emitting
multibyte noops instructions for templates (and inlines, and other
coalesced stuff) and was causing the assembler to mismatch .o files.
This fixes rdar://8018335
llvm-svn: 108461
independent of the order that isel happens to visit the dbg_declare
intrinsics. This fixes a bug in which the formal arguments were
being printed in reverse order, now that fast isel is going bottom up.
llvm-svn: 108369
LiveInterval::overlapsFrom dereferences end() if it is called on an empty
interval.
It would be reasonable to just return false - an empty interval doesn't overlap
anything, but I want to know who is doing it first.
llvm-svn: 108264
they already have one.
This fixes the himenobmtxpa miscompilation on ARM.
The PostRA scheduler got confused by the double memoperand and hoisted a stack
slot load above a store to the same slot.
llvm-svn: 108219
AggressiveAntiDepBreaker should not be using getPhysicalRegisterRegClass. An
instruction might be using a register that can only be replaced with one from
a subclass of getPhysicalRegisterRegClass.
With this patch we use getMinimalPhysRegClass. This is correct, but
conservative. We should check the uses of the register and select the
largest register class that can be used in all of them.
llvm-svn: 108122
physical register can be allocated in the class of the virtual are sufficient.
I think that the test for virtual registers is more strict than it needs to be,
it should be possible to coalesce two virtual registers the class of one
is a subclass of the other.
llvm-svn: 108118
getMinimalPhysRegClass. It was used to produce spills, and it is better to
use the most specific class if possible.
Update getLoadStoreRegOpcode to handle GR32_AD.
llvm-svn: 108115
Targets must now implement TargetInstrInfo::copyPhysReg instead. There is no
longer a default implementation forwarding to copyRegToReg.
llvm-svn: 108095
The first one was used just to call isSafeToMoveRegClassDefs. In
general, using a more specific reg class is better, in practice only
x86 implements that method and the results are always the same.
The second one is in FindFreeRegister and is used to check if a register
is in a register class, a much more direct call to contains is better as
it should cover more cases and is faster.
llvm-svn: 108093