uninteresting, just put all the operands on one list and make
GenerateReassociations make the decision about what's interesting.
This is simpler, and it avoids an extra ScalarEvolution::getAddExpr call.
llvm-svn: 111133
implementations of equality comparison and hash computation. This
can be used to optimize node lookup by avoiding creating lots of
temporary ID values just for hashing and comparison purposes.
llvm-svn: 111130
- Eliminate redundant successors.
- Convert an indirectbr with one successor into a direct branch.
Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.
llvm-svn: 111060
experimental pass that allocates locals relative to one another before
register allocation and then assigns them to actual stack slots as a block
later in PEI. This will eventually allow targets with limited index offset
range to allocate additional base registers (not just FP and SP) to
more efficiently reference locals, as well as handle situations where
locals cannot be referenced via SP or FP at all (dynamic stack realignment
together with variable sized objects, for example). It's currently
incomplete and almost certainly buggy. Work in progress.
Disabled by default and gated via the -enable-local-stack-alloc command
line option.
rdar://8277890
llvm-svn: 111059
The earliestStart argument is entirely specific to linear scan allocation, and
can be easily calculated by RegAllocLinearScan.
Replace std::vector with SmallVector.
llvm-svn: 111055
When a live range is contained a single block, we can split it around
instruction clusters. The current approach is very primitive, splitting before
and after the largest gap between uses.
llvm-svn: 111043
numbers match. The old check could accidentally leave holes in openli.
Also let useIntv add all ranges for the phi-def value inserted by
enterIntvAtEnd. This works as long at the value mapping is established in
enterIntvAtEnd.
llvm-svn: 110995
This can happen if the original interval has been broken into two disconnected
parts. Ideally, we should be able to detect when the graph is disconnected and
create separate intervals, but that code is not implemented yet.
Example:
Two basic blocks are both branching to a loop header. Our interval is defined in
both basic blocks, and live into the loop along both edges.
We decide to split the interval around the loop. The interval is split into an
inside part and an outside part. The outside part now has two disconnected
segments, one in each basic block.
If we later decide to split the outside interval into single blocks, we get one
interval per basic block and an empty dupli for the remainder.
llvm-svn: 110976
- Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too.
- Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX.
- Add a testcase for a simple 128-bit zero vector creation.
llvm-svn: 110946
Before spilling a live range, we split it into a separate range for each basic
block where it is used. That way we only get one reload per basic block if the
new smaller ranges can allocate to a register.
This type of splitting is already present in the standard spiller.
llvm-svn: 110934
having it finish processing all of the muliply operands before
starting the whole getAddExpr process over again, instead of
immediately after the first simplification.
llvm-svn: 110916
by having it finish processing the whole operand list before
starting the whole getAddExpr process over again, instead of
immediately after the first duplicate is found.
llvm-svn: 110914
target triple and straightens it out. This does less than gcc's script
config.sub, for example it turns i386-mingw32 into i386--mingw32 not
i386-pc-mingw32, but it does a decent job of turning funky triples into
something that the rest of the Triple class can understand. The plan
is to use this to canonicalize triple's when they are first provided
by users, and have the rest of LLVM only deal with canonical triples.
Once this is done the special case workarounds in the Triple constructor
can be removed, making the class more regular and easier to use. The
comments and unittests for the Triple class are already adjusted in this
patch appropriately for this brave new world of increased uniformity.
llvm-svn: 110909
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.
llvm-svn: 110897
entry for ARM STRBT is actually a super-instruction for A8.6.199 STRBT A1 & A2.
Recover by looking for ARM:USAT encoding pattern before delegating to the auto-
gened decoder.
Added a "usat" test case to arm-tests.txt.
llvm-svn: 110894
When a register is defined by a partial load:
%reg1234:sub_32 = MOV32mr <fi#-1>; GR64:%reg1234
That load cannot be folded into an instruction using the full 64-bit register.
It would become a 64-bit load.
This is related to the recent change to have isLoadFromStackSlot return false on
a sub-register load.
llvm-svn: 110874
- remove ashr which never worked.
- fix lshr and shl and add tests.
- remove dead function "intersect1Wrapped".
- add a new sub method to subtract ranges, with test.
llvm-svn: 110861
that many of these things, so the memory savings isn't significant,
and there are now situations where there can be alignments greater
than 128.
llvm-svn: 110836
avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.
Also, use getCopyFromReg instead of getRegister to read a
physical register's value.
llvm-svn: 110835
float t1(int argc) {
return (argc == 1123) ? 1.234f : 2.38213f;
}
We would generate truly awful code on ARM (those with a weak stomach should look
away):
_t1:
movw r1, #1123
movs r2, #1
movs r3, #0
cmp r0, r1
mov.w r0, #0
it eq
moveq r0, r2
movs r1, #4
cmp r0, #0
it ne
movne r3, r1
adr r0, #LCPI1_0
ldr r0, [r0, r3]
bx lr
The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".
I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.
Now we generate this, which looks optimal to me:
_t1:
movw r1, #1123
movs r2, #0
cmp r0, r1
adr r0, #LCPI0_0
it eq
moveq r2, #4
ldr r0, [r0, r2]
bx lr
.align 2
LCPI0_0:
.long 1075344593 @ float 2.382130e+00
.long 1067316150 @ float 1.234000e+00
llvm-svn: 110799
memory and synchronization barrier dmb and dsb instructions.
- Change instruction names to something more sensible (matching name of actual
instructions).
- Added tests for memory barrier codegen.
llvm-svn: 110785
(I discovered 2 more copies of the ARM instruction format list, bringing the
total to 4!! Two of them were already out of sync. I haven't yet gotten into
the disassembler enough to know the best way to fix this, but something needs
to be done.) Add support for encoding these instructions.
llvm-svn: 110754
Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.
This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.
llvm-svn: 110744
patterns generated by clang for transpose of a matrix in generic vectors. This is made
of two parts:
1) Propagating vector extracts of hi/lo half into their users
2) Recognizing an insertion of even elements followed by the odd elements as an unpack.
Testcase to come, but this shrinks the # of shuffle instructions generated on x86 from ~40 to the minimal 8.
llvm-svn: 110734
operands. We don't currently have a hook to provide "the largest super class of
A where all registers' getSubReg(subidx) is valid and in B".
llvm-svn: 110730
The live interval may be used for a spill slot as well, and that spill slot
could be shared by split registers. We cannot shrink it, even if we know the
current register won't need the spill slot in that range.
llvm-svn: 110721
Also added a test case to check for the added benefit of this patch: it's optimizing away the unnecessary restore of sp from fp for some non-leaf functions.
llvm-svn: 110707
When splitting a live range, the new registers have fewer uses and the
permissible register class may be less constrained. Recompute the register class
constraint from the uses of new registers created for a split. This may let them
be allocated from a larger set, possibly avoiding a spill.
llvm-svn: 110703
reserved, not available for general allocation. This eliminates all the
extra checks for Darwin.
This change also fixes the use of FP to access frame indices in leaf
functions and cleaned up some confusing code in epilogue emission.
llvm-svn: 110655
register at a time. This turns out to be slightly faster than iterating over
instructions, but more importantly, it allows us to compute spill weights for
new registers created after the spill weight pass has run.
Also compute the allocation hint at the same time as the spill weight. This
allows us to use the spill weight as a cost metric for copies, and choose the
most profitable hint if there is more than one possibility.
The new hints provide a very small (< 0.1%) but universal code size improvement.
llvm-svn: 110631
previously collected info from the .file directives and outputs the encoded
bytes for it. For now this is only in the Mach-O streamer but at some point
will move to a more generic place.
llvm-svn: 110617
This will always be false before PEI:
(DisableFramePointerElim(MF) && MFI->adjustsStack())
Which means it's going to make r11 available as a general purpose register even
if -disable-fp-elim is specified. It's working on Darwin only because r7 is
always reserved. But it's obviously broken for other targets.
llvm-svn: 110614
Next time the build is broken due to wrong library dependencies, just
try building again (if you are on some Unix and are building all LLVM
targets) or ask someone to commit the regenerated LLVMLibDeps.cmake.
llvm-svn: 110593
If we are emitting COPY instructions for the REG_SEQUENCE, make sure the kill
flag goes on the last COPY. Otherwise we may be using a killed register.
<rdar://problem/8287792>
llvm-svn: 110589
relatively expensive comparison analyzer on each instruction. Also rename the
comparison analyzer method to something more in line with what it actually does.
This pass is will eventually be folded into the Machine CSE pass.
llvm-svn: 110539
necessary.
Sometimes, live range splitting doesn't shrink the current interval, but simply
changes some instructions to use a new interval. That makes the original more
suitable for spilling. In this case, we don't need to duplicate the original.
llvm-svn: 110481
implementation of the function is equivalent, so no need to provide
the target-specific version until/unless it needs to do something.
llvm-svn: 110465
After heavy editing of a live interval, it is much easier to simply renumber the
live values instead of trying to keep track of the unused ones.
llvm-svn: 110463
When a physical register is in use, some alias of that register has a live
interval with a relevant live range. That is the sad state of intervals after
physreg coalescing of subregs, and it is good enough for correct register
allocation.
llvm-svn: 110452
Without this what was happening was:
* R3 is not marked as "used"
* ARM backend thinks it has to save it to the stack because of vaarg
* Offset computation correctly ignores it
* Offsets are wrong
llvm-svn: 110446
Further clean up the comparison function by removing overly generalized
"domains".
Remove all understanding of ELF aliases and simplify folding code and comments.
llvm-svn: 110434
This pass tries to remove comparison instructions when possible. For instance,
if you have this code:
sub r1, 1
cmp r1, 0
bz L1
and "sub" either sets the same flag as the "cmp" instruction or could be
converted to set the same flag, then we can eliminate the "cmp" instruction all
together. This is a important for ARM where the ALU instructions could set the
CPSR flag, but need a special suffix ('s') to do so.
llvm-svn: 110423
AliasAnalysis base class and into BasicAliasAnalyais. This avoids confusion
about where such logic is happening when there are other AliasAnalysis
implementations present.
Move the logic for translating two-callsite getModRefInfo queries into
other AliasAnalysis queries out of BasicAliasAnalysis and into the
AliasAnalysis base class, as it is useful for other AliasAnalysis
implementations.
llvm-svn: 110421
LiveVariables becomes horribly wrong while the coalescer is running, but the
analysis is not zapped until after the coalescer pass has run. This causes tons
of false reports when calling verify form the coalescer.
llvm-svn: 110402
We verify that the LiveInterval is live at uses and defs, and that all
instructions have a SlotIndex.
Stuff we don't check yet:
- Is the LiveInterval minimal?
- Do all defs correspond to instructions or phis?
- Do all defs dominate all their live ranges?
- Are all live ranges continually reachable from their def?
llvm-svn: 110386
response from getModRefInfo is not useful here. Instead, check for identical
calls only in the NoModRef case.
Reapply r110270, and strengthen it to compensate for the memdep changes.
When both calls are readonly, there is no dependence between them.
llvm-svn: 110382
register for local access when it's closer to the stack slot being refererenced
than the stack pointer. Make sure to take into account any argument frame
SP adjustments that are in affect at the time.
rdar://8256090
llvm-svn: 110366
be killed before being redefined.
These checks are usually disabled, and usually fail when enabled. We de facto
allow live registers to be redefined without a kill, the corresponding
assertions in RegScavenger were removed long ago.
llvm-svn: 110362
to return Ref if the left callsite only reads memory read or written
by the right callsite; fix BasicAliasAnalysis to implement this.
Add AliasAnalysisEvaluator support for testing the two-callsite
form of getModRefInfo.
llvm-svn: 110270
simplify the call frame pseudo instructions. In that situation, the
calculations for estimating the stack size will be way off, leading to
not having an emergency spill slot when we need one. It should be possible
to be more precise about tracking the adjustment values, but not really
necessary for correctness. Upcoming cleanups for PEI in general will
render that moot.
llvm-svn: 110258