TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.
Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.
ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
to re-materialize the instruction, allow machine LICM to hoist the set of
instructions out of the loop and make it possible to CSE them. It's a bit
hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.
With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.
llvm-svn: 123905
with an invalid type then split the result and perform the overflow check
normally.
Fixes the 32-bit parts of rdar://8622122 and rdar://8774702.
llvm-svn: 123864
movw r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
movt r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
add r0, pc, r0
It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.
llvm-svn: 123619
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
llvm-svn: 122995
If the basic block containing the BCCi64 (or BCCZi64) instruction ends with
an unconditional branch, that branch needs to be deleted before appending
the expansion of the BCCi64 to the end of the block.
llvm-svn: 122521
Type legalization splits up i64 values into pairs of i32 values, which leads
to poor quality code when inserting or extracting i64 vector elements.
If the vector element is loaded or stored, it can be treated as an f64 value
and loaded or stored directly from a VPR register. Use the pre-legalization
DAG combiner to cast those vector elements to f64 types so that the type
legalizer won't mess them up. Radar 8755338.
llvm-svn: 122319
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.
Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)
<rdar://problem/8782198>
llvm-svn: 122104
BUILD_VECTOR operands where the element type is not legal. I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.
llvm-svn: 122102
Alignments smaller than the total size of the memory being loaded or stored,
unless the alignment is 8 bytes, are not allowed. Add tests for this, too.
llvm-svn: 121506
Added test to check bl __aeabi_read_tp gets emitted properly for ELF/ASM
as well as ELF/OBJ (including fixup)
Also added support for ELF::R_ARM_TLS_IE32
llvm-svn: 121312
vpush instructions to save / restore VFP / NEON registers like this:
vpush {d8,d10,d11}
vpop {d8,d10,d11}
vpush and vpop do not allow gaps in the register list.
rdar://8728956
llvm-svn: 121197
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960
Lifted adjustFixupValue() from Darwin for sharing w ELF.
Test added
TODO:
refactor ELFObjectWriter::RecordRelocation more.
Possibly share more code with Darwin?
Lots more relocations...
llvm-svn: 120534
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
llvm-svn: 120501
We need to check if the individual vector elements are sign/zero-extended
values. For now this only handles constants values. Radar 8687140.
llvm-svn: 120034
state. Previously Thumb2 would restore sp from fp like this:
mov sp, r7
sub, sp, #4
If an interrupt is taken after the 'mov' but before the 'sub', callee-saved
registers might be clobbered by the interrupt handler. Instead, try
restoring directly from sp:
add sp, #4
Or, if necessary (with VLA, etc.) use a scratch register to compute sp and
then restore it:
sub.w r4, r7, #8
mov sp, r7
rdar://8465407
llvm-svn: 119977
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.
Adjust all testcases accordingly.
llvm-svn: 119725
It is generally not sufficient to check if the starting offset is in range
of the maximum offset that can be efficiently used for the target.
llvm-svn: 119565
This makes it more clear that the symbol is an internal, compiler-generated
name and gives a little more description about its contents.
llvm-svn: 119564