misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.
This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.
Andy reviewed, tested and simplified this patch; Thanks!
llvm-svn: 169744
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
llvm-svn: 169609
This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.
llvm-svn: 169425
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.
llvm-svn: 169312
on 64-bit PowerPC ELF.
The patch includes code to handle external assembly and MC output with the
integrated assembler. It intentionally does not support the "old" JIT.
For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:
Code sequence Relocation Symbol
ld 9,x@got@tprel(2) R_PPC64_GOT_TPREL16_DS x
add 9,9,x@tls R_PPC64_TLS x
The register 9 is arbitrary here. The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x. It will replace x@tls with the thread-pointer
register (13).
The two test cases verify correct assembly output and relocation output
as just described.
PowerPC-specific selection node variants are added for the two
instructions above: LD_GOT_TPREL and ADD_TLS. These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS. LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.
The rest of the processing is straightforward.
llvm-svn: 169281
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.
llvm-svn: 169218
the alignment is clamped to TargetFrameLowering.getStackAlignment if the target
does not support stack realignment or the option "realign-stack" is off.
This will cause miscompile if the address is treated as aligned and add is
replaced with or in DAGCombine.
Added a bool StackRealignable to TargetFrameLowering to check whether stack
realignment is implemented for the target. Also added a bool RealignOption
to MachineFrameInfo to check whether the option "realign-stack" is on.
rdar://12713765
llvm-svn: 169197
The TwoAddressInstructionPass takes the machine code out of SSA form by
expanding REG_SEQUENCE instructions into copies. It is no longer
necessary to rewrite the registers used by a REG_SEQUENCE instruction
because the new coalescer algorithm can do it now.
REG_SEQUENCE is just converted to a sequence of sub-register copies now.
llvm-svn: 169067
Codegen was failing with an assertion because of unexpected vector
operands when legalizing the selection DAG for a MUL instruction.
The asserting code was legalizing multiplies for vectors of size 128
bits. It uses a custom lowering to try and detect cases where it can
use a VMULL instruction instead of a VMOVL + VMUL. The code was
looking for input operands to the MUL that had been sign or zero
extended. If it found the extended operands it would drop the
sign/zero extension and use the original vector size as input to a
VMULL instruction.
The code assumed that the original input vector was 64 bits so that
after dropping the extension it would fit directly into a D register
and could be used as an operand of a VMULL instruction. The input
code that trigger the failure used a vector of <4 x i8> that was
sign extended to <4 x i32>. It was not safe to drop the sign
extension in this case because the original vector is only 32 bits
wide. The fix is to insert a sign extension for the vector to reach
the required 64 bit size. In this particular example, the vector would
need to be sign extented to a <4 x i16>.
llvm-svn: 169024
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.
The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).
The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.
llvm-svn: 168998
the last invoke instruction in the function. This also removes the last landing
pad in an function. This is fine, but with SjLj EH code, we've already placed a
bunch of code in the 'entry' block, which expects the landing pad to stick
around.
When we get to the situation where CGP has removed the last landing pad, go
ahead and nuke the SjLj instructions from the 'entry' block.
<rdar://problem/12721258>
llvm-svn: 168930
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.
llvm-svn: 168883
This is a simple, cheap infrastructure for analyzing the shape of a
DAG. It recognizes uniform DAGs that take the shape of bottom-up
subtrees, such as the included matrix multiplication example. This is
useful for heuristics that balance register pressure with ILP. Two
canonical expressions of the heuristic are implemented in scheduling
modes: -misched-ilpmin and -misched-ilpmax.
llvm-svn: 168773
This fixes a hole in the "cheap" alias analysis logic implemented within
the DAG builder itself, regardless of whether proper alias analysis is
enabled. It now handles this pattern produced by LSR+CodeGenPrepare.
%sunkaddr1 = ptrtoint * %obj to i64
%sunkaddr2 = add i64 %sunkaddr1, %lsr.iv
%sunkaddr3 = inttoptr i64 %sunkaddr2 to i32*
store i32 %v, i32* %sunkaddr3
llvm-svn: 168768
When the CodeGenInfo is to be created for the PPC64 target machine,
a default code-model selection is converted to CodeModel::Medium
provided we are not targeting the Darwin OS. Defaults for Darwin
are unaffected.
llvm-svn: 168747
boundaries.
Given the following case:
BB0
%vreg1<def> = SUBrr %vreg0, %vreg7
%vreg2<def> = COPY %vreg7
BB1
%vreg10<def> = SUBrr %vreg0, %vreg2
We should be able to CSE between SUBrr in BB0 and SUBrr in BB1.
rdar://12462006
llvm-svn: 168717
when the destination register is wider than the memory load.
These load instructions load from m32 or m64 and set the upper bits to zero,
while the folded instructions may accept m128.
rdar://12721174
llvm-svn: 168710