bits, use a union of a SimpleValueType enum and a regular Type*.
This increases the size of MVT on 64-bit hosts from 32 bits to 64 bits.
In most cases, this doesn't add significant overhead. There are places
in codegen that use arrays of MVTs, so these are now larger, but
they're small in common cases.
This eliminates restrictions on the size of integer types and vector
types that can be represented in codegen. As the included testcase
demonstrates, it's now possible to codegen very large add operations.
There are still some complications with using very large types. PR2880
is still open so they can't be used as return values on normal targets,
there are no libcalls defined for very large integers so operations
like multiply and divide aren't supported.
This also introduces a minimal tablgen Type library, capable of
handling IntegerType and VectorType. This will allow parts of
TableGen that don't depend on using SimpleValueType values to handle
arbitrary integer and vector types.
llvm-svn: 58623
so that va_start/va_arg/et.al. will walk arguments correctly for Cell SPU.
N.B.: Because neither clang nor llvm-gcc-4.2 can be built for CellSPU, this is
still unexorcised code.
llvm-svn: 58415
ppcf128 to i32 conversion and expand it into a code
sequence like in LegalizeDAG. This needs custom
ppc lowering of FP_ROUND_INREG, so turn that on and
make it work with LegalizeTypes. Probably PPC should
simply custom lower the original conversion.
llvm-svn: 58329
a memset using 16-byte XMM stores, but where the stack realignment code
didn't work. Until it does (PR2962) disable use of xmm regs in memcpy
and memset formation for linux and other targets with insufficiently
aligned stacks.
This is part of PR2888
llvm-svn: 58317
LHS is a foldable load, then LHS and RHS are swapped
and SetCCOpcode is changed to SETUGT. But the later
code is expecting operands to be the wrong way round
for SETUGT, but they are not in this case, resulting
in an inverted compare. The solution is to move the
load normalization before the correction for SETUGT.
This bug was tickled by LegalizeTypes which happened
to legalize the testcase slightly differently to
LegalizeDAG.
llvm-svn: 58092
in the 32-bit signed offset field of addresses. Even though this
may be intended, some linkers refuse to relocate code where the
relocated address computation overflows.
Also, fix the sign-extension of constant offsets to use the
actual pointer size, rather than the size of the GlobalAddress
node, which may be different, for example on x86-64 where MVT::i32
is used when the address is being fit into the 32-bit displacement
field.
llvm-svn: 57885
Where previously LLVM might emit code like this:
ucomisd %xmm1, %xmm0
setne %al
setp %cl
orb %al, %cl
jne .LBB4_2
it now emits this:
ucomisd %xmm1, %xmm0
jne .LBB4_2
jp .LBB4_2
It has fewer instructions and uses fewer registers, but it does
have more branches. And in the case that this code is followed by
a non-fallthrough edge, it may be followed by a jmp instruction,
resulting in three branch instructions in sequence. Some effort
is made to avoid this situation.
To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
FCMP_UNE in lowered form, and replace them with code that emits
two branches, except in the case where it would require converting
a fall-through edge to an explicit branch.
Also, X86InstrInfo.cpp's branch analysis and transform code now
knows now to handle blocks with multiple conditional branches. It
uses loops instead of having fixed checks for up to two
instructions. It can now analyze and transform code generated
from FCMP_OEQ and FCMP_UNE.
llvm-svn: 57873
the copy instruction from the instruction list before asking the
target to create the new instruction. This gets the old instruction
out of the way so that it doesn't interfere with the target's
rematerialization code. In the case of x86, this helps it find
more cases where EFLAGS is not live.
Also, in the X86InstrInfo.cpp, teach isSafeToClobberEFLAGS to check
to see if it reached the end of the block after scanning each
instruction, instead of just before. This lets it notice when the
end of the block is only two instructions away, without doing any
additional scanning.
These changes allow rematerialization to clobber EFLAGS in more
cases, for example using xor instead of mov to set the return value
to zero in the included testcase.
llvm-svn: 57872
for strange asm conditions earlier. In this case, we have a
double being passed in an integer reg class. Convert to like
sized integer register so that we allocate the right number
for the class (two i32's for the f64 in this case).
llvm-svn: 57862
the previous patch this one actually passes make check.
"Fix PR2356 on PowerPC: if we have an input and output that are tied together
that have different sizes (e.g. i32 and i64) make sure to reserve registers for
the bigger operand."
llvm-svn: 57771
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)
This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.
This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.
Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.
The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.
llvm-svn: 57748
in 32-bit mode instead of assigning a register pair. This has nothing to
do with PR2356, but I happened to notice it while working on it.
llvm-svn: 57704
use a SUB instruction instead of an ADD, because -128 can be
encoded in an 8-bit signed immediate field, while +128 can't be.
This avoids the need for a 32-bit immediate field in this case.
A similar optimization applies to 64-bit adds with 0x80000000,
with the 32-bit signed immediate field.
To support this, teach tablegen how to handle 64-bit constants.
llvm-svn: 57663
shift counts, and patterns that match dynamic shift counts
when the subtract is obscured by a truncate node.
Add DAGCombiner support for recognizing rotate patterns
when the shift counts are defined by truncate nodes.
Fix and simplify the code for commuting shld and shrd
instructions to work even when the given instruction doesn't
have a parent, and when the caller needs a new instruction.
These changes allow LLVM to use the shld, shrd, rol, and ror
instructions on x86 to replace equivalent code using two
shifts and an or in many more cases.
llvm-svn: 57662
i.e. conditions that cannot be checked with a single instruction. For example,
SETONE and SETUEQ on x86.
- Teach legalizer to implement *illegal* setcc as a and / or of a number of
legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is
implemented as SETO & SETNE, SETUEQ is SETUO | SETEQ.
- Move x86 target over.
llvm-svn: 57542
create a new DAG node to represent the new shift to keep the
DAG consistent, even though it'll almost always be folded into
the address.
If a user of the resulting address has multiple uses, the
nodes may get revisited by a later MatchAddress call, in which
case DAG inconsistencies do matter.
This fixes PR2849.
llvm-svn: 57465
instead.
So now: -fast-isel or -fast-isel=true enable fast-isel, and
-fast-isel=false disables it. Fast-isel is also on by default
with -fast, and off by default otherwise.
llvm-svn: 57270
are Inexact. (These are not Inexact as defined
by IEEE754, but that seems like a reasonable way
to abstract what happens: information is lost.)
llvm-svn: 57218
was setting kill flags on tied uses in two-address instructions.
The kill flags were causing the allocator to think it could
allocate the use and its tied def in different registers.
llvm-svn: 57039
"If a re-materializable instruction has a register
operand, the spiller will change the register operand's
spill weight to HUGE_VAL to avoid it being spilled.
However, if the operand is already in the queue ready
to be spilled, avoid re-materializing it".
llvm-svn: 56837
with an earlyclobber operand elsewhere. Propagate
this bit and the earlyclobber bit through SDISel.
Change linear-scan RA not to allocate regs in a way
that conflicts with an earlyclobber. See also comments.
llvm-svn: 56290
cases. See the comment above OptimizeSMax for the full story, and
the testcase for an example. This cancels out a pessimization
commonly attributed to indvars, and will allow us to lift some of
the artificial throttles in indvars, rather than add new ones.
llvm-svn: 56230
vr2 = OR vr0, vr1
=>
vr2 = OR vr1, vr1 // after coalescing vr0 with vr1
Update the value# of the destination register with the copy instruction if that happens.
llvm-svn: 56165
vr1024 = extract_subreg vr1025, 1
...
vr1024 = mov8rr AH
If vr1024 is coalesced with AH, the extract_subreg is now illegal since AH does not have a super-reg whose sub-register 1 is AH.
llvm-svn: 56118
i32>. This is a little messy, but it works.
We should really get rid of the intrinsics, though, since they map
perfectly well to standard LLVM instructions.
llvm-svn: 55864
instructions in CellSPU as "Expand" so that they won't be generated. I added a
"FIXME" so that this hack can be addressed and reverted once ISD::ROTR is
supported in the .td files.
llvm-svn: 55582
combiner can now generate ROTR if the backend says that it can handle it. Cell
SPU says this, but gets an error from code gen saying that it can't select
ROTR. I'm xfailing this test until this can be fixed.
llvm-svn: 55579