chain operand to point to the load being folded. Now we relax this, traversing
up the chain, if it doesn't reach the load, then it's ok. We will create a
TokenFactor (of all the chain operands and the load's chain) to capture all
the control flow dependencies.
llvm-svn: 30897
The dag/inst combiners often 'simplify' the masked value based on whether
or not the bits are live or known zero/one. This is good and dandy, but
often causes special case patterns to fail, such as alpha's CMPBGE pattern,
which looks like "(set GPRC:$RC, (setuge (and GPRC:$RA, 255), (and GPRC:$RB, 255)))".
Here the pattern for (and X, 255) should match actual dags like (and X, 254) if
the dag combiner proved that the missing bits are already zero (one for 'or').
For CodeGen/Alpha/cmpbge.ll:test2 for example, this results in:
sll $16,1,$0
cmpbge $0,$17,$0
ret $31,($26),1
instead of:
sll $16,1,$0
and $0,254,$0
and $17,255,$1
cmpule $1,$0,$0
ret $31,($26),1
... and requires no target-specific code.
llvm-svn: 30871
AggregateString += "\0\0";
Doesn't add two nuls to the AggregateString (for obvious reasons), which
broke the asmprinter when the first character of an asm string was not
literal text.
llvm-svn: 30625
the branch's chain is also produced by cmp.
[ch, r : ld]
^ ^
| |
[XX]--/ \- [flag : cmp]
^ ^
| |
\---[br flag]-
Remove an isel check which prevents loads from being folded into cmp / test
instructions.
2) Whenever possible, delete a selected node to allow more load folding
opportunities. Note not all nodes can be deleted after it has been
selected. Some may have simply morphed; some have not changed at all (e.g.
EntryToken).
llvm-svn: 30242
actually *removes* one of the operands, instead of just assigning both operands
the same register. This make reasoning about instructions unnecessarily complex,
because you need to know if you are before or after register allocation to match
up operand #'s with the target description file.
Changing this also gets rid of a bunch of hacky code in various places.
This patch also includes changes to fold loads into cmp/test instructions in
the X86 backend, along with a significant simplification to the X86 spill
folding code.
llvm-svn: 30108
- Clean up the code generated by tablegen:
* AddToISelQueue now takes one argument.
* ComplexPattern matching condition can now be shared.
* Eliminate passing unnecessary arguments to emit routines.
* Eliminate some unneeded SDOperand declarations in select routines.
* Other minor clean ups.
- This reduces foot print slightly: X86ISelDAGToDAG.o is reduced from 971k
to 823k.
llvm-svn: 29892
introduced by previous commit.
- SelectCode now returns a SDNode*. If it is not null, the selected node
produces the same number of results as the input node. The seletion loop
is responsible for calling ReplaceAllUsesWith() to replace the input node
with the output target node. For other cases, e.g. when load is folded,
the selection code is responsible for calling ReplaceAllUsesOfValueWith()
and SelectCode returns NULL.
- Other clean ups.
llvm-svn: 29602
in the start of an array and a count of operands where applicable. In many
cases, the number of operands is known, so this static array can be allocated
on the stack, avoiding the heap. In many other cases, a SmallVector can be
used, which has the same benefit in the common cases.
I updated a lot of code calling getNode that takes a vector, but ran out of
time. The rest of the code should be updated, and these methods should be
removed.
We should also do the same thing to eliminate the methods that take a
vector of MVT::ValueTypes.
It would be extra nice to convert the dagiselemitter to avoid creating vectors
for operands when calling getTargetNode.
llvm-svn: 29566
per possible ValueType of the node. e.g. Select_add is split into Select_add_i8,
Select_add_i16, etc.
For opcodes which do not produce a non-chain result, it is split on the
ValueType of its first non-chain operand. e.g. Select_store.
On X86 / Mac OS X, Select_store used to be the largest function. It had a stack
frame size of 8.5k. Now the largest one is Store_i32 with a frame size of 3.1k.
llvm-svn: 29404
instructions not handled would have a case value of #0, throwing things off.
This marginally shrinks the X86 asmprinter, but shrinks the sparc asmwriter
by 25 lines.
llvm-svn: 29187
series of identical commands, handle them all with one switch. In the case
of the x86 at&t asm printer, only 3 switches are needed for all instructions.
llvm-svn: 29184
based and less switch-statements-with-hundreds-of-cases based. This shrinks
the x86 asmprinters to about 1/3 their previous size.
Other improvements coming.
llvm-svn: 29177
code that emit target specific nodes into emit functions that are uniquified
and shared among selection routines.
e.g. This reduces X86ISelDAGToDAG.o (release) from ~2M to ~1.5M. Stack frame
size of Select_store from ~13k down to ~8k.
This is the first step. Further work to enable more sharing will follow.
llvm-svn: 29158
and index into it, instead of emitting it like this:
static const char * const OpStrs[] = {
"PHINODE\n", // PHI
0, // INLINEASM
"adc ", // ADC32mi
"adc ", // ADC32mi8
...
The old way required thousands of relocations that slows down link time and
dynamic load times.
This also cuts about 10K off each of the X86 asmprinters, and should shrink
the others as well.
llvm-svn: 29152
because information about one can help refine the other. This allows us to
write:
def : Pat<(i32 (extload xaddr:$src, i8)),
(LBZX xaddr:$src)>;
as:
def : Pat<(extload xaddr:$src, i8),
(LBZX xaddr:$src)>;
because tblgen knows LBZX returns i32.
llvm-svn: 28865
instruction, and the result type of the instruction to refine the pattern.
This allows us to write things like this:
def : Pat<(v2i64 (bitconvert (v16i8 VR128:$src))), (v2i64 VR128:$src)>;
as:
def : Pat<(v2i64 (bitconvert (v16i8 VR128:$src))), (VR128:$src)>
and fixes a ppc64 issue.
llvm-svn: 28863
operands. e.g.
def CALL32r : I<0xFF, MRM2r, (ops GR32:$dst, variable_ops),
"call {*}$dst", [(X86call GR32:$dst)]>;
TableGen should emit operand informations for the "required" operands.
Added a target instruction info flag M_VARIABLE_OPS to indicate the target
instruction may have more operands in addition to the minimum required
operands.
llvm-svn: 28791
a cycle. This increase the search space and will increase compile time (in
practice it appears to be small, e.g. 176.gcc goes from 62 sec to 65 sec)
that will be addressed later.
llvm-svn: 28476
patterns that look like this:
def : Pat<(i32 (X86Wrapper tconstpool :$dst)), (MOV32ri tconstpool :$dst)>;
InsertOneTypeCheck should copy the type from the resolved pattern to the
unresolved one as long as there types are different.
llvm-svn: 28389
1. Use expects a chain output.
2. Node is expanded into multiple target ops.
3. One of the inner node produces a chain, the outer most one doesn't.
llvm-svn: 28209
x86 and ppc for 100% dense switch statements when relocations are non-PIC.
This support will be extended and enhanced in the coming days to support
PIC, and less dense forms of jump tables.
llvm-svn: 27947
validate the prototype of intrinsic functions. This prevents GCC from going
crazy and inlining too much stuff, eventually running out of memory.
llvm-svn: 27283
independently, batch up checks so that identically typed intrinsics share
verifier code. This dramatically reduces the size of the verifier function,
which should help avoid GCC running out of memory compiling Verifier.cpp.
llvm-svn: 27281
mismatch against the enum table.
This is a part of Sabre's master plan to drive me nuts with subtle bugs that
happens to only affect x86 be. :-)
llvm-svn: 27237