1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 22:12:47 +01:00
Commit Graph

1013 Commits

Author SHA1 Message Date
Torok Edwin
358888da3a Implement changes from Chris's feedback.
Finish converting lib/Target.

llvm-svn: 75043
2009-07-08 20:53:28 +00:00
Torok Edwin
980729667e Convert more abort() calls to llvm_report_error().
Also remove trailing semicolon.

llvm-svn: 75027
2009-07-08 19:04:27 +00:00
Torok Edwin
ad3be984b7 Start converting to new error handling API.
cerr+abort -> llvm_report_error
assert(0)+abort -> LLVM_UNREACHABLE (assert(0)+llvm_unreachable-> abort() included)

llvm-svn: 75018
2009-07-08 18:01:40 +00:00
Dale Johannesen
5487047295 Don't accept globals as matching 'i' constraint
in PIC modes (in accordance with existing comment).
gcc.apple/asm-block-25.c

llvm-svn: 74886
2009-07-07 00:18:49 +00:00
Tilmann Scheller
cea3c16aa5 Add NumFixedArgs attribute to CallSDNode which indicates the number of fixed arguments in a vararg call.
With the SVR4 ABI on PowerPC, vector arguments for vararg calls are passed differently depending on whether they are a fixed or a variable argument. Variable vector arguments always go into memory, fixed vector arguments are put 
into vector registers. If there are no free vector registers available, fixed vector arguments are put on the stack.

The NumFixedArgs attribute allows to decide for an argument in a vararg call whether it belongs to the fixed or variable portion of the parameter list.

llvm-svn: 74764
2009-07-03 06:44:53 +00:00
Bill Wendling
fdd5badace Update comments to make it clear that the function alignment is the Log2 of the
bytes and not bytes.

llvm-svn: 74624
2009-07-01 18:50:55 +00:00
Bill Wendling
c0fb316bd3 Add an "alignment" field to the MachineFunction object. It makes more sense to
have the alignment be calculated up front, and have the back-ends obey whatever
alignment is decided upon.

This allows for future work that would allow for precise no-op placement and the
like.

llvm-svn: 74564
2009-06-30 22:38:32 +00:00
David Greene
0bf8cb7487 Add a 256-bit register class and YMM registers.
llvm-svn: 74469
2009-06-29 22:50:51 +00:00
Owen Anderson
d0e12300d9 Add a target-specific DAG combine on X86 to fold the common pattern of
fence-atomic-fence down to just the atomic op.  This is possible thanks to
X86's relatively strong memory model, which guarantees that locked instructions
(which are used to implement atomics) are implicit fences.

llvm-svn: 74435
2009-06-29 18:04:45 +00:00
David Greene
21d2c76116 Add more vector ValueTypes for AVX and other extended vector instruction
sets.

llvm-svn: 74427
2009-06-29 16:47:10 +00:00
Chris Lattner
9571347ce0 pull @GOT, @GOTOFF, @GOTPCREL handling into isel from the asmprinter.
llvm-svn: 74378
2009-06-27 05:39:56 +00:00
Chris Lattner
19eb0dad26 Reimplement rip-relative addressing in the X86-64 backend. The new
implementation primarily differs from the former in that the asmprinter
doesn't make a zillion decisions about whether or not something will be
RIP relative or not.  Instead, those decisions are made by isel lowering
and propagated through to the asm printer.  To achieve this, we:

1. Represent RIP relative addresses by setting the base of the X86 addr
   mode to X86::RIP.
2. When ISel Lowering decides that it is safe to use RIP, it lowers to
   X86ISD::WrapperRIP.  When it is unsafe to use RIP, it lowers to
   X86ISD::Wrapper as before.
3. This removes isRIPRel from X86ISelAddressMode, representing it with
   a basereg of RIP instead.
4. The addressing mode matching logic in isel is greatly simplified.
5. The asmprinter is greatly simplified, notably the "NotRIPRel" predicate
   passed through various printoperand routines is gone now.
6. The various symbol printing routines in asmprinter now no longer infer
   when to emit (%rip), they just print the symbol.

I think this is a big improvement over the previous situation.  It does have
two small caveats though: 1. I implemented a horrible "no-rip" modifier for
the inline asm "P" constraint modifier.  This is a short term hack, there is
a much better, but more involved, solution.  2. I had to xfail an 
-aggressive-remat testcase because it isn't handling the use of RIP in the
constant-pool reading instruction.  This specific test is easy to fix without
-aggressive-remat, which I intend to do next.

llvm-svn: 74372
2009-06-27 04:16:01 +00:00
Chris Lattner
1f3d17f45d Move all the TLS processing logic into isel, don't do it in asmprinter at all.
llvm-svn: 74327
2009-06-26 21:20:29 +00:00
Chris Lattner
0a0494b4f9 move magic for PIC constantpool references from asmprinter to isel.
llvm-svn: 74313
2009-06-26 19:22:52 +00:00
Chris Lattner
05eb63598b start adding logic in isel to determine asm printer semantics, step N of M.
llvm-svn: 74246
2009-06-26 00:43:52 +00:00
Chris Lattner
e358de060d indentation fix
llvm-svn: 73840
2009-06-21 02:22:34 +00:00
Eli Friedman
c80a4f18de Misc accumulated tweaks to legalization logic for various targets.
llvm-svn: 73476
2009-06-16 06:40:59 +00:00
Chris Lattner
eb664fc504 I got J and K backward, many thanks to Eli for spotting this!
llvm-svn: 73372
2009-06-15 04:39:05 +00:00
Chris Lattner
e427a956ca implement support for the 'K' asm constraint, PR4347
llvm-svn: 73366
2009-06-15 04:01:39 +00:00
Arnold Schwaighofer
780e3addf8 Fix Bug 4278: X86-64 with -tailcallopt calling convention
out of sync with regular cc.

The only difference between the tail call cc and the normal
cc was that one parameter register - R9 - was reserved for
calling functions through a function pointer. After time the
tail call cc has gotten out of sync with the regular cc. 

We can use R11 which is also caller saved but not used as
parameter register for potential function pointers and
remove the special tail call cc on x86-64.

llvm-svn: 73233
2009-06-12 16:26:57 +00:00
Anton Korobeynikov
1447d902e3 Silence a warning
llvm-svn: 73152
2009-06-09 23:00:39 +00:00
Eli Friedman
1609a6524f Get rid of some unnecessary code.
llvm-svn: 73017
2009-06-07 07:28:45 +00:00
Eli Friedman
d4b463b0dc Slightly generalize the code that handles shuffles of consecutive loads
on x86 to handle more cases.  Fix a bug in said code that would cause it 
to read past the end of an object.  Rewrite the code in 
SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general. 
Remove PerformBuildVectorCombine, which is no longer necessary with 
these changes.  In addition to simplifying the code, with this change, 
we can now catch a few more cases of consecutive loads.

llvm-svn: 73012
2009-06-07 06:52:44 +00:00
Eli Friedman
4395222136 Avoid crashing on a variable-index insertelement with element type i16.
llvm-svn: 72991
2009-06-06 06:32:50 +00:00
Eli Friedman
e546f94ef5 Get rid of some bogus patterns for X86vzmovl. Don't create VZEXT_MOVL
nodes for vectors with an i16 element type.  Add an optimization for 
building a vector which is all zeros/undef except for the bottom 
element, where the bottom element is an i8 or i16.

llvm-svn: 72988
2009-06-06 06:05:10 +00:00
Eli Friedman
05eef883e8 PR2598: make sure to expand illegal forms of integer/floating-point
conversions for x86, like <2 x i32> -> <2 x float> and <4 x i16> -> 
<4 x float>.

llvm-svn: 72983
2009-06-06 03:57:58 +00:00
Devang Patel
8d170194e8 Add new function attribute - noimplicitfloat
Update code generator to use this attribute and remove NoImplicitFloat target option.
Update llc to set this attribute when -no-implicit-float command line option is used.

llvm-svn: 72959
2009-06-05 21:57:13 +00:00
Nate Begeman
058d4eeccf Adapt the x86 build_vector dagcombine to the current state of the legalizer.
build vectors with i64 elements will only appear on 32b x86 before legalize.
Since vector widening occurs during legalize, and produces i64 build_vector 
elements, the dag combiner is never run on these before legalize splits them
into 32b elements.

Teach the build_vector dag combine in x86 back end to recognize consecutive 
loads producing the low part of the vector.

Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes
since that was required implicitly.

Add a testcase for the transform.

Old:
	subl	$28, %esp
	movl	32(%esp), %eax
	movl	4(%eax), %ecx
	movl	%ecx, 4(%esp)
	movl	(%eax), %eax
	movl	%eax, (%esp)
	movaps	(%esp), %xmm0
	pmovzxwd	%xmm0, %xmm0
	movl	36(%esp), %eax
	movaps	%xmm0, (%eax)
	addl	$28, %esp
	ret

New:
	movl	4(%esp), %eax
	pmovzxwd	(%eax), %xmm0
	movl	8(%esp), %eax
	movaps	%xmm0, (%eax)
	ret

llvm-svn: 72957
2009-06-05 21:37:30 +00:00
Devang Patel
d0745140a3 Evan thinks NoImplicitFloat check is not required here.
llvm-svn: 72954
2009-06-05 18:48:29 +00:00
Dan Gohman
273546fbdc Remove unnecessary #includes.
llvm-svn: 72782
2009-06-03 16:47:12 +00:00
Dale Johannesen
8b6ee9e312 Revert 72707 and 72709, for the moment.
llvm-svn: 72712
2009-06-02 03:12:52 +00:00
Dale Johannesen
c08669561e Make the implicit inputs and outputs of target-independent
ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
instead of MVT::Flag.  Remove CARRY_FALSE in favor of 0; adjust
all target-independent code to use this format.

Most targets will still produce a Flag-setting target-dependent
version when selection is done.  X86 is converted to use i32
instead, which means TableGen needs to produce different code
in xxxGenDAGISel.inc.  This keys off the new supportsHasI1 bit
in xxxInstrInfo, currently set only for X86; in principle this
is temporary and should go away when all other targets have
been converted.  All relevant X86 instruction patterns are
modified to represent setting and using EFLAGS explicitly.  The
same can be done on other targets.

The immediate behavior change is that an ADC/ADD pair are no
longer tightly coupled in the X86 scheduler; they can be
separated by instructions that don't clobber the flags (MOV).
I will soon add some peephole optimizations based on using
other instructions that set the flags to feed into ADC.

llvm-svn: 72707
2009-06-01 23:27:20 +00:00
Bill Wendling
8235a05c1a Untabification.
llvm-svn: 72604
2009-05-30 01:09:53 +00:00
Evan Cheng
40810c4d1b Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code.
e.g.
orl     $65536, 8(%rax)
=>
orb     $1, 10(%rax)

Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.

llvm-svn: 72507
2009-05-28 00:35:15 +00:00
Eli Friedman
9a87deee7e Ger rid of some dead code.
llvm-svn: 72494
2009-05-27 20:39:00 +00:00
Eli Friedman
b8c9f7ee35 Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and
FP_TO_XINT.  Necessary for some cleanups I'm working on.  Updated 
from the previous version (r72431) to fix a bug and make some things a 
bit clearer.

llvm-svn: 72445
2009-05-27 00:47:34 +00:00
Daniel Dunbar
75f52bda74 Back out r72431, it is causing a number of compilation crashes with clang.
llvm-svn: 72436
2009-05-26 21:27:02 +00:00
Eli Friedman
f7d0c01ed6 Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and
FP_TO_XINT.  Necessary for some cleanups I'm working on. 

llvm-svn: 72431
2009-05-26 19:18:56 +00:00
Eli Friedman
f4d25bb2b6 Make the X86 backend mark EXTRACT_SUBVECTOR as Expand, at least for the
moment.

llvm-svn: 72350
2009-05-23 22:44:52 +00:00
Eli Friedman
d877b76d14 Make the x86 backend custom-lower UINT_TO_FP and FP_TO_UINT on 32-bit
systems instead of attempting to promote them to a 64-bit SINT_TO_FP or 
FP_TO_SINT.  This is in preparation for removing the type legalization 
code from LegalizeDAG: once type legalization is gone from LegalizeDAG, 
it won't be able to handle the i64 operand/result correctly.

This isn't quite ideal, but I don't think any other operation for any 
target ends up in this situation, so treating this case specially seems 
reasonable.

llvm-svn: 72324
2009-05-23 09:59:16 +00:00
Evan Cheng
9bd08f0cde Run code placement optimization for targets that want it (arm and x86 for now).
llvm-svn: 71726
2009-05-13 21:42:09 +00:00
Chris Lattner
7b2dabcac9 Fix PR4152: asm constraint validation happens before dag combine, so we
need to work a bit to combine things like (x+c1+c2) into x+c3.

llvm-svn: 71232
2009-05-08 18:23:14 +00:00
Nate Begeman
b407809122 Fix infinite recursion in the C++ code which handles movddup by making it unnecessary.
llvm-svn: 70425
2009-04-29 22:47:44 +00:00
Nate Begeman
414534b3eb Implement review feedback for vector shuffle work.
llvm-svn: 70372
2009-04-29 05:20:52 +00:00
Nate Begeman
9d121924fd 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.
PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225
2009-04-27 18:41:29 +00:00
Rafael Espindola
4e7a0bf1f1 Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not
very elegant, but neither is the tls specification :-(

llvm-svn: 69968
2009-04-24 12:59:40 +00:00
Rafael Espindola
0b1037ad26 Revert 69952. Causes testsuite failures on linux x86-64.
llvm-svn: 69967
2009-04-24 12:40:33 +00:00
Nate Begeman
c1a09c7dfa PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952
2009-04-24 03:42:54 +00:00
Duncan Sands
58c9c564a9 Get rid of what looks like a copy-and-pasted typo.
Spotted by gcc-4.5.

llvm-svn: 69673
2009-04-21 09:44:39 +00:00
Bob Wilson
f7e9ff1d28 Move duplicated AddLiveIn function from X86 and ARM backends to be a method
in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock.  Thanks for Anton for suggesting this.

llvm-svn: 69615
2009-04-20 18:36:57 +00:00
Rafael Espindola
d74132e2c5 For general dynamic TLS access we must use
leaq	foo@TLSGD(%rip), %rdi

as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.

llvm-svn: 69350
2009-04-17 14:35:58 +00:00
Rafael Espindola
72347bffce X86-64 TLS support for local exec and initial exec.
llvm-svn: 68947
2009-04-13 13:02:49 +00:00
Dan Gohman
8121b3f88d Remove the obsolete SelectionDAG::getNodeValueTypes and simplify
code that uses it by using SelectionDAG::getVTList instead.

llvm-svn: 68744
2009-04-09 23:54:40 +00:00
Dan Gohman
6cb1387261 Fix grammaros in comments.
llvm-svn: 68666
2009-04-09 02:06:09 +00:00
Rafael Espindola
7eb72dc5f2 Re-apply 68552.
Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645
2009-04-08 21:14:34 +00:00
Rafael Espindola
d4563305fd Avoid a hard coded constant.
llvm-svn: 68603
2009-04-08 08:09:33 +00:00
Dan Gohman
c9ce27d6b7 Implement support for using modeling implicit-zero-extension on x86-64
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.

This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.

Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.

Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.

Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.

Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.

llvm-svn: 68576
2009-04-08 00:15:30 +00:00
Bill Wendling
6e702cf68c Temporarily revert r68552. This was causing a failure in the self-hosting LLVM
builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560
2009-04-07 22:35:25 +00:00
Rafael Espindola
0324937229 Reduce code duplication on the TLS implementation.
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552
2009-04-07 21:37:46 +00:00
Mon P Wang
f829fb5cab Added a x86 dag combine to increase the chances to use a
movq for v2i64 on x86-32.

llvm-svn: 68368
2009-04-03 02:43:30 +00:00
Chris Lattner
f1719bf7b5 silence warning in release-asserts build.
llvm-svn: 68253
2009-04-01 22:14:45 +00:00
Evan Cheng
44fdb5d570 i128 shift libcalls are not available on x86.
llvm-svn: 68133
2009-03-31 19:38:51 +00:00
Evan Cheng
3e30bcbd69 When optimzing a mul by immediate into two, the resulting mul's should get a x86 specific node to avoid dag combiner from hacking on them further.
llvm-svn: 68066
2009-03-30 21:36:47 +00:00
Rafael Espindola
37522e768a Have only one definition of X86AddrNumOperands.
llvm-svn: 67949
2009-03-28 18:55:31 +00:00
Evan Cheng
a15fdaa292 Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g.
x * 40
=>
shlq    $3, %rdi
leaq    (%rdi,%rdi,4), %rax

This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq    (%rdi,%rdi,2), %rax
leaq    (%rsi,%rax,8), %rax

llvm-svn: 67917
2009-03-28 05:57:29 +00:00
Rafael Espindola
38604d9598 I am trying to add a segment to the X86 addresses matching to
improve TLS support (see http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090309/075220.html), but that code is VERY brittle.

This patch just makes it a bit more resistant.

llvm-svn: 67843
2009-03-27 15:26:30 +00:00
Evan Cheng
ab6e38c88d -no-implicit-float means explicit fp operations are legal.
llvm-svn: 67784
2009-03-26 23:06:32 +00:00
Bill Wendling
f4247ff478 Pull transform from target-dependent code into target-independent code.
llvm-svn: 67742
2009-03-26 06:14:09 +00:00
Bill Wendling
f79eccc675 Match this pattern so that we can generate simpler code:
%a = ...
  %b = and i32 %a, 2
  %c = srl i32 %b, 1
  %d = br i32 %c, 

into

  %a = ...
  %b = and %a, 2
  %c = X86ISD::CMP %b, 0
  %d = X86ISD::BRCOND %c ...

This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.

llvm-svn: 67728
2009-03-26 01:47:50 +00:00
Bill Wendling
2fe64f48aa These instructions have special lowering that may lower them to SSE
instructions. Prevent that if we don't want implicit uses of SSE.

llvm-svn: 66877
2009-03-13 08:41:47 +00:00
Evan Cheng
f9951d1557 Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.


Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875
2009-03-13 07:51:59 +00:00
Chris Lattner
cbbdd230dd generalize the previous code to use the full generality of LEA
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869
2009-03-13 05:53:31 +00:00
Chris Lattner
878d951f8f optimize the case of cond ? 42 : 41 and friends. This compiles the
example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868
2009-03-13 05:22:11 +00:00
Chris Lattner
26a971c4ec Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))"
related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779
2009-03-12 06:52:53 +00:00
Evan Cheng
46e903d2f6 On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.
llvm-svn: 66776
2009-03-12 05:59:15 +00:00
Bill Wendling
fca05e3a5c Add a -no-implicit-float flag. This acts like -soft-float, but may generate
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
2009-03-11 22:30:01 +00:00
Mon P Wang
287e422039 For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits.
llvm-svn: 66684
2009-03-11 18:47:57 +00:00
Mon P Wang
2867737ad2 Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw.
llvm-svn: 66645
2009-03-11 06:35:11 +00:00
Chris Lattner
eb9327f335 formatting change, reduce indentation. No functionality change.
llvm-svn: 66642
2009-03-11 05:48:52 +00:00
Dan Gohman
b9c32f1aca Arithmetic instructions don't set EFLAGS bits OF and CF bits
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318
2009-03-07 01:58:32 +00:00
Dan Gohman
1e9db7c1a1 When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand.
The extra operand didn't appear to cause any trouble, but it was
erroneous regardless.

llvm-svn: 66206
2009-03-05 21:29:28 +00:00
Dan Gohman
f6f684b206 Fix the "test" optimization to recognize "dec" as an add of
negative one, as subtracts of immediates are canonicalized
to adds.

llvm-svn: 66180
2009-03-05 19:32:48 +00:00
Dan Gohman
31fb085c2e Re-apply 66008, now that the unfoldMemoryOperand bug is fixed.
llvm-svn: 66058
2009-03-04 19:44:21 +00:00
Dan Gohman
6831e2c2a6 Revert r66004 for now; it's causing a variety of test failures.
llvm-svn: 66008
2009-03-04 03:54:19 +00:00
Dan Gohman
c6c669cc1e Teach the x86 backend to eliminate "test" instructions by using the EFLAGS
result from add, sub, inc, and dec instructions in simple cases.

llvm-svn: 66004
2009-03-04 02:33:24 +00:00
Rafael Espindola
880e63bf01 Refactor TLS code and add some tests. The tests and expected results are:
pic |  declaration | linkage  | visibility |

!pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
 pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
!pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
 pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic

!pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
 pic |  declaration | external | hidden     | X                       | local dynamic
!pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
 pic | !declaration | external | hidden     | X                       | local dynamic

!pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
 pic |  declaration | internal | default    | X                       | local dynamic

The ones marked with an X have not been implemented since local dynamic is not implemented.

llvm-svn: 65632
2009-02-27 13:37:18 +00:00
Evan Cheng
ec34226c2b Revert BuildVectorSDNode related patches: 65426, 65427, and 65296.
llvm-svn: 65482
2009-02-25 22:49:59 +00:00
Evan Cheng
dd139e795c Only v1i16 (i.e. _m64) is returned via RAX / RDX.
llvm-svn: 65313
2009-02-23 09:03:22 +00:00
Nate Begeman
e0093d2501 Generate better code for v8i16 shuffles on SSE2
Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311
2009-02-23 08:49:38 +00:00
Scott Michel
3f8637305f Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.

llvm-svn: 65296
2009-02-22 23:36:09 +00:00
Evan Cheng
4385f393f7 Be bug compatible with gcc by returning MMX values in RAX.
llvm-svn: 65274
2009-02-22 08:05:12 +00:00
Evan Cheng
c40c3e28f7 Support return of MMX values in 64-bit mode.
llvm-svn: 65152
2009-02-20 20:43:02 +00:00
Scott Michel
4c5fa6c982 Remove trailing whitespace to reduce later commit patch noise.
(Note: Eventually, commits like this will be handled via a pre-commit hook that
 does this automagically, as well as expand tabs to spaces and look for 80-col
 violations.)

llvm-svn: 64827
2009-02-17 22:15:04 +00:00
Evan Cheng
9041a71923 Teach x86 target -soft-float.
llvm-svn: 64496
2009-02-13 22:36:38 +00:00
Dale Johannesen
47321cf01f Arrange to print constants that match "n" and "i" constraints
in inline asm as signed (what gcc does).  Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.

llvm-svn: 64400
2009-02-12 20:58:09 +00:00
Dale Johannesen
b22cb23f6f Use getDebugLoc forwarder instead of getNode()->getDebugLoc.
No functional change.

llvm-svn: 64026
2009-02-07 19:59:05 +00:00
Dan Gohman
4105a38248 Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing
ScheduleDAG's TLI member to use const.

llvm-svn: 64018
2009-02-07 16:15:20 +00:00
Dale Johannesen
a259483aae Get rid of the last non-DebugLoc versions of getNode!
Many targets build placeholder nodes for special operands, e.g.
GlobalBaseReg on X86 and PPC for the PIC base.  There's no
sensible way to associate debug info with these.  I've left
them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. 
I'm not too happy about this but don't see a good improvement;
I considered adding a getPseudoOperand or something, but it
seems to me that'll just make it harder to read.

llvm-svn: 63992
2009-02-07 00:55:49 +00:00
Dale Johannesen
1580ab6b7f Remove more non-DebugLoc getNode variants. Use
getCALLSEQ_{END,START} to permit passing no DebugLoc
there.  UNDEF doesn't logically have DebugLoc; add
getUNDEF to encapsulate this.

llvm-svn: 63978
2009-02-06 23:05:02 +00:00
Dale Johannesen
c405486235 Remove more non-DebugLoc versions of getNode.
llvm-svn: 63969
2009-02-06 21:50:26 +00:00