1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

819 Commits

Author SHA1 Message Date
Chris Lattner
c5c00890e5 If an xmm register is referenced explicitly in an inline asm, make sure to
assign it to a version of the xmm register with the regclass that matches its
type.  This fixes PR2715, a bug handling some crazy xpcom case in mozilla.

llvm-svn: 55358
2008-08-26 06:19:02 +00:00
Evan Cheng
569b489cf5 Try approach to moving call address load inside of callseq_start. Now it's done during the preprocess of x86 isel. callseq_start's chain is changed to load's chain node; while load's chain is the last of callseq_start or the loads or copytoreg nodes inserted to move arguments to the right spot.
llvm-svn: 55338
2008-08-25 21:27:18 +00:00
Bill Wendling
5728cf59fd Temporarily reverting r55292. It's causing a bootstraping failure:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc ... src/libiberty/make-temp-file.c -o make-temp-file.o
Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->Dep->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp, line 508.
../../../../llvm-gcc.src/libiberty/hashtab.c:955: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[4]: *** [hashtab.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [multi-do] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-target-libiberty] Error 2
make: *** [all] Error 2

llvm-svn: 55295
2008-08-24 21:45:30 +00:00
Evan Cheng
a600778748 Move callseq_start above the call address load to allow load to be folded into the call node.
llvm-svn: 55292
2008-08-24 19:19:55 +00:00
Bill Wendling
f105f92904 If part of the mask is "undef", then ignore it as we don't care what goes into it.
llvm-svn: 55147
2008-08-21 22:36:36 +00:00
Bill Wendling
170bd0a562 Fix whitespace. No functionality change.
llvm-svn: 55146
2008-08-21 22:35:37 +00:00
Evan Cheng
ef2509b3ba Fix a number of byval / memcpy / memset related codegen issues.
1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying.
2. Do not use byte repeat move and store operations. These are slow.

llvm-svn: 55139
2008-08-21 21:00:15 +00:00
Mon P Wang
bf7b94fd29 Treat floating point ST1 the same as ST0 when lowering for a call result
llvm-svn: 55135
2008-08-21 19:54:16 +00:00
Dan Gohman
ddebe95287 Simplify FastISel's constructor argument list, make the FastISel
class hold a MachineRegisterInfo member, and make the
MachineBasicBlock be passed in to SelectInstructions rather
than the FastISel constructor.

llvm-svn: 55076
2008-08-20 21:05:57 +00:00
Dale Johannesen
69c9d47dce Add remaining 64-bit atomic patterns for x86-64.
llvm-svn: 55029
2008-08-20 00:48:50 +00:00
Bill Wendling
ab390189dc Revert r55018 and apply the correct "fix" for the 64-bit sub_and_fetch atomic.
Just expand it like the other X-bit sub_and_fetches.

llvm-svn: 55023
2008-08-20 00:28:16 +00:00
Dan Gohman
b1ba73eeed Instantiate FastISel for X86.
llvm-svn: 55011
2008-08-19 21:45:35 +00:00
Dan Gohman
36e732b8fc The X86 target will soon have an implementation of createFastISel.
llvm-svn: 55010
2008-08-19 21:32:53 +00:00
Dale Johannesen
15b76de064 Add support for 8 and 16 bit forms of __sync
builtins on X86.

Change "lock" instructions to be on a separate line.
This is needed to work around a bug in the Darwin
assembler.

llvm-svn: 54999
2008-08-19 18:47:28 +00:00
Evan Cheng
6534c78383 Fix a (u)comiss intrinsic lowering bug. It was using anyext which can return junk in higher bits. Patch by Nate Begeman.
llvm-svn: 54903
2008-08-17 19:22:34 +00:00
Anton Korobeynikov
d475141eea Use correct name for TLS address resolution routine on x86-64
llvm-svn: 54845
2008-08-16 12:58:29 +00:00
Dan Gohman
7534da85c9 Also avoid pinsrw and pinsrb with a variable insertelement index.
llvm-svn: 54803
2008-08-14 22:53:18 +00:00
Dan Gohman
c530d2983d Don't try to use the insertps instruction for vector
element inserts with non-constant indices. This fixes
CodeGen/X86/vector-variable-idx.ll on machines that
have SSE4.1.

llvm-svn: 54801
2008-08-14 22:43:26 +00:00
Evan Cheng
f4d1119fbd Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64.
llvm-svn: 54376
2008-08-05 22:19:15 +00:00
Dan Gohman
5d0df78ae0 Add an assert to catch invalid VECTOR_SHUFFLE mask indices.
llvm-svn: 54329
2008-08-04 23:09:15 +00:00
Andrew Lenharth
377c046675 Add atomic sub for other sizes
llvm-svn: 54314
2008-08-03 20:17:34 +00:00
Dan Gohman
9742f7772d Rename SDOperand to SDValue.
llvm-svn: 54128
2008-07-27 21:46:04 +00:00
Dan Gohman
47c5cdbc34 Tidy SDNode::use_iterator, and complete the transition to have it
parallel its analogue, Value::value_use_iterator. The operator* method
now returns the user, rather than the use.

llvm-svn: 54127
2008-07-27 20:43:25 +00:00
Nate Begeman
5523d40e4b Disable mov{L, LP, HP, HLP, *DUP} shuffles for mmx
mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code.

Also commit Mon Ping's VSETCC patch

llvm-svn: 54039
2008-07-25 19:05:58 +00:00
Evan Cheng
20c9cdbe69 Fix PR2485: do all 4-element SSE shuffles in max. of 2 shuffle instructions.
Based on patch by Nicolas Capens.

llvm-svn: 53939
2008-07-23 00:22:17 +00:00
Evan Cheng
ff0bd19937 Factor out SSE 4 wide shuffle lowering code into its own function. No functionality changes.
llvm-svn: 53933
2008-07-22 21:13:36 +00:00
Evan Cheng
901d469e05 Fix PR2574: implement v2f32 scalar_to_vector.
llvm-svn: 53927
2008-07-22 18:39:19 +00:00
Duncan Sands
6e31474e71 Add VerifyNode, a place to put sanity checks on
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor).  Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check".  My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!

llvm-svn: 53850
2008-07-21 10:20:31 +00:00
Bill Wendling
98b6e63176 Fix for first part of PR2562. Generate the "pinsrw" instruction for inserts
into v4i16 vectors.

llvm-svn: 53807
2008-07-20 02:32:23 +00:00
Nate Begeman
af01bfff99 SSE codegen for vsetcc nodes
llvm-svn: 53719
2008-07-17 16:51:19 +00:00
Mon P Wang
57cd9d6e5a When lowering certain atomics, we need to copy the memoperand from the old
atomic operation to the new one.

llvm-svn: 53714
2008-07-17 04:54:06 +00:00
Evan Cheng
face16f9d8 x86-64 PIC JIT fixes: do not generate the extra load for external GV's.
llvm-svn: 53661
2008-07-16 01:34:02 +00:00
Dan Gohman
4c18394001 Include a frame index in the "fixed stack" pseudo source value
instead of using the frame index for the SVOffset, which was
inconsistent.

llvm-svn: 53486
2008-07-11 22:44:52 +00:00
Bill Wendling
9f17caa9a9 The frame address on an x86-64 box needs to be offset by -8, not -4.
llvm-svn: 53450
2008-07-11 07:18:52 +00:00
Dan Gohman
cd25487258 Pool-allocation for MachineInstrs, MachineBasicBlocks, and
MachineMemOperands. The pools are owned by MachineFunctions.

This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.

llvm-svn: 53212
2008-07-07 23:14:23 +00:00
Duncan Sands
3ea6f15708 Rather than having a different custom legalization
hook for each way in which a result type can be
legalized (promotion, expansion, softening etc),
just use one: ReplaceNodeResults, which returns
a node with exactly the same result types as the
node passed to it, but presumably with a bunch of
custom code behind the scenes.  No change if the
new LegalizeTypes infrastructure is not turned on.

llvm-svn: 53137
2008-07-04 11:47:58 +00:00
Duncan Sands
21e2a711e3 Add a new getMergeValues method that does not need
to be passed the list of value types, and use this
where appropriate.  Inappropriate places are where
the value type list is already known and may be
long, in which case the existing method is more
efficient.

llvm-svn: 53035
2008-07-02 17:40:58 +00:00
Duncan Sands
d8d11501c9 Highlight that getMergeValues optimization is
being suppressed here.

llvm-svn: 52952
2008-07-01 08:00:49 +00:00
Dan Gohman
c8097f8c8c Split ISD::LABEL into ISD::DBG_LABEL and ISD::EH_LABEL, eliminating
the need for a flavor operand, and add a new SDNode subclass,
LabelSDNode, for use with them to eliminate the need for a label id
operand.

Change instruction selection to let these label nodes through
unmodified instead of creating copies of them. Teach the MachineInstr
emitter how to emit a MachineInstr directly from an ISD label node.

This avoids the need for allocating SDNodes for the label id and
flavor value, as well as SDNodes for each of the post-isel label,
label id, and label flavor.

llvm-svn: 52943
2008-07-01 00:05:16 +00:00
Dan Gohman
e58f07e5d6 Update comments to new-style syntax.
llvm-svn: 52925
2008-06-30 21:00:56 +00:00
Dan Gohman
6cc648891b Rename ISD::LOCATION to ISD::DBG_STOPPOINT to better reflect its
purpose, and give it a custom SDNode subclass so that it doesn't
need to have line number, column number, filename string, and
directory string, all existing as individual SDNodes to be the
operands.

This was the only user of ISD::STRING, StringSDNode, etc., so
remove those and some associated code.

This makes stop-points considerably easier to read in
-view-legalize-dags output, and reduces overhead (creating new
nodes and copying std::strings into them) on code containing
debugging information.

llvm-svn: 52924
2008-06-30 20:59:49 +00:00
Duncan Sands
c882a4eba9 Revert the SelectionDAG optimization that makes
it impossible to create a MERGE_VALUES node with
only one result: sometimes it is useful to be able
to create a node with only one result out of one of
the results of a node with more than one result, for
example because the new node will eventually be used
to replace a one-result node using ReplaceAllUsesWith,
cf X86TargetLowering::ExpandFP_TO_SINT.  On the other
hand, most users of MERGE_VALUES don't need this and
for them the optimization was valuable.  So add a new
utility method getMergeValues for creating MERGE_VALUES
nodes which by default performs the optimization.
Change almost everywhere to use getMergeValues (and
tidy some stuff up at the same time).

llvm-svn: 52893
2008-06-30 10:19:09 +00:00
Evan Cheng
71fbfe73c1 - Fix a x86 vector isel bug: illegal transformation of a vector_shuffle into a
shift.
- Add a readme entry for a missing vector_shuffle optimization that results in
  awful codegen.

llvm-svn: 52740
2008-06-25 20:52:59 +00:00
Dan Gohman
404964dbc0 Remove the OrigVT member from AtomicSDNode, as it is redundant with
the base SDNode's VTList.

llvm-svn: 52722
2008-06-25 16:07:49 +00:00
Mon P Wang
7d89d61387 Added MemOperands to Atomic operations since Atomics touches memory.
Added abstract class MemSDNode for any Node that have an associated MemOperand
Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and
atomic.lss => atomic.load.sub

llvm-svn: 52706
2008-06-25 08:15:39 +00:00
Dale Johannesen
fdf8fe6c03 Add v2f32 (MMX) type to X86. Support is primitive:
load,store,call,return,bitcast.  This is enough to
make call and return work.

llvm-svn: 52691
2008-06-24 22:01:44 +00:00
Dan Gohman
c1aa753f00 Remove unnecessary #includes.
llvm-svn: 52613
2008-06-22 19:21:26 +00:00
Eli Friedman
570aa6f801 Fix a bug with <8 x i16> shuffle lowering on X86 where parts of the
shuffle could be skipped.  The check is invalid because the loop index i 
doesn't correspond to the element actually inserted. The correct check is
already done a few lines earlier, for whether the element is already in 
the right spot, so this shouldn't have any effect on the codegen for 
code that was already correct.

llvm-svn: 52486
2008-06-19 06:09:51 +00:00
Evan Cheng
89e2e3292d Rather than avoiding to wrap ISD::DECLARE GV operand in X86ISD::Wrapper, simply handle it at dagisel time with x86 specific isel code.
llvm-svn: 52377
2008-06-17 02:01:22 +00:00
Andrew Lenharth
327c3e7559 add missing atomic intrinsic from gcc
llvm-svn: 52270
2008-06-14 05:48:15 +00:00
Anton Korobeynikov
74422b3cd0 Properly lower DYNAMIC_STACKALLOC - bracket all black magic with
CALLSEQ_BEGIN & CALLSEQ_END.

llvm-svn: 52225
2008-06-11 20:16:42 +00:00
Duncan Sands
fe2a970a5c Remove comparison methods for MVT. The main cause
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits.  Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.

llvm-svn: 52098
2008-06-08 20:54:56 +00:00
Duncan Sands
d634afe3aa Wrap MVT::ValueType in a struct to get type safety
and better control the abstraction.  Rename the type
to MVT.  To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits().  Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).

llvm-svn: 52044
2008-06-06 12:08:01 +00:00
Dan Gohman
e256337a1a Expand small memmovs using inline code. Set the X86 threshold for expanding
memmove to a more plausible value, now that it's actually being used.

llvm-svn: 51696
2008-05-29 19:42:22 +00:00
Evan Cheng
04c0915a2f Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq.
llvm-svn: 51667
2008-05-29 08:22:04 +00:00
Nate Begeman
23dd264da6 Don't attempt to create VZEXT_LOAD out of an extload. This an issue where the
code generator would do something like this:

f64 = load f32 <anyext>, f32mem
v2f64 = insertelt undef, %0, 0
v2f64 = insertelt %1, 0.0, 1

into 

v2f64 = vzext_load f32mem

which on x86 is movsd, when you really wanted a cvtss2sd/movsd pair.

llvm-svn: 51624
2008-05-28 00:24:25 +00:00
Dan Gohman
6cc0b4f262 Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add
load-folding table entries for PMULDQ and PMULLD.

llvm-svn: 51489
2008-05-23 17:49:40 +00:00
Evan Cheng
73dadf21ce Fix typos and comments.
llvm-svn: 51165
2008-05-15 22:13:02 +00:00
Evan Cheng
778a5e27b0 Make use of vector load and store operations to implement memcpy, memmove, and memset. Currently only X86 target is taking advantage of these.
llvm-svn: 51140
2008-05-15 08:39:06 +00:00
Dan Gohman
f9d5689496 Change target-specific classes to use more precise static types.
This eliminates the need for several awkward casts, including
the last dynamic_cast under lib/Target.

llvm-svn: 51091
2008-05-14 01:58:56 +00:00
Evan Cheng
9e15622879 Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset.
pshufd $1, (%rdi), %xmm0
        movd %xmm0, %eax
=>
        movl 4(%rdi), %eax

llvm-svn: 51026
2008-05-13 08:35:03 +00:00
Evan Cheng
fcbdc8bd6e Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other.
llvm-svn: 51008
2008-05-12 23:04:07 +00:00
Nate Begeman
2ae55cecc6 Initial X86 codegen support for VSETCC.
llvm-svn: 51000
2008-05-12 20:34:32 +00:00
Evan Cheng
c7e9acfed7 Refactor isConsecutiveLoad from X86 to TargetLowering so DAG combiner can make use of it.
llvm-svn: 50991
2008-05-12 19:56:52 +00:00
Dan Gohman
8212eaa43a Fix a compile error on compilers that still want a return value
in a non-void function that calls abort.

llvm-svn: 50969
2008-05-12 16:17:19 +00:00
Evan Cheng
c19c639ad7 When transforming a vector_shuffle to a load, the base address must not be an undef.
llvm-svn: 50940
2008-05-10 06:46:49 +00:00
Dan Gohman
4b23d9e60a For now, abort when an ISD::VAARG is encountered on x86-64, rather
than silently generate invalid code.

llvm-gcc does not currently use VAArgInst; it lowers va_arg in the
front-end.

llvm-svn: 50930
2008-05-10 01:26:14 +00:00
Evan Cheng
79230955a8 If movl top bits are undef, let it be selected to movlps, etc.
llvm-svn: 50928
2008-05-10 00:58:41 +00:00
Evan Cheng
3493e43afd Handle a few more cases of folding load i64 into xmm and zero top bits.
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.

llvm-svn: 50918
2008-05-09 21:53:03 +00:00
Evan Cheng
f97e716511 Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.
llvm-svn: 50838
2008-05-08 00:57:18 +00:00
Mon P Wang
34b3f18a70 Improved generated code for atomic operators
llvm-svn: 50677
2008-05-05 22:56:23 +00:00
Evan Cheng
44d49e72a1 Code clean up. No functionality change.
llvm-svn: 50675
2008-05-05 22:12:23 +00:00
Mon P Wang
84a269e023 Added addition atomic instrinsics and, or, xor, min, and max.
llvm-svn: 50663
2008-05-05 19:05:59 +00:00
Anton Korobeynikov
04c974b1b2 Add General Dynamic TLS model for X86-64. Some parts looks really ugly (look for tlsaddr pattern),
but should work. Work is in progress, more models will follow

llvm-svn: 50630
2008-05-04 21:36:32 +00:00
Evan Cheng
a7747df955 Select vector shift with non-immediate i32 shift amount operand by first moving the operand into the right register.
llvm-svn: 50619
2008-05-04 09:15:50 +00:00
Arnold Schwaighofer
f58a35e2ec Tail call optimization improvements:
Move platform independent code (lowering of possibly overwritten
arguments, check for tail call optimization eligibility) from
target X86ISelectionLowering.cpp to TargetLowering.h and
SelectionDAGISel.cpp.

Initial PowerPC tail call implementation:

Support ppc32 implemented and tested (passes my tests and
test-suite llvm-test).  
Support ppc64 implemented and half tested (passes my tests).
On ppc tail call optimization is performed if 
  caller and callee are fastcc
  call is a tail call (in tail call position, call followed by ret)
  no variable argument lists or byval arguments
  option -tailcallopt is enabled
Supported:
 * non pic tail calls on linux/darwin
 * module-local tail calls on linux(PIC/GOT)/darwin(PIC)
 * inter-module tail calls on darwin(PIC)
If constraints are not met a normal call will be emitted.

A test checking the argument lowering behaviour on x86-64 was added.

llvm-svn: 50477
2008-04-30 09:16:33 +00:00
Dan Gohman
0285c1e9bb Fix the SVOffset values for loads and stores produced by
memcpy/memset expansion. It was a bug for the SVOffset value
to be used in the actual address calculations.

llvm-svn: 50359
2008-04-28 17:15:20 +00:00
Anton Korobeynikov
1c5d228377 Properly lower vararg's FORMAL_ARGUMENTS node on win64
llvm-svn: 50325
2008-04-27 23:15:03 +00:00
Chris Lattner
b5bd654163 A few inline asm cleanups:
- Make targetlowering.h fit in 80 cols.
  - Make LowerAsmOperandForConstraint const.
  - Make lowerXConstraint -> LowerXConstraint
  - Make LowerXConstraint return a const char* instead of taking a string byref.

llvm-svn: 50312
2008-04-26 23:02:14 +00:00
Evan Cheng
318e7e042c Extract the lower 64-bit if a MMX value is passed in a XMM register.
llvm-svn: 50292
2008-04-25 20:13:28 +00:00
Evan Cheng
11f101a800 Special handling for MMX values being passed in either GPR64 or lower 64-bits of XMM registers.
llvm-svn: 50289
2008-04-25 19:11:04 +00:00
Evan Cheng
39ae78cadb MMX argument passing fixes:
On Darwin / Linux x86-32, v8i8, v4i16, v2i32 values are passed in MM[0-2].                                                                                                                                      
On Darwin / Linux x86-32, v1i64 values are passed in memory.                                                                                                                                                    
On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7].                                                                                                                                     
On Darwin x86-64, v1i64 values are passed in 64-bit GPRs.

llvm-svn: 50257
2008-04-25 07:56:45 +00:00
Evan Cheng
484060ba4a Fix bug in x86 memcpy / memset lowering. If there are trailing bytes not handled by rep instructions, a new memcpy / memset is introduced for them. However, since source / destination addresses are already adjusted, their offsets should be zero.
llvm-svn: 50239
2008-04-25 00:26:43 +00:00
Dan Gohman
93b5be1824 Implement an x86-64 ABI detail of passing structs by hidden first
argument. The x86-64 ABI requires the incoming value of %rdi to
be copied to %rax on exit from a function that is returning a
large C struct.

Also, add a README-X86-64 entry detailing the missed optimization
opportunity and proposing an alternative approach.

llvm-svn: 50075
2008-04-21 23:59:07 +00:00
Chris Lattner
f390d62b7f Switch to using Simplified ConstantFP::get API.
llvm-svn: 49977
2008-04-20 00:41:09 +00:00
Dan Gohman
98ca33cb59 Fix the handling of va_copy on x86-64. As of llvm-gcc r49920
llvm-gcc is now lowering va_copy on x86-64, so this completes
the fix for PR2230.

llvm-svn: 49922
2008-04-18 20:55:41 +00:00
Roman Levenstein
728d59166f Ongoing work on improving the instruction selection infrastructure:
Rename SDOperandImpl back to SDOperand.
Introduce the SDUse class that represents a use of the SDNode referred by
an SDOperand. Now it is more similar to Use/Value classes.

Patch is approved by Dan Gohman.

llvm-svn: 49795
2008-04-16 16:15:27 +00:00
Dan Gohman
be8f2b452b Add support for the form of the SSE41 extractps instruction that
puts its result in a 32-bit GPR.

llvm-svn: 49762
2008-04-16 02:32:24 +00:00
Dan Gohman
cf79877623 Recreate the size SDNode instead of reusing the old one in the x86
memcpy lowering code; this ensures that the size node has the desired
result type. This fixes a regression from r49572 with @llvm.memcpy.i64
on x86-32.

llvm-svn: 49761
2008-04-16 01:32:32 +00:00
Dan Gohman
8d46278998 Fix const-correctness issues with the SrcValue handling in the
memory intrinsic expansion code.

llvm-svn: 49666
2008-04-14 17:55:48 +00:00
Arnold Schwaighofer
82af0e6a43 This patch corrects the handling of byval arguments for tailcall
optimized x86-64 (and x86) calls so that they work (... at least for
my test cases).

Should fix the following problems:

Problem 1: When i introduced the optimized handling of arguments for
tail called functions (using a sequence of copyto/copyfrom virtual
registers instead of always lowering to top of the stack) i did not
handle byval arguments correctly e.g they did not work at all :).

Problem 2: On x86-64 after the arguments of the tail called function
are moved to their registers (which include ESI/RSI etc), tail call
optimization performs byval lowering which causes xSI,xDI, xCX
registers to be overwritten. This is handled in this patch by moving
the arguments to virtual registers first and after the byval lowering
the arguments are moved from those virtual registers back to
RSI/RDI/RCX.

llvm-svn: 49584
2008-04-12 18:11:06 +00:00
Dan Gohman
15edbf989f Drop ISD::MEMSET, ISD::MEMMOVE, and ISD::MEMCPY, which are not Legal
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.

Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.

This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.

Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.

This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.

llvm-svn: 49572
2008-04-12 04:36:06 +00:00
Dan Gohman
41f9d24d52 Fix a bug that prevented x86-64 from using rep.movsq for
8-byte-aligned data.

llvm-svn: 49571
2008-04-12 02:35:39 +00:00
Dan Gohman
b3a511b236 Make isVectorClearMaskLegal's operand list const.
llvm-svn: 49446
2008-04-09 20:09:42 +00:00
Roman Levenstein
b40d332929 Re-commit of the r48822, where the infinite looping problem discovered
by Dan Gohman is fixed.

llvm-svn: 49330
2008-04-07 10:06:32 +00:00
Evan Cheng
4d7b2ab16f Favors pshufd over shufps when shuffling elements from one vector. pshufd is faster than shufps.
llvm-svn: 49244
2008-04-05 00:30:36 +00:00
Evan Cheng
497c607fae Backing out 48222 temporarily.
llvm-svn: 49124
2008-04-03 03:13:16 +00:00
Dan Gohman
a3e01dc1ec Don't use __bzero for memset if the second argument isn't zero.
llvm-svn: 49050
2008-04-01 20:56:18 +00:00
Dan Gohman
168b2b1300 Speculatively micro-optimize memory-zeroing calls on Darwin 10.
llvm-svn: 49048
2008-04-01 20:38:36 +00:00
Dale Johannesen
1336104c02 Accept 'y' constraint (MMX) in inline asm.
llvm-svn: 49011
2008-04-01 00:57:48 +00:00
Dan Gohman
227e702cae Fix a tokenfactor node to use the load chain rather than the
load value. This fixes PR2177.

llvm-svn: 48932
2008-03-28 23:45:16 +00:00
Roman Levenstein
55b8822511 Use a linked data structure for the uses lists of an SDNode, just like
LLVM Value/Use does and MachineRegisterInfo/MachineOperand does.
This allows constant time for all uses list maintenance operations.

The idea was suggested by Chris. Reviewed by Evan and Dan.
Patch is tested and approved by Dan.

On normal use-cases compilation speed is not affected. On very big basic
blocks there are compilation speedups in the range of 15-20% or even better. 

llvm-svn: 48822
2008-03-26 12:39:26 +00:00
Evan Cheng
dbdf48276a - SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix the instruction specification and teaches lowering code to use it only when the only use is a store instruction.
llvm-svn: 48746
2008-03-24 21:52:23 +00:00
Anton Korobeynikov
dad919f561 Add convenient helper for win64 check. Simplify things slightly.
llvm-svn: 48691
2008-03-22 20:57:27 +00:00
Anton Korobeynikov
27c8ad4020 Initial support for Win64 calling conventions. Still in early state.
llvm-svn: 48690
2008-03-22 20:37:30 +00:00
Duncan Sands
4153fc30c9 Introduce a new node for holding call argument
flags.  This is needed by the new legalize types
infrastructure which wants to expand the 64 bit
constants previously used to hold the flags on
32 bit machines.  There are two functional changes:
(1) in LowerArguments, if a parameter has the zext
attribute set then that is marked in the flags;
before it was being ignored; (2) PPC had some bogus
code for handling two word arguments when using the
ELF 32 ABI, which was hard to convert because of
the bogusness.  As suggested by the original author
(Nicolas Geoffray), I've disabled it for the moment.
Tested with "make check" and the Ada ACATS testsuite.

llvm-svn: 48640
2008-03-21 09:14:45 +00:00
Chris Lattner
edfc239ced remove Evan's "ugly hack" that sorta attempted to get
x86-64 return conventions correct, but was never enabled.
We can now do the "right thing" with multiple return values.

llvm-svn: 48635
2008-03-21 06:50:21 +00:00
Evan Cheng
8ecb189245 Fix this xform: (sra (shl X, m), result_size) -> (sign_extend (trunc (shl X, result_size - n - m)))
llvm-svn: 48578
2008-03-20 02:18:41 +00:00
Christopher Lamb
958b0494c3 Fix X86's isTruncateFree to not claim that truncate to i1 is free. This fixes Bill's testcase that failed for r48491.
llvm-svn: 48542
2008-03-19 08:30:06 +00:00
Evan Cheng
5ac87b837e Fix a x86-64 isel lowering bug that's been around forever. A x86-64 varargs function implicitly reads X86::AL, don't clobber it!
llvm-svn: 48515
2008-03-18 23:36:35 +00:00
Chris Lattner
7925cc72c0 Reimplement the parameter attributes support, phase #1. hilights:
1. There is now a "PAListPtr" class, which is a smart pointer around
   the underlying uniqued parameter attribute list object, and manages
   its refcount.  It is now impossible to mess up the refcount.
2. PAListPtr is now the main interface to the underlying object, and
   the underlying object is now completely opaque.
3. Implementation details like SmallVector and FoldingSet are now no
   longer part of the interface.
4. You can create a PAListPtr with an arbitrary sequence of
   ParamAttrsWithIndex's, no need to make a SmallVector of a specific 
   size (you can just use an array or scalar or vector if you wish).
5. All the client code that had to check for a null pointer before
   dereferencing the pointer is simplified to just access the 
   PAListPtr directly.
6. The interfaces for adding attrs to a list and removing them is a
   bit simpler.

Phase #2 will rename some stuff (e.g. PAListPtr) and do other less 
invasive changes.

llvm-svn: 48289
2008-03-12 17:45:29 +00:00
Chris Lattner
b3fefb1e5c start handling the 'f' x87 constraint.
llvm-svn: 48239
2008-03-11 19:06:29 +00:00
Chris Lattner
9826c9365e Change the model for FP Stack return to use fp operands on the
RET instruction instead of using FpSET_ST0_32.  This also generalizes
the code to handling returning of multiple FP results.

llvm-svn: 48209
2008-03-11 03:23:40 +00:00
Chris Lattner
d393772580 Eliminate the FP_GET_ST0/FP_SET_ST0 target-specific dag nodes, just lower to
copyfromreg/copytoreg instead.

llvm-svn: 48174
2008-03-10 21:08:41 +00:00
Evan Cheng
7d9e5a7680 Default ISD::PREFETCH to expand.
llvm-svn: 48169
2008-03-10 19:38:10 +00:00
Scott Michel
bb8e8fca47 Give TargetLowering::getSetCCResultType() a parameter so that ISD::SETCC's
return ValueType can depend its operands' ValueType.

This is a cosmetic change, no functionality impacted.

llvm-svn: 48145
2008-03-10 15:42:14 +00:00
Dale Johannesen
e6b0009792 Increase ISD::ParamFlags to 64 bits. Increase the ByValSize
field to 32 bits, thus enabling correct handling of ByVal
structs bigger than 0x1ffff.  Abstract interface a bit.
Fixes gcc.c-torture/execute/pr23135.c and 
gcc.c-torture/execute/pr28982b.c in gcc testsuite (were ICE'ing
on ppc32, quietly producing wrong code on x86-32.)

llvm-svn: 48122
2008-03-10 02:17:22 +00:00
Chris Lattner
2e7537b60b rename FP_SETRESULT -> FP_SET_ST0
llvm-svn: 48094
2008-03-09 07:08:44 +00:00
Chris Lattner
826402e365 rename FpGETRESULT32 -> FpGET_ST0_32 etc. Add support for
isel'ing value preserving FP roundings from one fp stack reg to another
into a noop, instead of stack traffic.

llvm-svn: 48093
2008-03-09 07:05:32 +00:00
Chris Lattner
b628208161 Finish implementing a readme entry: when inserting an i64 variable
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle.  This allows us to compile
vec_set-B.ll into:

_test3:
	movl	$1234567, %eax
	andl	4(%esp), %eax
	movd	%eax, %xmm0
	ret

instead of:

_test3:
	subl	$28, %esp
	movl	$1234567, %eax
	andl	32(%esp), %eax
	movl	%eax, (%esp)
	movl	$0, 4(%esp)
	movq	(%esp), %xmm0
	addl	$28, %esp
	ret

llvm-svn: 48090
2008-03-09 05:42:06 +00:00
Chris Lattner
17f68a3075 Implement a readme entry, compiling
#include <xmmintrin.h>
__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}

into:
	movl	$1, %eax
	movd	%eax, %xmm0
	ret

instead of a constant pool load.

llvm-svn: 48063
2008-03-09 01:05:04 +00:00
Chris Lattner
81deb3bc9c 1) Improve comments.
2) Don't try to insert an i64 value into the low part of a 
   vector with movq on an x86-32 target.  This allows us to 
   compile:

__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}

into:

_doload64:
	movaps	LCPI1_0, %xmm0
	ret

instead of:

_doload64:
	subl	$28, %esp
	movl	$0, 4(%esp)
	movl	$1, (%esp)
	movq	(%esp), %xmm0
	addl	$28, %esp
	ret

llvm-svn: 48057
2008-03-08 22:59:52 +00:00
Chris Lattner
405f2c6356 minor simplifications to this code, don't create a dead
SCALAR_TO_VECTOR on paths that end up not using it.

llvm-svn: 48056
2008-03-08 22:48:29 +00:00
Evan Cheng
dba1dfe962 Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.
llvm-svn: 48042
2008-03-08 00:58:38 +00:00
Chris Lattner
aa81dc7d21 mark frem as expand for all legal fp types on x86, regardless of whether
we're using SSE or not.  This fixes PR2122.

llvm-svn: 48006
2008-03-07 06:36:32 +00:00
Andrew Lenharth
95c88272c6 64bit CAS on 32bit x86.
llvm-svn: 47929
2008-03-05 01:15:49 +00:00
Andrew Lenharth
f5674915c5 x86-64 atomics
llvm-svn: 47903
2008-03-04 21:13:33 +00:00
Dan Gohman
ccc0bc5878 Add support for lowering i64 SRA_PARTS and friends on x86-64.
llvm-svn: 47865
2008-03-03 22:22:09 +00:00
Andrew Lenharth
f6c220738c make CAS work
llvm-svn: 47799
2008-03-01 22:27:48 +00:00
Andrew Lenharth
b91c664226 all but CAS working on x86
llvm-svn: 47798
2008-03-01 21:52:34 +00:00
Evan Cheng
f8b1257d2e Add a quick and dirty "loop aligner pass". x86 uses it to align its loops to 16-byte boundaries.
llvm-svn: 47703
2008-02-28 00:43:03 +00:00
Chris Lattner
1f46cc2345 Make X86TargetLowering::LowerSINT_TO_FP return without creating a dead
stack slot and store if the  SINT_TO_FP is actually legal.  This allows
us to compile:

double a(double b) {return (unsigned)b;}

to:

_a:
	cvttsd2siq	%xmm0, %rax
	movl	%eax, %eax
	cvtsi2sdq	%rax, %xmm0
	ret

instead of:

_a:
	subq	$8, %rsp
	cvttsd2siq	%xmm0, %rax
	movl	%eax, %eax
	cvtsi2sdq	%rax, %xmm0
	addq	$8, %rsp
	ret

crazy.

llvm-svn: 47660
2008-02-27 05:57:41 +00:00
Arnold Schwaighofer
642dc28734 Refactor according to Evan's and Anton's suggestions.
llvm-svn: 47635
2008-02-26 22:21:54 +00:00
Arnold Schwaighofer
7466041819 Correct function comments.
llvm-svn: 47606
2008-02-26 17:50:59 +00:00
Arnold Schwaighofer
ddebd886dc Add support for intermodule tail calls on x86/32bit with
GOT-style position independent code. Before only tail calls to
protected/hidden functions within the same module were optimized.
Now all function calls are tail call optimized.

llvm-svn: 47594
2008-02-26 10:21:54 +00:00
Arnold Schwaighofer
6383666085 Change the lowering of arguments for tail call optimized
calls. Before arguments that could overwrite each other were
explicitly lowered to a stack slot, not giving the register allocator
a chance to optimize. Now a sequence of copyto/copyfrom virtual
registers ensures that arguments are loaded in (virtual) registers
before they are lowered to the stack slot (and might overwrite each
other). Also parameter stack slots are marked mutable for
(potentially) tail calling functions.

llvm-svn: 47593
2008-02-26 09:19:59 +00:00
Dale Johannesen
c67fa85683 Revise previous patch per review.
llvm-svn: 47573
2008-02-25 22:29:22 +00:00
Dale Johannesen
574bb5e1e2 Expand removal of MMX memory copies to allow 1 level
of TokenFactor underneath chain (seems to be enough)

llvm-svn: 47554
2008-02-25 19:20:14 +00:00
Dale Johannesen
ae08bdb4cf Split ParameterAttributes.h, putting the complicated
stuff into ParamAttrsList.h.  Per feedback from
ParamAttrs changes.

llvm-svn: 47504
2008-02-22 22:17:59 +00:00
Chris Lattner
a64d4179d4 copy mmx values from/to memory with GPRs on x86-32
instead of with mmx registers.  This horribleness is apparently
done by gcc to avoid having to insert emms in places that really 
should have it.  This is the second half of rdar://5741668.

llvm-svn: 47474
2008-02-22 05:18:04 +00:00
Chris Lattner
e70bc39d74 Start using GPR's to copy around mmx value instead of mmx regs.
GCC apparently does this, and code depends on not having to do
emms when this happens.  This is x86-64 only so far, second half
should handle x86-32.

rdar://5741668

llvm-svn: 47470
2008-02-22 02:09:43 +00:00
Anton Korobeynikov
4f6e612973 Remove bunch of gcc 4.3-related warnings from Target
llvm-svn: 47369
2008-02-20 11:22:39 +00:00
Evan Cheng
bb577266bf - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.

llvm-svn: 47290
2008-02-18 23:04:32 +00:00
Dan Gohman
0d0f16ca85 Chris pointed out that it's not necessary to set i64 MUL to Expand
on x86-32 since i64 itself is not a Legal type. And, update some
comments.

llvm-svn: 47282
2008-02-18 19:34:53 +00:00
Dan Gohman
70b9b2f77f Don't mark scalar integer multiplication as Expand on x86, since x86
has plain one-result scalar integer multiplication instructions.
This avoids expanding such instructions into MUL_LOHI sequences that
must be special-cased at isel time, and avoids the problem with that
code that provented memory operands from being folded.

This fixes PR1874, addressesing the most common case. The uncommon
cases of optimizing multiply-high operations will require work
in DAGCombiner.

llvm-svn: 47277
2008-02-18 17:55:26 +00:00
Andrew Lenharth
da54523742 I cannot find a libgcc function for this builtin. Therefor expanding it to a noop (which is how it use to be treated). If someone who knows the x86 backend better than me could tell me how to get a lock prefix on an instruction, that would be nice to complete x86 support.
llvm-svn: 47213
2008-02-16 14:46:26 +00:00
Duncan Sands
0056f1e823 In TargetLowering::LowerCallTo, don't assert that
the return value is zero-extended if it isn't
sign-extended.  It may also be any-extended.
Also, if a floating point value was returned
in a larger floating point type, pass 1 as the
second operand to FP_ROUND, which tells it
that all the precision is in the original type.
I think this is right but I could be wrong.
Finally, when doing libcalls, set isZExt on
a parameter if it is "unsigned".  Currently
isSExt is set when signed, and nothing is
set otherwise.  This should be right for all
calls to standard library routines.

llvm-svn: 47122
2008-02-14 17:28:50 +00:00
Nate Begeman
1ef1013b6c Change how FP immediates are handled.
1) ConstantFP is now expand by default
2) ConstantFP is not turned into TargetConstantFP during Legalize
   if it is legal.

This allows ConstantFP to be handled like Constant, allowing for 
targets that can encode FP immediates as MachineOperands.

As a bonus, fix up Itanium FP constants, which now correctly match,
and match more constants!  Hooray.

llvm-svn: 47121
2008-02-14 08:57:00 +00:00
Dan Gohman
737856bd0d Assigning an APInt to 0 with plain assignment gives it a one-bit
size. Initialize these APInts to properly-sized zero values.

llvm-svn: 47099
2008-02-13 23:07:24 +00:00
Dan Gohman
99b38405e3 Simplify some logic in ComputeMaskedBits. And change ComputeMaskedBits
to pass the mask APInt by value, not by reference. 

llvm-svn: 47096
2008-02-13 22:28:48 +00:00