1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-28 14:32:51 +01:00
Commit Graph

625 Commits

Author SHA1 Message Date
Evan Cheng
097e95b1f7 Bug: rcpps can only folds a load if the address is 16-byte aligned. Fixed many 'ps' load folding patterns in X86InstrSSE.td which are missing the proper alignment checks.
Also fixed some 80 col. violations.

llvm-svn: 51462
2008-05-23 00:37:07 +00:00
Evan Cheng
dc3a3d3a2c Add a couple of test cases.
llvm-svn: 51441
2008-05-22 21:19:19 +00:00
Evan Cheng
d1373cd497 Add missing patterns.
llvm-svn: 51435
2008-05-22 18:56:56 +00:00
Chris Lattner
477239c56d testcase for PR2267
llvm-svn: 51408
2008-05-22 04:45:22 +00:00
Evan Cheng
8e02953de8 Fix PR2343. An *interesting* coalescer bug.
BB1:                                                                                                                                                  
  vr1025 = copy vr1024                                                                                                                                
  ..                                                                                                                                                  
BB2:                                                                                                                                                  
  vr1024 = op                                                                                                                                         
         = op vr1025                                                                                                                                     
  <loop eventually branch back to BB1>

Even though vr1025 is copied from vr1024, it's not safe to coalesced them since live range of vr1025 intersects the def of vr1024. This happens when vr1025 is assigned the value of the previous iteration of vr1024 in the loop.

llvm-svn: 51394
2008-05-21 22:34:12 +00:00
Gabor Greif
807c2df887 sabre brings to my attention that the 'tr' suffix is also obsolete
llvm-svn: 51349
2008-05-20 21:00:03 +00:00
Gabor Greif
d8a4dbb5da Rename the last test with .llx extension to .ll, resolve duplicate test by renaming to isnan2. Now that no test has llx ending there is no need to search for them from dg.exp too.
llvm-svn: 51328
2008-05-20 19:52:04 +00:00
Dan Gohman
7681889f75 Run vortex-bug as x86-64, which is what the original bug was triggered on.
llvm-svn: 51289
2008-05-20 00:54:39 +00:00
Dale Johannesen
4e46c5601d Use common where we mean common, not weak.
llvm-svn: 51173
2008-05-16 00:52:30 +00:00
Dan Gohman
2da4145cd8 Fix a bug in LoopStrengthReduce that caused it to emit IR with
use-before-def. The problem comes up in code with multiple PHIs where
one PHI is being rewritten in terms of the other, but the other needs
to be casted first. LLVM rules requre the cast instruction to be
inserted after any PHI instructions, but when instructions were
inserted to replace the second PHI value with a function of the first,
they were ended up going before the cast instruction. Avoid this
problem by remembering the location of the cast instruction, when one
is needed, and inserting the expansion of the new value after it.

This fixes a bug that surfaced in 255.vortex on x86-64 when
instcombine was removed from the middle of the loop optimization
passes. 

llvm-svn: 51169
2008-05-15 23:26:57 +00:00
Dan Gohman
cd29e1fa60 When bit-twiddling CondCode values for integer comparisons produces
SETOEQ, is it does with (SETEQ & SETULE), map it to SETEQ.

llvm-svn: 51112
2008-05-14 18:17:09 +00:00
Evan Cheng
9e15622879 Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset.
pshufd $1, (%rdi), %xmm0
        movd %xmm0, %eax
=>
        movl 4(%rdi), %eax

llvm-svn: 51026
2008-05-13 08:35:03 +00:00
Evan Cheng
e4ee4c2870 On x86, it's safe to treat i32 load anyext as a normal i32 load. Ditto for i8 anyext load to i16.
llvm-svn: 51019
2008-05-13 00:54:02 +00:00
Evan Cheng
fcbdc8bd6e Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other.
llvm-svn: 51008
2008-05-12 23:04:07 +00:00
Dale Johannesen
b54491d31a New test for tail merging
llvm-svn: 51007
2008-05-12 22:59:44 +00:00
Evan Cheng
c19c639ad7 When transforming a vector_shuffle to a load, the base address must not be an undef.
llvm-svn: 50940
2008-05-10 06:46:49 +00:00
Evan Cheng
fc4b8e1d96 Add nounwind.
llvm-svn: 50931
2008-05-10 02:22:25 +00:00
Evan Cheng
cf4d5567d5 If all sources of a PHI node are defined by an implicit_def, just emit an implicit_def instead of a copy.
llvm-svn: 50927
2008-05-10 00:17:50 +00:00
Evan Cheng
2adea48f7e Add a pattern to do move the low element of a v4f32 and zero extend the rest.
llvm-svn: 50922
2008-05-09 23:37:55 +00:00
Evan Cheng
3493e43afd Handle a few more cases of folding load i64 into xmm and zero top bits.
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.

llvm-svn: 50918
2008-05-09 21:53:03 +00:00
Evan Cheng
a0688bf1cb Simplify test.
llvm-svn: 50911
2008-05-09 19:56:32 +00:00
Evan Cheng
f824b47188 Use movq to move low half of XMM register and zero-extend the rest.
llvm-svn: 50874
2008-05-08 22:35:02 +00:00
Evan Cheng
f97e716511 Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.
llvm-svn: 50838
2008-05-08 00:57:18 +00:00
Evan Cheng
7ff000c175 Add nounwind.
llvm-svn: 50837
2008-05-07 22:59:08 +00:00
Evan Cheng
c86c035346 Yet another nasty spiller bug.
%ecx = op
store %cl<kill>, (addr)
(addr) = op %al

It's not safe to unfold the last operand and eliminate store even though %cl is marked kill. It's a sub-register use which means one of its super-register(s) may be used below.

llvm-svn: 50794
2008-05-07 00:49:28 +00:00
Anton Korobeynikov
a9ff11ea5f Use target triple in tests, not 'realign-stack=0' option. Per request.
llvm-svn: 50778
2008-05-06 23:09:29 +00:00
Evan Cheng
a84ed06284 Fix PR2287. Darwin passes mmx values in register in 64-mode, not Linux.
llvm-svn: 50716
2008-05-06 07:23:50 +00:00
Mon P Wang
84a269e023 Added addition atomic instrinsics and, or, xor, min, and max.
llvm-svn: 50663
2008-05-05 19:05:59 +00:00
Chris Lattner
ca94848f66 no need for eh info
llvm-svn: 50658
2008-05-05 18:24:33 +00:00
Dan Gohman
c860d9c77c Add AsmPrinter support for emitting a directive to declare that
the code being generated does not require an executable stack.

Also, add target-specific code to make use of this on Linux
on x86. 

llvm-svn: 50634
2008-05-05 00:28:39 +00:00
Evan Cheng
a7747df955 Select vector shift with non-immediate i32 shift amount operand by first moving the operand into the right register.
llvm-svn: 50619
2008-05-04 09:15:50 +00:00
Evan Cheng
c1c2adbfc6 Add separate intrinsics for MMX / SSE shifts with i32 integer operands. This allow us to simplify the horribly complicated matching code.
llvm-svn: 50601
2008-05-03 00:52:09 +00:00
Chris Lattner
8cc3e89b87 specify an arch for non-x86 hosts.
llvm-svn: 50576
2008-05-02 15:11:58 +00:00
Chris Lattner
e9bbe8e6b6 don't randomly miscompile seto/setuo just because we are in
ffastmath mode.  This fixes rdar://5902801, a miscompilation
of gcc.dg/builtins-8.c.

Bill, please pull this into Tak.

llvm-svn: 50523
2008-05-01 07:26:11 +00:00
Arnold Schwaighofer
29a654857a Really commit the test checking the argument lowering behaviour on x86-64 :).
llvm-svn: 50478
2008-04-30 09:19:47 +00:00
Chris Lattner
0f63b8fecc make the vector conversion magic handle multiple results.
We now compile test2/test3 to:

_test2:
	## InlineAsm Start
	set %xmm0, %xmm1
	## InlineAsm End
	addps	%xmm1, %xmm0
	ret
_test3:
	## InlineAsm Start
	set %xmm0, %xmm1
	## InlineAsm End
	paddd	%xmm1, %xmm0
	ret

as expected.

llvm-svn: 50389
2008-04-29 04:48:56 +00:00
Chris Lattner
e75d09711d add support for multiple return values in inline asm. This is a step
towards PR2094.  It now compiles the attached .ll file to:

_sad16_sse2:
	movslq	%ecx, %rax
	## InlineAsm Start
	%ecx %rdx %rax %rax %r8d %rdx %rsi
	## InlineAsm End
	## InlineAsm Start
	set %eax
	## InlineAsm End
	ret

which is pretty decent for a 3 output, 4 input asm.

llvm-svn: 50386
2008-04-29 04:29:54 +00:00
Evan Cheng
381b094a2b Another extract_subreg coalescing bug.
e.g.
vr1024<2> extract_subreg vr1025, 2
If vr1024 do not have the same register class as vr1025, it's not safe to coalesce this away. For example, vr1024 might be a GPR32 while vr1025 might be a GPR64.

llvm-svn: 50385
2008-04-29 01:41:44 +00:00
Evan Cheng
8696eb18c1 Add -march=x86.
llvm-svn: 50380
2008-04-28 23:31:41 +00:00
Evan Cheng
3167c716f2 Test case.
llvm-svn: 50377
2008-04-28 22:14:34 +00:00
Chris Lattner
39a4281deb Implement a signficant optimization for inline asm:
When choosing between constraints with multiple options,
like "ir", test to see if we can use the 'i' constraint and
go with that if possible.  This produces more optimal ASM in
all cases (sparing a register and an instruction to load it),
and fixes inline asm like this:

void test () {
  asm volatile (" %c0 %1 " : : "imr" (42), "imr"(14));
}

Previously we would dump "42" into a memory location (which
is ok for the 'm' constraint) which would cause a problem
because the 'c' modifier is not valid on memory operands.

Isn't it great how inline asm turns 'missed optimization'
into 'compile failed'??

Incidentally, this was the todo in 
PowerPC/2007-04-24-InlineAsm-I-Modifier.ll

Please do NOT pull this into Tak.

llvm-svn: 50315
2008-04-27 00:37:18 +00:00
Nate Begeman
1723f6af2b Feedback from chris
llvm-svn: 50305
2008-04-25 21:47:35 +00:00
Nate Begeman
acd7e1c464 Add a testcase for the recent "handle variable vector insert elt in mem" patch
llvm-svn: 50303
2008-04-25 21:26:59 +00:00
Evan Cheng
db1497fa77 Update tests.
llvm-svn: 50293
2008-04-25 20:13:47 +00:00
Evan Cheng
11f101a800 Special handling for MMX values being passed in either GPR64 or lower 64-bits of XMM registers.
llvm-svn: 50289
2008-04-25 19:11:04 +00:00
Evan Cheng
39ae78cadb MMX argument passing fixes:
On Darwin / Linux x86-32, v8i8, v4i16, v2i32 values are passed in MM[0-2].                                                                                                                                      
On Darwin / Linux x86-32, v1i64 values are passed in memory.                                                                                                                                                    
On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7].                                                                                                                                     
On Darwin x86-64, v1i64 values are passed in 64-bit GPRs.

llvm-svn: 50257
2008-04-25 07:56:45 +00:00
Chris Lattner
8c9f6c929a Loosen up an assertion to allow intrinsics. I really have no
idea what this code (findNonImmUse) does, so I'm only guessing 
that this is the right thing.  It would be really really nice
if this had comments and perhaps switched to SmallPtrSet
(hint hint) :)

This fixes rdar://5886601, a crash on gcc.target/i386/sse4_1-pblendw.c

llvm-svn: 50252
2008-04-25 05:13:01 +00:00
Evan Cheng
484060ba4a Fix bug in x86 memcpy / memset lowering. If there are trailing bytes not handled by rep instructions, a new memcpy / memset is introduced for them. However, since source / destination addresses are already adjusted, their offsets should be zero.
llvm-svn: 50239
2008-04-25 00:26:43 +00:00
Anton Korobeynikov
244a615291 Disable stack realignment for these tests
llvm-svn: 50172
2008-04-23 18:25:44 +00:00
Anton Korobeynikov
1898ce20e2 Fix test becase ABI stack alignment dropped to 'normal' value
llvm-svn: 50171
2008-04-23 18:25:16 +00:00
Anton Korobeynikov
15c5a2ce26 Fix test, instruction count is valid only if stack is not realigned
llvm-svn: 50170
2008-04-23 18:24:48 +00:00
Dan Gohman
93b5be1824 Implement an x86-64 ABI detail of passing structs by hidden first
argument. The x86-64 ABI requires the incoming value of %rdi to
be copied to %rax on exit from a function that is returning a
large C struct.

Also, add a README-X86-64 entry detailing the missed optimization
opportunity and proposing an alternative approach.

llvm-svn: 50075
2008-04-21 23:59:07 +00:00
Chris Lattner
2c5b96fbee A better fix for my previous patch, MOVZQI2PQIrr just requires SSE2.
llvm-svn: 49986
2008-04-20 05:52:46 +00:00
Chris Lattner
8503e2d236 Not all x86-64 machines have sse3 apparently.
llvm-svn: 49985
2008-04-20 05:47:56 +00:00
Chris Lattner
c310b1f1f3 rename *.llx -> *.ll
llvm-svn: 49970
2008-04-19 22:29:10 +00:00
Evan Cheng
073659986f Be more careful with insert_subreg and extract_subreg where either source or destination operand has already been coalesced with another register that's defined by a insert_subreg or extract_subreg.
llvm-svn: 49843
2008-04-17 07:58:04 +00:00
Evan Cheng
7c2c3333ca Fix a sub-register indice propagation bug.
llvm-svn: 49832
2008-04-17 00:06:42 +00:00
Evan Cheng
e2e899b5c2 Don't forget about sub-register indices when rematting instructions.
llvm-svn: 49830
2008-04-16 23:44:44 +00:00
Evan Cheng
4b16ea6247 Really test what's intended.
llvm-svn: 49802
2008-04-16 18:21:55 +00:00
Evan Cheng
6d05ce493b Rewrite LiveVariable liveness computation. The new implementation is much simplified. It eliminated the nasty recursive routines and removed the partial def / use bookkeeping. There is also potential for performance improvement by replacing the conservative handling of partial physical register definitions. The code is currently disabled until live interval analysis is taught of the name scheme.
This patch also fixed a couple of nasty corner cases.

llvm-svn: 49784
2008-04-16 09:46:40 +00:00
Dan Gohman
be8f2b452b Add support for the form of the SSE41 extractps instruction that
puts its result in a 32-bit GPR.

llvm-svn: 49762
2008-04-16 02:32:24 +00:00
Dan Gohman
cf79877623 Recreate the size SDNode instead of reusing the old one in the x86
memcpy lowering code; this ensures that the size node has the desired
result type. This fixes a regression from r49572 with @llvm.memcpy.i64
on x86-32.

llvm-svn: 49761
2008-04-16 01:32:32 +00:00
Dan Gohman
7d27552962 Add movd instructions to move from MMX registers
to 64-bit GPR registers on x86-64.

llvm-svn: 49757
2008-04-15 23:55:07 +00:00
Dan Gohman
3b99b3c807 Treat EntryToken nodes as "passive" so that they aren't added to the
ScheduleDAG; they don't correspond to any actual instructions so they
don't need to be scheduled.

This fixes a bug where the EntryToken was being scheduled multiple
times in some cases, though it ended up not causing any trouble because 
EntryToken doesn't expand into anything. With this fixed the schedulers
reliably schedule the expected number of units, so we can check this
with an assertion.

This requires a tweak to test/CodeGen/X86/loop-hoist.ll because it
ends up getting scheduled differently in a trivial way, though it was
enough to fool the prcontext+grep that the test does.

llvm-svn: 49701
2008-04-15 01:22:18 +00:00
Dale Johannesen
d9a9c746d8 Remove -unwind-tables-optional everywhere, since
this is now the default.

llvm-svn: 49667
2008-04-14 17:56:54 +00:00
Arnold Schwaighofer
82af0e6a43 This patch corrects the handling of byval arguments for tailcall
optimized x86-64 (and x86) calls so that they work (... at least for
my test cases).

Should fix the following problems:

Problem 1: When i introduced the optimized handling of arguments for
tail called functions (using a sequence of copyto/copyfrom virtual
registers instead of always lowering to top of the stack) i did not
handle byval arguments correctly e.g they did not work at all :).

Problem 2: On x86-64 after the arguments of the tail called function
are moved to their registers (which include ESI/RSI etc), tail call
optimization performs byval lowering which causes xSI,xDI, xCX
registers to be overwritten. This is handled in this patch by moving
the arguments to virtual registers first and after the byval lowering
the arguments are moved from those virtual registers back to
RSI/RDI/RCX.

llvm-svn: 49584
2008-04-12 18:11:06 +00:00
Dan Gohman
15edbf989f Drop ISD::MEMSET, ISD::MEMMOVE, and ISD::MEMCPY, which are not Legal
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.

Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.

This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.

Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.

This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.

llvm-svn: 49572
2008-04-12 04:36:06 +00:00
Dan Gohman
41f9d24d52 Fix a bug that prevented x86-64 from using rep.movsq for
8-byte-aligned data.

llvm-svn: 49571
2008-04-12 02:35:39 +00:00
Chris Lattner
3b289289a7 Fix the x86-64 side of PR2108 by adding a v2f64 version of
MOVZQI2PQIrr.  This would be better handled as a dag combine 
(with the goal of eliminating the bitconvert) but I don't know
how to do that safely.  Thoughts welcome.

llvm-svn: 49463
2008-04-10 05:13:43 +00:00
Evan Cheng
1803e20a62 Teach branch folding pass about implicit_def instructions. Unfortunately we can't just eliminate them since register scavenger expects every register use to be defined. However, we can delete them when there are no intra-block uses. Carefully removing some implicit def's which enable more blocks to be optimized away.
llvm-svn: 49461
2008-04-10 02:32:10 +00:00
Evan Cheng
def576f9e6 - More aggressively coalescing away copies whose source is defined by an implicit_def.
- Added insert_subreg coalescing support.

llvm-svn: 49448
2008-04-09 20:57:25 +00:00
Evan Cheng
f35cc57821 Missed a hasInterval check.
llvm-svn: 49415
2008-04-09 01:30:15 +00:00
Dale Johannesen
5ac0a0ed21 Rename -disable-required-unwind-tables to -unwind-tables-optional.
llvm-svn: 49391
2008-04-08 18:10:08 +00:00
Dale Johannesen
3f992b224e Add -disable-required-unwind-tables to tests
that need it (usually, grepping for some string
found in unwind info)

llvm-svn: 49364
2008-04-08 00:14:17 +00:00
Evan Cheng
6c58f2397d Fix test.
llvm-svn: 49343
2008-04-07 17:02:18 +00:00
Chris Lattner
f88214caca fix this testcase to pass and remove a duplicate instance of itself.
llvm-svn: 49281
2008-04-06 21:39:17 +00:00
Torok Edwin
34e6889671 Prefer to expand mask for xor to -1, so we have a chance to turn it into a not.
If it cannot be expanded, it will keep the old behaviour and try to shrink the constant.
Part of enhancement for PR2191.

llvm-svn: 49280
2008-04-06 21:23:02 +00:00
Evan Cheng
4d7b2ab16f Favors pshufd over shufps when shuffling elements from one vector. pshufd is faster than shufps.
llvm-svn: 49244
2008-04-05 00:30:36 +00:00
Evan Cheng
f045d86660 New test case.
llvm-svn: 49190
2008-04-03 21:25:03 +00:00
Dale Johannesen
ebfa6edc65 Testcase for EH with functions whose names are stripped.
llvm-svn: 49111
2008-04-02 20:16:41 +00:00
Dan Gohman
168b2b1300 Speculatively micro-optimize memory-zeroing calls on Darwin 10.
llvm-svn: 49048
2008-04-01 20:38:36 +00:00
Dale Johannesen
d9a5b77269 Mark functions in some tests as 'nounwind'. Generating
EH info for these functions causes the tests to fail for
random reasons (e.g. looking for 'or' or counting lines
with asm-printer; labels count as lines.)

llvm-svn: 49003
2008-03-31 23:20:09 +00:00
Evan Cheng
a3ce7b4c76 It's not safe to fold a load from GV stub or constantpool into a two-address use.
llvm-svn: 49002
2008-03-31 23:19:51 +00:00
Dan Gohman
f223eaafcd Fix a DAGCombiner optimization to respect volatile qualification.
llvm-svn: 48994
2008-03-31 20:32:52 +00:00
Dan Gohman
227e702cae Fix a tokenfactor node to use the load chain rather than the
load value. This fixes PR2177.

llvm-svn: 48932
2008-03-28 23:45:16 +00:00
Evan Cheng
6cbce6b602 Fix a memory bug: increment an iterator of a deleted machine instr.
llvm-svn: 48853
2008-03-27 01:27:25 +00:00
Evan Cheng
8d222d6221 Avoid commuting a def MI in order to coalesce a copy instruction away if any use of the same val# is a copy instruction that has already been coalesced.
llvm-svn: 48833
2008-03-26 19:03:01 +00:00
Dale Johannesen
8c1e95810f Use ## for comment delimiter on darwin x86-32, so
llvm's output .s files will go through gcc -std=c99
without triggering preprocesser errors.  Approach
suggested by Daveed Vandevoorde.

llvm-svn: 48808
2008-03-25 23:29:30 +00:00
Evan Cheng
8cb64d8e8b Handle a special case xor undef, undef -> 0. Technically this should be transformed to undef. But this is such a common idiom (misuse) we are going to handle it.
llvm-svn: 48792
2008-03-25 20:08:07 +00:00
Dan Gohman
58ad056286 Add CMP32mr and friends to the load-unfolding table. Among
other things, this allows the scheduler to unfold a load operand
in the 2008-01-08-SchedulerCrash.ll testcase, so it now successfully
clones the comparison to avoid a pushf+popf.

llvm-svn: 48777
2008-03-25 16:53:19 +00:00
Tanya Lattner
b6a27ed83f Byebye llvm-upgrade!
llvm-svn: 48762
2008-03-25 04:26:08 +00:00
Evan Cheng
dbdf48276a - SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix the instruction specification and teaches lowering code to use it only when the only use is a store instruction.
llvm-svn: 48746
2008-03-24 21:52:23 +00:00
Dan Gohman
b9c5e6258f APIntify SelectionDAG's EXTRACT_ELEMENT code.
llvm-svn: 48726
2008-03-24 16:38:05 +00:00
Evan Cheng
874aee2eec Teach DAG combiner to commute commutable binary nodes in order to achieve sdisel CSE.
llvm-svn: 48673
2008-03-22 01:55:50 +00:00
Dan Gohman
59aeac6320 Handle getresult instructions in different basic blocks
from their aggregate operands by moving the getresult
instructions.

llvm-svn: 48657
2008-03-21 21:01:32 +00:00
Chris Lattner
8a4fa95cae Add support for calls that return two FP values in
ST(0)/ST(1).

llvm-svn: 48634
2008-03-21 06:38:26 +00:00
Chris Lattner
933d0d318b disable a bogus assertion.
llvm-svn: 48633
2008-03-21 06:01:05 +00:00
Chris Lattner
260473f983 Enable support for returning two long-double values in ST(0)/ST(1).
This allows us to compile fp-stack-2results.ll into:

_test:
	fldz
	fld1
	ret

which returns 1 in ST(0) and 0 in ST(1).  This is needed for x86-64
_Complex long double.

llvm-svn: 48632
2008-03-21 05:57:20 +00:00
Evan Cheng
4ae9fee64c Undo 48570. Correctly match mmx shift instructions with an immediate operand.
llvm-svn: 48627
2008-03-21 00:40:09 +00:00
Evan Cheng
8ecb189245 Fix this xform: (sra (shl X, m), result_size) -> (sign_extend (trunc (shl X, result_size - n - m)))
llvm-svn: 48578
2008-03-20 02:18:41 +00:00
Evan Cheng
6f729b2820 Add intrinsics to match mmx shift builtin's with immediate operand.
llvm-svn: 48569
2008-03-19 23:38:52 +00:00
Christopher Lamb
958b0494c3 Fix X86's isTruncateFree to not claim that truncate to i1 is free. This fixes Bill's testcase that failed for r48491.
llvm-svn: 48542
2008-03-19 08:30:06 +00:00
Evan Cheng
e9aa507edc Fixed a coalescer bug caused by a typo.
llvm-svn: 48526
2008-03-19 02:26:36 +00:00
Evan Cheng
3d9309c11d Fix live variables issues:
1. If part of a register is re-defined, an implicit kill and an implicit def are added to denote read / mod / write. However, this should only be necessary if the register is actually read later. This is a performance issue.
2. If a sub-register is being defined, and it doesn't have a previous use, do not add a implicit kill to the last use of a super-register:
   = EAX, AX<imp-use,kill>
...
AX =
In this case, EAX is live but AX is killed, this is wrong and will cause the coalescer to do bad things.

llvm-svn: 48521
2008-03-19 00:52:20 +00:00
Evan Cheng
5ac87b837e Fix a x86-64 isel lowering bug that's been around forever. A x86-64 varargs function implicitly reads X86::AL, don't clobber it!
llvm-svn: 48515
2008-03-18 23:36:35 +00:00
Bill Wendling
c8f3fc7c3d It might be nice to have this run as x86 on non-x86 platforms...
llvm-svn: 48511
2008-03-18 22:38:22 +00:00
Bill Wendling
5ea2aec3ac Temporarily revert r48491. It's breaking test/CodeGen/X86/xorl.ll.
llvm-svn: 48510
2008-03-18 22:29:51 +00:00
Christopher Lamb
1d70509b55 Target independent DAG transform to use truncate for field extraction + sign extend on targets where this is profitable. Passes nightly on x86-64.
llvm-svn: 48491
2008-03-18 16:46:39 +00:00
Chris Lattner
bb335409c2 ensure we continue matching x86-64 rotates.
llvm-svn: 48437
2008-03-17 01:35:03 +00:00
Evan Cheng
3612a7ed30 Fix PR2138. Apparently any modification to a std::multimap (including remove entries for a different key) can invalidate multimap iterators.
llvm-svn: 48371
2008-03-14 20:44:01 +00:00
Evan Cheng
53c3dd0267 New test case.
llvm-svn: 48338
2008-03-13 08:05:02 +00:00
Evan Cheng
a76bb6e64e A test case I forgot to check in.
llvm-svn: 48335
2008-03-13 06:42:46 +00:00
Evan Cheng
0b8b1647dd TwoAddressInstructionPass enhancement. After it converts a two address instruction into a 3-address one, sink it past the instruction that kills the read-mod-write register if its definition is used past the kill. This reduces the number of live register by one.
llvm-svn: 48333
2008-03-13 06:37:55 +00:00
Evan Cheng
620fd19798 Experimental scheduler change to schedule / coalesce the copies added for function livein's. Take 2008-03-10-RegAllocInfLoop.ll, the schedule looks like this after these copies are inserted:
entry: 0x12049d0, LLVM BB @0x1201fd0, ID#0:
Live Ins: %EAX %EDX %ECX
        %reg1031<def> = MOVPC32r 0
        %reg1032<def> = ADD32ri %reg1031, <es:_GLOBAL_OFFSET_TABLE_>, %EFLAGS<imp-def>
        %reg1028<def> = MOV32rr %EAX
        %reg1029<def> = MOV32rr %EDX
        %reg1030<def> = MOV32rr %ECX
        %reg1027<def> = MOV8rm %reg0, 1, %reg0, 0, Mem:LD(1,1) [0x1201910 + 0]
        %reg1025<def> = MOV32rr %reg1029
        %reg1026<def> = MOV32rr %reg1030
        %reg1024<def> = MOV32rr %reg1028

The copies unnecessarily increase register pressure and it will end up requiring a physical register to be spilled.

With -schedule-livein-copies:
entry: 0x12049d0, LLVM BB @0x1201fa0, ID#0:
Live Ins: %EAX %EDX %ECX
        %reg1031<def> = MOVPC32r 0
        %reg1032<def> = ADD32ri %reg1031, <es:_GLOBAL_OFFSET_TABLE_>, %EFLAGS<imp-def>
        %reg1024<def> = MOV32rr %EAX
        %reg1025<def> = MOV32rr %EDX
        %reg1026<def> = MOV32rr %ECX
        %reg1027<def> = MOV8rm %reg0, 1, %reg0, 0, Mem:LD(1,1) [0x12018e0 + 0]

Much better!

llvm-svn: 48307
2008-03-12 22:19:41 +00:00
Dan Gohman
2ec41788ab Fix this test on hosts that don't have sse2.
llvm-svn: 48296
2008-03-12 20:40:51 +00:00
Dan Gohman
2a124430e9 Make this test x86-specific for now; targets that don't use
the automated CallingConv code to handle return values typically
don't support multiple return values.

llvm-svn: 48265
2008-03-12 00:25:14 +00:00
Anton Korobeynikov
d2fd135594 Testcase for PR2137
llvm-svn: 48258
2008-03-11 22:43:42 +00:00
Anton Korobeynikov
efa9405b94 Update testcase for recent aliases change
llvm-svn: 48250
2008-03-11 21:42:20 +00:00
Dan Gohman
f9f25bd41b Add a test to ensure that all-ones vectors are materialized with pcmpeqd.
llvm-svn: 48247
2008-03-11 21:37:00 +00:00
Dan Gohman
1fece90de9 Use the correct value for InSignBit.
llvm-svn: 48245
2008-03-11 21:29:43 +00:00
Chris Lattner
fd2c24af72 Implement basic support for the 'f' register class constraint. This basically
works, but probably won't if you mix it with 't' or 'u' yet.

llvm-svn: 48243
2008-03-11 19:50:13 +00:00
Evan Cheng
af1c76846d When the register allocator runs out of registers, spill a physical register around the def's and use's of the interval being allocated to make it possible for the interval to target a register and spill it right away and restore a register for uses. This likely generates terrible code but is before than aborting.
llvm-svn: 48218
2008-03-11 07:19:34 +00:00
Chris Lattner
f0684bfd16 Don't emit FP_REG_KILL into a block that just returns. Nothing
can be live out of the block anyway, so it isn't needed.

llvm-svn: 48192
2008-03-10 23:34:12 +00:00
Dan Gohman
47137eba06 Fix mul expansion to check the correct number of bits for
zero extension when checking if an unsigned multiply is
safe.

llvm-svn: 48171
2008-03-10 20:42:19 +00:00
Dale Johannesen
62a0b6a79b These tests don't work unless SSE2 is active.
Judging from the checking comments this is intentional,
so add the flag (makes them pass on non-x86 host).

llvm-svn: 48157
2008-03-10 17:33:57 +00:00
Dale Johannesen
c9ecee85c4 There is no "-mattr=+sse1" flag; fix test for non-x86 hosts.
llvm-svn: 48156
2008-03-10 17:13:37 +00:00
Evan Cheng
02b66c3a32 - Fix a subtle bug in RemoveCopyByCommutingDef. ALR is the live range where the source is defined; BLR is the live range which is defined by the copy.
If ALR and BLR overlaps and end of BLR extends beyond end of ALR, e.g.                                                                                                 
 A = or A, B                                                                                                                                                            
 ...                                                                                                                                                                    
 B = A                                                                                                                                                                  
 ...                                                                                                                                                                    
 C = A<kill>                                                                                                                                                            
 ...                                                                                                                                                                    
   = B                                                                                                                                                                  
                                                                                                                                                                        
then do not add kills of A to the newly created B interval.
- Also fix some kill info update bug.

llvm-svn: 48141
2008-03-10 08:11:32 +00:00
Evan Cheng
3c0ddc999f Avoid creating BUILD_VECTOR of all zero elements of "non-normalized" type (e.g. v8i16 on x86) after legalizer. Instruction selection does not expect to see them. In all likelihood this can only be an issue in a bugpoint reduced test case.
llvm-svn: 48136
2008-03-10 07:19:13 +00:00
Chris Lattner
b6bfedbcfd teach X86InstrInfo::copyRegToReg how to copy into ST(0) from
an RFP register class.

Teach ScheduleDAG how to handle CopyToReg with different src/dst 
reg classes.

This allows us to compile trivial inline asms that expect stuff
on the top of x87-fp stack.

llvm-svn: 48107
2008-03-09 09:15:31 +00:00
Chris Lattner
8d0203478f Add ScheduleDAG support for copytoreg where the src/dst register are
in different register classes, e.g. copy of ST(0) to RFP*.  This gets
some really trivial inline asm working that plops things on the top of
stack (PR879)

llvm-svn: 48105
2008-03-09 08:49:15 +00:00
Chris Lattner
b9a4c86fbf reduce this testcase more
llvm-svn: 48092
2008-03-09 06:57:21 +00:00
Chris Lattner
b628208161 Finish implementing a readme entry: when inserting an i64 variable
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle.  This allows us to compile
vec_set-B.ll into:

_test3:
	movl	$1234567, %eax
	andl	4(%esp), %eax
	movd	%eax, %xmm0
	ret

instead of:

_test3:
	subl	$28, %esp
	movl	$1234567, %eax
	andl	32(%esp), %eax
	movl	%eax, (%esp)
	movl	$0, 4(%esp)
	movq	(%esp), %xmm0
	addl	$28, %esp
	ret

llvm-svn: 48090
2008-03-09 05:42:06 +00:00
Chris Lattner
17f68a3075 Implement a readme entry, compiling
#include <xmmintrin.h>
__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}

into:
	movl	$1, %eax
	movd	%eax, %xmm0
	ret

instead of a constant pool load.

llvm-svn: 48063
2008-03-09 01:05:04 +00:00
Chris Lattner
24031c9426 make this test harder
llvm-svn: 48061
2008-03-09 00:30:06 +00:00
Chris Lattner
7173d3bd70 Teach SD some vector identities, allowing us to compile vec_set-9 into:
_test3:
	movd	%rdi, %xmm1
	#IMPLICIT_DEF %xmm0
	punpcklqdq	%xmm1, %xmm0
	ret

instead of:

_test3:
	#IMPLICIT_DEF %rax
	movd	%rax, %xmm0
	movd	%rdi, %xmm1
	punpcklqdq	%xmm1, %xmm0
	ret

This is still not ideal.  There is no reason to two xmm regs.

llvm-svn: 48058
2008-03-08 23:43:36 +00:00
Evan Cheng
dba1dfe962 Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.
llvm-svn: 48042
2008-03-08 00:58:38 +00:00
Chris Lattner
aa81dc7d21 mark frem as expand for all legal fp types on x86, regardless of whether
we're using SSE or not.  This fixes PR2122.

llvm-svn: 48006
2008-03-07 06:36:32 +00:00
Chris Lattner
a9fcb187af Generalize FP constant shrinking optimization to apply to any vt
except ppc long double.  This allows us to shrink constant pool
entries for x86 long double constants, which in turn allows us to
use flds/fldl instead of fldt.

llvm-svn: 47938
2008-03-05 06:48:13 +00:00
Evan Cheng
e0b3c221ab Add a target lowering hook to control whether it's worthwhile to compress fp constant.
For x86, if sse2 is available, it's not a good idea since cvtss2sd is slower than a movsd load and it prevents load folding. On x87, it's important to shrink fp constant since fldt is very expensive.

llvm-svn: 47931
2008-03-05 01:30:59 +00:00
Evan Cheng
14f556a6d7 Really fix the test.
llvm-svn: 47882
2008-03-04 08:01:56 +00:00
Evan Cheng
7a67175fcc Fix broken test.
llvm-svn: 47881
2008-03-04 07:59:13 +00:00
Evan Cheng
3123d6ced3 Add PR1501 test case.
llvm-svn: 47874
2008-03-04 00:47:45 +00:00
Chris Lattner
299977b5ca Evan implemented these.
llvm-svn: 47828
2008-03-02 18:05:14 +00:00
Evan Cheng
e1d3e0958b Set to default: x86 no longer fold and into test if it has more than one use.
llvm-svn: 47711
2008-02-28 07:46:38 +00:00
Evan Cheng
da92e34fe3 Fix a bug in dead spill slot elimination.
llvm-svn: 47687
2008-02-27 19:57:11 +00:00
Chris Lattner
e51c23341d actually run llc, thanks Dan :)
llvm-svn: 47677
2008-02-27 17:46:54 +00:00
Evan Cheng
295ae42ede Don't track max alignment during stack object allocations since they can be deleted later. Let PEI compute it.
llvm-svn: 47668
2008-02-27 10:04:56 +00:00
Chris Lattner
1f46cc2345 Make X86TargetLowering::LowerSINT_TO_FP return without creating a dead
stack slot and store if the  SINT_TO_FP is actually legal.  This allows
us to compile:

double a(double b) {return (unsigned)b;}

to:

_a:
	cvttsd2siq	%xmm0, %rax
	movl	%eax, %eax
	cvtsi2sdq	%rax, %xmm0
	ret

instead of:

_a:
	subq	$8, %rsp
	cvttsd2siq	%xmm0, %rax
	movl	%eax, %eax
	cvtsi2sdq	%rax, %xmm0
	addq	$8, %rsp
	ret

crazy.

llvm-svn: 47660
2008-02-27 05:57:41 +00:00
Chris Lattner
bc686e546a Compile x86-64-and-mask.ll into:
_test:
	movl	%edi, %eax
	ret

instead of:

_test:
        movl    $4294967295, %ecx
        movq    %rdi, %rax
        andq    %rcx, %rax
        ret

It would be great to write this as a Pat pattern that used subregs 
instead of a 'pseudo' instruction, but I don't know how to do that
in td files.

llvm-svn: 47658
2008-02-27 05:47:54 +00:00
Evan Cheng
7553230e3a Spiller now remove unused spill slots.
llvm-svn: 47657
2008-02-27 03:04:06 +00:00