1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-28 22:42:52 +01:00
Commit Graph

276 Commits

Author SHA1 Message Date
Evan Cheng
e5e0b4660d Eliminate x86.sse2.punpckh.qdq and x86.sse2.punpckl.qdq.
llvm-svn: 51533
2008-05-24 02:56:30 +00:00
Evan Cheng
564238c841 Eliminate x86.sse2.movs.d, x86.sse2.shuf.pd, x86.sse2.unpckh.pd, and x86.sse2.unpckl.pd intrinsics. These will be lowered into shuffles.
llvm-svn: 51531
2008-05-24 02:14:05 +00:00
Evan Cheng
98a292a302 Remove x86.sse2.loadh.pd and x86.sse2.loadl.pd. These will be lowered into load and shuffle instructions.
llvm-svn: 51522
2008-05-24 00:07:29 +00:00
Evan Cheng
4f660778f0 Use movlps / movhps to modify low / high half of 16-byet memory location.
llvm-svn: 51501
2008-05-23 21:23:16 +00:00
Evan Cheng
ec8bd19399 Fix a duplicated pattern.
llvm-svn: 51490
2008-05-23 18:00:18 +00:00
Dan Gohman
6cc0b4f262 Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add
load-folding table entries for PMULDQ and PMULLD.

llvm-svn: 51489
2008-05-23 17:49:40 +00:00
Evan Cheng
097e95b1f7 Bug: rcpps can only folds a load if the address is 16-byte aligned. Fixed many 'ps' load folding patterns in X86InstrSSE.td which are missing the proper alignment checks.
Also fixed some 80 col. violations.

llvm-svn: 51462
2008-05-23 00:37:07 +00:00
Evan Cheng
d1373cd497 Add missing patterns.
llvm-svn: 51435
2008-05-22 18:56:56 +00:00
Evan Cheng
d694e78e36 movsd and movq do not require 16-byte alignment. This fixes vec_set-5.ll on Linux.
llvm-svn: 51327
2008-05-20 18:24:47 +00:00
Nate Begeman
c290daf581 Fix one more encoding bug.
llvm-svn: 51057
2008-05-13 17:52:09 +00:00
Nate Begeman
b9a3d141aa Fix and encoding error in the psrad xmm, imm8 instruction.
llvm-svn: 51020
2008-05-13 01:47:52 +00:00
Nate Begeman
5d939498c3 Teach Legalize how to scalarize VSETCC
Teach X86 a few more vsetcc patterns.  Custom lowering for unsupported ones is next.

llvm-svn: 51009
2008-05-12 23:09:43 +00:00
Nate Begeman
2ae55cecc6 Initial X86 codegen support for VSETCC.
llvm-svn: 51000
2008-05-12 20:34:32 +00:00
Evan Cheng
6a3fa28b38 Some clean up.
llvm-svn: 50929
2008-05-10 00:59:18 +00:00
Evan Cheng
2adea48f7e Add a pattern to do move the low element of a v4f32 and zero extend the rest.
llvm-svn: 50922
2008-05-09 23:37:55 +00:00
Evan Cheng
3493e43afd Handle a few more cases of folding load i64 into xmm and zero top bits.
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.

llvm-svn: 50918
2008-05-09 21:53:03 +00:00
Evan Cheng
f824b47188 Use movq to move low half of XMM register and zero-extend the rest.
llvm-svn: 50874
2008-05-08 22:35:02 +00:00
Evan Cheng
f97e716511 Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.
llvm-svn: 50838
2008-05-08 00:57:18 +00:00
Evan Cheng
c1c2adbfc6 Add separate intrinsics for MMX / SSE shifts with i32 integer operands. This allow us to simplify the horribly complicated matching code.
llvm-svn: 50601
2008-05-03 00:52:09 +00:00
Evan Cheng
583a346ec6 80 column violation.
llvm-svn: 50575
2008-05-02 07:53:32 +00:00
Chris Lattner
2c5b96fbee A better fix for my previous patch, MOVZQI2PQIrr just requires SSE2.
llvm-svn: 49986
2008-04-20 05:52:46 +00:00
Dan Gohman
be8f2b452b Add support for the form of the SSE41 extractps instruction that
puts its result in a 32-bit GPR.

llvm-svn: 49762
2008-04-16 02:32:24 +00:00
Chris Lattner
3b289289a7 Fix the x86-64 side of PR2108 by adding a v2f64 version of
MOVZQI2PQIrr.  This would be better handled as a dag combine 
(with the goal of eliminating the bitconvert) but I don't know
how to do that safely.  Thoughts welcome.

llvm-svn: 49463
2008-04-10 05:13:43 +00:00
Evan Cheng
4d7b2ab16f Favors pshufd over shufps when shuffling elements from one vector. pshufd is faster than shufps.
llvm-svn: 49244
2008-04-05 00:30:36 +00:00
Evan Cheng
6323ea8467 Fix some SSE4.1 instruction encoding bugs.
llvm-svn: 48815
2008-03-26 08:11:49 +00:00
Evan Cheng
dbdf48276a - SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix the instruction specification and teaches lowering code to use it only when the only use is a store instruction.
llvm-svn: 48746
2008-03-24 21:52:23 +00:00
Nate Begeman
f9691b8236 Add a couple missing SSE4 instructions
llvm-svn: 48430
2008-03-16 21:14:46 +00:00
Evan Cheng
11d2c09adc Replace all target specific implicit def instructions with a target independent one: TargetInstrInfo::IMPLICIT_DEF.
llvm-svn: 48380
2008-03-15 00:03:38 +00:00
Evan Cheng
877c5ecabd Fix some 80 col violations.
llvm-svn: 48361
2008-03-14 07:46:48 +00:00
Evan Cheng
fc6645a382 Fix a number of encoding bugs. SSE 4.1 instructions MPSADBWrri, PINSRDrr, etc. have 8-bits immediate field (ImmT == Imm8).
llvm-svn: 48360
2008-03-14 07:39:27 +00:00
Evan Cheng
df92afe7d3 Clean up my own mess.
X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.

llvm-svn: 48279
2008-03-12 07:02:50 +00:00
Evan Cheng
dba1dfe962 Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.
llvm-svn: 48042
2008-03-08 00:58:38 +00:00
Evan Cheng
a36562006a isTwoAddress = 1 -> Constraints.
llvm-svn: 47941
2008-03-05 08:19:16 +00:00
Evan Cheng
6c2bb7c67e PSLLWri etc. are two-address instructions.
llvm-svn: 47940
2008-03-05 08:11:27 +00:00
Evan Cheng
bb577266bf - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.

llvm-svn: 47290
2008-02-18 23:04:32 +00:00
Andrew Lenharth
c178981b85 llvm.memory.barrier, and impl for x86 and alpha
llvm-svn: 47204
2008-02-16 01:24:58 +00:00
Nate Begeman
810b85bde8 SSE4.1 64b integer insert/extract pattern support
Move formats into the formats file

llvm-svn: 47035
2008-02-12 22:51:28 +00:00
Nate Begeman
5a4e290b70 Enable SSE4 codegen and pattern matching.
Add some notes to the README.

llvm-svn: 46949
2008-02-11 04:19:36 +00:00
Nate Begeman
297d683980 xmm0 variable blends
llvm-svn: 46931
2008-02-10 18:47:57 +00:00
Nate Begeman
2627ffd14b memopv16i8 had wrong alignment requirement, would have broken pabsb
pabs{b,w,d} are not two address
fix extract-to-mem sse4 ops
add sse4 vector sign extend nodes

llvm-svn: 46915
2008-02-09 23:46:37 +00:00
Nate Begeman
a78c35a368 Skeleton of insert and extract matching, more to come
llvm-svn: 46902
2008-02-09 01:38:08 +00:00
Nate Begeman
16830a34c4 The rest of the SSE4.1 intrinsic patterns that are obvious to me. Getting
Evan's help with the rest.

llvm-svn: 46697
2008-02-04 06:00:24 +00:00
Nate Begeman
ee7503810f Some more SSE 4.1 intrinsic patterns.
llvm-svn: 46696
2008-02-04 05:34:34 +00:00
Nate Begeman
ead8dfeef2 SSE 4.1 Intrinsics and detection
llvm-svn: 46681
2008-02-03 07:18:54 +00:00
Chris Lattner
16a8f126d3 Significantly simplify and improve handling of FP function results on x86-32.
This case returns the value in ST(0) and then has to convert it to an SSE
register.  This causes significant codegen ugliness in some cases.  For 
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:

_bar:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.

Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always 
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.

This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case.  This gives 
us this code for the example above:

_bar:
	subl	$12, %esp
	call	L_foo$stub
	addl	$12, %esp
	ret

The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
less gross than the code it is replacing :)

This also allows us to generate better code in several other cases.  For 
example on fp-stack-ret-conv.ll, we now generate:

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstps	8(%esp)
	movl	16(%esp), %eax
	cvtss2sd	8(%esp), %xmm0
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

where before we produced (incidentally, the old bad code is identical to what
gcc produces):

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	cvtsd2ss	(%esp), %xmm0
	cvtss2sd	%xmm0, %xmm0
	movl	16(%esp), %eax
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
deficiency that is unrelated to this patch.

llvm-svn: 46307
2008-01-24 08:07:48 +00:00
Chris Lattner
cb62477ec2 add some missing flags.
llvm-svn: 45859
2008-01-11 06:59:07 +00:00
Chris Lattner
9d7971791b Start inferring side effect information more aggressively, and fix many bugs in the
x86 backend where instructions were not marked maystore/mayload, and perf issues where
instructions were not marked neverHasSideEffects.  It would be really nice if we could
write patterns for copy instructions.

I have audited all the x86 instructions down to MOVDQAmr.  The flags on others and on
other targets are probably not right in all cases, but no clients currently use this
info that are enabled by default.

llvm-svn: 45829
2008-01-10 07:59:24 +00:00
Chris Lattner
6ad01a9965 remove explicit sets of 'neverHasSideEffects' that can now be
inferred from the instr patterns.

llvm-svn: 45824
2008-01-10 05:45:39 +00:00
Chris Lattner
14310afe42 rename isLoad -> isSimpleLoad due to evan's desire to have such a predicate.
llvm-svn: 45667
2008-01-06 23:38:27 +00:00
Chris Lattner
ad9a6ccb83 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00