1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00
Commit Graph

1372 Commits

Author SHA1 Message Date
Daniel Dunbar
c8af4f3a0a Revert r112938 "Fix comment", which depends on r112934, which introduced some
infinite loop and select failures.

llvm-svn: 112999
2010-09-03 19:38:08 +00:00
Daniel Dunbar
4ece67890b Revert r112942, "Use punpckh and unpckh family of nodes instead of using unpckh
mask pattern fragment", which depends on r112934, which introduced some infinite
loop and select failures.

llvm-svn: 112998
2010-09-03 19:38:05 +00:00
Bruno Cardoso Lopes
70f376e9da Use punpckh and unpckh family of nodes instead of using unpckh mask pattern fragment
llvm-svn: 112942
2010-09-03 01:39:08 +00:00
Bruno Cardoso Lopes
b107a092a5 Fix comment
llvm-svn: 112938
2010-09-03 01:28:51 +00:00
Bruno Cardoso Lopes
e1ad6555a8 - Use specific nodes to match unpckl masks.
- Teach getShuffleScalarElt how to handle more target
specific nodes, so the DAGCombine can make use of it.
- Add another hack to avoid the node update problem
during legalization. More description on the comments

llvm-svn: 112934
2010-09-03 01:24:00 +00:00
Anton Korobeynikov
a65910e5ca Revert win64 changes. They seem to be incomplete
llvm-svn: 112885
2010-09-02 22:31:32 +00:00
Anton Korobeynikov
339ab60a5b Properly allocate win64 shadow reg area.
Patch by Jan Sjodin!

llvm-svn: 112875
2010-09-02 22:16:28 +00:00
Bruno Cardoso Lopes
659f549638 Replace unpckl_undef and unpckh_undef matching with target specific opcodes
llvm-svn: 112806
2010-09-02 05:23:12 +00:00
Bruno Cardoso Lopes
9d4a11d4c6 Move condition out to prepare for more matching
llvm-svn: 112805
2010-09-02 04:20:26 +00:00
Bruno Cardoso Lopes
1b9095fff1 Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted for it
llvm-svn: 112804
2010-09-02 03:57:58 +00:00
Bruno Cardoso Lopes
dcdab94661 become more strict about when it's safe to use X86ISD::MOVLPS
llvm-svn: 112799
2010-09-02 02:35:51 +00:00
Bruno Cardoso Lopes
b73f0cbc7a Revert r112689, avoid those kind of checks cause they mess up with mmx
llvm-svn: 112760
2010-09-01 22:59:03 +00:00
Bruno Cardoso Lopes
9375b2f67d Use movlps, movlpd, movss and movsd specific nodes instead of pattern matching with movlp pattern fragment
llvm-svn: 112694
2010-09-01 05:08:25 +00:00
Bruno Cardoso Lopes
b69568ab33 minor change, simplify some logic
llvm-svn: 112689
2010-09-01 00:57:08 +00:00
Bruno Cardoso Lopes
c31697f68c Move some functions around so they can be used for some other to come function
llvm-svn: 112687
2010-09-01 00:51:36 +00:00
Bruno Cardoso Lopes
80613a070e Use x86 specific MOVSLDUP node, add more patterns to match it and remove useless load nodes
llvm-svn: 112661
2010-08-31 22:35:05 +00:00
Bruno Cardoso Lopes
8fc83b1960 Use x86 specific MOVSHDUP node and add more patterns to match it
llvm-svn: 112657
2010-08-31 22:22:11 +00:00
Bruno Cardoso Lopes
dfa177cf81 Use MOVHLPS node instead of matching using movhlps and movhlps_undef pattern fragments
llvm-svn: 112644
2010-08-31 21:38:49 +00:00
Bruno Cardoso Lopes
6fbe7b9ddd Use MOVLHPS and MOVHLPS x86 nodes whenever possible. Also remove some useless nodes
llvm-svn: 112642
2010-08-31 21:15:21 +00:00
Bruno Cardoso Lopes
08d5d62dcb Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles
llvm-svn: 112570
2010-08-31 02:26:40 +00:00
Chris Lattner
8cb4abbc0e fix the buildvector->insertp[sd] logic to not always create a redundant
insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379
2010-08-28 17:59:08 +00:00
Chris Lattner
c3b630d64b fix the BuildVector -> unpcklps logic to not do pointless shuffles
when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378
2010-08-28 17:28:30 +00:00
Chris Lattner
7fa5fa1207 improve comments in the unpcklps generating logic, introduce
a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose.  No functionality change.

llvm-svn: 112377
2010-08-28 17:15:43 +00:00
Bruno Cardoso Lopes
1052e6d5d9 Clean up the logic of vector shuffles -> vector shifts.
Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.

llvm-svn: 112348
2010-08-28 02:46:39 +00:00
Anton Korobeynikov
62a9879ef4 Properly handle passing of FP stuff to varargs function on Win64:
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
2010-08-27 14:43:06 +00:00
Bruno Cardoso Lopes
6150648a64 zap the now unused MVT::getIntVectorWithNumElements
llvm-svn: 112218
2010-08-26 20:53:12 +00:00
Chris Lattner
148485f707 implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
llvm-svn: 112171
2010-08-26 05:51:22 +00:00
Chris Lattner
5256226fc8 fix sse1 only codegen in x86-64 mode, which is something we
apparently try to support.

llvm-svn: 112168
2010-08-26 05:24:29 +00:00
Bruno Cardoso Lopes
28f3261dbd Revert this for now, PUNPCKLDQ dont operate on v4f32
llvm-svn: 112090
2010-08-25 21:26:37 +00:00
Anton Korobeynikov
1544f79e36 Fix nasty mingw32 bug, which e.g. prevented llvm-gcc bootstrap there.
Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove
other flags-clobberring stuff (e.g. cmp instructions) occuring after
_alloca call.

llvm-svn: 112034
2010-08-25 07:50:11 +00:00
Bruno Cardoso Lopes
af72dd7362 PUNPCKLDQ should also be used for v4f32
llvm-svn: 112020
2010-08-25 02:55:40 +00:00
Bruno Cardoso Lopes
33aa4f7d1c teach lowering to get target specific nodes for pshufd, emulating the same isel behavior for now, so we can pass all vector shuffle tests
llvm-svn: 112017
2010-08-25 02:35:37 +00:00
Dan Gohman
e400c660e4 Fix X86's isLegalAddressingMode to recognize that static addresses
need not be RIP-relative in small mode.

llvm-svn: 111917
2010-08-24 15:55:12 +00:00
Bruno Cardoso Lopes
7939025262 Use pshufhw and pshuflw in more cases and fix getTargetShuffleNode number of arguments
llvm-svn: 111890
2010-08-24 01:16:15 +00:00
Bruno Cardoso Lopes
ed9ff8d8d0 Start using target speficic nodes for shuffles: pshufhw and pshuflw
llvm-svn: 111837
2010-08-23 20:41:02 +00:00
Anton Korobeynikov
a68e2a53a1 Revert invalid r111792. Jump tables are not broken on x86-64 / coff,
it's COFF emitter which does not support differences of two symbols
(and needs to be fixed). GAS is pretty fine with code produced.

llvm-svn: 111801
2010-08-23 07:38:51 +00:00
Michael J. Spencer
c52ac23659 Workaround broken jump tables on x86-64 COFF.
llvm-svn: 111792
2010-08-23 04:45:37 +00:00
Bruno Cardoso Lopes
1998fbbf1a Prepare LowerVECTOR_SHUFFLEv8i16 to use x86 target specific nodes directly
llvm-svn: 111704
2010-08-21 01:32:18 +00:00
Bruno Cardoso Lopes
28d9071635 This is the first step towards refactoring the x86 vector shuffle code. The
general idea here is to have a group of x86 target specific nodes which are
going to be selected during lowering and then directly matched in isel.

The commit includes the addition of those specific nodes and a *bunch* of
patterns, and incrementally we're going to switch between them and what we
have right now. Both the patterns and target specific nodes can change as
we move forward with this work.

llvm-svn: 111691
2010-08-20 22:55:05 +00:00
Anton Korobeynikov
f0600e9e8a More fixes for win64:
- Do not clobber al during variadic calls, this is AMD64 ABI-only feature
  - Emit wincall64, where necessary
Patch by Cameron Esfahani!

llvm-svn: 111289
2010-08-17 21:06:07 +00:00
Eric Christopher
1470fe415c Rework how the non-sse2 memory barrier is lowered so that the
encoding is correct for the built-in assembler.

Based on a patch from Chris.

llvm-svn: 111083
2010-08-14 21:51:50 +00:00
Chris Lattner
8426971169 improve indentation
llvm-svn: 111073
2010-08-14 17:26:09 +00:00
Bruno Cardoso Lopes
8b07859f3a Fix comment to reflect code, and remove an unused argument
llvm-svn: 111022
2010-08-13 17:50:47 +00:00
Bruno Cardoso Lopes
bb491bd56c Begin to support some vector operations for AVX 256-bit intructions. The long
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897
2010-08-12 02:06:36 +00:00
Dan Gohman
d91d51116b Use ISD::ADD instead of ISD::SUB with a negated constant. This
avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.

Also, use getCopyFromReg instead of getRegister to read a
physical register's value.

llvm-svn: 110835
2010-08-11 18:14:00 +00:00
Bruno Cardoso Lopes
6eb24fd744 Add AVX matching patterns to Packed Bit Test intrinsics.
Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.

This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.

llvm-svn: 110744
2010-08-10 23:25:42 +00:00
Bruno Cardoso Lopes
2a7ed4b5c9 Support AVX 256-bit load and store intrinsics
llvm-svn: 110645
2010-08-10 01:43:16 +00:00
Bruno Cardoso Lopes
a26a97510a Support very basic (doesn't include ABI support in the front-end, varags, ...) 256-bit argument passing and return for AVX
llvm-svn: 110394
2010-08-05 23:35:51 +00:00
Eric Christopher
0e09eb9f77 Make x86-64 membarriers work without sse and clean up some of the
uses.

llvm-svn: 110274
2010-08-04 23:03:04 +00:00
Bruno Cardoso Lopes
0c0dd2173c Support all 128-bit AVX vector intrinsics. Most part of them I already
declared during the addition of the assembler support, the additional
changes are:
- Add missing intrinsics
- Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file.
- Duplicate some patterns to AVX mode.
- Step into PCMPEST/PCMPIST custom inserter and add AVX versions.

llvm-svn: 109878
2010-07-30 19:54:33 +00:00