llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00

Author	SHA1	Message	Date
Eli Friedman	5cccb60bad	Fix for PR2484: add an SSE1 pattern for a shuffle we normally prefer to handle with an SSE2 instruction. llvm-svn: 73760	2009-06-19 07:00:55 +00:00
Eli Friedman	539325c8e7	Fix an obvious typo. llvm-svn: 72987	2009-06-06 05:55:37 +00:00
Bill Wendling	dd6cbdb28c	The MONITOR and MWAIT instructions have insufficient information for decoding. Essentially, they both map to the same column in the "opcode extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm complicates decoding this. Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code emitter special case these, a la [SML]FENCE. llvm-svn: 72556	2009-05-28 23:40:46 +00:00
Evan Cheng	3d35b7e54c	Fix MOVMSKPDrr encoding. llvm-svn: 72535	2009-05-28 18:55:28 +00:00
Evan Cheng	99643b717c	Fix PSIGND encoding bug. Patch by Sean Callanan. llvm-svn: 72534	2009-05-28 18:48:53 +00:00
Bill Wendling	8a8d271d29	"The instructions MMX_PSADBWrm and MMX_PSADBWrr have opcode 0b11100000 (e0), but the Intel manual (screenshot) says it should be 0b11110110 (f6). The existing encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be 0f e0." Patch by Sean Callanan! llvm-svn: 72508	2009-05-28 02:04:00 +00:00
Evan Cheng	9fd570a8a1	Fix sfence jit encoding. Patch by Sean Callanan. llvm-svn: 72488	2009-05-27 18:38:01 +00:00
Evan Cheng	7f78e63cf3	80 col violations. llvm-svn: 71582	2009-05-12 20:17:52 +00:00
Nate Begeman	b407809122	Fix infinite recursion in the C++ code which handles movddup by making it unnecessary. llvm-svn: 70425	2009-04-29 22:47:44 +00:00
Nate Begeman	9d121924fd	2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. llvm-svn: 70225	2009-04-27 18:41:29 +00:00
Rafael Espindola	0b1037ad26	Revert 69952. Causes testsuite failures on linux x86-64. llvm-svn: 69967	2009-04-24 12:40:33 +00:00
Nate Begeman	c1a09c7dfa	PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next. llvm-svn: 69952	2009-04-24 03:42:54 +00:00
Rafael Espindola	7eb72dc5f2	Re-apply 68552. Tested by bootstrapping llvm-gcc and using that to build llvm. llvm-svn: 68645	2009-04-08 21:14:34 +00:00
Bill Wendling	6e702cf68c	Temporarily revert r68552. This was causing a failure in the self-hosting LLVM builds. --- Reverse-merging (from foreign repository) r68552 into '.': U test/CodeGen/X86/tls8.ll U test/CodeGen/X86/tls10.ll U test/CodeGen/X86/tls2.ll U test/CodeGen/X86/tls6.ll U lib/Target/X86/X86Instr64bit.td U lib/Target/X86/X86InstrSSE.td U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86RegisterInfo.cpp U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86CodeEmitter.cpp U lib/Target/X86/X86FastISel.cpp U lib/Target/X86/X86InstrInfo.h U lib/Target/X86/X86ISelDAGToDAG.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h U lib/Target/X86/X86ISelLowering.h U lib/Target/X86/X86InstrInfo.cpp U lib/Target/X86/X86InstrBuilder.h U lib/Target/X86/X86RegisterInfo.td llvm-svn: 68560	2009-04-07 22:35:25 +00:00
Rafael Espindola	0324937229	Reduce code duplication on the TLS implementation. This introduces a small regression on the generated code quality in the case we are just computing addresses, not loading values. Will work on it and on X86-64 support. llvm-svn: 68552	2009-04-07 21:37:46 +00:00
Evan Cheng	4014a9a5b8	ADDS{D\|S}rr_Int and MULS{D\|S}rr_Int are not commutable. The users of these intrinsics expect the high bits will not be modified. llvm-svn: 65499	2009-02-26 03:12:02 +00:00
Nate Begeman	e0093d2501	Generate better code for v8i16 shuffles on SSE2 Generate better code for v16i8 shuffles on SSE2 (avoids stack) Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops. Document the shuffle matching logic and add some FIXMEs for later further cleanups. New tests that test the above. Examples: New: _shuf2: pextrw $7, %xmm0, %eax punpcklqdq %xmm1, %xmm0 pshuflw $128, %xmm0, %xmm0 pinsrw $2, %eax, %xmm0 Old: _shuf2: pextrw $2, %xmm0, %eax pextrw $7, %xmm0, %ecx pinsrw $2, %ecx, %xmm0 pinsrw $3, %eax, %xmm0 movd %xmm1, %eax pinsrw $4, %eax, %xmm0 ret ========= New: _shuf4: punpcklqdq %xmm1, %xmm0 pshufb LCPI1_0, %xmm0 Old: _shuf4: pextrw $3, %xmm0, %eax movsd %xmm1, %xmm0 pextrw $3, %xmm1, %ecx pinsrw $4, %ecx, %xmm0 pinsrw $5, %eax, %xmm0 ======== New: _shuf1: pushl %ebx pushl %edi pushl %esi pextrw $1, %xmm0, %eax rolw $8, %ax movd %xmm0, %ecx rolw $8, %cx pextrw $5, %xmm0, %edx pextrw $4, %xmm0, %esi pextrw $3, %xmm0, %edi pextrw $2, %xmm0, %ebx movaps %xmm0, %xmm1 pinsrw $0, %ecx, %xmm1 pinsrw $1, %eax, %xmm1 rolw $8, %bx pinsrw $2, %ebx, %xmm1 rolw $8, %di pinsrw $3, %edi, %xmm1 rolw $8, %si pinsrw $4, %esi, %xmm1 rolw $8, %dx pinsrw $5, %edx, %xmm1 pextrw $7, %xmm0, %eax rolw $8, %ax movaps %xmm1, %xmm0 pinsrw $7, %eax, %xmm0 popl %esi popl %edi popl %ebx ret Old: _shuf1: subl $252, %esp movaps %xmm0, (%esp) movaps %xmm0, 16(%esp) movaps %xmm0, 32(%esp) movaps %xmm0, 48(%esp) movaps %xmm0, 64(%esp) movaps %xmm0, 80(%esp) movaps %xmm0, 96(%esp) movaps %xmm0, 224(%esp) movaps %xmm0, 208(%esp) movaps %xmm0, 192(%esp) movaps %xmm0, 176(%esp) movaps %xmm0, 160(%esp) movaps %xmm0, 144(%esp) movaps %xmm0, 128(%esp) movaps %xmm0, 112(%esp) movzbl 14(%esp), %eax movd %eax, %xmm1 movzbl 22(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 42(%esp), %eax movd %eax, %xmm1 movzbl 50(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm1, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 77(%esp), %eax movd %eax, %xmm1 movzbl 84(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 104(%esp), %eax movd %eax, %xmm1 punpcklbw %xmm1, %xmm0 punpcklbw %xmm2, %xmm0 movaps %xmm0, %xmm1 punpcklbw %xmm3, %xmm1 movzbl 127(%esp), %eax movd %eax, %xmm0 movzbl 135(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 155(%esp), %eax movd %eax, %xmm0 movzbl 163(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 188(%esp), %eax movd %eax, %xmm0 movzbl 197(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 217(%esp), %eax movd %eax, %xmm4 movzbl 225(%esp), %eax movd %eax, %xmm0 punpcklbw %xmm4, %xmm0 punpcklbw %xmm2, %xmm0 punpcklbw %xmm3, %xmm0 punpcklbw %xmm1, %xmm0 addl $252, %esp ret llvm-svn: 65311	2009-02-23 08:49:38 +00:00
Evan Cheng	cdb35e3f0f	Handle llvm.x86.sse2.maskmov.dqu in 64-bit. llvm-svn: 64240	2009-02-10 22:06:28 +00:00
Evan Cheng	87def37f67	A few more isAsCheapAsAMove. llvm-svn: 63852	2009-02-05 08:42:55 +00:00
Evan Cheng	2a965124b7	The memory alignment requirement on some of the mov{h\|l}p{d\|s} patterns are 16-byte. That is overly strict. These instructions read / write f64 memory locations without alignment requirement. llvm-svn: 63195	2009-01-28 08:35:02 +00:00
Dan Gohman	0e86745357	Whitespace and other minor adjustments to make SSE instructions have the same formatting as their corresponding SSE2 instructions, for consistency. llvm-svn: 61971	2009-01-09 02:27:34 +00:00
Mon P Wang	9f8945c5b9	Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1. llvm-svn: 61211	2008-12-18 21:42:19 +00:00
Dan Gohman	5dad0993a9	Rename isSimpleLoad to canFoldAsLoad, to better reflect its meaning. llvm-svn: 60487	2008-12-03 18:15:48 +00:00
Dan Gohman	ac6561793c	Mark x86's V_SET0 and V_SETALLONES with isSimpleLoad, and teach X86's foldMemoryOperand how to "fold" them, by converting them into constant-pool loads. When they aren't folded, they use xorps/cmpeqd, but for example when register pressure is high, they may now be folded as memory operands, which reduces register pressure. Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will remat it instead of copying zeros around (V_SETALLONES was already marked). llvm-svn: 60461	2008-12-03 05:21:24 +00:00
Evan Cheng	fa61b6a4ba	Fix lfence and mfence encoding. These look like MRM5r and MRM6r instructions except they do not have any operands. The RegModRM byte is encoded with register number 0. llvm-svn: 57692	2008-10-17 17:14:20 +00:00
Dan Gohman	6dba6b2384	Fix the predicate for memop64 to be a regular load, not just an unindexed load. llvm-svn: 57612	2008-10-16 00:03:00 +00:00
Dan Gohman	65702b2eb8	Now that predicates can be composed, simplify several of the predicates by extending simple predicates to create more complex predicates instead of duplicating the logic for the simple predicates. This doesn't reduce much redundancy in DAGISelEmitter.cpp's generated source yet; that will require improvements to DAGISelEmitter.cpp's instruction sorting, to make it more effectively group nodes with similar predicates together. llvm-svn: 57565	2008-10-15 06:50:19 +00:00
Dale Johannesen	03d0a7d70c	Fix SSE4.1 roundss, roundsd. While the instructions have the same pattern as roundpd/roundps, the Intel compiler builtins do not: rounds* has an extra operand. Fixes gcc.target/i386/sse4_1-rounds[sd]-[1234].c llvm-svn: 57370	2008-10-10 23:51:03 +00:00
Anders Carlsson	a9c42526f8	Certain patterns involving the "movss" instruction were marked as requiring SSE2, when in reality movss is an SSE1 instruction. llvm-svn: 57246	2008-10-07 16:14:11 +00:00
Bill Wendling	1c97db4090	"The original bug was a complaint that _mm_srli_si128 mis-compiled when passed a constant vector ("{0x123, 0x456}" syntax). The fix is to simplify the _mm_srli_si128 macro, and move the "* 8" from the macro into the compiler back-end. I can't change the existing __builtins because so many people are using them :-(." Patch by Stuart Hastings! llvm-svn: 56944	2008-10-02 05:56:52 +00:00
Evan Cheng	d63fc80c1e	Implement "punpckldq %xmm0, $xmm0" as "pshufd $0x50, %xmm0, %xmm" unless optimizing for code size. llvm-svn: 56711	2008-09-26 23:41:32 +00:00
Evan Cheng	5ce14d532c	unpckhps requires sse1, punpckhdq requires sse2. llvm-svn: 56697	2008-09-26 21:26:30 +00:00
Evan Cheng	d190aeb62d	With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps llvm-svn: 56620	2008-09-25 20:50:48 +00:00
Evan Cheng	d8a6a7beaa	pmovsxbq etc. requires sse4.1. llvm-svn: 56600	2008-09-25 00:49:51 +00:00
Evan Cheng	efd1f614ff	Fix patterns for SSE4.1 move and sign extend instructions. Also add instructions which fold VZEXT_MOVL and VZEXT_LOAD. llvm-svn: 56594	2008-09-24 23:27:55 +00:00
Dan Gohman	89660301e3	Rename ConstantSDNode::getValue to getZExtValue, for consistency with ConstantInt. This led to fixing a bug in TargetLowering.cpp using getValue instead of getAPIntValue. llvm-svn: 56159	2008-09-12 16:56:44 +00:00
Eli Friedman	fecea4b498	Fix for PR2687: Add patterns to match sint_to_fp and fp_to_sint for <2 x i32>. This is a little messy, but it works. We should really get rid of the intrinsics, though, since they map perfectly well to standard LLVM instructions. llvm-svn: 55864	2008-09-05 23:07:03 +00:00
Evan Cheng	419506a149	FsFLD0S{S\|D} and V_SETALLONES are as cheap as moves. llvm-svn: 55466	2008-08-28 07:52:25 +00:00
Dan Gohman	ebba07cccf	Tablegen generated code already tests the opcode value, so it's not necessary to use dyn_cast in these predicates. llvm-svn: 55055	2008-08-20 15:24:22 +00:00
Dan Gohman	ac992cdc1c	Add an EXTRACTPSmr pattern to match the pattern that X86ISelLowering creates. llvm-svn: 54544	2008-08-08 18:30:21 +00:00
Evan Cheng	f4d1119fbd	Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64. llvm-svn: 54376	2008-08-05 22:19:15 +00:00
Nate Begeman	61f6c21028	Fix a typo in last commit llvm-svn: 53720	2008-07-17 17:04:58 +00:00
Nate Begeman	af01bfff99	SSE codegen for vsetcc nodes llvm-svn: 53719	2008-07-17 16:51:19 +00:00
Evan Cheng	02a618dc56	Fix for PR2472. Use movss to set lower 32-bits of a zero XMM vector. llvm-svn: 53386	2008-07-10 01:08:23 +00:00
Evan Cheng	4e7b7b21a2	Horizontal-add instructions are not commutative. llvm-svn: 52363	2008-06-16 21:16:24 +00:00
Evan Cheng	acd614c262	mpsadbw is commutable. llvm-svn: 52352	2008-06-16 20:25:59 +00:00
Duncan Sands	40c8db881a	Disable some DAG combiner optimizations that may be wrong for volatile loads and stores. In fact this is almost all of them! There are three types of problems: (1) it is wrong to change the width of a volatile memory access. These may be used to do memory mapped i/o, in which case a load can have an effect even if the result is not used. Consider loading an i32 but only using the lower 8 bits. It is wrong to change this into a load of an i8, because you are no longer tickling the other three bytes. It is also unwise to make a load/store wider. For example, changing an i16 load into an i32 load is wrong no matter how aligned things are, since the fact of loading an additional 2 bytes can have i/o side-effects. (2) it is wrong to change the number of volatile load/stores: they may be counted by the hardware. (3) it is wrong to change a volatile load/store that requires one memory access into one that requires several. For example on x86-32, you can store a double in one processor operation, but to store an i64 requires two (two i32 stores). In a multi-threaded program you may want to bitcast an i64 to a double and store as a double because that will occur atomically, and be indivisible to other threads. So it would be wrong to convert the store-of-double into a store of an i64, because this will become two i32 stores - no longer atomic. My policy here is to say that the number of processor operations for an illegal operation is undefined. So it is alright to change a store of an i64 (requires at least two stores; but could be validly lowered to memcpy for example) into a store of double (one processor op). In short, if the new store is legal and has the same size then I say that the transform is ok. It would also be possible to say that transforms are always ok if before they were illegal, whether after they are illegal or not, but that's more awkward to do and I doubt it buys us anything much. However this exposed an interesting thing - on x86-32 a store of i64 is considered legal! That is because operations are marked legal by default, regardless of whether the type is legal or not. In some ways this is clever: before type legalization this means that operations on illegal types are considered legal; after type legalization there are no illegal types so now operations are only legal if they really are. But I consider this to be too cunning for mere mortals. Better to do things explicitly by testing AfterLegalize. So I have changed things so that operations with illegal types are considered illegal - indeed they can never map to a machine operation. However this means that the DAG combiner is more conservative because before it was "accidentally" performing transforms where the type was illegal because the operation was nonetheless marked legal. So in a few such places I added a check on AfterLegalize, which I suppose was actually just forgotten before. This causes the DAG combiner to do slightly more than it used to, which resulted in the X86 backend blowing up because it got a slightly surprising node it wasn't expecting, so I tweaked it. llvm-svn: 52254	2008-06-13 19:07:40 +00:00
Evan Cheng	04c0915a2f	Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq. llvm-svn: 51667	2008-05-29 08:22:04 +00:00
Dan Gohman	a5549a2f9c	Fix the encoding for two more "rm" instructions that were using MRMSrcReg. llvm-svn: 51630	2008-05-28 01:50:19 +00:00
Mon P Wang	8e37b2d13e	Fixed X86 encoding error CVTPS2PD and CVTPD2PS when the source operand is a memory location llvm-svn: 51626	2008-05-28 00:42:27 +00:00

1 2 3 4 5 ...

326 Commits