llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Craig Topper	c6d0bc2afc	Add SSE4A MOVNTSS/MOVNTSD instructions. llvm-svn: 156281	2012-05-07 05:36:19 +00:00
Nadav Rotem	021d75713c	AVX: Add additional vbroadcast replacement sequences for integers. Remove the v2f64 patterns because it does not match any vbroadcast instruction. llvm-svn: 155461	2012-04-24 18:09:59 +00:00
Nadav Rotem	d060c25823	AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions using the pattern (vbroadcast (i32load src)). In some cases, after we generate this pattern new users are added to the load node, which prevent the selection of the blend pattern. This commit provides fallback patterns which perform in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1). llvm-svn: 155437	2012-04-24 11:07:03 +00:00
Elena Demikhovsky	35721fc4f8	ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2 llvm-svn: 155309	2012-04-22 09:39:03 +00:00
Craig Topper	db4fcf7088	Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes. llvm-svn: 154801	2012-04-16 07:13:00 +00:00
Craig Topper	129dccdc84	Flip the arguments when converting vpermd/vpermps intrinsics into instructions. The intrinsic has the mask as the last operand, but the instruction has it as the second. llvm-svn: 154797	2012-04-16 06:26:15 +00:00
Craig Topper	1b15347812	Merge vpermps/vpermd and vpermpd/vpermq SD nodes. llvm-svn: 154782	2012-04-16 00:41:45 +00:00
Craig Topper	788250eec1	Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors. llvm-svn: 154778	2012-04-15 22:43:31 +00:00
Nadav Rotem	2a4e2ef10c	Fix PR12529. The Vxx family of instructions are only supported by AVX. Use non-vex instructions for SSE4. llvm-svn: 154770	2012-04-15 19:36:44 +00:00
Elena Demikhovsky	92fb3e613e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
Craig Topper	448790d566	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Nadav Rotem	c922b4f2a3	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Eric Christopher	f8886e8f48	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Nadav Rotem	74f87a6bd8	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Craig Topper	a6412fb8c0	Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1. llvm-svn: 154272	2012-04-07 22:32:29 +00:00
Craig Topper	1ddf62dc2c	Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns. llvm-svn: 154268	2012-04-07 21:57:43 +00:00
Craig Topper	ce6c05e0df	Add support for AVX enhanced comparison predicates. Patch from Kay Tiong Khoo. llvm-svn: 153935	2012-04-03 05:20:24 +00:00
Chad Rosier	17f25ea47b	[avx] Add patterns for combining vextractf128 + vmovaps/vmovups/vmobdqu to vextractf128 with 128-bit mem dest. Combines vextractf128 $0, %ymm0, %xmm0 vmovaps %xmm0, (%rdi) to vextractf128 $0, %ymm0, (%rdi) rdar://11082570 llvm-svn: 153139	2012-03-20 21:43:40 +00:00
Chad Rosier	f6d522341c	[avx] Add the AddedComplexity to the VINSERTI128 avx2 patterns to give precedence over the VINSERTF128 avx1 patterns. llvm-svn: 153114	2012-03-20 19:45:07 +00:00
Chad Rosier	73d8191b27	Whitespace. llvm-svn: 153105	2012-03-20 18:38:33 +00:00
Chad Rosier	ffd2dbd676	[avx] Move the vextractf128 patterns closer to the vextractf128 def. Remove whitespace from test case. No functional change intended. llvm-svn: 153103	2012-03-20 18:24:55 +00:00
Chad Rosier	143f33dc92	[avx] Adjust the VINSERTF128rm pattern to allow for unaligned loads. This results in things such as vmovups 16(%rdi), %xmm0 vinsertf128 $1, %xmm0, %ymm0, %ymm0 to be combined to vinsertf128 $1, 16(%rdi), %ymm0, %ymm0 rdar://11076953 llvm-svn: 153092	2012-03-20 17:08:51 +00:00
Chad Rosier	bd3e55d39c	[avx] Add patterns for VINSERTF128rm. This results in things such as vmovaps -96(%rbx), %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 to be combined to vinsertf128 $1, -96(%rbx), %ymm0, %ymm0 rdar://10643481 llvm-svn: 152762	2012-03-15 00:45:30 +00:00
Kay Tiong Khoo	aaa4140718	*fix typo in comment; test of commit access llvm-svn: 152507	2012-03-10 21:29:49 +00:00
Chad Rosier	a10cf5e1b9	Fix a regression from r147481. Original commit message from r147481: DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. Fix: Unaligned loads need to generate a vmovups. rdar://10974078 llvm-svn: 152366	2012-03-09 02:00:48 +00:00
Preston Gurd	e0609ed607	This patch adds instruction latencies for the SSE instructions to the instruction scheduler for the Intel Atom. llvm-svn: 151590	2012-02-27 23:35:03 +00:00
Pete Cooper	135769381b	Turn avx insert intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove duplicate patterns for selecting the intrinsics llvm-svn: 151342	2012-02-24 03:51:49 +00:00
Jia Liu	6bb2f0f0e4	some comment fix for X86 and ARM llvm-svn: 150902	2012-02-19 02:03:36 +00:00
Jia Liu	b077b6085d	Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878	2012-02-18 12:03:15 +00:00
Craig Topper	a25b84d986	Remove the last of the old vector_shuffle patterns from X86 isel. llvm-svn: 150795	2012-02-17 07:02:34 +00:00
Craig Topper	a754fb54b1	Move old movl vector_shuffle patterns. Not needed anymore since vector_shuffles shouldn't reach isel. llvm-svn: 150462	2012-02-14 08:14:53 +00:00
Craig Topper	f70ffc68b2	Still more vector_shuffle pattern removal. llvm-svn: 150365	2012-02-13 07:23:41 +00:00
Craig Topper	3073033e59	Remove more vector_shuffle patterns for unpack. These should be target specific nodes when they get to isel. llvm-svn: 150363	2012-02-13 05:48:49 +00:00
Craig Topper	5cb2de69d8	Recommit r150328. Previous test failures should be fixed by r150360. llvm-svn: 150362	2012-02-13 05:10:10 +00:00
NAKAMURA Takumi	50b0952aa9	Revert r150328, "Remove more vector_shuffle patterns." It caused 3 failures on pre-penryn and non-x86(generic) hosts. llvm-svn: 150357	2012-02-13 00:10:15 +00:00
Craig Topper	76547b82f2	Remove more vector_shuffle patterns. llvm-svn: 150328	2012-02-12 08:14:35 +00:00
Craig Topper	96a1a31c5c	Remove more vector_shuffle patterns. llvm-svn: 150321	2012-02-12 01:07:34 +00:00
Craig Topper	7c85e63244	Remove more vector_shuffle patterns. llvm-svn: 150314	2012-02-11 23:31:01 +00:00
Craig Topper	59398d7eb3	Remove some patterns for matching vector_shuffle instructions since vector_shuffles should be custom lowered before isel. llvm-svn: 150299	2012-02-11 07:43:35 +00:00
Craig Topper	f19c739d2d	Remove a couple unneeded intrinsic patterns llvm-svn: 150067	2012-02-08 08:29:30 +00:00
Craig Topper	6dbd5e534c	Remove GCC builtins for vpermilp* intrinsics as clang no longer needs them. Custom lower the intrinsics to the vpermilp target specific node and remove intrinsic patterns. llvm-svn: 150060	2012-02-08 06:36:57 +00:00
Craig Topper	a8a69356e1	Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD. llvm-svn: 149968	2012-02-07 06:28:42 +00:00
Craig Topper	07b9d056fa	Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies. llvm-svn: 149807	2012-02-05 03:14:49 +00:00
Elena Demikhovsky	7ca11b6e3f	Optimization for SIGN_EXTEND operation on AVX. Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32 extensions. llvm-svn: 149600	2012-02-02 09:10:43 +00:00
Andrew Trick	d09b64fc25	Instruction scheduling itinerary for Intel Atom. Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT. Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches. Adds a test to verify that the scheduler is working. Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP. Patch by Preston Gurd! llvm-svn: 149558	2012-02-01 23:20:51 +00:00
Craig Topper	9a8c6c1633	Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary. llvm-svn: 149232	2012-01-30 07:50:31 +00:00
Craig Topper	9259da4826	Move some patterns back near their instructions and use AddedComplexity to fix priority. Merge some patterns into their instruction definition. llvm-svn: 149122	2012-01-27 07:09:40 +00:00
Victor Umansky	bf35274368	Fix for the following bug in AVX codegen for double-to-int conversions: . "fptosi" and "fptoui" IR instructions are defined with round-to-zero rounding mode. . Currently for AVX mode for <4xdouble> and <8xdouble> the "VCVTPD2DQ.128" and "VCVTPD2DQ.256" instructions are selected (for .fp_to_sint. DAG node operation ) by AVX codegen. However they use round-to-nearest-even rounding mode. . Consequently, the conversion produces incorrect numbers. The fix is to replace selection of VCVTPD2DQ instructions with VCVTTPD2DQ instructions. The latter use truncate (i.e. round-to-zero) rounding mode. As .fp_to_sint. DAG node operation is used only for lowering of "fptosi" and "fptoui" IR instructions, the fix in X86InstrSSE.td definition file doesn.t have an impact on other LLVM flows. The patch includes changes in the .td file, LIT test for the changes and a fix in a legacy LIT test (which produced asm code conflicting with LLVN IR spec). llvm-svn: 149056	2012-01-26 08:51:39 +00:00
Craig Topper	814a037ac6	Fix AVX vs SSE patterns ordering issue for VPCMPESTRM and VPCMPISTRM. llvm-svn: 149053	2012-01-26 07:31:30 +00:00
Craig Topper	7ed643d290	Remove some more patterns by custom lowering intrinsics to target specific nodes. llvm-svn: 149052	2012-01-26 07:18:03 +00:00
Craig Topper	c2b030401c	Custom lower PSIGN and PSHUFB intrinsics to their corresponding target specific nodes so we can remove the isel patterns. llvm-svn: 148933	2012-01-25 06:43:11 +00:00
Craig Topper	9edaa5c15a	Custom lower phadd and phsub intrinsics to target specific nodes. Remove the patterns that are no longer necessary. llvm-svn: 148927	2012-01-25 05:37:32 +00:00
Craig Topper	c968a4ccc9	Remove AVX 256-bit unaligned load intrinsics. 128-bit versions had been removed a while ago. llvm-svn: 148922	2012-01-25 04:42:03 +00:00
Craig Topper	f4166abe42	Merge intrinsic pattern and no pattern versions of VCVTSD2SI intruction definitions. Matches non-AVX version of same instructions. llvm-svn: 148914	2012-01-25 03:52:09 +00:00
Craig Topper	606872615f	Custom lower PCMPEQ/PCMPGT intrinsics to target specific nodes and remove the intrinsic patterns. llvm-svn: 148687	2012-01-23 08:18:28 +00:00
Craig Topper	360c9f28cf	Custom lower vector shift intrinsics to target specific nodes and remove the patterns that are no longer needed. llvm-svn: 148684	2012-01-23 06:16:53 +00:00
Craig Topper	03b49e88a2	Remove pattern fragments for v32i8, v16i16, v8i32, v16i8, v8i16, and v4i32 loads. All integer vector loads are promoted to v2i64 or v4i64 so these pattern fragments can never match. Fix or remove patterns that used these fragments. llvm-svn: 148672	2012-01-23 00:06:44 +00:00
Craig Topper	b80bb890b6	Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching. llvm-svn: 148670	2012-01-22 23:36:02 +00:00
Craig Topper	558395cb4e	Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching. llvm-svn: 148667	2012-01-22 22:42:16 +00:00
Craig Topper	2b6951f7c4	Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead. llvm-svn: 148664	2012-01-22 19:15:14 +00:00
Craig Topper	bb07c0da2d	Move some vector shift patterns into their instruction definitions. llvm-svn: 148643	2012-01-22 00:41:20 +00:00
Craig Topper	63c59673a3	Add memory patterns for some of the fp<->integer conversion instructions. Fold some patterns into instruction definitions. llvm-svn: 148641	2012-01-21 18:37:15 +00:00
Craig Topper	e2d3f3060d	Add support for selecting 256-bit PALIGNR. llvm-svn: 148532	2012-01-20 05:53:00 +00:00
Craig Topper	d1f51dc860	Give priority to AVX over SSE for 128-bit floating point unpck instructions. llvm-svn: 148233	2012-01-16 09:56:42 +00:00
Craig Topper	0c4ab86d2c	Fix the memop type on a couple 256-bit AVX instructions that were using f128mem instead of f256mem. llvm-svn: 148196	2012-01-14 18:29:57 +00:00
Chad Rosier	4a705ae81a	Fix pasto from r146196. llvm-svn: 148167	2012-01-14 01:50:21 +00:00
Craig Topper	c1e3d46e07	Convert SHUFPD with the same register for both sources to PSHUFD if it would prevent a register copy. Similar to SHUFPS, but requires the mask to be converted. llvm-svn: 148112	2012-01-13 09:21:41 +00:00
Craig Topper	e52c0484de	Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32. llvm-svn: 148108	2012-01-13 08:12:35 +00:00
Craig Topper	71ea42cc29	Add patterns for v16i16 and v32i8 immAllZerosV to select VPXOR to match v4i64 and v8i32. llvm-svn: 148106	2012-01-13 06:59:47 +00:00
Chad Rosier	7bab07a5f1	Add missing VEX predicates to VMOVSDto64rr/VMOVSDto64mr. This fixes a few failing test cases on our internal AVX nightly tester. rdar://10663637 llvm-svn: 147881	2012-01-10 22:14:06 +00:00
Craig Topper	c9756440ea	Instruction selection priority fixes to remove the XMM/XMMInt/orAVX predicates. Another commit will remove orAVX functions from X86SubTarget. llvm-svn: 147841	2012-01-10 06:30:56 +00:00
Craig Topper	915bc4edaa	Add HasAVX predicate to some of the AVX patterns. llvm-svn: 147769	2012-01-09 08:34:00 +00:00
Craig Topper	2710d2dadb	Reorder a bunch of patterns to put the AVX version first thus giving it priority over the SSE version. Another step towards trying to remove the AVX hack that disables SSE from X86Subtarget. llvm-svn: 147768	2012-01-09 08:10:38 +00:00
Craig Topper	ee2dabebe3	Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing. llvm-svn: 147767	2012-01-09 06:52:46 +00:00
Craig Topper	8ce58f4687	Mark MOVNTI as being supported in SSE2 OR AVX mode. This instruction has no AVX equivalent so we should use the SSE version. llvm-svn: 147766	2012-01-09 06:38:55 +00:00
Craig Topper	bd6f3004ba	Move SSE2 logical operations PAND/POR/PXOR/PANDN above SSE1 logical operations ANDPS/ORPS/XORPS/ANDNPS. This fixes a pattern ordering issue that meant that the SSE2 instructions could never be directly selected since the SSE1 patterns would always match first. This is largely moot with the ExeDepsFix pass, but I'm trying to audit for all such ordering issues. llvm-svn: 147765	2012-01-09 05:07:01 +00:00
Chad Rosier	afcaa8f38a	Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481	2012-01-03 21:05:52 +00:00
Craig Topper	69e0a09d8f	Make CanXFormVExtractWithShuffleIntoLoad reject loads with multiple uses. Also make it return false if there's not even a load at all. This makes the code better match the code in DAGCombiner that it tries to match. These two changes prevent some cases where vector_shuffles were making it to instruction selection and causing the older shuffle selection code to be triggered. Also needed to fix a bad pattern that this change exposed. This is the first step towards getting rid of the old shuffle selection support. No test cases yet because there's no way to tell whether a shuffle was handled in the legalize stage or at instruction selection. llvm-svn: 147428	2012-01-02 08:46:48 +00:00
Craig Topper	d8ae2d9f27	Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers. llvm-svn: 147409	2012-01-01 19:40:22 +00:00
Craig Topper	ef59fe1ad4	Merge X86 SHUFPS and SHUFPD node types. llvm-svn: 147394	2011-12-31 23:50:21 +00:00
Craig Topper	0311c45aed	Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load. llvm-svn: 147393	2011-12-31 23:24:49 +00:00
Craig Topper	c01ce759d7	Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected. llvm-svn: 147392	2011-12-31 23:15:11 +00:00
Craig Topper	04b3b369de	Remove the separate explicit AES instruction patterns. They are equivalent to the patterns specified by the instructions. Also remove unnecessary bitconverts from the AES patterns. llvm-svn: 147342	2011-12-29 17:41:56 +00:00
Chad Rosier	98251404f7	Fix 80-column violations. llvm-svn: 147095	2011-12-21 20:59:09 +00:00
Elena Demikhovsky	b37883fe87	This is the second fix related to VZEXT_MOVL node. The failure that I see in the current version is: LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14] 0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13] 0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12] 0x18b9870: v4i64 = undef [ID=4] 0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10] 0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9970: i32 = Constant<0> [ID=3] 0x18b9170: v2i64 = undef [ORD=1] [ID=1] 0x18b9570: i32 = Constant<2> [ID=5] llvm-svn: 146975	2011-12-20 13:34:28 +00:00
Eli Friedman	f626b19bda	Make sure we correctly note the existence of an i8 immediate for vblendvps and friends, so we compute fixups correctly. PR11586. llvm-svn: 146709	2011-12-15 23:46:18 +00:00
Chad Rosier	62ebee9859	Add missing zmovl AVX patterns which were causing crashes. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146689	2011-12-15 22:11:31 +00:00
Benjamin Kramer	06cd66b1d7	X86: Add patterns for the various rounding ops for SSE4.1 and AVX. llvm-svn: 146257	2011-12-09 15:44:03 +00:00
Benjamin Kramer	66bfc0739d	X86: Split (v)rounds[sd] into a normal and an intrinsic version. llvm-svn: 146256	2011-12-09 15:43:55 +00:00
Evan Cheng	ad8debd736	Add 256-bit variant vmovss and vmovsd patterns. rdar://10538417 llvm-svn: 146196	2011-12-08 22:30:45 +00:00
Evan Cheng	d8a73b8918	Add various missing AVX patterns which was causing crashes. Sadly, the generated code looks pretty bad compared to SSE. rdar://10538793 llvm-svn: 146191	2011-12-08 22:05:28 +00:00
Evan Cheng	93e29adc2f	Many of the SSE patterns should not be selected when AVX is available. This led to the following code in X86Subtarget.cpp if (HasAVX) X86SSELevel = NoMMXSSE; This is so patterns that are predicated on hasSSE3, etc. would not be selected when avx is available. Instead, the AVX variant is selected. However, this breaks instructions which do not have AVX variants. The right way to fix this is for the SSE but not-AVX patterns to predicate on something like hasSSE3() && !hasAVX(). Then we can take out the hack in X86Subtarget.cpp. Patterns which do not have AVX variants do not need to change. However, we need to audit all the patterns before we make the change. This patch is workaround that fixes one specific case, the prefetch instructions. rdar://10538297 llvm-svn: 146163	2011-12-08 19:00:42 +00:00
Craig Topper	6b3cc1405f	Fix a bunch of SSE/AVX patterns to use proper memop types. In particular, not using integer loads other than v2i64/v4i64 since the others are all promoted. llvm-svn: 146031	2011-12-07 08:30:53 +00:00
Craig Topper	8b05e7d035	Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those. llvm-svn: 145927	2011-12-06 09:04:59 +00:00
Craig Topper	846d53deed	Merge floating point and integer UNPCK X86ISD node types. llvm-svn: 145926	2011-12-06 08:21:25 +00:00
Craig Topper	0ac9bb8aa1	Merge VPERM2F128/VPERM2I128 ISD node types. llvm-svn: 145485	2011-11-30 07:47:51 +00:00
Craig Topper	43b885cff4	Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. llvm-svn: 145483	2011-11-30 06:25:25 +00:00
Evan Cheng	5c1efd630b	Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins. llvm-svn: 145448	2011-11-29 22:48:34 +00:00
Jakob Stoklund Olesen	5d6a4584d9	Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions. Like V_SET0, these instructions are expanded by ExpandPostRA to xorps / vxorps so they can participate in execution domain swizzling. This also makes the AVX variants redundant. llvm-svn: 145440	2011-11-29 22:27:25 +00:00
Elena Demikhovsky	735cff1fa8	Fixed vsqrt.ss intrinsic usage - order of input operands was wrong. Added a test. Thanks Bruno for reviewing the patch. llvm-svn: 145403	2011-11-29 15:00:45 +00:00
Craig Topper	4550fc2649	Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390	2011-11-29 07:49:05 +00:00
Craig Topper	aca91b9f14	Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled. llvm-svn: 145376	2011-11-29 05:37:58 +00:00
Craig Topper	a6c1d25798	Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2. llvm-svn: 145370	2011-11-29 03:57:34 +00:00
Evan Cheng	1ed975b097	Add missing avx pattern. llvm-svn: 145272	2011-11-28 20:27:23 +00:00
Craig Topper	6f5a0bc4e3	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Craig Topper	563854a230	Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. llvm-svn: 145153	2011-11-26 22:55:48 +00:00
Craig Topper	65f8dcdb7d	Collapse X86ISD node types for PUNPCKH, PUNPCKL, UNPCKLP, and UNPCKHP to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type. llvm-svn: 145148	2011-11-26 20:47:44 +00:00
Craig Topper	e761f42368	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Craig Topper	7cf04d32e9	Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish. llvm-svn: 145125	2011-11-24 22:20:08 +00:00
Craig Topper	866214a486	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	14cedf481a	Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled. llvm-svn: 145026	2011-11-21 06:57:39 +00:00
Craig Topper	e878c775cf	Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine. llvm-svn: 145005	2011-11-20 00:12:05 +00:00
Craig Topper	6ed413c495	Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled. llvm-svn: 145004	2011-11-19 22:34:59 +00:00
Craig Topper	3e24dc25b2	Remove some of the special classes that worked around an old tablegen limitation of not being able to remove redundant bitconverts from patterns. llvm-svn: 145003	2011-11-19 21:01:54 +00:00
Craig Topper	c6a4cbdc04	Custom lower AVX2 variable shift intrinsics to shl/srl/sra nodes and remove the intrinsic patterns. llvm-svn: 144999	2011-11-19 17:46:46 +00:00
Craig Topper	a64e2604a2	Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors. llvm-svn: 144989	2011-11-19 09:02:40 +00:00
Craig Topper	117ffc9a0c	Collapse X86 PSIGNB/PSIGNW/PSIGND node types. llvm-svn: 144988	2011-11-19 07:33:10 +00:00
Craig Topper	536f9d9434	Extend VPBLENDVB and VPSIGN lowering to work for AVX2. llvm-svn: 144987	2011-11-19 07:07:26 +00:00
Craig Topper	0deee76383	Remove unused parameters from the AVX maskmov classes. llvm-svn: 144985	2011-11-19 04:49:22 +00:00
Nadav Rotem	08f8a75c2c	Add AVX2 vpbroadcast support llvm-svn: 144967	2011-11-18 02:49:55 +00:00
Craig Topper	7297509c73	Fix SSE/AVX integer comparison patterns to understand that all integer vector loads are promoted to i64 vector loads so patterns need a bitconvert. Also slightly simplify the AVX2 variable shift patterns by using the predefined bitconvert pattern fragments. llvm-svn: 144896	2011-11-17 07:49:38 +00:00
Craig Topper	4d39196041	Remove seemingly unnecessary duplicate VROUND definitions. llvm-svn: 144885	2011-11-17 07:04:00 +00:00
Evan Cheng	5bae2333cb	Another missing X86ISD::MOVLPD pattern. rdar://10450317 llvm-svn: 144839	2011-11-16 22:24:44 +00:00
Craig Topper	7a4d482aaa	Fix the execution domain on a bunch of SSE/AVX instructions. llvm-svn: 144784	2011-11-16 07:30:46 +00:00
Evan Cheng	2034ff3b0b	Add a missing pattern for X86ISD::MOVLPD. rdar://10436044 llvm-svn: 144566	2011-11-14 20:35:52 +00:00
Craig Topper	e0b34012db	Add neverHasSideEffects, mayLoad, and mayStore to many patternless SSE/AVX instructions. Remove MMX check from LowerVECTOR_SHUFFLE since MMX vector types won't go through it anyway. llvm-svn: 144522	2011-11-14 06:46:21 +00:00
Craig Topper	0458cdf64a	Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code. llvm-svn: 144457	2011-11-12 09:58:49 +00:00
Craig Topper	50df7c3842	Add lowering for AVX2 shift instructions. llvm-svn: 144380	2011-11-11 07:39:23 +00:00
Nadav Rotem	e3d8f1a069	AVX2: Add variable shift from memory. Note: These patterns only works in some cases because many times the load sd node is bitcasted from a load node of a different type. llvm-svn: 144266	2011-11-10 06:54:20 +00:00
Nadav Rotem	ddc6bfa543	AVX2: Add patterns for variable shift operations llvm-svn: 144212	2011-11-09 21:22:13 +00:00
Nadav Rotem	e66a72a2c4	Add AVX2 support for vselect of v32i8 llvm-svn: 144187	2011-11-09 13:21:28 +00:00
Craig Topper	7ff77dc2b1	Add instruction selection for AVX2 integer comparisons. llvm-svn: 144176	2011-11-09 08:06:13 +00:00
Evan Cheng	4a63100fe3	Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222 llvm-svn: 144052	2011-11-08 00:31:58 +00:00
Craig Topper	7eab73f510	Add AVX2 variable shift instructions and intrinsics. llvm-svn: 143915	2011-11-07 08:26:24 +00:00
Craig Topper	b1ef950217	Add AVX2 VPMOVMASK instructions and intrinsics. llvm-svn: 143904	2011-11-07 03:20:35 +00:00
Craig Topper	d422190c0f	Add AVX2 VEXTRACTI128 and VINSERTI128 instructions. Fix VPERM2I128 to be qualified with HasAVX2 instead of HasAVX. Mark VINSERTF128 and VEXTRACTF128 as never having side effects. llvm-svn: 143902	2011-11-07 02:00:04 +00:00
Craig Topper	01b852b95a	More AVX2 instructions and their intrinsics. llvm-svn: 143895	2011-11-06 23:04:08 +00:00
Craig Topper	31b1d79474	Add more AVX2 instructions and intrinsics. llvm-svn: 143861	2011-11-06 06:12:20 +00:00
Craig Topper	80cdc1ee12	Add intrinsics for X86 vcvtps2ph and vcvtph2ps instructions llvm-svn: 143683	2011-11-04 06:59:49 +00:00
Craig Topper	124b2fd08c	Add new X86 AVX2 VBROADCAST instructions. llvm-svn: 143612	2011-11-03 07:35:53 +00:00
Craig Topper	a2a55bd0b4	More AVX2 instructions and intrinsics. llvm-svn: 143536	2011-11-02 06:54:17 +00:00
Craig Topper	c5482eb697	Add a bunch more X86 AVX2 instructions and their corresponding intrinsics. llvm-svn: 143529	2011-11-02 04:42:13 +00:00
Craig Topper	6eaf58df7c	Begin adding AVX2 instructions. No selection support yet other than intrinsics. llvm-svn: 143331	2011-10-31 02:15:10 +00:00
Jakob Stoklund Olesen	d7827f928d	V_SET0 has no side effects. TableGen will mark any pattern-less instruction as having unmodeled side effects. This is extra bad for V_SET0 which gets rematerialized a lot. This was part of the cause for PR11125, but the real bug was fixed in r141923. llvm-svn: 141924	2011-10-14 00:39:50 +00:00
Craig Topper	0d25fa802f	Add 'implicit EFLAGS' to patterns for popcnt and lzcnt llvm-svn: 141853	2011-10-13 06:18:52 +00:00
Craig Topper	881d972428	Add HasPOPCNT predicate to the POPCNT instructions. Also mark POPCNT as modifying EFLAGS. llvm-svn: 141656	2011-10-11 07:13:09 +00:00
Craig Topper	db2d702bff	Make Ivy Bridge 16-bit floating point conversion instructions require AVX. llvm-svn: 141654	2011-10-11 07:01:37 +00:00
Craig Topper	9b7ab95570	Add Ivy Bridge 16-bit floating point conversion instructions for the X86 disassembler. llvm-svn: 141505	2011-10-09 07:31:39 +00:00
Craig Topper	9d32602cfd	Add support in the disassembler for ignoring the L-bit on certain VEX instructions. Mark instructions that have this behavior. Fixes PR10676. llvm-svn: 141065	2011-10-04 06:30:42 +00:00
Craig Topper	df04bee9b2	Add support for MOVBE and RDRAND instructions for the assembler and disassembler. Includes feature flag checking, but no instrinsic support. Fixes PR10832, PR11026 and PR11027. llvm-svn: 141007	2011-10-03 17:28:23 +00:00
Jakob Stoklund Olesen	76da38e8e8	Expand the x86 V_SET0* pseudos right after register allocation. This also makes it possible to reduce the number of pseudo instructions and get rid of the encoding information. llvm-svn: 140776	2011-09-29 05:10:54 +00:00
Duncan Sands	6d3fe8d11a	Implement Chris's suggestion of legalizing the various SSE and AVX hadd/hsub intrinsics into the new fhadd/fhsub X86 node. llvm-svn: 140383	2011-09-23 16:10:22 +00:00
Duncan Sands	1da590b589	Synthesize SSE3/AVX 128 bit horizontal add/sub instructions from floating point add/sub of appropriate shuffle vectors. Does not synthesize the 256 bit AVX versions because they work differently. llvm-svn: 140332	2011-09-22 20:15:48 +00:00
Bruno Cardoso Lopes	629e7c2410	Revert r140097, working on a better approach llvm-svn: 140203	2011-09-20 23:19:29 +00:00
Bruno Cardoso Lopes	906f64c461	The wrong relocation was being emitted for several SSSE3 instructions. This fixes PR10963. Thanks to Benjamin for finding the wrong tablegen declaration. llvm-svn: 140184	2011-09-20 21:39:21 +00:00
Bruno Cardoso Lopes	de0dc10d6d	Fix PR10949. Fix the encoding of VMOVPQIto64rr. llvm-svn: 140098	2011-09-19 23:36:59 +00:00
Bruno Cardoso Lopes	7cf7f02c3d	Based on the small opt Zvi's patch was trying to achieve, eliminate 128-bit undef subvector insertion into a 256-bit vector llvm-svn: 140097	2011-09-19 23:36:50 +00:00
Bruno Cardoso Lopes	9e5ef44daf	Match X86ISD::FSETCCsd and X86ISD::FSETCCss while in AVX mode. This fix PR10955 and PR10948. llvm-svn: 140069	2011-09-19 21:29:24 +00:00
Bruno Cardoso Lopes	f611f6c371	Describe more AVX 128-bit convert instructions without patterns to have mayLoad = 1 llvm-svn: 139973	2011-09-16 23:41:29 +00:00
Bruno Cardoso Lopes	396b8136bf	Add mayLoad attribute to AVX convert instructions, since non of them are declared with load patterns. This fix the crash in PR10941. No testcases, since a fold is triggered and then converted back to the register form afterwards. llvm-svn: 139953	2011-09-16 22:02:14 +00:00
Craig Topper	60719c7bfb	Fix mem type for VEX.128 form of VROUNDP*. Remove filter preventing VROUND from being recognized by disassembler. llvm-svn: 139691	2011-09-14 06:41:26 +00:00
Bruno Cardoso Lopes	27a7ace4b4	Teach the foldable tables about 128-bit AVX instructions and make the alignment check for 256-bit classes more strict. There're no testcases but we catch more folding cases for AVX while running single and multi sources in the llvm testsuite. Since some 128-bit AVX instructions have different number of operands than their SSE counterparts, they are placed in different tables. 256-bit AVX instructions should also be added in the table soon. And there a few more 128-bit versions to handled, which should come in the following commits. llvm-svn: 139687	2011-09-14 02:36:58 +00:00
Nadav Rotem	f1730712f7	swap vselect operand order - pr10907 llvm-svn: 139630	2011-09-13 19:56:38 +00:00
Bruno Cardoso Lopes	f02589db47	Add versions 256-bit versions of alignedstore and alignedload, to be more strict about the alignment checking. This was found by inspection and I don't have any testcases so far, although the llvm testsuite runs without any problem. llvm-svn: 139625	2011-09-13 19:33:03 +00:00
Craig Topper	03c833ff84	Remove filter that was preventing MOVDQU/MOVDQA and their VEX forms from being disassembled. Also added encodings for the other register/register form of these instructions. Fixes PR10848. llvm-svn: 139588	2011-09-13 06:54:58 +00:00
Craig Topper	6eeb5396f8	Fix encoding of VMOVDQU to not simultaneously be 'TB OpSize' and 'XS'. 'XS' is correct and seems to have been taking priority. llvm-svn: 139587	2011-09-13 06:39:34 +00:00
Bruno Cardoso Lopes	a4d2bdfa40	Fix PR10845. SUBREG_TO_REG shouldn't be used when the input and destination types are equal! llvm-svn: 139553	2011-09-12 22:59:23 +00:00
Bruno Cardoso Lopes	e2fc394ed2	Organize a bit the operand names for CMPPS and CMPPD llvm-svn: 139527	2011-09-12 19:30:36 +00:00
Bruno Cardoso Lopes	fc1c90ac48	Realign BLEND patterns to match the general style for patterns in .td file. llvm-svn: 139526	2011-09-12 19:30:33 +00:00
Bruno Cardoso Lopes	f0e65e0f13	Fix 80-columns llvm-svn: 139525	2011-09-12 19:30:29 +00:00
Nadav Rotem	06ce2ac074	Format patterns, remove unused X86blend patterns llvm-svn: 139491	2011-09-12 08:41:50 +00:00
Craig Topper	5ffd0cb080	Fix disassembling of one of the register/register forms of MOVUPS/MOVUPD/MOVAPS/MOVAPD/MOVSS/MOVSD and their VEX equivalents. Fixes PR10877. llvm-svn: 139486	2011-09-11 23:19:54 +00:00
Nadav Rotem	abb5bb41d4	CR fixes per Bruno's request. Undo the changes from r139285 which added custom lowering to vselect. Add tablegen lowering for vselect. llvm-svn: 139479	2011-09-11 15:02:23 +00:00
Nadav Rotem	ccb46031e6	Implement vector-select support for avx256. Refactor the vblend implementation to have tablegen match the instruction by the node type llvm-svn: 139400	2011-09-09 20:29:17 +00:00
Bruno Cardoso Lopes	54962ac233	Add a AVX version of a simple i64 -> f64 bitcast. This could be triggered using llc with -O0, which wouldn't let it be folded and expose the lack of this pattern. llvm-svn: 139320	2011-09-08 21:52:33 +00:00
Bruno Cardoso Lopes	74a67e22b0	Add AVX versions of blend vector operations and fix some issues noticed in Nadav's r139285 and r139287 commits. 1) Rename vsel.ll to a more descriptive name 2) Change the order of BLEND operands to "Op1, Op2, Cond", this is necessary because PBLENDVB is already used in different places with this order, and it was being emitted in the wrong way for vselect 3) Add AVX patterns and tests for the same SSE41 instructions llvm-svn: 139305	2011-09-08 18:05:08 +00:00
Bruno Cardoso Lopes	84c53e3965	Fix PR10844: Add patterns to cover non foldable versions of X86vzmovl. Triggered using llc -O0. Also fix some SET0PS patterns to their AVX forms and test it on the testcase. llvm-svn: 139304	2011-09-08 18:05:02 +00:00
Nadav Rotem	b461f2190e	Add X86-SSE4 codegen support for vector-select. llvm-svn: 139285	2011-09-08 08:11:19 +00:00
Bruno Cardoso Lopes	02157d584a	Add AVX versions to match AESENC/AESDEC intrinsics. This hopefully ends the cycle of missing AVX counterparts of already present SSE* patterns llvm-svn: 139073	2011-09-03 00:47:08 +00:00
Bruno Cardoso Lopes	c72ce24240	Add AVX version of a SSE4.1 VPBLENDVB pattern llvm-svn: 139072	2011-09-03 00:47:05 +00:00
Bruno Cardoso Lopes	a25fc6f941	Add AVX versions of SSE4.1 EXTRACTPS patterns llvm-svn: 139071	2011-09-03 00:47:03 +00:00
Bruno Cardoso Lopes	45d02d5eca	Add AVX versions for SSE4.1 MOVZX* patterns llvm-svn: 139070	2011-09-03 00:47:01 +00:00
Bruno Cardoso Lopes	cadec3711c	Add one more AVX pattern for MOVZPQILo2PQI llvm-svn: 139069	2011-09-03 00:46:58 +00:00
Bruno Cardoso Lopes	48eeb79003	Move PUNPCKLQDQ splat pattern close to the instruction definition and duplicate it for AVX mode. llvm-svn: 139068	2011-09-03 00:46:56 +00:00
Bruno Cardoso Lopes	ca90af60bd	Add AVX pattern versions for PSHUFB,PSIGN{B,W,D} llvm-svn: 139067	2011-09-03 00:46:54 +00:00
Bruno Cardoso Lopes	7fae5ca308	Add AVX versions of MOVZDI2PDI patterns. Use SUBREG_TO_REG to indicate that the AVX versions (even the 128-bit ones) all clear the upper part of the destination register. llvm-svn: 139066	2011-09-03 00:46:51 +00:00
Bruno Cardoso Lopes	e749426ece	Enforce subtarget checks in a few places to be explicit when the pattern should be matched llvm-svn: 139065	2011-09-03 00:46:49 +00:00
Bruno Cardoso Lopes	323a5b334e	Tidy up code moving patterns to their appropriate place! llvm-svn: 139064	2011-09-03 00:46:47 +00:00
Bruno Cardoso Lopes	ea1931b9d0	Add AVX versions of FsMOVAPS and FsMOVAPS. Teach X86InstrInfo how to use it! llvm-svn: 139063	2011-09-03 00:46:45 +00:00
Bruno Cardoso Lopes	86c67e11c9	Fix 80-column and style llvm-svn: 139061	2011-09-03 00:46:40 +00:00
Bruno Cardoso Lopes	beb7a448e7	Tidy up some SSE/AVX convert intrinsics. Also add an AVX version of OptForSize pattern llvm-svn: 139060	2011-09-03 00:46:38 +00:00
Bruno Cardoso Lopes	8771512b75	Move more code around and duplicate AVX patterns: MOVHPS and MOVLPS llvm-svn: 138897	2011-08-31 21:15:32 +00:00
Bruno Cardoso Lopes	22aceefbf7	Move MOVAPS,MOVUPS patterns close to the instructions definition llvm-svn: 138896	2011-08-31 21:15:29 +00:00
Bruno Cardoso Lopes	4823fe07e6	Remove "_Int" forms of MOVUPSmr and MOVAPSmr llvm-svn: 138895	2011-08-31 21:15:22 +00:00
Bruno Cardoso Lopes	5bd6e92f99	- Move all MOVSS and MOVSD patterns close to their definitions - Duplicate some store patterns to their AVX forms! - Catched a bug while restricting the patterns subtarget, fix it and update a testcase to check it properly llvm-svn: 138851	2011-08-31 03:04:20 +00:00
Bruno Cardoso Lopes	a9c2c56e13	Remove unnecessary AVX checks llvm-svn: 138850	2011-08-31 03:04:14 +00:00
Evan Cheng	bbabe9ff60	Fix (movhps load) lowering / pattern to match more cases. rdar://10050549 llvm-svn: 138848	2011-08-31 02:05:24 +00:00
Bruno Cardoso Lopes	3a09888a72	Move non-intruction patterns to a more appropriate place! llvm-svn: 138744	2011-08-29 17:51:24 +00:00
Craig Topper	b20cee1e19	Fix disassembling of VCVTSD2SI llvm-svn: 138623	2011-08-26 04:49:29 +00:00
Bruno Cardoso Lopes	e6119d18de	Do the same as r138461. Mark VZEROALL as clobbering all YMM registers llvm-svn: 138592	2011-08-25 22:23:58 +00:00
Bruno Cardoso Lopes	5b3d2c9e17	Add support for AVX 256-bit version of MOVDDUP! llvm-svn: 138588	2011-08-25 21:40:37 +00:00
Craig Topper	a6085b9757	Add more missing TB encodings to VEX instructions to allow them to be disassembled. Fixes remainder of PR10678. llvm-svn: 138553	2011-08-25 08:11:01 +00:00
Craig Topper	06ed6cb856	Add TB encoding to VEROALL, VZEROUPPER, and VCVTPS2PD to allow them to be disassembled. Fixes PR10723. llvm-svn: 138551	2011-08-25 06:57:46 +00:00
Bruno Cardoso Lopes	5d34219953	Add support for 256-bit versions of VSHUFPD and VSHUFPS. llvm-svn: 138546	2011-08-25 02:58:26 +00:00
Bruno Cardoso Lopes	dfa5cf4620	Create a section for non-instructions patterns in the beginning of the file, and move more code around! llvm-svn: 138521	2011-08-24 23:18:11 +00:00
Bruno Cardoso Lopes	719b357628	Move code around! llvm-svn: 138520	2011-08-24 23:18:09 +00:00
Bruno Cardoso Lopes	3824b766ac	Organize UNPCK* patterns, also add remaining for AVX. llvm-svn: 138519	2011-08-24 23:18:06 +00:00
Bruno Cardoso Lopes	82c8bc7efd	Move remaining MOVDDUP patterns close to MOVDDUP defintion and duplicate the missing ones for AVX. llvm-svn: 138518	2011-08-24 23:18:04 +00:00
Bruno Cardoso Lopes	d315a6b6e6	Organize and tidy up MOVDDUP section. Also update comments! llvm-svn: 138517	2011-08-24 23:18:02 +00:00
Bruno Cardoso Lopes	762fb13cc9	Move MOVHLPS patterns close to MOVHLPS definition, and duplicate the pattern for 128-bit AVX mode. llvm-svn: 138516	2011-08-24 23:17:59 +00:00
Bruno Cardoso Lopes	d62766849f	Move all PSHUF* patterns close to the PSHUF* definitions. Also be explicit about which subtarget they refer to, and add AVX versions of the ones we currently don't. Remove old and now wrong comments! llvm-svn: 138515	2011-08-24 23:17:57 +00:00
Bruno Cardoso Lopes	122f7cfc92	Move all SHUFP* patterns close to the SHUFP* definitions. Also be explicit about which subtarget they refer to, and add AVX versions of the ones we currently don't. Make the mask check more strict, to be clear it won't be used to match to 256-bit versions! llvm-svn: 138514	2011-08-24 23:17:55 +00:00
Bruno Cardoso Lopes	734febce18	Mark VZEROALL as clobbering all YMM registers llvm-svn: 138461	2011-08-24 18:48:33 +00:00
Bruno Cardoso Lopes	8959b54713	Fix a nasty bug where a v4i64 was being wrong emitted with 32-bit permutations. Also tidy up some patterns and make them close to their instruction definition! llvm-svn: 138392	2011-08-23 22:06:37 +00:00
Craig Topper	67b22aedb4	Add support for breaking 256-bit v16i16 and v32i8 VSETCC into two 128-bit ones, avoiding sclarization. Add vex form of pcmpeqq and pcmpgtq. Fixes more cases for PR10712. llvm-svn: 138321	2011-08-23 04:36:33 +00:00
Bruno Cardoso Lopes	23ff325f5b	Add 128-bit AVX codegen for PCMP* family of integer instructions llvm-svn: 138270	2011-08-22 20:31:00 +00:00
Craig Topper	f68d77215d	Add TB encoding to VEX versions of SSE fp logical operations to fix disassembler llvm-svn: 138034	2011-08-19 05:28:50 +00:00
Bruno Cardoso Lopes	0d458d4bb3	Re-encoded 128-bit AVX versions of SQRT, RSQRT, RCP have 3 operands instead of 2. They were already defined this way in their regular version, but not for the intrinsics versions (_Int), and that would work for assembly emission but not for object code, since a MachineOperand would be missing. This commit fix PR10697. Also removed the {VSQRT,VRSQRT,VRCP}r_Int forms and match the intrinsic via INSERT_SUBREG+EXTRACT_SUBREG patterns. The same couldn't be done for memory versions because sse_load_f32/sse_load_f64 operand need special handling and don't work like regular "addr" operands. There are right now 114 "_Int" and 98 "Int_*" forms! I'm slowly removing them as I step through, but hope we can get rid of these someday, they are really annoying :) llvm-svn: 138012	2011-08-18 23:59:21 +00:00
Bruno Cardoso Lopes	c174d8ac48	Cleanup vector logical ops in AVX and add use int versions for simple v2i64 llvm-svn: 137919	2011-08-18 02:11:34 +00:00
Bruno Cardoso Lopes	98531dfd08	Introduce matching patterns for vbroadcast AVX instruction. The idea is to match splats in the form (splat (scalar_to_vector (load ...))) whenever the load can be folded. All the logic and instruction emission is working but because of PR8156, there are no ways to match loads, cause they can never be folded for splats. Thus, the tests are XFAILed, but I've tested and exercised all the logic using a relaxed version for checking the foldable loads, as if the bug was already fixed. This should work out of the box once PR8156 gets fixed since MayFoldLoad will work as expected. llvm-svn: 137810	2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes	f026c60f3d	While I'm here, remove the "_alt" hacks to a series of INSERT_SUBREG and also add the AVX versions of the 128-bit patterns llvm-svn: 137685	2011-08-15 23:36:51 +00:00
Bruno Cardoso Lopes	1e817d1451	Reorder declarations of vmovmskp* and also put the necessary AVX predicate and TB encoding fields. This fix the encoding for the attached testcase. This fixes PR10625. llvm-svn: 137684	2011-08-15 23:36:45 +00:00
Bruno Cardoso Lopes	2d100ca13c	The VPERM2F128 is a AVX instruction which permutes between two 256-bit vectors. It operates on 128-bit elements instead of regular scalar types. Recognize shuffles that are suitable for VPERM2F128 and teach the x86 legalizer how to handle them. llvm-svn: 137519	2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes	17ae896095	Move code around and add comments llvm-svn: 137518	2011-08-12 21:48:22 +00:00
Bruno Cardoso Lopes	4106caa9af	Cleanup: Remove Int_ CVTSS2SI* forms llvm-svn: 137297	2011-08-11 02:52:36 +00:00
Bruno Cardoso Lopes	565ab1542a	The following X86 pattern is incorrect: def : Pat<(X86Movss VR128:$src1, (bc_v4i32 (v2i64 (load addr:$src2)))), (MOVLPSrm VR128:$src1, addr:$src2)>; This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner. llvm-svn: 137227	2011-08-10 17:45:17 +00:00
Bruno Cardoso Lopes	7461b930f3	Add v16i16 and v32i8 store patterns llvm-svn: 137166	2011-08-09 22:39:53 +00:00
Bruno Cardoso Lopes	028c6aa951	Use fp unpack instructions to unpack int types. Until we have AVX2, this is the best we can do for these patterns. This fix PR10554. llvm-svn: 137161	2011-08-09 22:18:37 +00:00
Bruno Cardoso Lopes	633400ee00	Reapply a more appropriate solution than in r137114. AVX supports v4f64 = sitofp v4i32. This fix PR10559. Also add support for v4i32 = fptosi v4f64. llvm-svn: 137128	2011-08-09 17:39:13 +00:00
Bruno Cardoso Lopes	d521431558	Add support for avx vector fextend llvm-svn: 137105	2011-08-09 03:04:29 +00:00
Bruno Cardoso Lopes	09a727298f	Add AVX versions of 128-bit sitofp and fptosi llvm-svn: 137104	2011-08-09 03:04:25 +00:00
Bruno Cardoso Lopes	1025d1eb3b	Add two patterns to match special vmovss and vmovsd cases. Also fix the patterns already there to be more strict regarding the predicate. This fixes PR10558 llvm-svn: 137100	2011-08-09 01:43:09 +00:00
Bruno Cardoso Lopes	d7eac41193	Make LowerVSETCC aware of AVX types and add patterns to match them. llvm-svn: 137090	2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes	771876cade	Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise the legalizer. This commit together with the two previous ones fixes PR10495. llvm-svn: 136654	2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes	473d982caf	Add v8i32 and v4i64 vpermil patterns llvm-svn: 136451	2011-07-29 01:31:07 +00:00
Bruno Cardoso Lopes	02bbf20b02	Cleanup PALIGNR handling and remove the old palign pattern fragment. Also make PALIGNR masks to don't match 256-bits, which isn't supported It's also a step to solve PR10489 llvm-svn: 136448	2011-07-29 01:30:59 +00:00
Bruno Cardoso Lopes	e24a043703	Add patterns to generate copies for extract_subvector instead of using vextractf128. This will reduce the number of issued instruction for several avx codes. llvm-svn: 136323	2011-07-28 01:26:50 +00:00
Bruno Cardoso Lopes	73945bf79a	movd/movq write zeros in the high 128-bit part of the vector. Use them to match 256-bit scalar_to_vector+zext. llvm-svn: 136322	2011-07-28 01:26:46 +00:00
Bruno Cardoso Lopes	1f63a37172	Add a few patterns to match allzeros without having to use the fp unit. Take advantage that the 128-bit vpxor zeros the higher part and use it. This also fixes PR10491 llvm-svn: 136321	2011-07-28 01:26:43 +00:00
Bruno Cardoso Lopes	06d8be564f	Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move a convert pattern close to the instruction definition. llvm-svn: 136320	2011-07-28 01:26:39 +00:00
Kevin Enderby	9adbbfffd0	Fix llvm-mc handing of x86 instructions that take 8-bit unsigned immediates. llvm-mc gives an "invalid operand" error for instructions that take an unsigned immediate which have the high bit set such as: pblendw $0xc5, %xmm2, %xmm1 llvm-mc treats all x86 immediates as signed values and range checks them. A small number of x86 instructions use the imm8 field as a set of bits. This change only changes those instructions and where the high bit is not ignored. The others remain unchanged. llvm-svn: 136287	2011-07-27 23:01:50 +00:00
Bruno Cardoso Lopes	8830fde434	The vpermilps and vpermilpd have different behaviour regarding the usage of the shuffle bitmask. Both work in 128-bit lanes without crossing, but in the former the mask of the high part is the same used by the low part while in the later both lanes have independent masks. Handle this properly and and add support for vpermilpd. llvm-svn: 136200	2011-07-27 00:56:34 +00:00
Bruno Cardoso Lopes	e53bb853ea	Recognize unpckh* masks and match 256-bit versions. The new versions are different from the previous 128-bit because they work in lanes. Update a few comments and add testcases llvm-svn: 136157	2011-07-26 22:03:40 +00:00
Bruno Cardoso Lopes	a493ad3938	Remove now unused patterns. 0 insertions(+), 98 deletions(-) llvm-svn: 136109	2011-07-26 18:22:39 +00:00
Bruno Cardoso Lopes	b24e958ffb	Cleanup old matching for PUNPCK* variants llvm-svn: 136108	2011-07-26 18:22:27 +00:00
Bruno Cardoso Lopes	ab40a57cce	Add 256-bit isel for movsldup/movshdup llvm-svn: 136051	2011-07-26 02:39:32 +00:00
Bruno Cardoso Lopes	cde45ac9ca	Add 128-bit AVX versions of movshdup/mosldup llvm-svn: 136048	2011-07-26 02:39:23 +00:00
Bruno Cardoso Lopes	25698b90e9	Cleanup movsldup/movshdup matching. 27 insertions(+), 62 deletions(-) llvm-svn: 136047	2011-07-26 02:39:13 +00:00
Bruno Cardoso Lopes	c94d6a2d2c	Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128 This also fixes PR10452 llvm-svn: 136004	2011-07-25 23:05:32 +00:00
Bruno Cardoso Lopes	f457bc8120	Add remaining 256-bit vector bitcasts. This also fixes PR10451 llvm-svn: 136003	2011-07-25 23:05:28 +00:00

... 3 4 5 6 7 ...

1079 Commits