llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00

Author	SHA1	Message	Date
Jakob Stoklund Olesen	85ef53eb8d	Ignore register mask operands when lowering instructions to MC. This is similar to implicit register operands. MC doesn't understand register liveness and call clobbers. llvm-svn: 148437	2012-01-18 23:52:19 +00:00
Devang Patel	ee49d825b1	Process instructions after match to select alternative encoding which may be more desirable. llvm-svn: 148431	2012-01-18 22:42:29 +00:00
Jim Grosbach	f88f0b08c4	Tidy up. MCAsmBackend naming conventions. llvm-svn: 148400	2012-01-18 18:52:16 +00:00
Jakob Stoklund Olesen	e9c53bc69b	Add a CoveredBySubRegs property to Register descriptions. When set, this bit indicates that a register is completely defined by the value of its sub-registers. Use the CoveredBySubRegs property to infer which super-registers are call-preserved given a list of callee-saved registers. For example, the ARM registers D8-D15 are callee-saved. This now automatically implies that Q4-Q7 are call-preserved. Conversely, Win64 callees save XMM6-XMM15, but the corresponding YMM6-YMM15 registers are not call-preserved because they are not fully defined by their sub-registers. llvm-svn: 148363	2012-01-18 00:16:39 +00:00
Jakob Stoklund Olesen	fbb3f38a1a	Move X86 callee saved register lists to the X86CallConv .td file. Add a trivial implementation of the getCallPreservedMask() hook. llvm-svn: 148347	2012-01-17 22:47:01 +00:00
Devang Patel	e03a4f051f	Intel syntax: Fix parser match class to check memory operand size. llvm-svn: 148338	2012-01-17 21:48:03 +00:00
Devang Patel	4585b536ee	Intel syntax: Parse "BYTE PTR [RDX + RCX]" llvm-svn: 148334	2012-01-17 21:25:10 +00:00
Devang Patel	91c178c1f7	Untabify. llvm-svn: 148322	2012-01-17 19:09:22 +00:00
Devang Patel	388aa8feb0	Intel syntax: Do not unncessarily create plus expression for memory operand displacement. llvm-svn: 148321	2012-01-17 19:08:07 +00:00
Devang Patel	b1e07175c4	Intel syntax: Robustify memory operand parsing. llvm-svn: 148312	2012-01-17 18:00:18 +00:00
Nadav Rotem	ab6d6e6d56	Fix warning. llvm-svn: 148301	2012-01-17 09:31:09 +00:00
Nadav Rotem	c460c870b2	Fix 11769. In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner. However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part. llvm-svn: 148298	2012-01-17 09:13:19 +00:00
Craig Topper	c1c438b32f	Remove unnecessary AVX check from an assert. hasSSE2 is enough. llvm-svn: 148295	2012-01-17 08:23:44 +00:00
Craig Topper	e230e9df7d	Fix a crasher when PerformShiftCombine receives a BUILD_VECTOR of all UNDEF. Probably could use better handling in DAG combine or getNode. Fixes PR11772. llvm-svn: 148285	2012-01-17 04:44:50 +00:00
Eli Friedman	a343d87eac	Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768. llvm-svn: 148240	2012-01-16 16:42:21 +00:00
Eli Friedman	a2b480b010	Get rid of unused codegen-only instruction. llvm-svn: 148239	2012-01-16 16:29:35 +00:00
Craig Topper	d1f51dc860	Give priority to AVX over SSE for 128-bit floating point unpck instructions. llvm-svn: 148233	2012-01-16 09:56:42 +00:00
Nadav Rotem	b36c029e3b	[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225	2012-01-15 19:27:55 +00:00
Benjamin Kramer	14443a8cf6	Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen. llvm-svn: 148218	2012-01-15 13:16:05 +00:00
Craig Topper	0c4ab86d2c	Fix the memop type on a couple 256-bit AVX instructions that were using f128mem instead of f256mem. llvm-svn: 148196	2012-01-14 18:29:57 +00:00
Craig Topper	ddc2e1091e	Add a bunch of AVX instructions to the folding tables. Also fixed the alignment on 256-bit AVX2 instructions. llvm-svn: 148194	2012-01-14 18:14:53 +00:00
Chad Rosier	4a705ae81a	Fix pasto from r146196. llvm-svn: 148167	2012-01-14 01:50:21 +00:00
Devang Patel	410d6214f9	Revert r148131, it was committed before it was ready. llvm-svn: 148134	2012-01-13 19:28:58 +00:00
Devang Patel	194ad5ead8	Refactor. llvm-svn: 148131	2012-01-13 19:12:18 +00:00
Craig Topper	c1e3d46e07	Convert SHUFPD with the same register for both sources to PSHUFD if it would prevent a register copy. Similar to SHUFPS, but requires the mask to be converted. llvm-svn: 148112	2012-01-13 09:21:41 +00:00
Craig Topper	e75115a861	use v8i32 as optimal mem type over v8f32 if AVX2 is enabled. Similar to SSE2 vs SSE1. llvm-svn: 148109	2012-01-13 08:32:21 +00:00
Craig Topper	e52c0484de	Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32. llvm-svn: 148108	2012-01-13 08:12:35 +00:00
Craig Topper	71ea42cc29	Add patterns for v16i16 and v32i8 immAllZerosV to select VPXOR to match v4i64 and v8i32. llvm-svn: 148106	2012-01-13 06:59:47 +00:00
Craig Topper	0e34a8e58c	Use 8i32 constant pool entry for converting AVX2_SETALLONES. Possibly fixes PR11750. llvm-svn: 148101	2012-01-13 06:12:41 +00:00
Craig Topper	32812741b7	Fix typo in PerformAddCombine that caused any vector type to be checked for horizontal add/sub if AVX2 is enabled. This caused an assert to fail for non 128/256-bit vectors when done before type legalizing. Fixes PR11749. llvm-svn: 148096	2012-01-13 05:04:25 +00:00
Bill Wendling	c8f27fdf02	Fix off-by-one error. llvm-svn: 148077	2012-01-13 00:41:53 +00:00
Bill Wendling	ce528914e7	Fix the code that was WRONG. The registers are placed into the saved registers list in the reverse order, which is why the original loop was written to loop backwards. llvm-svn: 148064	2012-01-12 23:05:03 +00:00
Elena Demikhovsky	beb66de0f9	Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed. Added a test. llvm-svn: 148044	2012-01-12 20:33:10 +00:00
Rafael Espindola	959adf57db	Support segmented stacks on 64-bit FreeBSD. This patch uses tcb_spare field in the tcb structure to store info. Patch by Jyun-Yan You. llvm-svn: 148041	2012-01-12 20:24:30 +00:00
Rafael Espindola	dda46f4081	Support segmented stacks on win32. Uses the pvArbitrary slot of the TIB, which is reserved for applications. We only support frames with a static size. llvm-svn: 148040	2012-01-12 20:22:08 +00:00
Devang Patel	d568674834	Rename X86ATTAsmParser -> X86AsmParser We are using one parser to parse att as well as intel style syntax. llvm-svn: 148032	2012-01-12 18:03:40 +00:00
Benjamin Kramer	7678692172	After Jakob's r147938 exception handling on i386 was completely broken. Restore the (obviously wrong) behavior from before r147938 without relying on undefined behavior. Add a fat FIXME note. This should fix nightly tester failures. llvm-svn: 148030	2012-01-12 17:37:18 +00:00
Nadav Rotem	618722de09	Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors. The attached testcase was stuck in an endless loop. llvm-svn: 148027	2012-01-12 15:31:55 +00:00
Benjamin Kramer	ae4ad5f924	X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63. llvm-svn: 148024	2012-01-12 12:41:34 +00:00
Devang Patel	2096c1a697	Add predicate method check match memory operand size, if available. In att style asm syntax memory operand size is derived from suffix attached with mnemonic. In intel style asm syntax it is part of memory operand hence predicate method check is required to select appropriate instruction. llvm-svn: 148006	2012-01-12 01:51:42 +00:00
Devang Patel	ed9a353b59	Add intel style operand parser skeleton. This is a work in progress. llvm-svn: 148002	2012-01-12 01:36:43 +00:00
Chandler Carruth	751438a272	Switch all of the uses of my InsertDAGNode helper to follow the exact same pattern. We already had this pattern is a few places, but others tried to make a rough approximation of an actual DAG structure. As not everywhere went to this trouble, nothing could rely on this being done. In fact, I've checked all references to these node Ids, and the ones that are using the topo-sort properties are actually satisfied with a strict-weak-ordering. The requirement appears to be that Use >= Def. I've added a big blurb of comments to this bit of the transform to clarify why the order is so important for the next reader of the code. I'm starting with this change as it is very small, and trivially reverted if something breaks or the >= above really does need to be >. If that proves the case, we can hide the problem by reverting this patch, but the problem exists elsewhere as well, and so a more comprehensive solution will be needed. llvm-svn: 148001	2012-01-12 01:34:44 +00:00
Rafael Espindola	cd3a456eef	Support segmented stacks on mac. This uses TLS slot 90, which actually belongs to JavaScriptCore. We only support frames with static size Patch by Brian Anderson. llvm-svn: 147960	2012-01-11 19:00:37 +00:00
Rafael Espindola	f7e3528d92	Generate the segmented stack prologue for fastcc too. Patch by Brian Anderson. llvm-svn: 147958	2012-01-11 18:41:19 +00:00
Chandler Carruth	e3facd7860	Revert r147945 which disabled an addressing mode transformation. I had hoped this would revive one of the llvm-gcc selfhost build bots, but it didn't so it doesn't appear that my transform is the culprit. If anyone else is seeing failures, please let me know! llvm-svn: 147957	2012-01-11 18:36:12 +00:00
Rafael Espindola	12313ca273	Use unsigned comparison in segmented stack prologue. This is a comparison of two addresses, and GCC does the comparison unsigned. Patch by Brian Anderson. llvm-svn: 147954	2012-01-11 18:23:35 +00:00
Rafael Espindola	dd2f5fe210	Explicitly set the scale to 1 on some segstack prologue instrs. Patch by Brian Anderson. llvm-svn: 147952	2012-01-11 18:14:03 +00:00
Jan Sjödin	cb98434ebe	Add XOP Intrinsics and tests llvm-svn: 147949	2012-01-11 15:20:20 +00:00
Nadav Rotem	01bf090299	Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead. llvm-svn: 147948	2012-01-11 14:07:51 +00:00
Chandler Carruth	e443e5e55d	Disable the transformation I added in r147936 to see if it fixes some strange build bot failures that look like a miscompile into an infloop. I'll investigate this tomorrow, but I'd both like to know whether my patch is the culprit, and get the bots back to green. llvm-svn: 147945	2012-01-11 12:17:47 +00:00
Chandler Carruth	9729a20760	Hoist a really redundant code pattern into a helper function, and delete lots of lines of code. No functionality changed. llvm-svn: 147942	2012-01-11 11:04:36 +00:00
Chandler Carruth	f44e3c7745	Simplify the AND-rooted mask+shift checking code to match that of the SRL-rooted code. llvm-svn: 147941	2012-01-11 09:35:04 +00:00
Chandler Carruth	5a0ea8a5fd	Unify the interface of the three mask+shift transform helpers, and factor the differences that were hiding in one of them into its other caller, the SRL handling code. No change in behavior. llvm-svn: 147940	2012-01-11 09:35:02 +00:00
Chandler Carruth	7d8eac052d	Clarify and make explicit some of the requirements for transforming mask+shift pairs at the beginning of the ISD::AND case block, and then hoist the final pattern into a helper function, simplifying and reflowing it appropriately. This should have no observable behavior change, but several simplifications fell out of this such as directly computing the new mask constant, etc. llvm-svn: 147939	2012-01-11 09:35:00 +00:00
Jakob Stoklund Olesen	9baf09c11a	Fix undefined code and reenable test case. I don't think the compact encoding code is right, but at least is has defined behavior now. llvm-svn: 147938	2012-01-11 09:08:04 +00:00
Chandler Carruth	068b0ca15a	Hoist the logic to transform shift+mask combinations into sub-register extracts and scaled addressing modes into its own helper function. No functionality changed here, just hoisting and layout fixes falling out of that hoisting. llvm-svn: 147937	2012-01-11 08:48:20 +00:00
Chandler Carruth	b3371fa250	Teach the X86 instruction selection to do some heroic transforms to detect a pattern which can be implemented with a small 'shl' embedded in the addressing mode scale. This happens in real code as follows: unsigned x = my_accelerator_table[input >> 11]; Here we have some lookup table that we look into using the high bits of 'input'. Each entity in the table is 4-bytes, which means this implicitly gets turned into (once lowered out of a GEP): (unsigned)((char)my_accelerator_table + ((input >> 11) << 2)); The shift right followed by a shift left is canonicalized to a smaller shift right and masking off the low bits. That hides the shift right which x86 has an addressing mode designed to support. We now detect masks of this form, and produce the longer shift right followed by the proper addressing mode. In addition to saving a (rather large) instruction, this also reduces stalls in Intel chips on benchmarks I've measured. In order for all of this to work, one part of the DAG needs to be canonicalized still further* than it currently is. This involves removing pointless 'trunc' nodes between a zextload and a zext. Without that, we end up generating spurious masks and hiding the pattern. llvm-svn: 147936	2012-01-11 08:41:08 +00:00
Lang Hames	11ff139f8f	Fixed order of operands in comment to match code. llvm-svn: 147890	2012-01-10 22:53:20 +00:00
Joerg Sonnenberger	fccad68f65	Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes. Add a test that checks the stack alignment of a simple function for Darwin, Linux and NetBSD for 32bit and 64bit mode. llvm-svn: 147888	2012-01-10 22:43:53 +00:00
Chad Rosier	7bab07a5f1	Add missing VEX predicates to VMOVSDto64rr/VMOVSDto64mr. This fixes a few failing test cases on our internal AVX nightly tester. rdar://10663637 llvm-svn: 147881	2012-01-10 22:14:06 +00:00
Bill Wendling	4a4be29bb5	For i386, don't use the generic code. As the comment around 7746 says, it's better to use the x87 extended precision here than SSE. And the generic code doesn't know how to do that. It also regains the speed lost for the uint64_to_float.c testcase. <rdar://problem/10669858> llvm-svn: 147869	2012-01-10 19:41:30 +00:00
Devang Patel	9a08a580a7	Add definition for intel asm variant. Right now, this just adds additional entries in match table. The parser does not use them yet. llvm-svn: 147859	2012-01-10 17:51:54 +00:00
David Blaikie	8d47bb30e3	Remove unnecessary default cases in switches that cover all enum values. llvm-svn: 147855	2012-01-10 16:47:17 +00:00
Benjamin Kramer	1f1a76c7af	Add definitions for AMD's bobcat (aka btver1) llvm-svn: 147846	2012-01-10 11:50:02 +00:00
Craig Topper	01eba20904	Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside. llvm-svn: 147844	2012-01-10 08:23:59 +00:00
Craig Topper	9beee30168	Remove hasXMM/hasXMMInt functions. Move callers to hasSSE1/hasSSE2. This is the final piece to remove the AVX hack that disabled SSE. llvm-svn: 147843	2012-01-10 06:54:16 +00:00
Craig Topper	5f6f96da91	Remove hasSSEorAVX functions and change all callers to use just hasSSE. AVX is now an SSE level and no longer disables SSE checks. llvm-svn: 147842	2012-01-10 06:37:29 +00:00
Craig Topper	c9756440ea	Instruction selection priority fixes to remove the XMM/XMMInt/orAVX predicates. Another commit will remove orAVX functions from X86SubTarget. llvm-svn: 147841	2012-01-10 06:30:56 +00:00
Devang Patel	88a31798bc	Fix asm string wrt variants. llvm-svn: 147805	2012-01-09 21:32:02 +00:00
Devang Patel	921a16318d	Split AsmParser into two components - AsmParser and AsmParserVariant AsmParser holds info specific to target parser. AsmParserVariant holds info specific to asm variants supported by the target. llvm-svn: 147787	2012-01-09 19:13:28 +00:00
Chandler Carruth	f762ca322b	Don't rely on the fact that shift values are never very large, and thus this substraction will result in small negative numbers at worst which become very large positive numbers on assignment and are thus caught by the <=4 check on the next line. The >0 check clearly intended to catch these as negative numbers. Spotted by inspection, and impossible to trigger given the shift widths that can be used. llvm-svn: 147773	2012-01-09 09:47:25 +00:00
Craig Topper	be3b3744b8	Remove AVX hack in X86Subtarget. AVX/AVX2 are now treated as an SSE level. Predicate functions have been altered to maintain previous names and behavior. llvm-svn: 147770	2012-01-09 09:02:13 +00:00
Craig Topper	915bc4edaa	Add HasAVX predicate to some of the AVX patterns. llvm-svn: 147769	2012-01-09 08:34:00 +00:00
Craig Topper	2710d2dadb	Reorder a bunch of patterns to put the AVX version first thus giving it priority over the SSE version. Another step towards trying to remove the AVX hack that disables SSE from X86Subtarget. llvm-svn: 147768	2012-01-09 08:10:38 +00:00
Craig Topper	ee2dabebe3	Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing. llvm-svn: 147767	2012-01-09 06:52:46 +00:00
Craig Topper	8ce58f4687	Mark MOVNTI as being supported in SSE2 OR AVX mode. This instruction has no AVX equivalent so we should use the SSE version. llvm-svn: 147766	2012-01-09 06:38:55 +00:00
Craig Topper	bd6f3004ba	Move SSE2 logical operations PAND/POR/PXOR/PANDN above SSE1 logical operations ANDPS/ORPS/XORPS/ANDNPS. This fixes a pattern ordering issue that meant that the SSE2 instructions could never be directly selected since the SSE1 patterns would always match first. This is largely moot with the ExeDepsFix pass, but I'm trying to audit for all such ordering issues. llvm-svn: 147765	2012-01-09 05:07:01 +00:00
Craig Topper	dc81848e5b	Change some places that were checking for AVX OR SSE1/2 to use hasXMM/hasXMMInt instead. Also fix one place that checked SSE3, but accidentally excluded AVX to use hasSSE3orAVX. This is a step towards removing the AVX hack from the X86Subtarget.h llvm-svn: 147764	2012-01-09 02:28:15 +00:00
Craig Topper	1fdcf7071d	Don't disable MMX support when AVX is enabled. Fix predicates for MMX instructions that were added along with SSE instructions to check for AVX in addition to SSE level. llvm-svn: 147762	2012-01-09 00:11:29 +00:00
Craig Topper	3fb8201fcb	Enable FISTTP* instructions when AVX is enabled. llvm-svn: 147758	2012-01-08 23:04:21 +00:00
Victor Umansky	5d24f5f51a	Reverted commit #147601 upon Evan's request. llvm-svn: 147748	2012-01-08 17:20:33 +00:00
Craig Topper	260a034757	Fix typo in the X86 backend readme. Patch from Jaeden Amero. llvm-svn: 147739	2012-01-07 20:35:21 +00:00
Benjamin Kramer	0ce9fd3032	Remove VectorExtras. This unused helper was written for a type of API that is discouraged now. llvm-svn: 147738	2012-01-07 19:42:13 +00:00
Craig Topper	17901c2d33	Remove unnecessary check of hasAVX(). It's already included in hasXMM(). llvm-svn: 147734	2012-01-07 18:48:43 +00:00
Eric Christopher	09a41e8939	Make the 'x' constraint work for AVX registers as well. Fixes rdar://10614894 llvm-svn: 147704	2012-01-07 01:02:09 +00:00
Craig Topper	4f47613598	Mark scalar FMA4 instructions as ignoring the VEX.L bit. llvm-svn: 147602	2012-01-05 08:56:10 +00:00
Victor Umansky	87d5ada510	Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601	2012-01-05 08:46:19 +00:00
Bill Wendling	6d5ac8b8df	Replace the uint64_t -> double convertion algorithm with one that's more efficient. This small bit of ASM code is sufficient to do what the old algorithm did: movq %rax, %xmm0 punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U } subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 } #ifdef __SSE3__ haddpd %xmm0, %xmm0 #else pshufd $0x4e, %xmm0, %xmm1 addpd %xmm1, %xmm0 #endif It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on all processors. <rdar://problem/7719814> llvm-svn: 147593	2012-01-05 02:13:20 +00:00
Benjamin Kramer	26f7d2c8cc	Silence warnings of a mysterious compiler that still defaults to C89. llvm-svn: 147553	2012-01-04 22:06:45 +00:00
Evan Cheng	caba5d2fc2	For x86, canonicalize max (x > y) ? x : y => (x >= y) ? x : y So for something like (x - y) > 0 : (x - y) ? 0 It will be (x - y) >= 0 : (x - y) ? 0 This makes is possible to test sign-bit and eliminate a comparison against zero. e.g. subl %esi, %edi testl %edi, %edi movl $0, %eax cmovgl %edi, %eax => xorl %eax, %eax subl %esi, $edi cmovsl %eax, %edi rdar://10633221 llvm-svn: 147512	2012-01-04 01:41:39 +00:00
Chad Rosier	74a3944974	Fix 80-column violations. llvm-svn: 147495	2012-01-03 23:19:12 +00:00
Nadav Rotem	79f1692fe0	Revert 147426 because it caused pr11696. llvm-svn: 147485	2012-01-03 22:19:42 +00:00
Chad Rosier	afcaa8f38a	Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481	2012-01-03 21:05:52 +00:00
Devang Patel	131611237d	Intel style asm variant does not need '%' prefix. llvm-svn: 147453	2012-01-03 18:22:10 +00:00
Craig Topper	dbc281bcbe	Miscellaneous shuffle lowering cleanup. No functional changes. Primarily converting the indexing loops to unsigned to be consistent across functions. llvm-svn: 147430	2012-01-02 09:17:37 +00:00
Craig Topper	69e0a09d8f	Make CanXFormVExtractWithShuffleIntoLoad reject loads with multiple uses. Also make it return false if there's not even a load at all. This makes the code better match the code in DAGCombiner that it tries to match. These two changes prevent some cases where vector_shuffles were making it to instruction selection and causing the older shuffle selection code to be triggered. Also needed to fix a bad pattern that this change exposed. This is the first step towards getting rid of the old shuffle selection support. No test cases yet because there's no way to tell whether a shuffle was handled in the legalize stage or at instruction selection. llvm-svn: 147428	2012-01-02 08:46:48 +00:00
Nadav Rotem	6929a8868b	Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit. llvm-svn: 147426	2012-01-02 08:05:46 +00:00
Craig Topper	f7c9bf17dd	Allow CRC32 instructions to be selected when AVX is enabled. llvm-svn: 147411	2012-01-01 19:51:58 +00:00
Craig Topper	d8ae2d9f27	Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers. llvm-svn: 147409	2012-01-01 19:40:22 +00:00
Benjamin Kramer	210900f7c2	X86Disassembler: Fix undefined behavior found by GCC 4.6 llvm-svn: 147404	2012-01-01 17:55:36 +00:00
Craig Topper	ef59fe1ad4	Merge X86 SHUFPS and SHUFPD node types. llvm-svn: 147394	2011-12-31 23:50:21 +00:00
Craig Topper	0311c45aed	Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load. llvm-svn: 147393	2011-12-31 23:24:49 +00:00
Craig Topper	c01ce759d7	Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected. llvm-svn: 147392	2011-12-31 23:15:11 +00:00
Craig Topper	4065311852	Make FMA4 imply AVX so that YMM registers would be available. Necessitates removing from Bulldozer CPU types since it would enable AVX code generation implicitly. Also make SSE4A imply SSE3. Without some level of SSE implied, XMM registers wouldn't be legal. llvm-svn: 147369	2011-12-30 07:16:00 +00:00
Craig Topper	b4db8689ee	Add disassembler support for VPERMIL2PD and VPERMIL2PS. llvm-svn: 147368	2011-12-30 06:23:39 +00:00
Craig Topper	089be4fefa	Add FMA4 instructions to disassembler. llvm-svn: 147367	2011-12-30 05:20:36 +00:00
Craig Topper	44a5136fac	Separate the concept of having memory access in operand 4 from the concept of having the W bit set for XOP instructons. Removes ORing W-bits in the encoder and will similarly simplify the disassembler implementation. llvm-svn: 147366	2011-12-30 04:48:54 +00:00
Craig Topper	97ba7b38eb	Combine FMA4 SS/SD patterns with the instruction definitions. llvm-svn: 147365	2011-12-30 03:33:59 +00:00
Craig Topper	bdcc86a43c	Combine FMA4 PS/PD patterns with the instruction definitions. llvm-svn: 147364	2011-12-30 03:17:15 +00:00
Craig Topper	33091db89a	Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms. llvm-svn: 147361	2011-12-30 02:18:36 +00:00
Craig Topper	e066262284	Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere. llvm-svn: 147360	2011-12-30 01:49:53 +00:00
Craig Topper	97e84c23a1	Fix execution domains for PS/PD FMA3 instructions. Add SS/SD forms o FMA3 instructions. llvm-svn: 147353	2011-12-29 20:43:40 +00:00
Craig Topper	bcfd070378	Expose FMA3 instructions to the disassembler. llvm-svn: 147351	2011-12-29 20:03:14 +00:00
Craig Topper	9d664349d6	Make FMA3 imply AVX needs to be enabled. Particularly because 256-bit types aren't valid unless AVX is enabled. llvm-svn: 147349	2011-12-29 19:46:19 +00:00
Craig Topper	21029d1f81	Change XOP detection to use the correct CPUID bit instead of using the FMA4 bit. llvm-svn: 147348	2011-12-29 19:25:56 +00:00
Craig Topper	93d614dd3a	Add FeaturePOPCNT to all CPU types that lost it was removed from SSE42/SSE4A in r147339. llvm-svn: 147347	2011-12-29 18:47:31 +00:00
Craig Topper	ba73cefabb	Mark non-VEX forms of PCLMUL instructions as requiring SSE2 to be enabled along with CLMUL. That's required for the XMM registers to be valid for integer data. Doesn't change any behavior since the CLMUL instructions don't have patterns yet. llvm-svn: 147345	2011-12-29 18:08:36 +00:00
Craig Topper	63bb77ebe7	Mark non-VEX forms of AES instructions as requiring SSE2 to be enabled along with AES. Since that's required for the XMM registers to be valid for integer data. Doesn't change any behavior though since you can't use an intrinsic with an illegal type anyway. Just makes it consistent with the VEX forms. llvm-svn: 147344	2011-12-29 18:00:08 +00:00
Craig Topper	04b3b369de	Remove the separate explicit AES instruction patterns. They are equivalent to the patterns specified by the instructions. Also remove unnecessary bitconverts from the AES patterns. llvm-svn: 147342	2011-12-29 17:41:56 +00:00
Craig Topper	3ff20898e9	Make SSE42 and SSE4A not imply POPCNT. POPCNT should be able to be disabled on its own without disabling SSE4.2 or SSE4A. llvm-svn: 147339	2011-12-29 15:51:45 +00:00
Craig Topper	0ff84d82cc	Make LowerBUILD_VECTOR keep node vector types consistent when creating MOVL for v16i16 and v32i8. llvm-svn: 147337	2011-12-29 03:34:54 +00:00
Craig Topper	cb884f28d9	Remove some elses after returns. llvm-svn: 147336	2011-12-29 03:20:51 +00:00
Craig Topper	4cbe88ceba	Remove trailing spaces. Fix an assert to use && instead of \|\| before string. Add same assert on similar code path. llvm-svn: 147335	2011-12-29 03:09:33 +00:00
Eli Friedman	db54b4b68f	Fix type-checking for load transformation which is not legal on floating-point types. PR11674. llvm-svn: 147323	2011-12-28 21:24:44 +00:00
Elena Demikhovsky	9b4613ff14	Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR. Matching MOVLP mask for AVX (265-bit vectors) was wrong. The failure was detected by conformance tests. llvm-svn: 147308	2011-12-28 08:14:01 +00:00
Craig Topper	9c07745da9	Add handling of x86_avx2_pmovmskb to computeMaskedBitsForTargetNode for consistency. Add comments and an assert for BMI instructions to PerformXorCombine since the enabling of the combine is conditional on it, but the function itself isn't. llvm-svn: 147287	2011-12-27 06:27:23 +00:00
Rafael Espindola	504588a7a3	Section relative fixups are a coff concept, not a x86 one. Replace the x86 specific reloc_coff_secrel32 with a generic FK_SecRel_4. llvm-svn: 147252	2011-12-24 14:47:52 +00:00
Chandler Carruth	7a5c52fadf	Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the LZCNT instructions are available. Force promotion to i32 to get a smaller encoding since the fix-ups necessary are just as complex for either promoted type We can't do standard promotion for CTLZ when lowering through BSR because it results in poor code surrounding the 'xor' at the end of this instruction. Essentially, if we promote the entire CTLZ node to i32, we end up doing the xor on a 32-bit CTLZ implementation, and then subtracting appropriately to get back to an i8 value. Instead, our custom logic just uses the knowledge of the incoming size to compute a perfect xor. I'd love to know of a way to fix this, but so far I'm drawing a blank. I suspect the legalizer could be more clever and/or it could collude with the DAG combiner, but how... ;] llvm-svn: 147251	2011-12-24 12:12:34 +00:00
Chandler Carruth	82b7a7478b	Add systematic testing for cttz as well, and fix the bug I spotted by inspection earlier. llvm-svn: 147250	2011-12-24 11:46:10 +00:00
Benjamin Kramer	facca1025b	Chandler fixed this. llvm-svn: 147247	2011-12-24 11:23:32 +00:00
Chandler Carruth	48f5be6ce0	Expand more when we have a nice 'tzcnt' instruction, to avoid generating 'bsf' instructions here. This one is actually debatable to my eyes. It's not clear that any chip implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding. Still, this restores the old behavior with 'tzcnt' enabled for now. llvm-svn: 147246	2011-12-24 11:11:38 +00:00
Chandler Carruth	9ef50ef1f7	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Rafael Espindola	2fc741dfac	Move x86 specific bits of the COFF writer to lib/Target/X86. llvm-svn: 147231	2011-12-24 02:14:02 +00:00
Chad Rosier	32c63b4265	Fix 80-column violations. llvm-svn: 147192	2011-12-22 22:35:21 +00:00
Chad Rosier	98251404f7	Fix 80-column violations. llvm-svn: 147095	2011-12-21 20:59:09 +00:00
Chad Rosier	4e4bfcaa90	No case stmt for BUILD_VECTOR in PerformDAGCombine(), so I assume this isn't necessary. Please chime in if I'm mistaken. llvm-svn: 147065	2011-12-21 19:14:52 +00:00
Rafael Espindola	8c9b0dea02	Move the X86 specific bits of the ELF writer to the Target/X86 directory. Other targets will follow shortly. llvm-svn: 147060	2011-12-21 17:30:17 +00:00
Rafael Espindola	f9c7f9e3f3	Reduce the exposure of Triple::OSType in the ELF object writer. This will avoid including ADT/Triple.h in many places when the target specific bits are moved. llvm-svn: 147059	2011-12-21 17:00:36 +00:00
Craig Topper	496932a88a	Remove mode specific disassembler classes and just call X86GenericDisassembler constructor with appropriate argument in the creation functions. This removes a few tables that needed to be anchored. llvm-svn: 147046	2011-12-21 08:06:52 +00:00
Craig Topper	e2e670bee5	Fix typo in a couple comments llvm-svn: 147045	2011-12-21 06:30:53 +00:00
Elena Demikhovsky	b37883fe87	This is the second fix related to VZEXT_MOVL node. The failure that I see in the current version is: LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14] 0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13] 0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12] 0x18b9870: v4i64 = undef [ID=4] 0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10] 0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9970: i32 = Constant<0> [ID=3] 0x18b9170: v2i64 = undef [ORD=1] [ID=1] 0x18b9570: i32 = Constant<2> [ID=5] llvm-svn: 146975	2011-12-20 13:34:28 +00:00
Chandler Carruth	7564e8371a	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Chandler Carruth	1663697160	Fix up the CMake build for the new files added in r146960, they're likely to stay either way that discussion ends up resolving itself. llvm-svn: 146966	2011-12-20 08:42:11 +00:00
David Blaikie	576aba04f1	Unweaken vtables as per http://llvm.org/docs/CodingStandards.html#ll_virtual_anch llvm-svn: 146960	2011-12-20 02:50:00 +00:00
Jakob Stoklund Olesen	0aa2f7755a	Emit a getMatchingSuperRegClass() implementation for every target. Use information computed while inferring new register classes to emit accurate, table-driven implementations of getMatchingSuperRegClass(). Delete the old manual, error-prone implementations in the targets. llvm-svn: 146873	2011-12-19 16:53:34 +00:00
Benjamin Kramer	942f1bd653	Another variadics tweak. llvm-svn: 146852	2011-12-18 20:51:31 +00:00
Benjamin Kramer	e16533eded	Use the fancy new VariadicFunction template instead of a plain variadic function. Some compilers were complaining about passing StringRef to it. llvm-svn: 146850	2011-12-18 19:59:20 +00:00
Craig Topper	9696094d07	Remove an unused X86ISD node type. llvm-svn: 146833	2011-12-17 19:16:44 +00:00
Benjamin Kramer	65c1236c63	X86: Factor the bswap asm matching to be slightly less horrible to read. llvm-svn: 146831	2011-12-17 14:36:05 +00:00
Rafael Espindola	549d0683b1	Add back the MC bits of 126425. Original patch by Nathan Jeffords. I added the asm parsing and testcase. llvm-svn: 146801	2011-12-17 01:14:52 +00:00
Lang Hames	e32ef23ba8	Make sure that the lower bits on the VSELECT condition are properly set. llvm-svn: 146800	2011-12-17 01:08:46 +00:00
Craig Topper	88e2bfef0a	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. llvm-svn: 146726	2011-12-16 08:06:31 +00:00
Eli Friedman	f626b19bda	Make sure we correctly note the existence of an i8 immediate for vblendvps and friends, so we compute fixups correctly. PR11586. llvm-svn: 146709	2011-12-15 23:46:18 +00:00
Chad Rosier	62ebee9859	Add missing zmovl AVX patterns which were causing crashes. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146689	2011-12-15 22:11:31 +00:00
Chad Rosier	e74b3b1469	Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146684	2011-12-15 21:34:44 +00:00
Lang Hames	d86b47a279	Fix VSELECT operand order. Was previously backwards, causing bogus vector shift results - <rdar://problem/10559581>. llvm-svn: 146671	2011-12-15 18:57:27 +00:00
Chad Rosier	dcfc5e1dd0	Use SmallVector/assign(), rather than std::vector/push_back(). llvm-svn: 146627	2011-12-15 01:16:09 +00:00
Chad Rosier	b93733686c	Add support for lowering fneg when AVX is enabled. rdar://10566486 llvm-svn: 146625	2011-12-15 01:02:25 +00:00
Bill Wendling	e9bd145105	The saved registers weren't being processed in the correct order. This lead to the compact unwind claiming that one register was saved before another, which isn't all that great in general. Process them in the natural order. Reverse the list only when necessary for the algorithm. llvm-svn: 146612	2011-12-14 23:53:24 +00:00
Evan Cheng	68ba5536f3	- Add MachineInstrBundle.h and MachineInstrBundle.cpp. This includes a function to finalize MI bundles (i.e. add BUNDLE instruction and computing register def and use lists of the BUNDLE instruction) and a pass to unpack bundles. - Teach more of MachineBasic and MachineInstr methods to be bundle aware. - Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to prevent IT blocks from being broken apart. llvm-svn: 146542	2011-12-14 02:11:42 +00:00
Chandler Carruth	e0484f6b37	Initial CodeGen support for CTTZ/CTLZ where a zero input produces an undefined result. This adds new ISD nodes for the new semantics, selecting them when the LLVM intrinsic indicates that the undef behavior is desired. The new nodes expand trivially to the old nodes, so targets don't actually need to do anything to support these new nodes besides indicating that they should be expanded. I've done this for all the operand types that I could figure out for all the targets. Owners of various targets, please review and let me know if any of these are incorrect. Note that the expand behavior is conservatively correct, and exactly matches LLVM's current behavior with these operations. Ideally this patch will not change behavior in any way. For example the regtest suite finds the exact same instruction sequences coming out of the code generator. That's why there are no new tests here -- all of this is being exercised by the existing test suite. Thanks to Duncan Sands for reviewing the various bits of this patch and helping me get the wrinkles ironed out with expanding for each target. Also thanks to Chris for clarifying through all the discussions that this is indeed the approach he was looking for. That said, there are likely still rough spots. Further review much appreciated. llvm-svn: 146466	2011-12-13 01:56:10 +00:00
Daniel Dunbar	b72534060e	LLVMBuild: Introduce a common section which currently has a list of the subdirectories to traverse into. - Originally I wanted to avoid this and just autoscan, but this has one key flaw in that new subdirectories can not automatically trigger a rerun of the llvm-build tool. This is particularly a pain when switching back and forth between trees where one has added a subdirectory, as the dependencies will tend to be wrong. This will also eliminates FIXME implicitly. llvm-svn: 146436	2011-12-12 22:45:54 +00:00
Daniel Dunbar	30d6a45140	LLVMBuild: Remove trailing newline, which irked me. llvm-svn: 146409	2011-12-12 19:48:00 +00:00
Jan Sjödin	b9e2da0d9a	XOP instructions and encoding tests. llvm-svn: 146407	2011-12-12 19:37:49 +00:00
Jan Sjödin	b4602e048f	XOP encoding bits and logic. llvm-svn: 146397	2011-12-12 19:12:26 +00:00
Craig Topper	1e13feed8c	Remove some remants of the old palign pattern fragment that were still hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast. llvm-svn: 146344	2011-12-11 19:12:35 +00:00
Rafael Espindola	9b9d35cc05	Handle expressions of the form _GLOBAL_OFFSET_TABLE_-symbol the same way gas does. The _GLOBAL_OFFSET_TABLE_ is still magical in that we get a R_386_GOTPC, but it doesn't change the immediate in the same way as when the expression has no right hand side symbol. llvm-svn: 146311	2011-12-10 02:28:43 +00:00
Benjamin Kramer	54f11215e8	This is now implemented. llvm-svn: 146258	2011-12-09 15:45:57 +00:00
Benjamin Kramer	06cd66b1d7	X86: Add patterns for the various rounding ops for SSE4.1 and AVX. llvm-svn: 146257	2011-12-09 15:44:03 +00:00
Benjamin Kramer	66bfc0739d	X86: Split (v)rounds[sd] into a normal and an intrinsic version. llvm-svn: 146256	2011-12-09 15:43:55 +00:00
Evan Cheng	e186276896	Remove hasSSE1orAVX(). It's the same as hasXMM(). llvm-svn: 146246	2011-12-09 06:32:46 +00:00
Evan Cheng	ad8debd736	Add 256-bit variant vmovss and vmovsd patterns. rdar://10538417 llvm-svn: 146196	2011-12-08 22:30:45 +00:00
Evan Cheng	d8a73b8918	Add various missing AVX patterns which was causing crashes. Sadly, the generated code looks pretty bad compared to SSE. rdar://10538793 llvm-svn: 146191	2011-12-08 22:05:28 +00:00
Owen Anderson	b622630e01	Don't explicitly marked libm rounding ops as legal on SSE4.1/AVX. There don't seem to be patterns for these, so I don't know why they were marked legal in the first place. Fixes failures caused by r146171. llvm-svn: 146180	2011-12-08 20:51:38 +00:00
Owen Anderson	d003a613e7	Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise. llvm-svn: 146171	2011-12-08 19:32:14 +00:00
Evan Cheng	93e29adc2f	Many of the SSE patterns should not be selected when AVX is available. This led to the following code in X86Subtarget.cpp if (HasAVX) X86SSELevel = NoMMXSSE; This is so patterns that are predicated on hasSSE3, etc. would not be selected when avx is available. Instead, the AVX variant is selected. However, this breaks instructions which do not have AVX variants. The right way to fix this is for the SSE but not-AVX patterns to predicate on something like hasSSE3() && !hasAVX(). Then we can take out the hack in X86Subtarget.cpp. Patterns which do not have AVX variants do not need to change. However, we need to audit all the patterns before we make the change. This patch is workaround that fixes one specific case, the prefetch instructions. rdar://10538297 llvm-svn: 146163	2011-12-08 19:00:42 +00:00
Jan Sjödin	fb32802944	Src2 and src3 were accidentally swapped for the FMA4 rr patterns. Undo this and fix the encoding. llvm-svn: 146151	2011-12-08 14:43:19 +00:00
Craig Topper	6b3cc1405f	Fix a bunch of SSE/AVX patterns to use proper memop types. In particular, not using integer loads other than v2i64/v4i64 since the others are all promoted. llvm-svn: 146031	2011-12-07 08:30:53 +00:00
Bill Wendling	4741665fb1	Adjust the stack by one pointer size for all frameless stacks. llvm-svn: 146030	2011-12-07 07:58:55 +00:00
Bill Wendling	757cba38ba	Fix off-by-one error when encoding the stack size for a frameless stack. llvm-svn: 146029	2011-12-07 07:49:49 +00:00
Evan Cheng	1acd685d87	Add bundle aware API for querying instruction properties and switch the code generator to it. For non-bundle instructions, these behave exactly the same as the MC layer API. For properties like mayLoad / mayStore, look into the bundle and if any of the bundled instructions has the property it would return true. For properties like isPredicable, only return true if all of the bundled instructions have the property. For properties like canFoldAsLoad, isCompare, conservatively return false for bundles. llvm-svn: 146026	2011-12-07 07:15:52 +00:00
Bill Wendling	7380d4e412	Explicitly check for the different SUB instructions. llvm-svn: 145976	2011-12-06 22:14:27 +00:00
Bill Wendling	5fabb465e2	Encode the total stack if there isn't a frame. llvm-svn: 145969	2011-12-06 21:34:01 +00:00
Bill Wendling	075cd4a296	* Add a macro to remove a magic number. * Rename variables to reflect what they're actually used for. llvm-svn: 145968	2011-12-06 21:23:42 +00:00
Bill Wendling	9f248310fc	Check the correct value for small stack sizes. Also modify some comments. llvm-svn: 145954	2011-12-06 19:16:17 +00:00
Bill Wendling	5229d524ea	For a small sized stack, we encode that value directly with no "stack adjust" value. llvm-svn: 145952	2011-12-06 19:09:06 +00:00
Craig Topper	26f41cda03	Add X86ISD::HADD/HSUB to getTargetNodeName llvm-svn: 145929	2011-12-06 09:31:36 +00:00
Craig Topper	8b05e7d035	Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those. llvm-svn: 145927	2011-12-06 09:04:59 +00:00
Craig Topper	846d53deed	Merge floating point and integer UNPCK X86ISD node types. llvm-svn: 145926	2011-12-06 08:21:25 +00:00
Craig Topper	e6e44c24dd	Clean up some of the shuffle decoding code for UNPCK instructions. Add instruction commenting for AVX/AVX2 forms for integer UNPCKs. llvm-svn: 145924	2011-12-06 05:31:16 +00:00
Craig Topper	72b41227d8	Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted. llvm-svn: 145921	2011-12-06 04:59:07 +00:00
Bill Wendling	e7c74617e5	Add a comment. llvm-svn: 145896	2011-12-06 01:57:48 +00:00
Jakob Stoklund Olesen	e53ed273d9	Use logarithmic units for basic block alignment. This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly documented as taking log2(bytes) units, but the x86 target would still set a preferred loop alignment of '16'. CodePlacementOpt passed this number on to the basic block, and AsmPrinter interpreted it as bytes. Now both MachineFunction and MachineBasicBlock use logarithmic alignments. Obviously, MachineConstantPool still measures alignments in bytes, so we can emulate the thrill of using as. llvm-svn: 145889	2011-12-06 01:26:19 +00:00
Bill Wendling	6e3adcc60a	The compact encoding of the registers are 3-bits each. Make sure we shift the value over that much. llvm-svn: 145888	2011-12-06 01:26:14 +00:00
Jim Grosbach	633ce3426c	Move target-specific logic out of generic MCAssembler. Whether a fixup needs relaxation for the associated instruction is a target-specific function, as the FIXME indicated. Create a hook for that and use it. llvm-svn: 145881	2011-12-06 00:47:03 +00:00
Craig Topper	e873d61004	Remove some leftover remnants that once tried to create 64-bit MMX PALIGNR instructions. llvm-svn: 145804	2011-12-05 07:27:14 +00:00
Craig Topper	02e1c33dca	Clean up and optimizations to the X86 shuffle lowering code. No functional change. llvm-svn: 145803	2011-12-05 06:56:46 +00:00
Sanjoy Das	fe35e107cd	Check for stack space more intelligently. libgcc sets the stack limit field in TCB to 256 bytes above the actual allocated stack limit. This means if the function's stack frame needs less than 256 bytes, we can just compare the stack pointer with the stack limit. This should result in lesser calls to __morestack. llvm-svn: 145766	2011-12-03 09:32:07 +00:00
Sanjoy Das	d1c3d82afe	Fix a bug in the x86-32 code generated for segmented stacks. Currently LLVM pads the call to __morestack with a add and sub of 8 bytes to esp. This isn't correct since __morestack expects the call to be followed directly by a ret. This commit also adjusts the relevant test-case. llvm-svn: 145765	2011-12-03 09:21:07 +00:00
Nick Lewycky	2b2f028dcc	Creating multiple JITs on X86 in multiple threads causes multiple writes (of the same value) to this variable. This code could be refactored, but it doesn't matter since the old JIT is going away. Add tsan annotations to ignore the race. llvm-svn: 145745	2011-12-03 02:45:50 +00:00
Nick Lewycky	7d0d3c2d58	Move global variables in TargetMachine into new TargetOptions class. As an API change, now you need a TargetOptions object to create a TargetMachine. Clang patch to follow. One small functionality change in PTX. PTX had commented out the machine verifier parts in their copy of printAndVerify. That now calls the version in LLVMTargetMachine. Users of PTX who need verification disabled should rely on not passing the command-line flag to enable it. llvm-svn: 145714	2011-12-02 22:16:29 +00:00
Jan Sjödin	81ee97461c	Add XOP feature flag. llvm-svn: 145682	2011-12-02 15:14:37 +00:00
Craig Topper	c67b7d7e2c	Reduce duplicate code in isHorizontalBinOp and add some asserts to protect assumptions llvm-svn: 145681	2011-12-02 08:18:41 +00:00
Craig Topper	d381116357	Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors. llvm-svn: 145680	2011-12-02 07:16:01 +00:00
Sanjoy Das	5cc5730434	Dummy commit to check commit access. llvm-svn: 145619	2011-12-01 19:15:08 +00:00
Eric Christopher	4aa8024569	For 64-bit the rest of the general regs are ok for the q constraint. Make sure we can emit both the high and low versions of those registers. Fixes rdar://10392864 llvm-svn: 145579	2011-12-01 08:12:41 +00:00
Eli Friedman	ad916965a6	Pass AVX vectors which are arguments to varargs functions on the stack. <rdar://problem/10463281>. llvm-svn: 145573	2011-12-01 04:49:21 +00:00
Jan Sjödin	2dfb343ffa	Support for encoding all FMA4 instructions and tablegen patterns for all remaining FMA4 instructions and intrinsics with tests. llvm-svn: 145525	2011-11-30 22:09:42 +00:00
Benjamin Kramer	fbdf17c667	X86: Turns out bulldozer also supports sse42 and lzcnt. While at it remove the barcelona/instanbul/shanghai subtargets, they're unsupported by GCC and look pretty broken. llvm-svn: 145494	2011-11-30 15:48:16 +00:00
Benjamin Kramer	359fde3539	X86: Add subtargets for AMD's bulldozer. llvm-svn: 145493	2011-11-30 15:27:46 +00:00
Nadav Rotem	f8e096f4ee	X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479). llvm-svn: 145488	2011-11-30 10:13:37 +00:00
Craig Topper	2a0fc05456	Add instruction selection support for AVX2 horizontal add/sub instructions. llvm-svn: 145487	2011-11-30 09:10:50 +00:00
Craig Topper	0ac9bb8aa1	Merge VPERM2F128/VPERM2I128 ISD node types. llvm-svn: 145485	2011-11-30 07:47:51 +00:00
Craig Topper	43b885cff4	Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. llvm-svn: 145483	2011-11-30 06:25:25 +00:00
Evan Cheng	5c1efd630b	Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins. llvm-svn: 145448	2011-11-29 22:48:34 +00:00
Jakob Stoklund Olesen	5d6a4584d9	Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions. Like V_SET0, these instructions are expanded by ExpandPostRA to xorps / vxorps so they can participate in execution domain swizzling. This also makes the AVX variants redundant. llvm-svn: 145440	2011-11-29 22:27:25 +00:00
Daniel Dunbar	4e00f5f8fd	build/CMake: Finish removal of add_llvm_library_dependencies. llvm-svn: 145420	2011-11-29 19:25:30 +00:00
Michael J. Spencer	5fade79478	MC/X86/COFF: Allow quotes in names when targeting MS/Windows, as MC is the only assembler we support. This splits MS/Windows and GNU/Windows ASM infos into two seperate classes. While there is currently only one difference, full MS C++ ABI support will require many more. llvm-svn: 145409	2011-11-29 18:00:06 +00:00
Elena Demikhovsky	735cff1fa8	Fixed vsqrt.ss intrinsic usage - order of input operands was wrong. Added a test. Thanks Bruno for reviewing the patch. llvm-svn: 145403	2011-11-29 15:00:45 +00:00
Craig Topper	637e60afdd	Fix shuffle decoding for memory forms for (V)SHUFPS/D. llvm-svn: 145392	2011-11-29 07:58:09 +00:00
Craig Topper	4550fc2649	Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390	2011-11-29 07:49:05 +00:00
Craig Topper	aca91b9f14	Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled. llvm-svn: 145376	2011-11-29 05:37:58 +00:00
Craig Topper	a6c1d25798	Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2. llvm-svn: 145370	2011-11-29 03:57:34 +00:00
Evan Cheng	1ed975b097	Add missing avx pattern. llvm-svn: 145272	2011-11-28 20:27:23 +00:00
Craig Topper	6f5a0bc4e3	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Craig Topper	bf4b640497	Make isCommutedVSHUFP more like the way isCommutedSHUFP is handled. llvm-svn: 145218	2011-11-28 01:14:24 +00:00
Craig Topper	251feef715	Merge detecting and handling for VSHUFPSY and VSHUFPDY since a lot of the code was similar for both. llvm-svn: 145199	2011-11-27 21:41:12 +00:00
Craig Topper	563854a230	Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. llvm-svn: 145153	2011-11-26 22:55:48 +00:00
Craig Topper	65f8dcdb7d	Collapse X86ISD node types for PUNPCKH, PUNPCKL, UNPCKLP, and UNPCKHP to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type. llvm-svn: 145148	2011-11-26 20:47:44 +00:00
Bruno Cardoso Lopes	626d04cc6f	This patch contains support for encoding FMA4 instructions and tablegen patterns for scalar FMA4 operations and intrinsic. Also add tests for vfmaddsd. Patch by Jan Sjodin llvm-svn: 145133	2011-11-25 19:33:42 +00:00
Craig Topper	e761f42368	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Craig Topper	7cf04d32e9	Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish. llvm-svn: 145125	2011-11-24 22:20:08 +00:00
Benjamin Kramer	d03fc374bd	X86: alias cqo to cqto. llvm-svn: 145121	2011-11-24 12:02:46 +00:00
Benjamin Kramer	825892c47f	X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. Before: movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00] testq %rax, %rdi ## encoding: [0x48,0x85,0xf8] jne LBB0_2 ## encoding: [0x75,A] After: btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20] jb LBB0_2 ## encoding: [0x72,A] btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off saving one register and a giant movabsq. llvm-svn: 145103	2011-11-23 13:54:17 +00:00
Elena Demikhovsky	681438f0b2	I added several lines in X86 code generator that allow to choose VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask. The patch was reviewed by Bruno. llvm-svn: 145099	2011-11-23 10:23:16 +00:00
Jakob Stoklund Olesen	2b76f5685c	Fix PR11422. This was a bug in keeping track of the available domains when merging domain values. The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr to the integer domain which is only available in AVX2. Also add an assertion to catch future attempts at emitting AVX2 instructions. llvm-svn: 145096	2011-11-23 04:03:08 +00:00
Craig Topper	40377194a6	More fixes to the X86InstComments for shuffle instructions. In particular add AVX flavors of many instructions and fix the destination operand for some of the existing AVX entries. llvm-svn: 145063	2011-11-22 14:27:57 +00:00
Craig Topper	33f60fca12	Fix shuffle decoding logic to handle UNPCKLPS/UNPCKLPD on 256-bit vectors correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms. llvm-svn: 145055	2011-11-22 01:57:35 +00:00
Craig Topper	4e39324173	Add methods for querying minimum SSE version along with AVX. Simplifies all the places that had to check a version of SSE and AVX. llvm-svn: 145053	2011-11-22 00:44:41 +00:00
Craig Topper	866214a486	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	14cedf481a	Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled. llvm-svn: 145026	2011-11-21 06:57:39 +00:00
Craig Topper	62ae335144	Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled. llvm-svn: 145022	2011-11-21 01:12:36 +00:00
Craig Topper	e878c775cf	Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine. llvm-svn: 145005	2011-11-20 00:12:05 +00:00
Craig Topper	6ed413c495	Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled. llvm-svn: 145004	2011-11-19 22:34:59 +00:00
Craig Topper	3e24dc25b2	Remove some of the special classes that worked around an old tablegen limitation of not being able to remove redundant bitconverts from patterns. llvm-svn: 145003	2011-11-19 21:01:54 +00:00
Craig Topper	c6a4cbdc04	Custom lower AVX2 variable shift intrinsics to shl/srl/sra nodes and remove the intrinsic patterns. llvm-svn: 144999	2011-11-19 17:46:46 +00:00
Craig Topper	a64e2604a2	Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors. llvm-svn: 144989	2011-11-19 09:02:40 +00:00
Craig Topper	117ffc9a0c	Collapse X86 PSIGNB/PSIGNW/PSIGND node types. llvm-svn: 144988	2011-11-19 07:33:10 +00:00
Craig Topper	536f9d9434	Extend VPBLENDVB and VPSIGN lowering to work for AVX2. llvm-svn: 144987	2011-11-19 07:07:26 +00:00
Craig Topper	0deee76383	Remove unused parameters from the AVX maskmov classes. llvm-svn: 144985	2011-11-19 04:49:22 +00:00

... 3 4 5 6 7 ...

8143 Commits