llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00

Author	SHA1	Message	Date
Chandler Carruth	75ff869d6a	Cleanup and FileCheck-ize a test. llvm-svn: 147603	2012-01-05 11:05:47 +00:00
Victor Umansky	87d5ada510	Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601	2012-01-05 08:46:19 +00:00
Benjamin Kramer	e5589bccdd	FileCheck hygiene. llvm-svn: 147580	2012-01-05 00:43:34 +00:00
NAKAMURA Takumi	6ebbc05c9d	test/CodeGen/X86/jump_sign.ll: Add -mcpu=pentiumpro for non-x86 hosts. It uses "cmov". llvm-svn: 147521	2012-01-04 03:52:23 +00:00
Evan Cheng	caba5d2fc2	For x86, canonicalize max (x > y) ? x : y => (x >= y) ? x : y So for something like (x - y) > 0 : (x - y) ? 0 It will be (x - y) >= 0 : (x - y) ? 0 This makes is possible to test sign-bit and eliminate a comparison against zero. e.g. subl %esi, %edi testl %edi, %edi movl $0, %eax cmovgl %edi, %eax => xorl %eax, %eax subl %esi, $edi cmovsl %eax, %edi rdar://10633221 llvm-svn: 147512	2012-01-04 01:41:39 +00:00
Nadav Rotem	79f1692fe0	Revert 147426 because it caused pr11696. llvm-svn: 147485	2012-01-03 22:19:42 +00:00
Nadav Rotem	cc49c4d74d	Fix incorrect widening of the bitcast sdnode in case the incoming operand is integer-promoted. llvm-svn: 147484	2012-01-03 22:12:28 +00:00
Chad Rosier	afcaa8f38a	Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481	2012-01-03 21:05:52 +00:00
Elena Demikhovsky	40f2c3077f	Fixed a bug in SelectionDAG.cpp. The failure seen on win32, when i64 type is illegal. It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR. The failure message is: llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT \|\| (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed. I added a special test that checks vector shuffle on win32. llvm-svn: 147445	2012-01-03 11:59:04 +00:00
Nadav Rotem	6929a8868b	Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit. llvm-svn: 147426	2012-01-02 08:05:46 +00:00
Craig Topper	f7c9bf17dd	Allow CRC32 instructions to be selected when AVX is enabled. llvm-svn: 147411	2012-01-01 19:51:58 +00:00
Craig Topper	d8ae2d9f27	Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers. llvm-svn: 147409	2012-01-01 19:40:22 +00:00
Rafael Espindola	1cb17796db	Revert 147399. It broke CodeGen/ARM/vext.ll. llvm-svn: 147400	2012-01-01 17:36:23 +00:00
Elena Demikhovsky	9b74049783	Fixed a bug in SelectionDAG.cpp. The failure seen on win32, when i64 type is illegal. It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR. The failure message is: llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT \|\| (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed. I added a special test that checks vector shuffle on win32. llvm-svn: 147399	2012-01-01 16:22:47 +00:00
Craig Topper	0311c45aed	Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load. llvm-svn: 147393	2011-12-31 23:24:49 +00:00
Craig Topper	c01ce759d7	Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected. llvm-svn: 147392	2011-12-31 23:15:11 +00:00
Craig Topper	33091db89a	Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms. llvm-svn: 147361	2011-12-30 02:18:36 +00:00
Craig Topper	e066262284	Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere. llvm-svn: 147360	2011-12-30 01:49:53 +00:00
Eli Friedman	db54b4b68f	Fix type-checking for load transformation which is not legal on floating-point types. PR11674. llvm-svn: 147323	2011-12-28 21:24:44 +00:00
Nadav Rotem	d8c4880903	PR11662. Promotion of the mask operand needs to be done using PromoteTargetBoolean, and not padded with garbage. llvm-svn: 147309	2011-12-28 13:08:20 +00:00
Elena Demikhovsky	9b4613ff14	Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR. Matching MOVLP mask for AVX (265-bit vectors) was wrong. The failure was detected by conformance tests. llvm-svn: 147308	2011-12-28 08:14:01 +00:00
Eli Friedman	064187912e	Make sure DAGCombiner doesn't introduce multiple loads from the same memory location. PR10747, part 2. llvm-svn: 147283	2011-12-26 22:49:32 +00:00
Chandler Carruth	7a5c52fadf	Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the LZCNT instructions are available. Force promotion to i32 to get a smaller encoding since the fix-ups necessary are just as complex for either promoted type We can't do standard promotion for CTLZ when lowering through BSR because it results in poor code surrounding the 'xor' at the end of this instruction. Essentially, if we promote the entire CTLZ node to i32, we end up doing the xor on a 32-bit CTLZ implementation, and then subtracting appropriately to get back to an i8 value. Instead, our custom logic just uses the knowledge of the incoming size to compute a perfect xor. I'd love to know of a way to fix this, but so far I'm drawing a blank. I suspect the legalizer could be more clever and/or it could collude with the DAG combiner, but how... ;] llvm-svn: 147251	2011-12-24 12:12:34 +00:00
Chandler Carruth	82b7a7478b	Add systematic testing for cttz as well, and fix the bug I spotted by inspection earlier. llvm-svn: 147250	2011-12-24 11:46:10 +00:00
Chandler Carruth	b52ba33d0a	Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test. llvm-svn: 147249	2011-12-24 11:26:59 +00:00
Chandler Carruth	800a803717	Tidy up this rather crufty test. Put the declarations at the top to make my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like nounwind. Remove stray uses comments. Name things semantically rather than tN so that adding a new test in the middle doesn't cause pain, and so that new tests can be grouped semantically. This exposes how little systematic testing is going on here. I noticed this by finding several bugs via inspection and wondering why this test wasn't catching any of them. =[ llvm-svn: 147248	2011-12-24 11:26:57 +00:00
Chandler Carruth	48f5be6ce0	Expand more when we have a nice 'tzcnt' instruction, to avoid generating 'bsf' instructions here. This one is actually debatable to my eyes. It's not clear that any chip implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding. Still, this restores the old behavior with 'tzcnt' enabled for now. llvm-svn: 147246	2011-12-24 11:11:38 +00:00
Chandler Carruth	1846086903	Tidy up some of these tests. llvm-svn: 147245	2011-12-24 11:11:36 +00:00
Chandler Carruth	9ef50ef1f7	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chandler Carruth	514920d53b	Cleanup this test a bit, sorting things and grouping them more clearly. llvm-svn: 147243	2011-12-24 10:55:42 +00:00
Elena Demikhovsky	b37883fe87	This is the second fix related to VZEXT_MOVL node. The failure that I see in the current version is: LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14] 0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13] 0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12] 0x18b9870: v4i64 = undef [ID=4] 0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10] 0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9970: i32 = Constant<0> [ID=3] 0x18b9170: v2i64 = undef [ORD=1] [ID=1] 0x18b9570: i32 = Constant<2> [ID=5] llvm-svn: 146975	2011-12-20 13:34:28 +00:00
Chandler Carruth	7564e8371a	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Lang Hames	e32ef23ba8	Make sure that the lower bits on the VSELECT condition are properly set. llvm-svn: 146800	2011-12-17 01:08:46 +00:00
Craig Topper	88e2bfef0a	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. llvm-svn: 146726	2011-12-16 08:06:31 +00:00
Chad Rosier	62ebee9859	Add missing zmovl AVX patterns which were causing crashes. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146689	2011-12-15 22:11:31 +00:00
Chad Rosier	e74b3b1469	Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146684	2011-12-15 21:34:44 +00:00
Lang Hames	d5cee672a7	Set specific target cpu for testcase. llvm-svn: 146678	2011-12-15 20:22:34 +00:00
Lang Hames	0e361e816d	Added test case for r146671. llvm-svn: 146675	2011-12-15 19:56:07 +00:00
Eli Friedman	71c0914b64	Don't try to form FGETSIGN after legalization; it is possible in some cases, but the existing code can't do it correctly. PR11570. llvm-svn: 146630	2011-12-15 02:07:20 +00:00
Chad Rosier	b93733686c	Add support for lowering fneg when AVX is enabled. rdar://10566486 llvm-svn: 146625	2011-12-15 01:02:25 +00:00
Chandler Carruth	2bedf185c9	Manually upgrade the test suite to specify the flag to cttz and ctlz. I followed three heuristics for deciding whether to set 'true' or 'false': - Everything target independent got 'true' as that is the expected common output of the GCC builtins. - If the target arch only has one way of implementing this operation, set the flag in the way that exercises the most of codegen. For most architectures this is also the likely path from a GCC builtin, with 'true' being set. It will (eventually) require lowering away that difference, and then lowering to the architecture's operation. - Otherwise, set the flag differently dependending on which target operation should be tested. Let me know if anyone has any issue with this pattern or would like specific tests of another form. This should allow the x86 codegen to just iteratively improve as I teach the backend how to differentiate between the two forms, and everything else should remain exactly the same. llvm-svn: 146370	2011-12-12 11:59:10 +00:00
Evan Cheng	77f0fb0296	Update test to something more sensible. llvm-svn: 146282	2011-12-09 21:54:10 +00:00
Benjamin Kramer	06cd66b1d7	X86: Add patterns for the various rounding ops for SSE4.1 and AVX. llvm-svn: 146257	2011-12-09 15:44:03 +00:00
Evan Cheng	2c8bac6b4c	Forgot setting -march. llvm-svn: 146244	2011-12-09 06:15:00 +00:00
Evan Cheng	ad8debd736	Add 256-bit variant vmovss and vmovsd patterns. rdar://10538417 llvm-svn: 146196	2011-12-08 22:30:45 +00:00
Evan Cheng	d8a73b8918	Add various missing AVX patterns which was causing crashes. Sadly, the generated code looks pretty bad compared to SSE. rdar://10538793 llvm-svn: 146191	2011-12-08 22:05:28 +00:00
Evan Cheng	0e0e920975	Add test for r146163. llvm-svn: 146167	2011-12-08 19:21:39 +00:00
NAKAMURA Takumi	671c1da473	test/CodeGen/X86/vec_compare-2.ll: Add explicit -mtriple=i686-linux. llvm-svn: 146152	2011-12-08 15:24:09 +00:00
Nadav Rotem	341b30a457	Fix a bug in the integer-promotion of bitcast operations on vector types. We must not issue a bitcast operation for integer-promotion of vector types, because the location of the values in the vector may be different. llvm-svn: 146150	2011-12-08 13:10:01 +00:00
Eli Friedman	9e8d557cd1	Support vector bitcasts in the AsmPrinter. PR11495. llvm-svn: 146001	2011-12-07 00:50:54 +00:00
Eli Friedman	5545db0906	Fix an optimization involving EXTRACT_SUBVECTOR in DAGCombine so it behaves correctly. PR11494. llvm-svn: 145996	2011-12-07 00:11:56 +00:00
Craig Topper	8b05e7d035	Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those. llvm-svn: 145927	2011-12-06 09:04:59 +00:00
Craig Topper	72b41227d8	Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted. llvm-svn: 145921	2011-12-06 04:59:07 +00:00
NAKAMURA Takumi	c6a187dfdd	test/CodeGen/X86/pointer-vector.ll: Add explicit -mtriple=i686-linux. llvm-svn: 145805	2011-12-05 07:54:57 +00:00
Nadav Rotem	1a91e4381d	Add support for vectors of pointers. llvm-svn: 145801	2011-12-05 06:29:09 +00:00
Sanjoy Das	fe35e107cd	Check for stack space more intelligently. libgcc sets the stack limit field in TCB to 256 bytes above the actual allocated stack limit. This means if the function's stack frame needs less than 256 bytes, we can just compare the stack pointer with the stack limit. This should result in lesser calls to __morestack. llvm-svn: 145766	2011-12-03 09:32:07 +00:00
Sanjoy Das	d1c3d82afe	Fix a bug in the x86-32 code generated for segmented stacks. Currently LLVM pads the call to __morestack with a add and sub of 8 bytes to esp. This isn't correct since __morestack expects the call to be followed directly by a ret. This commit also adjusts the relevant test-case. llvm-svn: 145765	2011-12-03 09:21:07 +00:00
Craig Topper	d381116357	Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors. llvm-svn: 145680	2011-12-02 07:16:01 +00:00
Eric Christopher	4aa8024569	For 64-bit the rest of the general regs are ok for the q constraint. Make sure we can emit both the high and low versions of those registers. Fixes rdar://10392864 llvm-svn: 145579	2011-12-01 08:12:41 +00:00
Eli Friedman	ad916965a6	Pass AVX vectors which are arguments to varargs functions on the stack. <rdar://problem/10463281>. llvm-svn: 145573	2011-12-01 04:49:21 +00:00
Jan Sjödin	2dfb343ffa	Support for encoding all FMA4 instructions and tablegen patterns for all remaining FMA4 instructions and intrinsics with tests. llvm-svn: 145525	2011-11-30 22:09:42 +00:00
Nadav Rotem	148d347bb7	Add test arch to make it pass on non x86 targets llvm-svn: 145498	2011-11-30 17:34:28 +00:00
Nadav Rotem	bef49f31be	Add a tripple to the test llvm-svn: 145489	2011-11-30 11:20:56 +00:00
Nadav Rotem	f8e096f4ee	X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479). llvm-svn: 145488	2011-11-30 10:13:37 +00:00
Evan Cheng	5c1efd630b	Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins. llvm-svn: 145448	2011-11-29 22:48:34 +00:00
Jakob Stoklund Olesen	5d6a4584d9	Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions. Like V_SET0, these instructions are expanded by ExpandPostRA to xorps / vxorps so they can participate in execution domain swizzling. This also makes the AVX variants redundant. llvm-svn: 145440	2011-11-29 22:27:25 +00:00
Elena Demikhovsky	735cff1fa8	Fixed vsqrt.ss intrinsic usage - order of input operands was wrong. Added a test. Thanks Bruno for reviewing the patch. llvm-svn: 145403	2011-11-29 15:00:45 +00:00
Craig Topper	637e60afdd	Fix shuffle decoding for memory forms for (V)SHUFPS/D. llvm-svn: 145392	2011-11-29 07:58:09 +00:00
Craig Topper	4550fc2649	Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390	2011-11-29 07:49:05 +00:00
Craig Topper	aca91b9f14	Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled. llvm-svn: 145376	2011-11-29 05:37:58 +00:00
Craig Topper	a6c1d25798	Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2. llvm-svn: 145370	2011-11-29 03:57:34 +00:00
Evan Cheng	1435aa5fdc	Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead. Conservatively returns zero when the GV does not specify an alignment nor is it initialized. Previously it returns ABI alignment for type of the GV. However, if the type is a "packed" type, then the under-specified alignments is attached to the load / store instructions. In that case, the alignment of the type cannot be trusted. rdar://10464621 llvm-svn: 145300	2011-11-28 22:37:34 +00:00
Evan Cheng	567aa3dfb3	DAG combine should not increase alignment of loads / stores with alignment less than ABI alignment. These are loads / stores from / to "packed" data structures. Their alignments are intentionally under-specified. rdar://10301431 llvm-svn: 145273	2011-11-28 20:42:56 +00:00
Craig Topper	6f5a0bc4e3	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Chandler Carruth	bb4c250613	Take two on rotating the block ordering of loops. My previous attempt was centered around the premise of laying out a loop in a chain, and then rotating that chain. This is good for preserving contiguous layout, but bad for actually making sane rotations. In order to keep it safe, I had to essentially make it impossible to rotate deeply nested loops. The information needed to correctly reason about a deeply nested loop is actually available -- before we layout the loop. We know the inner loops are already fused into chains, etc. We lose information the moment we actually lay out the loop. The solution was the other alternative for this algorithm I discussed with Benjamin and some others: rather than rotating the loop after-the-fact, try to pick a profitable starting block for the loop's layout, and then use our existing layout logic. I was worried about the complexity of this "pick" step, but it turns out such complexity is needed to handle all the important cases I keep teasing out of benchmarks. This is, I'm afraid, a bit of a work-in-progress. It is still misbehaving on some likely important cases I'm investigating in Olden. It also isn't really tested. I'm going to try to craft some interesting nested-loop test cases, but it's likely to be extremely time consuming and I don't want to go there until I'm sure I'm testing the correct behavior. Sadly I can't come up with a way of getting simple, fine grained test cases for this logic. We need complex loop structures to even trigger much of it. llvm-svn: 145183	2011-11-27 13:34:33 +00:00
Chandler Carruth	e6374e6953	Rework a bit of the implementation of loop block rotation to not rely so heavily on AnalyzeBranch. That routine doesn't behave as we want given that rotation occurs mid-way through re-ordering the function. Instead merely check that there are not unanalyzable branching constructs present, and then reason about the CFG via successor lists. This actually simplifies my mental model for all of this as well. The concrete result is that we now will rotate more loop chains. I've added a test case from Olden highlighting the effect. There is still a bit more to do here though in order to regain all of the performance in Olden. llvm-svn: 145179	2011-11-27 09:22:53 +00:00
Chris Lattner	84bf52737a	remove autoupgrade support for old forms of llvm.prefetch and the old trampoline forms. Both of these were correct in LLVM 3.0, and we don't need to support LLVM 2.9 and earlier in mainline. llvm-svn: 145174	2011-11-27 07:42:04 +00:00
Chris Lattner	9d1e8420ff	Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic. llvm-svn: 145171	2011-11-27 06:54:59 +00:00
Chris Lattner	8067661775	remove some old autoupgrade logic llvm-svn: 145167	2011-11-27 06:10:54 +00:00
Chandler Carruth	0d073febb6	Introduce a loop block rotation optimization to the new block placement pass. This is designed to achieve one of the important optimizations that the old code placement pass did, but more simply. This is a somewhat rough and very conservative version of the transform. We could get a lot fancier here if there are profitable cases to do so. In particular, this only looks for a single pattern, it insists that the loop backedge being rotated away is the last backedge in the chain, and it doesn't provide any means of doing better in-loop placement due to the rotation. However, it appears that it will handle the important loops I am finding in the LLVM test suite. llvm-svn: 145158	2011-11-27 00:38:03 +00:00
Eli Friedman	448be745f6	Fix APFloat::convert so that it handles narrowing conversions correctly; it was returning incorrect values in rare cases, and incorrectly marking exact conversions as inexact in some more common cases. Fixes PR11406, and a missed optimization in test/CodeGen/X86/fp-stack-O0.ll. llvm-svn: 145141	2011-11-26 03:38:02 +00:00
Bruno Cardoso Lopes	626d04cc6f	This patch contains support for encoding FMA4 instructions and tablegen patterns for scalar FMA4 operations and intrinsic. Also add tests for vfmaddsd. Patch by Jan Sjodin llvm-svn: 145133	2011-11-25 19:33:42 +00:00
Craig Topper	e761f42368	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Chandler Carruth	f6e96b54f8	Fix a silly use-after-free issue. A much earlier version of this code need lots of fanciness around retaining a reference to a Chain's slot in the BlockToChain map, but that's all gone now. We can just go directly to allocating the new chain (which will update the mapping for us) and using it. Somewhat gross mechanically generated test case replicates the issue Duncan spotted when actually testing this out. llvm-svn: 145120	2011-11-24 11:23:15 +00:00
Chandler Carruth	1d3f68ffd0	When adding blocks to the list of those which no longer have any CFG conflicts, we should only be adding the first block of the chain to the list, lest we try to merge into the middle of that chain. Most of the places we were doing this we already happened to be looking at the first block, but there is no reason to assume that, and in some cases it was clearly wrong. I've added a couple of tests here. One already worked, but I like having an explicit test for it. The other is reduced from a test case Duncan reduced for me and used to crash. Now it is handled correctly. llvm-svn: 145119	2011-11-24 08:46:04 +00:00
Benjamin Kramer	825892c47f	X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. Before: movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00] testq %rax, %rdi ## encoding: [0x48,0x85,0xf8] jne LBB0_2 ## encoding: [0x75,A] After: btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20] jb LBB0_2 ## encoding: [0x72,A] btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off saving one register and a giant movabsq. llvm-svn: 145103	2011-11-23 13:54:17 +00:00
NAKAMURA Takumi	364c7c80ec	test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet. llvm-svn: 145101	2011-11-23 12:18:22 +00:00
Chandler Carruth	dcf04fc35a	Relax an invariant that block placement was trying to assert a bit further. This invariant just wasn't going to work in the face of unanalyzable branches; we need to be resillient to the phenomenon of chains poking into a loop and poking out of a loop. In fact, we already were, we just needed to not assert on it. This was found during a bootstrap with block placement turned on. llvm-svn: 145100	2011-11-23 10:35:36 +00:00
Elena Demikhovsky	681438f0b2	I added several lines in X86 code generator that allow to choose VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask. The patch was reviewed by Bruno. llvm-svn: 145099	2011-11-23 10:23:16 +00:00
Chandler Carruth	c175221b2d	Handle the case of a no-return invoke correctly. It actually still has successors, they just are all landing pad successors. We handle this the same way as no successors. Comments attached for the next person to wade through here and another lovely test case courtesy of Benjamin Kramer's bugpoint reduction. llvm-svn: 145098	2011-11-23 08:23:54 +00:00
Bob Wilson	8722a53e52	Enable stack protectors for all arrays, not just char arrays. rdar://5875909 Patch by Bill Wendling. llvm-svn: 145097	2011-11-23 07:13:56 +00:00
Jakob Stoklund Olesen	2b76f5685c	Fix PR11422. This was a bug in keeping track of the available domains when merging domain values. The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr to the integer domain which is only available in AVX2. Also add an assertion to catch future attempts at emitting AVX2 instructions. llvm-svn: 145096	2011-11-23 04:03:08 +00:00
Chandler Carruth	a357b0a5ee	Fix a crash in block placement due to an inner loop that happened to be reversed in the function's original ordering, and we happened to encounter it while handling an outer unnatural CFG structure. Thanks to the test case reduced from GCC's source by Benjamin Kramer. This may also fix a crasher in gzip that Duncan reduced for me, but I haven't yet gotten to testing that one. llvm-svn: 145094	2011-11-23 03:03:21 +00:00
Chandler Carruth	59f0abf50e	Fix a devilish miscompile exposed by block placement. The updateTerminator code didn't correctly handle EH terminators in one very specific case. AnalyzeBranch would find no terminator instruction, and so the fallback in updateTerminator is to assume fallthrough. This is correct, but the destination of the fallthrough was assumed to be the first successor. This is almost always true, but in certain cases the loop transformations will cause the landing pad to be the first successor! Instead of this brittle logic, actually look through the successors for a non-landing-pad accessor, and to assert if more than one is found. This will hopefully fix some (if not all) of the self host miscompiles with block placement. Thanks to Benjamin Kramer for reporting, Nick Lewycky for an initial stab at a reduction, and Duncan for endless advice on EH (which I know nothing about) as well as reviewing the actual fix. llvm-svn: 145062	2011-11-22 13:13:16 +00:00
Rafael Espindola	05f88add77	Add triple to the test. llvm-svn: 145057	2011-11-22 06:36:25 +00:00
Rafael Espindola	b45f6ea527	If a register is both an early clobber and part of a tied use, handle the use before the clobber so that we copy the value if needed. Fixes pr11415. llvm-svn: 145056	2011-11-22 06:27:18 +00:00
Craig Topper	866214a486	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	05db995317	Test case for r145026 llvm-svn: 145027	2011-11-21 06:58:09 +00:00
Craig Topper	62ae335144	Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled. llvm-svn: 145022	2011-11-21 01:12:36 +00:00
NAKAMURA Takumi	0738f386b4	test/CodeGen/X86/block-placement.ll: Relax expressions for Win32. llvm-svn: 145011	2011-11-20 12:49:45 +00:00
Chandler Carruth	aac8e5082a	The logic for breaking the CFG in the presence of hot successors didn't properly account for the global probability of the edge being taken. This manifested as a very large number of unconditional branches to blocks being merged against the CFG even though they weren't particularly hot within the CFG. The fix is to check whether the edge being merged is both locally hot relative to other successors for the source block, and globally hot compared to other (unmerged) predecessors of the destination block. This introduces a new crasher on GCC single-source, but it's currently behind a flag, and Ben has offered to work on the reduction. =] llvm-svn: 145010	2011-11-20 11:22:06 +00:00
Chandler Carruth	fa7964c81f	Add some comments to the latest test case I added here to document what is actually being tested. Also add some FileCheck goodness to much more carefully ensure that the result is the desired result. Before this test would only have failed through an assert failure if the underlying fix were reverted. Also, add some weight metadata and a comment explaining exactly what is going on to a trick section of the test case. Originally, we were getting very unlucky and trying to form a block chain that isn't actually profitable. I'm working on a fix to avoid forming these unprofitable chains, and that would also have masked any failure from this test case. The easy solution is to add some metadata that makes it really profitable to form the bad chain here. llvm-svn: 145006	2011-11-20 09:30:40 +00:00
Craig Topper	e878c775cf	Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine. llvm-svn: 145005	2011-11-20 00:12:05 +00:00
Craig Topper	6ed413c495	Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled. llvm-svn: 145004	2011-11-19 22:34:59 +00:00
Chandler Carruth	f24d3f8fc7	Move the handling of unanalyzable branches out of the loop-driven chain formation phase and into the initial walk of the basic blocks. We essentially pre-merge all blocks where unanalyzable fallthrough exists, as we won't be able to update the terminators effectively after any reorderings. This is quite a bit more principled as there may be CFGs where the second half of the unanalyzable pair has some analyzable predecessor that gets placed first. Then it may get placed next, implicitly breaking the unanalyzable branch even though we never even looked at the part that isn't analyzable. I've included a test case that triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize some more general ones as I dig into related issues. Also, to make this new scheme work we have to be able to handle branches into the middle of a chain, so add this check. We always fallback on the incoming ordering. Finally, this starts to really underscore a known limitation of the current implementation -- we don't consider broken predecessors when merging successors. This can caused major missed opportunities, and is something I'm planning on looking at next (modulo more bug reports). llvm-svn: 144994	2011-11-19 10:26:02 +00:00
Craig Topper	44b06b0096	Test cases for SSSE3/AVX integer horizontal add/sub. llvm-svn: 144990	2011-11-19 09:03:33 +00:00
Craig Topper	536f9d9434	Extend VPBLENDVB and VPSIGN lowering to work for AVX2. llvm-svn: 144987	2011-11-19 07:07:26 +00:00
Nadav Rotem	08f8a75c2c	Add AVX2 vpbroadcast support llvm-svn: 144967	2011-11-18 02:49:55 +00:00
Devang Patel	a0973b0c53	DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange. llvm-svn: 144937	2011-11-17 23:43:15 +00:00
Eli Friedman	51adc2ea5a	Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393. llvm-svn: 144863	2011-11-16 23:50:22 +00:00
Evan Cheng	5bae2333cb	Another missing X86ISD::MOVLPD pattern. rdar://10450317 llvm-svn: 144839	2011-11-16 22:24:44 +00:00
Evan Cheng	5242b6aaa1	Disable expensive two-address optimizations at -O0. rdar://10453055 llvm-svn: 144806	2011-11-16 18:44:48 +00:00
Eli Friedman	c2a46f1a90	Fix testcase. llvm-svn: 144769	2011-11-16 03:03:52 +00:00
Eli Friedman	1f3d774ba4	CONCAT_VECTORS can have more than two operands. PR11389. llvm-svn: 144768	2011-11-16 02:52:39 +00:00
Nadav Rotem	63be4a26a9	AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code. llvm-svn: 144720	2011-11-15 22:50:37 +00:00
NAKAMURA Takumi	f99c0f0fcd	test/CodeGen/X86/dec-eflags-lower.ll: Relax expression for win32 x64. llvm-svn: 144714	2011-11-15 22:30:37 +00:00
Pete Cooper	8441c08e0b	Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used by later instructions. Only done for DEC64m right now. Fixes <rdar://problem/6172640> llvm-svn: 144705	2011-11-15 21:57:53 +00:00
Rafael Espindola	95f4e0c409	We currently use a callback to handle an IL pass deleting a BB that still has a reference to it. Unfortunately, that doesn't work for codegen passes since we don't get notified of MBB's being deleted (the original BB stays). Use that fact to our advantage and after printing a function, check if any of the IL BBs corresponds to a symbol that was not printed. This fixes pr11202. llvm-svn: 144674	2011-11-15 19:08:46 +00:00
Jakob Stoklund Olesen	606d4e999b	Revert r144611 and r144613. These tests are actually correct, clang was miscompiling ExeDepsFix::processUses. Evan fixed the miscompilation in r144628. llvm-svn: 144630	2011-11-15 07:13:03 +00:00
Chandler Carruth	fdcba17bec	Rather than trying to use the loop block sequence or the function block sequence when recovering from unanalyzable control flow constructs, always use the function sequence. I'm not sure why I ever went down the path of trying to use the loop sequence, it is fundamentally not the correct sequence to use. We're trying to preserve the incoming layout in the cases of unreasonable control flow, and that is only encoded at the function level. We already have a filter to select exactly the sub-set of blocks within the function that we're trying to form into a chain. The resulting code layout is also significantly better because of this. In several places we were ending up with completely unreasonable control flow constructs due to the ordering chosen by the loop structure for its internal storage. This change removes a completely wasteful vector of basic blocks, saving memory allocation in the common case even though it costs us CPU in the fairly rare case of unnatural loops. Finally, it fixes the latest crasher reduced out of GCC's single source. Thanks again to Benjamin Kramer for the reduction, my bugpoint skills failed at it. llvm-svn: 144627	2011-11-15 06:26:43 +00:00
Craig Topper	a584521daa	Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370. llvm-svn: 144622	2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen	a0ceee932a	Really fix test. llvm-svn: 144613	2011-11-15 03:17:01 +00:00
Jakob Stoklund Olesen	a4ad7080ff	Allow for depencendy-breaking instructions before cvt*. This should unbreak clang-x86_64-darwin10-RA, but I can't actually reproduce the failure. llvm-svn: 144611	2011-11-15 02:29:48 +00:00
Jakob Stoklund Olesen	2709f65821	Break false dependencies before partial register updates. Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix about instructions with partial register updates causing false unwanted dependencies. The ExecutionDepsFix pass will break the false dependencies if the updated register was written in the previoius N instructions. The small loop added to sse-domains.ll runs twice as fast with dependency-breaking instructions inserted. llvm-svn: 144602	2011-11-15 01:15:30 +00:00
Evan Cheng	2034ff3b0b	Add a missing pattern for X86ISD::MOVLPD. rdar://10436044 llvm-svn: 144566	2011-11-14 20:35:52 +00:00
Evan Cheng	f19d257488	Teach two-address pass to re-schedule two-address instructions (or the kill instructions of the two-address operands) in order to avoid inserting copies. This fixes the few regressions introduced when the two-address hack was disabled (without regressing the improvements). rdar://10422688 llvm-svn: 144559	2011-11-14 19:48:55 +00:00
Pete Cooper	c9d6834f38	Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered Constant idx case is still done in tablegen but other cases are then expanded Fixes <rdar://problem/10435460> llvm-svn: 144557	2011-11-14 19:38:42 +00:00
Chandler Carruth	462bb16130	Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on the sum of the edge weights not overflowing uint32, and crashed when they did. This is generally safe as BranchProbabilityInfo tries to provide this guarantee. However, the CFG can get modified during codegen in a way that grows the sum of the edge weights. This doesn't seem unreasonable (imagine just adding more blocks all with the default weight of 16), but it is hard to come up with a case that actually triggers 32-bit overflow. Fortuately, the single-source GCC build is good at this. The solution isn't very pretty, but its no worse than the previous code. We're already summing all of the edge weights on each query, we can sum them, check for an overflow, compute a scale, and sum them again. I've included a greatly reduced test case out of the GCC source that triggers it. It's a pretty lame test, as it clearly is just barely triggering the overflow. I'd like to have something that is much more definitive, but I don't understand the fundamental pattern that triggers an explosion in the edge weight sums. The buggy code is duplicated within this file. I'll colapse them into a single implementation in a subsequent commit. llvm-svn: 144526	2011-11-14 08:50:16 +00:00
Chandler Carruth	b7f21af176	Teach machine block placement to cope with unnatural loops. These don't get loop info structures associated with them, and so we need some way to make forward progress selecting and placing basic blocks. The technique used here is pretty brutal -- it just scans the list of blocks looking for the first unplaced candidate. It keeps placing blocks like this until the CFG becomes tractable. The cost is somewhat unfortunate, it requires allocating a vector of all basic block pointers eagerly. I have some ideas about how to simplify and optimize this, but I'm trying to get the logic correct first. Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly there are other bugs that GCC is tickling that I'm reducing and working on now. llvm-svn: 144516	2011-11-14 00:00:35 +00:00
Chandler Carruth	e67c92282f	Rewrite #3 of machine block placement. This is based somewhat on the second algorithm, but only loosely. It is more heavily based on the last discussion I had with Andy. It continues to walk from the inner-most loop outward, but there is a key difference. With this algorithm we ensure that as we visit each loop, the entire loop is merged into a single chain. At the end, the entire function is treated as a "loop", and merged into a single chain. This chain forms the desired sequence of blocks within the function. Switching to a single algorithm removes my biggest problem with the previous approaches -- they had different behavior depending on which system triggered the layout. Now there is exactly one algorithm and one basis for the decision making. The other key difference is how the chain is formed. This is based heavily on the idea Andy mentioned of keeping a worklist of blocks that are viable layout successors based on the CFG. Having this set allows us to consistently select the best layout successor for each block. It is expensive though. The code here remains very rough. There is a lot that needs to be done to clean up the code, and to make the runtime cost of this pass much lower. Very much WIP, but this was a giant chunk of code and I'd rather folks see it sooner than later. Everything remains behind a flag of course. I've added a couple of tests to exercise the issues that this iteration was motivated by: loop structure preservation. I've also fixed one test that was exhibiting the broken behavior of the previous version. llvm-svn: 144495	2011-11-13 11:20:44 +00:00
Jakob Stoklund Olesen	3eaaa93104	Remove the -color-ss-with-regs option. It was off by default. The new register allocators don't have the problems that made it necessary to reallocate registers during stack slot coloring. llvm-svn: 144481	2011-11-13 00:31:23 +00:00
Jakob Stoklund Olesen	4aa9c6888f	Linear scan is going away. llvm-svn: 144472	2011-11-12 22:39:34 +00:00
Jakob Stoklund Olesen	43b7a3871b	Remove obsolete test. This test was committed with a bugfix to RemoveCopyByCommutingDef, but that optimization is no longer triggered by this test. llvm-svn: 144470	2011-11-12 22:39:27 +00:00
Jakob Stoklund Olesen	6a290484cb	Remove obsolete test. This test is for a very specific LocalRewriter bug. LocalRewriter is going away. llvm-svn: 144469	2011-11-12 22:39:24 +00:00
Jakob Stoklund Olesen	005eabf28a	Remove obsolete test. I don't think this test does what is was supposed to do, and LocalRewriter is going away anyway. llvm-svn: 144463	2011-11-12 20:37:57 +00:00
Jakob Stoklund Olesen	c11d7a9b4d	Eliminate more linear scan tests. llvm-svn: 144462	2011-11-12 20:35:26 +00:00
Jakob Stoklund Olesen	0fe59856fd	Switch a couple -O0 tests to RABasic. llvm-svn: 144461	2011-11-12 20:11:04 +00:00
Jakob Stoklund Olesen	f8fed2a3a7	Delete old test of a VirtRegRewriter feature. This test doesn't expose the issue with RAGreedy. I filed PR11363 to track the missing InlineSpiller feature. llvm-svn: 144459	2011-11-12 19:53:48 +00:00
Jakob Stoklund Olesen	49118cf9a5	Remove old test that doesn't make sense. The test is checking that the output doesn't contains any 'mov ' strings. It does contain movl, though. llvm-svn: 144458	2011-11-12 19:53:45 +00:00
Craig Topper	0458cdf64a	Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code. llvm-svn: 144457	2011-11-12 09:58:49 +00:00
Craig Topper	50df7c3842	Add lowering for AVX2 shift instructions. llvm-svn: 144380	2011-11-11 07:39:23 +00:00
NAKAMURA Takumi	ea14fd81c6	test/CodeGen/X86/lsr-loop-exit-cond.ll: Try to appease linux and freebsd bots to specify explicit -mtriple=x86_64-darwin. I guess it expects -relocation-model=pic. llvm-svn: 144290	2011-11-10 14:18:59 +00:00
Evan Cheng	4760ff0763	Use a bigger hammer to fix PR11314 by disabling the "forcing two-address instruction lower optimization" in the pre-RA scheduler. The optimization, rather the hack, was done before MI use-list was available. Now we should be able to implement it in a better way, perhaps in the two-address pass until a MI scheduler is available. Now that the scheduler has to backtrack to handle call sequences. Adding artificial scheduling constraints is just not safe. Furthermore, the hack is not taking all the other scheduling decisions into consideration so it's just as likely to pessimize code. So I view disabling this optimization goodness regardless of PR11314. llvm-svn: 144267	2011-11-10 07:43:16 +00:00
Jakob Stoklund Olesen	bc48cd34b6	Strip old implicit operands after foldMemoryOperand. The TII.foldMemoryOperand hook preserves implicit operands from the original instruction. This is not what we want when those implicit operands refer to the register being spilled. Implicit operands referring to other registers are preserved. This fixes PR11347. llvm-svn: 144247	2011-11-10 00:17:03 +00:00
Nadav Rotem	ddc6bfa543	AVX2: Add patterns for variable shift operations llvm-svn: 144212	2011-11-09 21:22:13 +00:00
Duncan Sands	2934a0eaeb	Speculatively revert commit 144124 (djg) in the hope that the 32 bit dragonegg self-host buildbot will recover (it is complaining about object files differing between different build stages). Original commit message: Add a hack to the scheduler to disable pseudo-two-address dependencies in basic blocks containing calls. This works around a problem in which these artificial dependencies can get tied up in calling seqeunce scheduling in a way that makes the graph unschedulable with the current approach of using artificial physical register dependencies for calling sequences. This fixes PR11314. llvm-svn: 144188	2011-11-09 14:20:48 +00:00
Nadav Rotem	e66a72a2c4	Add AVX2 support for vselect of v32i8 llvm-svn: 144187	2011-11-09 13:21:28 +00:00
Craig Topper	432dd8d623	Enable execution dependency fix pass for YMM registers when AVX2 is enabled. Add AVX2 logical operations to list of replaceable instructions. llvm-svn: 144179	2011-11-09 09:37:21 +00:00
Craig Topper	7ff77dc2b1	Add instruction selection for AVX2 integer comparisons. llvm-svn: 144176	2011-11-09 08:06:13 +00:00
Craig Topper	d82abb7156	Add AVX2 instruction lowering for add, sub, and mul. llvm-svn: 144174	2011-11-09 07:28:55 +00:00
Jakob Stoklund Olesen	1239fed1e2	Collapse DomainValues across loop back-edges. During the initial RPO traversal of the basic blocks, remember the ones that are incomplete because of back-edges from predecessors that haven't been visited yet. After the initial RPO, revisit all those loop headers so the incoming DomainValues on the back-edges can be properly collapsed. This will properly fix execution domains on software pipelined code, like the included test case. llvm-svn: 144151	2011-11-09 01:06:56 +00:00
Dan Gohman	b6cf7c4e94	Add a hack to the scheduler to disable pseudo-two-address dependencies in basic blocks containing calls. This works around a problem in which these artificial dependencies can get tied up in calling seqeunce scheduling in a way that makes the graph unschedulable with the current approach of using artificial physical register dependencies for calling sequences. This fixes PR11314. llvm-svn: 144124	2011-11-08 21:29:06 +00:00
Pete Cooper	a1c151814a	Adding test for machine-licm operating on invariant load instructions llvm-svn: 144104	2011-11-08 19:06:53 +00:00
NAKAMURA Takumi	e7c7964113	test/CodeGen/X86/vec_shuffle-39.ll: Add explicit -mtriple=x86_64-linux. Passing packed value is not compatible on Win32 x64. llvm-svn: 144068	2011-11-08 03:46:39 +00:00
NAKAMURA Takumi	7094a0d830	test/CodeGen/X86/vec_shuffle-38.ll: Relax expression for Win32 x64. llvm-svn: 144067	2011-11-08 03:46:32 +00:00
NAKAMURA Takumi	8bc13fe0b2	test/CodeGen/X86/vec_shuffle.ll: Add explicit -mtriple=i686-linux. We may see some suboptimal frame (%ebp) emission on certain hosts. Possible [PR11031] llvm-svn: 144066	2011-11-08 03:46:25 +00:00
Eli Friedman	741d364aa9	Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318. Re-commit of r144034, with an extra fix so that RemoveDeadNode doesn't blow up. llvm-svn: 144055	2011-11-08 01:25:24 +00:00
Evan Cheng	4a63100fe3	Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222 llvm-svn: 144052	2011-11-08 00:31:58 +00:00
Bill Wendling	788df1dca1	Convert to the new EH model. llvm-svn: 144049	2011-11-08 00:17:28 +00:00
Bill Wendling	16499170c2	Convert tests to the new EH model. llvm-svn: 144048	2011-11-08 00:09:27 +00:00
Pete Cooper	2f5c35ae89	Added missing newline llvm-svn: 144046	2011-11-08 00:03:24 +00:00
Eli Friedman	8d138bf571	Revert r144034 while I try to track down a crash. llvm-svn: 144044	2011-11-07 23:53:20 +00:00
Jakob Stoklund Olesen	1900a5f521	Fix test for Windows as well. llvm-svn: 144038	2011-11-07 23:10:43 +00:00
Jakob Stoklund Olesen	9380d5daff	Kill and collapse outstanding DomainValues. DomainValues that are only used by "don't care" instructions are now collapsed to the first possible execution domain after all basic blocks have been processed. This typically means the PS domain on x86. For example, the vsel_i64 and vsel_double functions in sse2-blend.ll are completely collapsed to the PS domain instead of containing a mix of execution domains created by isel. llvm-svn: 144037	2011-11-07 23:08:21 +00:00
Pete Cooper	1d5d364e06	InstCombine now optimizes vector udiv by power of 2 to shifts Fixes r8429 llvm-svn: 144036	2011-11-07 23:04:49 +00:00
Eli Friedman	c1bb1b2b09	Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318. llvm-svn: 144034	2011-11-07 22:51:10 +00:00
Jakob Stoklund Olesen	d33a581d93	Fix test for Linux. llvm-svn: 144003	2011-11-07 20:47:23 +00:00
Jakob Stoklund Olesen	b53be3a67d	Expand V_SET0 to xorps by default. The xorps instruction is smaller than pxor, so prefer that encoding. The ExecutionDepsFix pass will switch the encoding to pxor and xorpd when appropriate. llvm-svn: 143996	2011-11-07 19:15:58 +00:00
Craig Topper	7eab73f510	Add AVX2 variable shift instructions and intrinsics. llvm-svn: 143915	2011-11-07 08:26:24 +00:00
Craig Topper	b1ef950217	Add AVX2 VPMOVMASK instructions and intrinsics. llvm-svn: 143904	2011-11-07 03:20:35 +00:00
Craig Topper	d422190c0f	Add AVX2 VEXTRACTI128 and VINSERTI128 instructions. Fix VPERM2I128 to be qualified with HasAVX2 instead of HasAVX. Mark VINSERTF128 and VEXTRACTF128 as never having side effects. llvm-svn: 143902	2011-11-07 02:00:04 +00:00
Craig Topper	01b852b95a	More AVX2 instructions and their intrinsics. llvm-svn: 143895	2011-11-06 23:04:08 +00:00
Craig Topper	31b1d79474	Add more AVX2 instructions and intrinsics. llvm-svn: 143861	2011-11-06 06:12:20 +00:00
Benjamin Kramer	4c8932e3b8	Add an option to pad an uleb128 to MCObjectWriter and remove the uleb128 encoding from the DWARF asm printer. As a side effect we now print dwarf ulebs with .ascii directives. llvm-svn: 143809	2011-11-05 11:52:44 +00:00
Eli Friedman	1478b657c8	Enhanced vzeroupper insertion pass that avoids inserting vzeroupper where it is unnecessary through local analysis. Patch from Bruno Cardoso Lopes, with some additional changes. I'm going to wait for any review comments and perform some additional testing before turning this on by default. llvm-svn: 143750	2011-11-04 23:46:11 +00:00
Craig Topper	6ae8fe6fbe	Add intrinsics for X86 vcvtps2ph and vcvtph2ps instructions llvm-svn: 143682	2011-11-04 06:59:21 +00:00
Dan Gohman	a5f382da8b	Reapply r143206, with fixes. Disallow physical register lifetimes across calls, and only check for nested dependences on the special call-sequence-resource register. llvm-svn: 143660	2011-11-03 21:49:52 +00:00
Pete Cooper	ad3d5b2eee	Reverted r143600 - selector reference change llvm-svn: 143646	2011-11-03 20:47:50 +00:00
Craig Topper	124b2fd08c	Add new X86 AVX2 VBROADCAST instructions. llvm-svn: 143612	2011-11-03 07:35:53 +00:00
Pete Cooper	c8a657a2b2	Treat objc selector reference globals as invariant so that MachineLICM can hoist them out of loops. Fixes <rdar://problem/6027699> llvm-svn: 143600	2011-11-03 00:56:36 +00:00
Nick Lewycky	691d7f80c2	Don't emit a directory entry for the value in DW_AT_comp_dir, that is always implied by directory index zero. llvm-svn: 143570	2011-11-02 20:55:33 +00:00
Craig Topper	a2a55bd0b4	More AVX2 instructions and intrinsics. llvm-svn: 143536	2011-11-02 06:54:17 +00:00
Craig Topper	c5482eb697	Add a bunch more X86 AVX2 instructions and their corresponding intrinsics. llvm-svn: 143529	2011-11-02 04:42:13 +00:00
Eli Friedman	c60a0ad611	Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value. Fixes PR11289. llvm-svn: 143498	2011-11-01 21:18:39 +00:00
Craig Topper	361c873b52	Fix operand type for x86 pmadd_ub_sw intrinsic. llvm-svn: 143455	2011-11-01 07:25:22 +00:00
Craig Topper	dbf10927d7	Fix operand type for int_x86_ssse3_phadd_sw_128 intrinsic llvm-svn: 143336	2011-10-31 07:16:37 +00:00
Craig Topper	c0f93132bd	Test case for X86 FS/GS Base intrinsics llvm-svn: 143332	2011-10-31 02:15:47 +00:00
Craig Topper	6eaf58df7c	Begin adding AVX2 instructions. No selection support yet other than intrinsics. llvm-svn: 143331	2011-10-31 02:15:10 +00:00
Nick Lewycky	7308946be2	Switch new .file directive emission off by default, change llc's flag for it to -enable-dwarf-directory. llvm-svn: 143326	2011-10-31 01:06:02 +00:00
Benjamin Kramer	c0001c42c6	X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in. llvm-svn: 143315	2011-10-30 17:31:21 +00:00
Craig Topper	e77289b243	Fix return type for X86 mpsadbw instrinsic. The instruction takes in a vector of 8-bit integers, but produces a vector of 16-bit integers. llvm-svn: 143313	2011-10-30 17:22:45 +00:00
Nadav Rotem	8282fc9e3b	Fix pr11266. On x86: (shl V, 1) -> add V,V Hardware support for vector-shift is sparse and in many cases we scalarize the result. Additionally, on sandybridge padd is faster than shl. llvm-svn: 143311	2011-10-30 13:24:22 +00:00
Nadav Rotem	68400d352b	Stabilize the test by specifying an exact cpu target llvm-svn: 143307	2011-10-30 08:07:50 +00:00
Nadav Rotem	6c79131e39	Add a new DAGCombine optimization for BUILD_VECTOR. If all of the inputs are zero/any_extended, create a new simple BV which can be further optimized by other BV optimizations. llvm-svn: 143297	2011-10-29 21:23:04 +00:00
Benjamin Kramer	24c4266ada	Force SSE for this test. llvm-svn: 143291	2011-10-29 19:43:44 +00:00
Dan Gohman	826cec9a4b	Revert r143206, as there are still some failing tests. llvm-svn: 143262	2011-10-29 00:41:52 +00:00
Dan Gohman	dedcc22bcd	Reapply r143177 and r143179 (reverting r143188), with scheduler fixes: Use a separate register, instead of SP, as the calling-convention resource, to avoid spurious conflicts with actual uses of SP. Also, fix unscheduling of calling sequences, which can be triggered by pseudo-two-address dependencies. llvm-svn: 143206	2011-10-28 17:55:38 +00:00
NAKAMURA Takumi	bcfac720a7	Dwarf: [PR11022] Fix emitting DW_AT_const_value(>i64), to be host-endian-neutral. Don't assume APInt::getRawData() would hold target-aware endianness nor host-compliant endianness. rawdata[0] holds most lower i64, even on big endian host. FIXME: Add a testcase for big endian target. FIXME: Ditto on CompileUnit::addConstantFPValue() ? llvm-svn: 143194	2011-10-28 14:12:22 +00:00
NAKAMURA Takumi	b5df9f3cc1	test/CodeGen/X86/2010-08-10-DbgConstant.ll: Add explicit -mtriple=i686-linux. It must be for elf! llvm-svn: 143189	2011-10-28 10:50:52 +00:00
Duncan Sands	a6507c4bcb	Speculatively disable Dan's commits 143177 and 143179 to see if it fixes the dragonegg self-host (it looks like gcc is miscompiled). Original commit messages: Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW on every node as it legalizes them. This makes it easier to use hasOneUse() heuristics, since unneeded nodes can be removed from the DAG earlier. Make LegalizeOps visit the DAG in an operands-last order. It previously used operands-first, because LegalizeTypes has to go operands-first, and LegalizeTypes used to be part of LegalizeOps, but they're now split. The operands-last order is more natural for several legalization tasks. For example, it allows lowering code for nodes with floating-point or vector constants to see those constants directly instead of seeing the lowered form (often constant-pool loads). This makes some things somewhat more complicated today, though it ought to allow things to be simpler in the future. It also fixes some bugs exposed by Legalizing using RAUW aggressively. Remove the part of LegalizeOps that attempted to patch up invalid chain operands on libcalls generated by LegalizeTypes, since it doesn't work with the new LegalizeOps traversal order. Instead, define what LegalizeTypes is doing to be correct, and transfer the responsibility of keeping calls from having overlapping calling sequences into the scheduler. Teach the scheduler to model callseq_begin/end pairs as having a physical register definition/use to prevent calls from having overlapping calling sequences. This is also somewhat complicated, though there are ways it might be simplified in the future. This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others. Please direct high-level questions about this patch to management. Delete #if 0 code accidentally left in. llvm-svn: 143188	2011-10-28 09:55:57 +00:00

... 2 3 4 5 6 ...

3241 Commits