llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00

Author	SHA1	Message	Date
Craig Topper	8b05e7d035	Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those. llvm-svn: 145927	2011-12-06 09:04:59 +00:00
Craig Topper	72b41227d8	Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted. llvm-svn: 145921	2011-12-06 04:59:07 +00:00
NAKAMURA Takumi	c6a187dfdd	test/CodeGen/X86/pointer-vector.ll: Add explicit -mtriple=i686-linux. llvm-svn: 145805	2011-12-05 07:54:57 +00:00
Nadav Rotem	1a91e4381d	Add support for vectors of pointers. llvm-svn: 145801	2011-12-05 06:29:09 +00:00
Sanjoy Das	fe35e107cd	Check for stack space more intelligently. libgcc sets the stack limit field in TCB to 256 bytes above the actual allocated stack limit. This means if the function's stack frame needs less than 256 bytes, we can just compare the stack pointer with the stack limit. This should result in lesser calls to __morestack. llvm-svn: 145766	2011-12-03 09:32:07 +00:00
Sanjoy Das	d1c3d82afe	Fix a bug in the x86-32 code generated for segmented stacks. Currently LLVM pads the call to __morestack with a add and sub of 8 bytes to esp. This isn't correct since __morestack expects the call to be followed directly by a ret. This commit also adjusts the relevant test-case. llvm-svn: 145765	2011-12-03 09:21:07 +00:00
Craig Topper	d381116357	Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors. llvm-svn: 145680	2011-12-02 07:16:01 +00:00
Eric Christopher	4aa8024569	For 64-bit the rest of the general regs are ok for the q constraint. Make sure we can emit both the high and low versions of those registers. Fixes rdar://10392864 llvm-svn: 145579	2011-12-01 08:12:41 +00:00
Eli Friedman	ad916965a6	Pass AVX vectors which are arguments to varargs functions on the stack. <rdar://problem/10463281>. llvm-svn: 145573	2011-12-01 04:49:21 +00:00
Jan Sjödin	2dfb343ffa	Support for encoding all FMA4 instructions and tablegen patterns for all remaining FMA4 instructions and intrinsics with tests. llvm-svn: 145525	2011-11-30 22:09:42 +00:00
Nadav Rotem	148d347bb7	Add test arch to make it pass on non x86 targets llvm-svn: 145498	2011-11-30 17:34:28 +00:00
Nadav Rotem	bef49f31be	Add a tripple to the test llvm-svn: 145489	2011-11-30 11:20:56 +00:00
Nadav Rotem	f8e096f4ee	X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479). llvm-svn: 145488	2011-11-30 10:13:37 +00:00
Evan Cheng	5c1efd630b	Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins. llvm-svn: 145448	2011-11-29 22:48:34 +00:00
Jakob Stoklund Olesen	5d6a4584d9	Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions. Like V_SET0, these instructions are expanded by ExpandPostRA to xorps / vxorps so they can participate in execution domain swizzling. This also makes the AVX variants redundant. llvm-svn: 145440	2011-11-29 22:27:25 +00:00
Elena Demikhovsky	735cff1fa8	Fixed vsqrt.ss intrinsic usage - order of input operands was wrong. Added a test. Thanks Bruno for reviewing the patch. llvm-svn: 145403	2011-11-29 15:00:45 +00:00
Craig Topper	637e60afdd	Fix shuffle decoding for memory forms for (V)SHUFPS/D. llvm-svn: 145392	2011-11-29 07:58:09 +00:00
Craig Topper	4550fc2649	Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390	2011-11-29 07:49:05 +00:00
Craig Topper	aca91b9f14	Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled. llvm-svn: 145376	2011-11-29 05:37:58 +00:00
Craig Topper	a6c1d25798	Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2. llvm-svn: 145370	2011-11-29 03:57:34 +00:00
Evan Cheng	1435aa5fdc	Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead. Conservatively returns zero when the GV does not specify an alignment nor is it initialized. Previously it returns ABI alignment for type of the GV. However, if the type is a "packed" type, then the under-specified alignments is attached to the load / store instructions. In that case, the alignment of the type cannot be trusted. rdar://10464621 llvm-svn: 145300	2011-11-28 22:37:34 +00:00
Evan Cheng	567aa3dfb3	DAG combine should not increase alignment of loads / stores with alignment less than ABI alignment. These are loads / stores from / to "packed" data structures. Their alignments are intentionally under-specified. rdar://10301431 llvm-svn: 145273	2011-11-28 20:42:56 +00:00
Craig Topper	6f5a0bc4e3	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Chandler Carruth	bb4c250613	Take two on rotating the block ordering of loops. My previous attempt was centered around the premise of laying out a loop in a chain, and then rotating that chain. This is good for preserving contiguous layout, but bad for actually making sane rotations. In order to keep it safe, I had to essentially make it impossible to rotate deeply nested loops. The information needed to correctly reason about a deeply nested loop is actually available -- before we layout the loop. We know the inner loops are already fused into chains, etc. We lose information the moment we actually lay out the loop. The solution was the other alternative for this algorithm I discussed with Benjamin and some others: rather than rotating the loop after-the-fact, try to pick a profitable starting block for the loop's layout, and then use our existing layout logic. I was worried about the complexity of this "pick" step, but it turns out such complexity is needed to handle all the important cases I keep teasing out of benchmarks. This is, I'm afraid, a bit of a work-in-progress. It is still misbehaving on some likely important cases I'm investigating in Olden. It also isn't really tested. I'm going to try to craft some interesting nested-loop test cases, but it's likely to be extremely time consuming and I don't want to go there until I'm sure I'm testing the correct behavior. Sadly I can't come up with a way of getting simple, fine grained test cases for this logic. We need complex loop structures to even trigger much of it. llvm-svn: 145183	2011-11-27 13:34:33 +00:00
Chandler Carruth	e6374e6953	Rework a bit of the implementation of loop block rotation to not rely so heavily on AnalyzeBranch. That routine doesn't behave as we want given that rotation occurs mid-way through re-ordering the function. Instead merely check that there are not unanalyzable branching constructs present, and then reason about the CFG via successor lists. This actually simplifies my mental model for all of this as well. The concrete result is that we now will rotate more loop chains. I've added a test case from Olden highlighting the effect. There is still a bit more to do here though in order to regain all of the performance in Olden. llvm-svn: 145179	2011-11-27 09:22:53 +00:00
Chris Lattner	84bf52737a	remove autoupgrade support for old forms of llvm.prefetch and the old trampoline forms. Both of these were correct in LLVM 3.0, and we don't need to support LLVM 2.9 and earlier in mainline. llvm-svn: 145174	2011-11-27 07:42:04 +00:00
Chris Lattner	9d1e8420ff	Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic. llvm-svn: 145171	2011-11-27 06:54:59 +00:00
Chris Lattner	8067661775	remove some old autoupgrade logic llvm-svn: 145167	2011-11-27 06:10:54 +00:00
Chandler Carruth	0d073febb6	Introduce a loop block rotation optimization to the new block placement pass. This is designed to achieve one of the important optimizations that the old code placement pass did, but more simply. This is a somewhat rough and very conservative version of the transform. We could get a lot fancier here if there are profitable cases to do so. In particular, this only looks for a single pattern, it insists that the loop backedge being rotated away is the last backedge in the chain, and it doesn't provide any means of doing better in-loop placement due to the rotation. However, it appears that it will handle the important loops I am finding in the LLVM test suite. llvm-svn: 145158	2011-11-27 00:38:03 +00:00
Eli Friedman	448be745f6	Fix APFloat::convert so that it handles narrowing conversions correctly; it was returning incorrect values in rare cases, and incorrectly marking exact conversions as inexact in some more common cases. Fixes PR11406, and a missed optimization in test/CodeGen/X86/fp-stack-O0.ll. llvm-svn: 145141	2011-11-26 03:38:02 +00:00
Bruno Cardoso Lopes	626d04cc6f	This patch contains support for encoding FMA4 instructions and tablegen patterns for scalar FMA4 operations and intrinsic. Also add tests for vfmaddsd. Patch by Jan Sjodin llvm-svn: 145133	2011-11-25 19:33:42 +00:00
Craig Topper	e761f42368	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Chandler Carruth	f6e96b54f8	Fix a silly use-after-free issue. A much earlier version of this code need lots of fanciness around retaining a reference to a Chain's slot in the BlockToChain map, but that's all gone now. We can just go directly to allocating the new chain (which will update the mapping for us) and using it. Somewhat gross mechanically generated test case replicates the issue Duncan spotted when actually testing this out. llvm-svn: 145120	2011-11-24 11:23:15 +00:00
Chandler Carruth	1d3f68ffd0	When adding blocks to the list of those which no longer have any CFG conflicts, we should only be adding the first block of the chain to the list, lest we try to merge into the middle of that chain. Most of the places we were doing this we already happened to be looking at the first block, but there is no reason to assume that, and in some cases it was clearly wrong. I've added a couple of tests here. One already worked, but I like having an explicit test for it. The other is reduced from a test case Duncan reduced for me and used to crash. Now it is handled correctly. llvm-svn: 145119	2011-11-24 08:46:04 +00:00
Benjamin Kramer	825892c47f	X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. Before: movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00] testq %rax, %rdi ## encoding: [0x48,0x85,0xf8] jne LBB0_2 ## encoding: [0x75,A] After: btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20] jb LBB0_2 ## encoding: [0x72,A] btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off saving one register and a giant movabsq. llvm-svn: 145103	2011-11-23 13:54:17 +00:00
NAKAMURA Takumi	364c7c80ec	test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet. llvm-svn: 145101	2011-11-23 12:18:22 +00:00
Chandler Carruth	dcf04fc35a	Relax an invariant that block placement was trying to assert a bit further. This invariant just wasn't going to work in the face of unanalyzable branches; we need to be resillient to the phenomenon of chains poking into a loop and poking out of a loop. In fact, we already were, we just needed to not assert on it. This was found during a bootstrap with block placement turned on. llvm-svn: 145100	2011-11-23 10:35:36 +00:00
Elena Demikhovsky	681438f0b2	I added several lines in X86 code generator that allow to choose VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask. The patch was reviewed by Bruno. llvm-svn: 145099	2011-11-23 10:23:16 +00:00
Chandler Carruth	c175221b2d	Handle the case of a no-return invoke correctly. It actually still has successors, they just are all landing pad successors. We handle this the same way as no successors. Comments attached for the next person to wade through here and another lovely test case courtesy of Benjamin Kramer's bugpoint reduction. llvm-svn: 145098	2011-11-23 08:23:54 +00:00
Bob Wilson	8722a53e52	Enable stack protectors for all arrays, not just char arrays. rdar://5875909 Patch by Bill Wendling. llvm-svn: 145097	2011-11-23 07:13:56 +00:00
Jakob Stoklund Olesen	2b76f5685c	Fix PR11422. This was a bug in keeping track of the available domains when merging domain values. The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr to the integer domain which is only available in AVX2. Also add an assertion to catch future attempts at emitting AVX2 instructions. llvm-svn: 145096	2011-11-23 04:03:08 +00:00
Chandler Carruth	a357b0a5ee	Fix a crash in block placement due to an inner loop that happened to be reversed in the function's original ordering, and we happened to encounter it while handling an outer unnatural CFG structure. Thanks to the test case reduced from GCC's source by Benjamin Kramer. This may also fix a crasher in gzip that Duncan reduced for me, but I haven't yet gotten to testing that one. llvm-svn: 145094	2011-11-23 03:03:21 +00:00
Chandler Carruth	59f0abf50e	Fix a devilish miscompile exposed by block placement. The updateTerminator code didn't correctly handle EH terminators in one very specific case. AnalyzeBranch would find no terminator instruction, and so the fallback in updateTerminator is to assume fallthrough. This is correct, but the destination of the fallthrough was assumed to be the first successor. This is almost always true, but in certain cases the loop transformations will cause the landing pad to be the first successor! Instead of this brittle logic, actually look through the successors for a non-landing-pad accessor, and to assert if more than one is found. This will hopefully fix some (if not all) of the self host miscompiles with block placement. Thanks to Benjamin Kramer for reporting, Nick Lewycky for an initial stab at a reduction, and Duncan for endless advice on EH (which I know nothing about) as well as reviewing the actual fix. llvm-svn: 145062	2011-11-22 13:13:16 +00:00
Rafael Espindola	05f88add77	Add triple to the test. llvm-svn: 145057	2011-11-22 06:36:25 +00:00
Rafael Espindola	b45f6ea527	If a register is both an early clobber and part of a tied use, handle the use before the clobber so that we copy the value if needed. Fixes pr11415. llvm-svn: 145056	2011-11-22 06:27:18 +00:00
Craig Topper	866214a486	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	05db995317	Test case for r145026 llvm-svn: 145027	2011-11-21 06:58:09 +00:00
Craig Topper	62ae335144	Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled. llvm-svn: 145022	2011-11-21 01:12:36 +00:00
NAKAMURA Takumi	0738f386b4	test/CodeGen/X86/block-placement.ll: Relax expressions for Win32. llvm-svn: 145011	2011-11-20 12:49:45 +00:00
Chandler Carruth	aac8e5082a	The logic for breaking the CFG in the presence of hot successors didn't properly account for the global probability of the edge being taken. This manifested as a very large number of unconditional branches to blocks being merged against the CFG even though they weren't particularly hot within the CFG. The fix is to check whether the edge being merged is both locally hot relative to other successors for the source block, and globally hot compared to other (unmerged) predecessors of the destination block. This introduces a new crasher on GCC single-source, but it's currently behind a flag, and Ben has offered to work on the reduction. =] llvm-svn: 145010	2011-11-20 11:22:06 +00:00

1 2 3 4 5 ...

3040 Commits