llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Simon Pilgrim	4d4989655e	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Andrew Kaylor	8fbb7931a3	[regalloc][WinEH] Do not mark intervals as not spillable if they contain a regmask Differential Revision: http://reviews.llvm.org/D16831 llvm-svn: 260164	2016-02-08 22:52:51 +00:00
Dan Gohman	32a15ba966	[WebAssembly] Update the br_if instructions' operand orders to match the spec. llvm-svn: 260152	2016-02-08 21:50:13 +00:00
Sanjay Patel	69b9326059	[x86] convert masked store of one element to scalar store Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but this should be a win for any AVX machine. Code comments note that this transform could be extended for other targets / cases. Differential Revision: http://reviews.llvm.org/D16828 llvm-svn: 260145	2016-02-08 21:05:08 +00:00
Hans Wennborg	d934f6f749	Add triple to h-registers-3.ll to make bots happy after r260133 llvm-svn: 260136	2016-02-08 19:45:24 +00:00
Hans Wennborg	36d0cb3fd9	[X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532) This matches GCC and MSVC's behaviour, and saves on code size. We were already not extending i1 return values on x86_64 after r127766. This takes that patch further by applying it to x86 target as well, and also for i8 and i16. The ABI docs have been unclear about the required behaviour here. The new i386 psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being updated to say the same [2]. Differential Revision: http://reviews.llvm.org/D16907 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ llvm-svn: 260133	2016-02-08 19:34:30 +00:00
Tim Northover	1dabed8e29	AArch64: match correct order in subtraction pattern. The accumulator in multiply-and-subtract instructions is actually subtracted from so these patterns were computing the wrong value. llvm-svn: 260131	2016-02-08 19:33:18 +00:00
Matt Arsenault	eed0ad4e3e	AMDGPU: Remove bfi and bfm intrinsics Nothing is using them. llvm-svn: 260123	2016-02-08 19:06:01 +00:00
Matt Arsenault	34d57039a9	SelectionDAG: Lower some range metadata to AssertZext If a range has a lower bound of 0, add an AssertZext from the nearest floor power of two. This allows operations with some workitem intrinsics with known maximum ranges to use fast 24-bit multiplies. llvm-svn: 260109	2016-02-08 16:28:19 +00:00
Michael Zuckerman	c705af63bb	[AVX512][PROLQ][PROLD] Change imm8 to int Differential Revision: http://reviews.llvm.org/D16983 llvm-svn: 260101	2016-02-08 15:13:32 +00:00
Craig Topper	bc3298f1ee	[X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461. llvm-svn: 260069	2016-02-08 01:23:15 +00:00
Simon Pilgrim	9ecb59bace	[X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input. This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle. Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle. Differential Revision: http://reviews.llvm.org/D16683 llvm-svn: 260063	2016-02-07 22:51:06 +00:00
Simon Pilgrim	f8238985a5	[X86][SSE] Regenerate PSHUFB shuffle mask comments tests llvm-svn: 260061	2016-02-07 22:22:09 +00:00
Simon Pilgrim	8d73f8b49c	[X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding. llvm-svn: 260034	2016-02-07 15:39:22 +00:00
Asaf Badouh	5bbbfafa66	[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033	2016-02-07 14:59:13 +00:00
Igor Breger	60ac21f165	AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation. Differential Revision: http://reviews.llvm.org/D16813 llvm-svn: 260024	2016-02-07 08:30:50 +00:00
Simon Pilgrim	c7e8ed93d3	[X86][AVX2] Regenerated broadcast domain tests llvm-svn: 260010	2016-02-06 22:09:25 +00:00
Simon Pilgrim	3fac9b2c7b	[X86][SSE] Add tests for MOVHLPS/MOVLHPS shuffle lowering. As raised in PR26491, we don't make use of these instructions at the moment. llvm-svn: 260008	2016-02-06 20:11:52 +00:00
Simon Pilgrim	b7e95cd192	[X86][AVX512] Added support for VPMOVZX shuffle decoding. llvm-svn: 260007	2016-02-06 19:51:21 +00:00
Simon Pilgrim	cd7ea0187e	[X86][AVX512] Fixed prefix ordering for lzcnt tests. Let AVX512 targets share the same CHECKs. llvm-svn: 260000	2016-02-06 18:07:19 +00:00
Simon Pilgrim	9c023a124c	[X86][SSE] Regenerate vector shift tests llvm-svn: 259999	2016-02-06 17:57:15 +00:00
Simon Pilgrim	5b51d55c78	line endings fix llvm-svn: 259992	2016-02-06 15:38:25 +00:00
Simon Pilgrim	f1a97ef96e	[X86][SSE] Don't replace an existing 32-bit load with its duplicate If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991	2016-02-06 15:37:09 +00:00
Matt Arsenault	8009cfb6b5	AMDGPU: Account for LDS alignment The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912	2016-02-05 19:47:29 +00:00
Matt Arsenault	3c264fc8c0	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Wei Mi	5f6c0d3559	Some stackslots are allocated to vregs which have no real reference. LiveRangeEdit::eliminateDeadDef is used to remove dead define instructions after rematerialization. To remove a VNI for a vreg from its LiveInterval, LiveIntervals::removeVRegDefAt is used. However, after non-PHI VNIs are all removed, PHI VNI are still left in the LiveInterval. Such unused vregs will be kept in RegsToSpill[] at the end of InlineSpiller::reMaterializeAll and spiller will allocate stackslot for them. The fix is to get rid of unused reg by checking whether it has non-dbg reference instead of whether it has non-empty interval. llvm-svn: 259895	2016-02-05 18:14:24 +00:00
Dan Gohman	f44d278023	[WebAssembly] Update the select instructions' operand orders to match the spec. llvm-svn: 259893	2016-02-05 17:14:59 +00:00
Nemanja Ivanovic	a54ae64cf1	Add the missing test case for PR26193 llvm-svn: 259888	2016-02-05 15:03:17 +00:00
Renato Golin	c3580d7a36	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)." This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881	2016-02-05 12:14:30 +00:00
Nemanja Ivanovic	b7bc445a9f	Fix for PR 26356 Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840	2016-02-04 23:14:42 +00:00
Nemanja Ivanovic	8a43819d85	Provide a test case for rl259798 llvm-svn: 259835	2016-02-04 22:36:10 +00:00
Simon Pilgrim	cc76b8656c	[X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816	2016-02-04 19:27:51 +00:00
Chad Rosier	b8b5852fe4	[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3). This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812	2016-02-04 18:59:49 +00:00
Silviu Baranga	22ab3adc5c	[AArch64] Multiply extended 32-bit ints with `[U\|S]MADDL' During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U\|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800	2016-02-04 16:47:09 +00:00
Benjamin Kramer	57b36f1497	The canonical way to XFAIL a test for all targets is XFAIL: , not XFAIL: Fix the lit bug that enabled this "feature" (empty triple is substring of all possible target triples) and change the two outliers to use the documented syntax. llvm-svn: 259799	2016-02-04 16:21:38 +00:00
Renato Golin	f1ae93e13e	[PPC] Move PPC test to a PPC-specific dir llvm-svn: 259797	2016-02-04 16:14:59 +00:00
Simon Pilgrim	da26d272a9	[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796	2016-02-04 16:12:56 +00:00
Chad Rosier	fcca55983b	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR." This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795	2016-02-04 16:01:40 +00:00
Simon Pilgrim	e5e6320aa6	[X86][SSE] Added i686 target tests to make sure we are correctly loading consecutive entries as 64-bit integers llvm-svn: 259794	2016-02-04 15:51:55 +00:00
Elena Demikhovsky	86a7e2549e	AVX-512: Fixed a bug in FMA instruction selection on KNL The FMA instruction was selected from AVX2 set instead of AVX-512 Differential Revision: http://reviews.llvm.org/D16884 llvm-svn: 259792	2016-02-04 15:11:11 +00:00
Petar Jovanovic	7a49463224	[Power PC] softening long double type This patch implements softening of long double type (ppcf128) on ppc32 architecture and enables operations for this type for soft float. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D15811 llvm-svn: 259791	2016-02-04 14:43:50 +00:00
Chad Rosier	52d5d7b161	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790	2016-02-04 14:42:55 +00:00
Michael Zuckerman	d8de4a9888	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789	2016-02-04 14:41:08 +00:00
Simon Pilgrim	b9c80e3f8d	[X86] Add AVX512 vector zext tests llvm-svn: 259786	2016-02-04 14:06:19 +00:00
Jingyue Wu	bb54579422	[NVPTX] Disable performance optimizations when OptLevel==None Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749	2016-02-04 04:15:36 +00:00
Nemanja Ivanovic	5a78d13728	Test case for PR 26381 llvm-svn: 259740	2016-02-04 01:58:20 +00:00
Wei Mi	00d0d9c981	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736	2016-02-04 01:27:38 +00:00
Tim Shen	c6dc619045	[SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior This patch consists of two parts: a performance fix in DAGCombiner.cpp and a correctness fix in SelectionDAG.cpp. The test case tests the bug that's uncovered by the performance fix, and fixed by the correctness fix. The performance fix keeps the containers required by the hasPredecessorHelper (which is a lazy DFS) and reuse them. Since hasPredecessorHelper is called in a loop, the overall efficiency reduced from O(n^2) to O(n), where n is the number of SDNodes. The correctness fix keeps iterating the neighbor list even if it's time to early return. It will return after finishing adding all neighbors to Worklist, so that no neighbors are discarded due to the original early return. llvm-svn: 259691	2016-02-03 20:58:55 +00:00
Saleem Abdulrasool	d7405cba41	ARM: support TLS for WoA Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676	2016-02-03 18:21:59 +00:00
Wei Mi	1ef051b016	Revert r259662, which caused regressions on polly tests. llvm-svn: 259675	2016-02-03 18:05:57 +00:00

1 2 3 4 5 ...

14903 Commits