llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00

Author	SHA1	Message	Date
Sanjay Patel	0cd7759843	[x86] avoid crashing with illegal vector type (PR31672) https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758	2017-01-22 17:06:12 +00:00
Craig Topper	35349180a5	[X86] Don't allow commuting to form phsub operations. Fixes PR31714. llvm-svn: 292713	2017-01-21 06:59:38 +00:00
Craig Topper	8fa93ef7d3	[X86] Add test cases that show bad commuting being allowed to create a phsub operation. llvm-svn: 292712	2017-01-21 06:59:35 +00:00
Sanjay Patel	fc6ca85023	[ValueTracking] recognize variations of 'clamp' to improve codegen (PR31693) By enhancing value tracking, we allow an existing min/max canonicalization to kick in and improve codegen for several targets that have min/max instructions. Unfortunately, recognizing min/max in value tracking may cause us to hit a hack in InstCombiner::visitICmpInst() more often: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html ...but I'm hoping we can remove that soon. Correctness proofs based on Alive: Name: smaxmin Pre: C1 < C2 %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp slt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp sgt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: sminmax Pre: C1 > C2 %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp sgt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp slt i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: smaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: sminmax Done: 1 Optimization is correct! Name: umaxmin Pre: C1 u< C2 %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ult i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ugt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: uminmax Pre: C1 u> C2 %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ugt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ult i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: umaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: uminmax Done: 1 Optimization is correct! llvm-svn: 292660	2017-01-20 22:18:47 +00:00
Sanjay Patel	29a4690f15	[x86] add tests to show missed min/max vector codegen (PR31693) llvm-svn: 292640	2017-01-20 20:14:11 +00:00
Wei Mi	10ac510628	[RegisterCoalescing] Recommit the patch "Remove partial redundent copy". The recommit fixes a bug related with live interval update after the partial redundent copy is moved. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 llvm-svn: 292621	2017-01-20 17:38:54 +00:00
Craig Topper	2e09861346	[AVX-512] Fix a couple test cases to not pass an undef mask to gather intrinsic. This could break if any future optimizations taken advantage of the undef. llvm-svn: 292585	2017-01-20 07:12:30 +00:00
Simon Pilgrim	fb321451a1	[SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293) This patch improves the knownbits logic for unsigned integer min/max opcodes. For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits. This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set. Differential Revision: https://reviews.llvm.org/D28853 llvm-svn: 292528	2017-01-19 22:41:22 +00:00
Simon Pilgrim	5efa0b5fb4	[X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493	2017-01-19 16:25:02 +00:00
Simon Pilgrim	d64c8178c6	[X86][SSE] Added tests for pre-truncating arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292487	2017-01-19 15:03:00 +00:00
Mikael Holmen	ba88ebaa68	[DAG] Don't increase SDNodeOrder for dbg.value/declare. Summary: The SDNodeOrder is saved in the IROrder field in the SDNode, and this field may affects scheduling. Thus, letting dbg.value/declare increase the order numbers may in turn affect scheduling. Because of this change we also need to update the code deciding when dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues. Dbg values now have the same order as the SDNode they are connected to, not the following orders. Test cases provided by Florian Hahn. Reviewers: bogner, aprantl, sunfish, atrick Reviewed By: atrick Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D25318 llvm-svn: 292485	2017-01-19 13:55:55 +00:00
Elena Demikhovsky	8e858e78f5	Recommiting unsigned saturation with a bugfix. A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479	2017-01-19 12:08:21 +00:00
Craig Topper	c5ad71316f	[AVX-512] Add test cases that show where we are using two subvector inserts to broadcast a 128-bit subvector into a 512-bit vector. We'd be better off using something like SHUFF32X4. If the subvector comes from a load, we convert to SUBV_BROADCAST and use a broadcast instruction. But if there is no load we keep the inserts. I think we should create the SUBV_BROADCAST even without the load and let isel use the fallback patterns that are used if the load can't be folded. This will use the SHUFF32X4 or similar instruction for the 128-bit into 512-bit case and a single insert for 128 into 256 or 256 into 512. This should be fixed so subvector broadcast intrinsics can be replaced with native IR since some of those currently lower directly to SHUFF32X4. llvm-svn: 292475	2017-01-19 07:37:45 +00:00
Craig Topper	544c0172e3	[AVX-512] Support ADD/SUB/MUL of mask vectors Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474	2017-01-19 07:12:35 +00:00
Craig Topper	abe1662e38	[AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load. llvm-svn: 292466	2017-01-19 02:34:29 +00:00
Craig Topper	d00e767f8f	[AVX-512] Add additional test cases for broadcast intrinsics that demonstates that we don't fold the loads to use a broadcast instruction. llvm-svn: 292465	2017-01-19 02:34:25 +00:00
Michael Kuperstein	8be247dd9b	Revert r291670 because it introduces a crash. r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444	2017-01-18 23:05:58 +00:00
Teresa Johnson	b4143d5d72	Don't create a comdat group for a dropped def with initializer Non-prevailing weak/linkonce odr symbols will be dropped by ThinLTO to available_externally when possible. If they had an initializer in the global_ctors list, a comdat group was being created. This code already had logic to skip available_externally defs, but now the EliminateAvailableExternally pass will drop these symbols to declarations earlier. Change the check to skip all declarations for linker (which includes available_externally along with declarations). Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28737 llvm-svn: 292408	2017-01-18 16:58:43 +00:00
Simon Pilgrim	db263515e7	[X86][SSE] Simplify umax knownbits test combineSRA doesn't detect sign bits splats that it does itself so just use -1 as the demanded input so that its already splatted llvm-svn: 292361	2017-01-18 11:20:31 +00:00
Michael Zuckerman	b21dc48a99	[X86] Improve mul combine for negative multiplayer (2^c - 1) This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358	2017-01-18 09:31:13 +00:00
Wei Mi	3f55206292	Revert rL292292 since it causes a SEGV on sanitizer-x86_64-linux-fuzzer build bot. llvm-svn: 292327	2017-01-18 01:53:53 +00:00
Wei Mi	4e944b6da4	[RegisterCoalescing] Remove partial redundent copy. The patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 llvm-svn: 292292	2017-01-17 23:39:07 +00:00
Simon Pilgrim	9895b41d9d	[X86][SSE] Split UMIN and UMAX known bits tests llvm-svn: 292277	2017-01-17 22:12:25 +00:00
Joerg Sonnenberger	d75b6a6a2d	Remove an overeager assert from r288844. llvm-svn: 292244	2017-01-17 19:29:15 +00:00
Bob Wilson	62491d9e34	Revert r291640 change to fold X86 comparison with atomic_load_add. Even with the fix from r291630, this still causes problems. I get widespread assertion failures in the Swift runtime's WeakRefCount::increment() function. I sent a reduced testcase in reply to the commit. llvm-svn: 292242	2017-01-17 19:18:57 +00:00
Simon Pilgrim	df3a78e117	[X86][AVX512] Add all_of/any_of avx512vl tests llvm-svn: 292235	2017-01-17 17:33:18 +00:00
Simon Pilgrim	fa295d2c6e	[X86][SSE] Tests showing horizontal all_of/any_of of vector comparison results llvm-svn: 292223	2017-01-17 15:02:01 +00:00
Craig Topper	5dedee2c4e	[AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation. llvm-svn: 292201	2017-01-17 06:49:59 +00:00
Craig Topper	40ae8bdf45	[AVX-512] Add test cases showing missed opportunities to fold subvector broadcasts with a mask operation. llvm-svn: 292200	2017-01-17 06:49:54 +00:00
Ahmed Bougacha	98077584bd	Revert "[TLI] Robustize SDAG proto checking by merging it into TLI." This reverts commit r292189, as it causes issues on SystemZ bots. llvm-svn: 292191	2017-01-17 03:31:00 +00:00
Ahmed Bougacha	2004f9b06a	[TLI] Robustize SDAG proto checking by merging it into TLI. SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to two tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. I don't think it's worth supporting either of these (IMO) broken testcases. Instead, fix them to be more correct. llvm-svn: 292189	2017-01-17 03:10:06 +00:00
Simon Pilgrim	7522ab562f	[SelectionDAG] Add knownbits support for BITREVERSE llvm-svn: 292130	2017-01-16 14:49:26 +00:00
Simon Pilgrim	5136255171	[X86][SSE] Test showing missing BITREVERSE knownbits support llvm-svn: 292118	2017-01-16 13:59:42 +00:00
Simon Pilgrim	a8edcf036b	[SelectionDAG] Add support for BITREVERSE constant folding We were relying on constant folding of the legalized instructions to do what constant folding we had previously llvm-svn: 292114	2017-01-16 13:39:00 +00:00
Simon Pilgrim	bcca9b527d	[X86][SSE] Tests showing missing BITREVERSE constant folding llvm-svn: 292111	2017-01-16 13:18:07 +00:00
Michael Zuckerman	ea3cad6409	Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE. llvm-svn: 292066	2017-01-15 16:43:14 +00:00
Simon Pilgrim	51be3b8ec7	[X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator llvm-svn: 292021	2017-01-14 18:52:13 +00:00
Simon Pilgrim	8edc50af45	[X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator llvm-svn: 292020	2017-01-14 18:08:54 +00:00
Simon Pilgrim	76d16b1002	[X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages llvm-svn: 292019	2017-01-14 17:13:52 +00:00
Simon Pilgrim	82b70a961d	[X86][XOP] Add tests for integer fused multiply add Tests showing missed opportunities to use XOP's integer fma instructions Some of these are pretty awkward to match as they often have implicit sext/trunc stages but many just ignore overflow bits which makes things pretty straightforward. llvm-svn: 292017	2017-01-14 13:07:22 +00:00
Craig Topper	f41154b488	[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005	2017-01-14 07:50:52 +00:00
Craig Topper	8a8b277524	[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004	2017-01-14 07:29:24 +00:00
Craig Topper	27e612d748	[AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there. This fixes a undefined sanitizer failure from r291888. llvm-svn: 291994	2017-01-14 04:19:35 +00:00
Simon Pilgrim	442bdb683b	[X86][AVX] Bad v4f64/v4i64 '1z3z' shuffle test case This lowers to SHUFPD if the input is zeroinitializer but not with a demanded elts optimized build vector. llvm-svn: 291924	2017-01-13 18:23:47 +00:00
Simon Pilgrim	bc7c4b4b3d	Regenerate test. llvm-svn: 291920	2017-01-13 17:44:28 +00:00
Simon Pilgrim	7e80d357cd	Regenerate test with update_llc_test_checks.py llvm-svn: 291910	2017-01-13 16:37:38 +00:00
Simon Pilgrim	774d523907	[X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901	2017-01-13 13:16:19 +00:00
Michael Zuckerman	61a330a224	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888	2017-01-13 09:06:00 +00:00
Nikolai Bozhenov	3a4f22c55f	[X86] Replace AND+IMM64 with SRL/SHL Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806	2017-01-12 19:54:27 +00:00
Nikolai Bozhenov	472aaf377c	[X86] Modify BypassSlowDivision tests to match their new names (NFC) - bypass-slow-division-32.ll: tests verifying correctness of divl-to-divb bypassing - bypass-slow-division-64.ll: tests verifying correctness of divq-to-divl bypassing - bypass-slow-division-tune.ll: tests verifying that bypassing is enabled only when appropriate Differential Revision: https://reviews.llvm.org/D28551 llvm-svn: 291804	2017-01-12 19:48:01 +00:00

1 2 3 4 5 ...

8874 Commits