llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00

Author	SHA1	Message	Date
Andrea Di Biagio	4de2e09295	[x86] Fix AVX maskload/store intrinsic prototypes. The mask value type for maskload/maskstore GCC builtins is never a vector of packed floats/doubles. This patch fixes the following issues: 1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd should be of type llvm_v2i64_ty and not llvm_v2f64_ty. 2. The mask argument for builtin_ia32_maskloadpd256 and builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not llvm_v4f64_ty. 3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps should be of type llvm_v4i32_ty and not llvm_v4f32_ty. 4. The mask argument for builtin_ia32_maskloadps256 and builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not llvm_v8f32_ty. Differential Revision: http://reviews.llvm.org/D13776 llvm-svn: 250817	2015-10-20 11:20:13 +00:00
Simon Pilgrim	838c618b88	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Simon Pilgrim	162ca60e5b	[X86][SSE] Intrinsics builtins test refresh. NFCI llvm-svn: 248122	2015-09-20 15:41:35 +00:00
Sanjay Patel	d7195a62c2	[X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd This is the AVX extension of r235014: http://llvm.org/viewvc/llvm-project?view=revision&revision=235014 Review: http://reviews.llvm.org/D8691 llvm-svn: 235210	2015-04-17 17:02:37 +00:00
Sanjay Patel	fa74d9a602	Use utils/update_llc_test_checks.py to update all CHECKs The checks here were so vague that we could nuke intrinsics from existence and still pass the test because we'd match the function name. llvm-svn: 232647	2015-03-18 16:38:44 +00:00
Sanjay Patel	e60e76fab6	fixed to test features, not CPU model The 'vmovntdq' was only passing due to a fluke in SandyBridge codegen that splits 32-byte stores in half, but that meant that the test was not correctly checking for the 32-byte store that we thought we were generating. The lax checking in this file will be addressed in another commit. There are bigger problems here. llvm-svn: 232644	2015-03-18 16:07:10 +00:00
Sanjay Patel	aa9ea9aae7	[X86, AVX] replace vextractf128 intrinsics with generic shuffles Now that we've replaced the vinsertf128 intrinsics, do the same for their extract twins. This is very much like D8086 (checked in at r231794): We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. This is also the LLVM sibling to the cfe D8275 patch. Differential Revision: http://reviews.llvm.org/D8276 llvm-svn: 232045	2015-03-12 15:15:19 +00:00
Sanjay Patel	5c62e16cdb	[X86, AVX] replace vinsertf128 intrinsics with generic shuffles We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. This is the sibling patch for the Clang half of this change: http://reviews.llvm.org/D8088 Differential Revision: http://reviews.llvm.org/D8086 llvm-svn: 231794	2015-03-10 16:08:36 +00:00
Craig Topper	c3d656cc84	[X86] Remove the blendpd/blendps/pblendw/pblendd intrinsics. They can represented by shuffle_vector instructions. llvm-svn: 230860	2015-02-28 19:33:17 +00:00
David Blaikie	ab043ff680	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Craig Topper	398dc737fa	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. llvm-svn: 229640	2015-02-18 06:24:44 +00:00
Craig Topper	eaf6d626b1	[X86] Remove int_x86_sse2_psll_dq_bs and int_x86_sse2_psrl_dq_bs intrinsics. The builtins aren't used by clang. llvm-svn: 229069	2015-02-13 06:07:24 +00:00
Chandler Carruth	2da477ebfb	[x86] Remove some windows line endings that snuck into the tests here. Folks on Windows, remember to set up your subversion to strip these when submitting... llvm-svn: 225593	2015-01-11 01:36:20 +00:00
Simon Pilgrim	6a516dfd91	[X86][SSE] pslldq/psrldq shuffle mask decodes Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738	2014-10-14 22:31:34 +00:00
Chandler Carruth	5063f25595	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Chandler Carruth	5b09348e8e	[x86] Fix a pretty horrible bug and inconsistency in the x86 asm parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. llvm-svn: 217310	2014-09-06 10:00:01 +00:00
Chandler Carruth	385e41a0b4	[x86] Add a much more powerful framework for combining x86 shuffle instructions in the legalized DAG, and leverage it to combine long sequences of instructions to PSHUFB. Eventually, the other x86-instruction-specific shuffle combines will probably all be driven out of this routine. But the real motivation is to detect after we have fully legalized and optimized a shuffle to the minimal number of x86 instructions whether it is profitable to replace the chain with a fully generic PSHUFB instruction even though doing so requires either a load from a constant pool or tying up a register with the mask. While the Intel manuals claim it should be used when it replaces 5 or more instructions (!!!!) my experience is that it is actually very fast on modern chips, and so I've gon with a much more aggressive model of replacing any sequence of 3 or more instructions. I've also taught it to do some basic canonicalization to special-purpose instructions which have smaller encodings than their generic counterparts. There are still quite a few FIXMEs here, and I've not yet implemented support for lowering blends with PSHUFB (where its power really shines due to being able to zero out lanes), but this starts implementing real PSHUFB support even when using the new, fancy shuffle lowering. =] llvm-svn: 214042	2014-07-27 01:15:58 +00:00
Adam Nemet	94b6f19596	[X86] Remove AVX1 vbroadcast intrinsics The corresponding CFE patch replaces these intrinsics with vector initializers in avxintrin.h. This patch removes the LLVM intrinsics from the backend. We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering that further to the intrinsics. The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue to use intrinsics. As explained in the CFE patch, the reason is that we currently don't generate as good code for them without the intrinsics. CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It checks that for a series of insertelements we generate the appropriate vbroadcast instruction. Also verified that there was no assembly change in the test-suite before and after this patch. llvm-svn: 209864	2014-05-29 23:35:36 +00:00
Nadav Rotem	fc6e3e3272	X86: Prefer using VPSHUFD over VPERMIL because it has better throughput. llvm-svn: 169624	2012-12-07 19:01:13 +00:00
Craig Topper	e907c5dd53	Remove intrinsic specific instructions for (V)MOVQUmr with patterns pointing to the normal instructions. llvm-svn: 169482	2012-12-06 07:31:16 +00:00
Craig Topper	f424da6ff9	Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support. llvm-svn: 167652	2012-11-10 01:23:36 +00:00
Craig Topper	dc1b95e7de	Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305. llvm-svn: 161318	2012-08-06 06:22:36 +00:00
Craig Topper	7a2d94e7d0	Update test to check for r161305 llvm-svn: 161307	2012-08-05 09:06:28 +00:00
Craig Topper	c6bd90e646	Add intrinsic for pclmulqdq instruction. llvm-svn: 157731	2012-05-31 04:37:40 +00:00
Craig Topper	77b1a4cee5	Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit. llvm-svn: 156375	2012-05-08 06:58:15 +00:00
Craig Topper	448790d566	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Elena Demikhovsky	87a6e08d3a	Fixed a bug in printing "cmp" pseudo ops. > This IR code > %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14) > fails with assertion: > > llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed. > 0 llc 0x0000000001355803 > 1 llc 0x0000000001355dc9 > 2 libpthread.so.0 0x00007f79a30575d0 > 3 libc.so.6 0x00007f79a23a1945 gsignal + 53 > 4 libc.so.6 0x00007f79a23a2f21 abort + 385 > 5 libc.so.6 0x00007f79a239a810 __assert_fail + 240 > 6 llc 0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const, unsigned int, llvm::raw_ostream&) + 119 I added the full testing for all possible pseudo-ops of cmp. I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp. You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in. llvm-svn: 150068	2012-02-08 08:37:26 +00:00
Craig Topper	2b764de6ab	Remove pcmpgt/pcmpeq intrinsics as clang is not using them. llvm-svn: 149367	2012-01-31 06:52:44 +00:00
Craig Topper	f7c9bf17dd	Allow CRC32 instructions to be selected when AVX is enabled. llvm-svn: 147411	2012-01-01 19:51:58 +00:00
Craig Topper	d8ae2d9f27	Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers. llvm-svn: 147409	2012-01-01 19:40:22 +00:00
Craig Topper	8b05e7d035	Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those. llvm-svn: 145927	2011-12-06 09:04:59 +00:00
Craig Topper	aca91b9f14	Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled. llvm-svn: 145376	2011-11-29 05:37:58 +00:00
Chris Lattner	8067661775	remove some old autoupgrade logic llvm-svn: 145167	2011-11-27 06:10:54 +00:00
Craig Topper	a584521daa	Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370. llvm-svn: 144622	2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen	9380d5daff	Kill and collapse outstanding DomainValues. DomainValues that are only used by "don't care" instructions are now collapsed to the first possible execution domain after all basic blocks have been processed. This typically means the PS domain on x86. For example, the vsel_i64 and vsel_double functions in sse2-blend.ll are completely collapsed to the PS domain instead of containing a mix of execution domains created by isel. llvm-svn: 144037	2011-11-07 23:08:21 +00:00
Craig Topper	124b2fd08c	Add new X86 AVX2 VBROADCAST instructions. llvm-svn: 143612	2011-11-03 07:35:53 +00:00
Craig Topper	361c873b52	Fix operand type for x86 pmadd_ub_sw intrinsic. llvm-svn: 143455	2011-11-01 07:25:22 +00:00
Craig Topper	dbf10927d7	Fix operand type for int_x86_ssse3_phadd_sw_128 intrinsic llvm-svn: 143336	2011-10-31 07:16:37 +00:00
Craig Topper	e77289b243	Fix return type for X86 mpsadbw instrinsic. The instruction takes in a vector of 8-bit integers, but produces a vector of 16-bit integers. llvm-svn: 143313	2011-10-30 17:22:45 +00:00
Bill Wendling	0b9c16295a	As Dan pointed out, movzbl, movsbl, and friends are nicer than their alias (movzx/movsx) because they give more information. Revert that part of the patch. llvm-svn: 129498	2011-04-14 01:46:37 +00:00
Bill Wendling	d49591cf21	Have the X86 back-end emit the alias instead of what's being aliased. In most cases, it's much nicer and more informative reading the alias. llvm-svn: 129497	2011-04-14 01:11:51 +00:00
Bill Wendling	0984f4927e	Reapply r129401 with patch for clang. llvm-svn: 129419	2011-04-13 00:36:11 +00:00
Bill Wendling	f6446a0961	Revert r129401 for now. Clang is using the old way of doing things. llvm-svn: 129403	2011-04-12 22:59:27 +00:00
Bill Wendling	f9c9d3e05b	Remove the unaligned load intrinsics in favor of using native unaligned loads. Now that we have a first-class way to represent unaligned loads, the unaligned load intrinsics are superfluous. First part of <rdar://problem/8460511>. llvm-svn: 129401	2011-04-12 22:46:31 +00:00
Chris Lattner	297259f6f1	improve the setcc -> setcc_carry optimization to happen more consistently by moving it out of lowering into dag combine. Add some missing patterns for matching away extended versions of setcc_c. llvm-svn: 122201	2010-12-19 22:08:31 +00:00
Dale Johannesen	b78530f9b0	Fix pastos in handling of AVX cvttsd2si, PR8491. Bruno, please review, but I'm pretty sure this is right. Patch by Alex Mac! llvm-svn: 117514	2010-10-28 00:35:54 +00:00
Dale Johannesen	2f4f8f5705	Remove the rest of the nonexistent 64-bit AVX instructions. Bruno, please review. llvm-svn: 113014	2010-09-03 21:23:00 +00:00
Bruno Cardoso Lopes	f91bd70e9a	AVX doesn't support mm operations neither its instrinsics. The AVX versions of PALIGN and PABS* should only exist for 128-bit. Remove the unnecessary stuff. llvm-svn: 112944	2010-09-03 02:08:45 +00:00
Bruno Cardoso Lopes	2051068483	Add testcases for all AVX 256-bit intrinsics added in the last couple days llvm-svn: 110854	2010-08-11 21:12:09 +00:00
Bruno Cardoso Lopes	fa19084e79	Reapply r109881 using a more strict command line for llc. llvm-svn: 110833	2010-08-11 17:39:23 +00:00

1 2

52 Commits