llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Chandler Carruth	3301dd2d10	[x86] Fix PR22524: the DAG combiner was incorrectly handling illegal nodes when folding bitcasts of constants. We can't fold things and then check after-the-fact whether it was legal. Once we have formed the DAG node, arbitrary other nodes may have been collapsed to it. There is no easy way to go back. Instead, we need to test for the specific folding cases we're interested in and ensure those are legal first. This could in theory make this less powerful for bitcasting from an integer to some vector type, but AFAICT, that can't actually happen in the SDAG so its fine. Now, we only whitelist specific int->fp and fp->int bitcasts for post-legalization folding. I've added the test case from the PR. (Also as a note, this does not appear to be in 3.6, no backport needed) llvm-svn: 228656	2015-02-10 02:25:56 +00:00
David Majnemer	9d9e2660fd	X86: Emit an ABI compliant prologue and epilogue for Win64 Win64 has specific contraints on what valid prologues and epilogues look like. This constraint is born from the flexibility and descriptiveness of Win64's unwind opcodes. Prologues previously emitted by LLVM could not be represented by the unwind opcodes, preventing operations powered by stack unwinding to successfully work. Differential Revision: http://reviews.llvm.org/D7520 llvm-svn: 228641	2015-02-10 00:57:42 +00:00
Sanjay Patel	ac6e29e4d7	fixed to test features, not CPUs llvm-svn: 228581	2015-02-09 17:17:09 +00:00
Sanjay Patel	aab4929127	fix test attributes; this is an SSE2 test, not a Nehalem test llvm-svn: 228546	2015-02-08 21:14:27 +00:00
Sanjay Patel	b2034cf7b4	fix test attributes; this is an x86-64 test, not a Nehalem test llvm-svn: 228545	2015-02-08 21:10:40 +00:00
Sanjay Patel	46a1da8b0e	fix test attributes; these are SSE2 tests, not Nehalem tests llvm-svn: 228544	2015-02-08 21:05:03 +00:00
Sanjay Patel	091da2db42	fix test attributes; these are SSE2 tests, not Nehalem tests llvm-svn: 228541	2015-02-08 20:50:58 +00:00
Sanjay Patel	fea32fd7dc	fix test attributes; these are x86-64 tests, not Nehalem tests llvm-svn: 228536	2015-02-08 20:05:53 +00:00
Sanjay Patel	fe04920f0e	fix test attributes; these are MMX tests, not Nehalem tests llvm-svn: 228535	2015-02-08 20:01:12 +00:00
Sanjay Patel	880c5c4f36	fix test attributes; these are SSE2 tests, not Nehalem tests llvm-svn: 228534	2015-02-08 19:50:55 +00:00
Sanjay Patel	c25859ef62	generalize test; nothing Nehalem-specific here llvm-svn: 228532	2015-02-08 19:38:25 +00:00
Simon Pilgrim	c65658ff6d	[X86][AVX2] AVX2 broadcast + permute memory folding tests. llvm-svn: 228528	2015-02-08 18:33:13 +00:00
Simon Pilgrim	a28dacc86e	[X86][AVX2] AVX2 integer stack folding tests. This adds tests for the remaining AVX2 instructions that currently support memory folding. llvm-svn: 228513	2015-02-07 23:28:16 +00:00
Simon Pilgrim	8cad14c8ba	[X86][AVX] Added missing stack folding support + test for vptest ymm instruction llvm-svn: 228509	2015-02-07 21:44:06 +00:00
Simon Pilgrim	9af0a10c0c	[X86][SSE] Added missing stack folding tests for (v)mpsadbw instruction llvm-svn: 228506	2015-02-07 21:20:11 +00:00
Simon Pilgrim	7b6313462e	[X86] Force fp stack folding tests to keep to specific domain. General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default). llvm-svn: 228495	2015-02-07 16:14:55 +00:00
Simon Pilgrim	87231791d9	[X86][AVX2] More AVX2 integer stack folding tests. llvm-svn: 228494	2015-02-07 16:07:27 +00:00
David Majnemer	639c882cbe	MC: Emit COFF section flags in the "proper" order COFF section flags are not idempotent: 'rd' will make a read-write section because 'd' implies write 'dr' will make a read-only section because 'r' disables write llvm-svn: 228490	2015-02-07 08:26:40 +00:00
Simon Pilgrim	a8d15b6f93	[X86][AVX2] Begun adding AVX2 integer stack folding tests. llvm-svn: 228462	2015-02-06 23:12:15 +00:00
Reid Kleckner	fd6f58826f	Don't dllexport declarations Fixes PR22488 llvm-svn: 228411	2015-02-06 17:59:49 +00:00
Matthias Braun	d7bdc2cc14	X86: Test cleanup Use FileCheck, make it more consistent and do not rely on unoptimized or(cmp,cmp) getting combined for max to be matched. llvm-svn: 228361	2015-02-05 23:52:12 +00:00
Ahmed Bougacha	fccf28b772	[CodeGen] Add hook/combine to form vector extloads, enabled on X86. The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325	2015-02-05 18:31:02 +00:00
Andrew Trick	c08a22d8e6	X86 ABI fix for return values > 24 bytes. The return value's address must be returned in %rax. i.e. the callee needs to copy the sret argument (%rdi) into the return value (%rax). This probably won't manifest as a bug when the caller is LLVM-compiled code. But it is an ABI guarantee and tools expect it. llvm-svn: 228321	2015-02-05 18:09:05 +00:00
Bruno Cardoso Lopes	559b43d1de	[X86][MMX] Handle i32->mmx conversion using movd Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293	2015-02-05 13:23:07 +00:00
Bruno Cardoso Lopes	8a49236598	[X86][MMX] Add several bitcast tests Avoid regression in previously supported MMX code by adding different combinations of tests which exercise MMX bitcasts. Small improvements to these patterns should come next. llvm-svn: 228292	2015-02-05 13:22:57 +00:00
Rafael Espindola	b9eab9993a	Don' try to make sections in comdats SHF_MERGE. Parts of llvm were not expecting it and we wouldn't print the entity size of the section. Given what comdats are used for, having SHF_MERGE sections would be just a small improvement, so just disable it for now. Fixes pr22463. llvm-svn: 228196	2015-02-04 21:27:24 +00:00
Michael Kuperstein	69b881344b	Fixes a bug in vector load legalization that confused bits and bytes. Differential Revision: http://reviews.llvm.org/D7400 llvm-svn: 228168	2015-02-04 18:54:01 +00:00
Chandler Carruth	99f7e3a3dd	[x86] Give movss and movsd execution domains in the x86 backend. This associates movss and movsd with the packed single and packed double execution domains (resp.). While this is largely cosmetic, as we now don't have weird ping-pong-ing between single and double precision, it is also useful because it avoids the domain fixing algorithm from seeing domain breaks that don't actually exist. It will also be much more important if we have an execution domain default other than packed single, as that would cause us to mix movss and movsd with integer vector code on a regular basis, a very bad mixture. llvm-svn: 228135	2015-02-04 10:58:53 +00:00
Chandler Carruth	7381b41ffe	[x86] Remove a low-value test that was just checking how we cleared a register. We have lots of tests covering this. llvm-svn: 228133	2015-02-04 10:47:34 +00:00
Chandler Carruth	d418a34d82	[x86] Mechanically update a bunch of tests' check lines using the latest version of the script. Changes include: - Using the VEX prefix - Skipping more detail when we have useful shuffle comments to match - Matching more shuffle comments that have been added to the printer (yay!) - Matching the destination registers of some AVX instructions - Stripping trailing whitespace that crept in - Fixing indentation issues Nothing interesting going on here. I'm just trying really hard to ensure these changes don't show up in the diffs with actual changes to the backend. llvm-svn: 228132	2015-02-04 10:46:53 +00:00
Chandler Carruth	bc1e3c9278	[x86] Include the destination register in the check-lines for AVX instructions. No actual change here. llvm-svn: 228127	2015-02-04 09:18:27 +00:00
Chandler Carruth	4044a6e4ff	[x86] Add some tests I missed in the prior commit to cover blends with zero for v8i16 as well. These exhibit the same domain badness, but also exhibit other weaknesses in our blend lowering. More fixes to come. llvm-svn: 228126	2015-02-04 09:15:46 +00:00
Chandler Carruth	61ac2c112b	[x86] Start to introduce bit-masking based blend lowering. This is the simplest form of bit-math based blending which only fires when we are blending with zero and is relatively profitable. I've only enabled this path on very specific lowering strategies. I'm planning to widen its applicability in subsequent patches, but so far you'll notice that even though we get fewer shufps instructions, we still do the bit math in the FP execution port. I'm looking into why this is still happening. llvm-svn: 228124	2015-02-04 09:06:05 +00:00
Chandler Carruth	55467700cf	[x86] Add tests for blends-with-zero on 4-element vectors. llvm-svn: 228122	2015-02-04 09:05:58 +00:00
Chandler Carruth	2b858b7b20	[x86] Refresh the checks of a number of tests using update_llc_test_checks.py. The exact format of the checks has changed over time. This includes different indenting rules, new shuffle comments that have been added, and more operand hiding behind regular expressions. No functional change to the tests are expected here, but this will make subsequent patches have a clean diff as they change shuffle lowering. llvm-svn: 228097	2015-02-04 00:58:42 +00:00
Chandler Carruth	4d271d9dd1	[x86] Switch to using the long '--check-prefix' form which the update_llc_test_checks.py script uses, and refresh the checks in this test. No functionality changed here, just bringing this test up to work with automated updates using the python script. llvm-svn: 228096	2015-02-04 00:58:40 +00:00
Chandler Carruth	d489a3effe	[x86] Port this test to use utils/update_llc_test_checks.py. This will make it easy to update as I change some parts of the X86 backend, makes it more clear what instruction differences are introduced, and I find it makes it a bit easier to read as well. llvm-svn: 228095	2015-02-04 00:58:37 +00:00
Sanjay Patel	b1e7e01db7	improved CHECK llvm-svn: 228086	2015-02-04 00:24:06 +00:00
Simon Pilgrim	eee3b225d9	[X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2 Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699. Differential Revision: http://reviews.llvm.org/D6649 llvm-svn: 228047	2015-02-03 21:58:29 +00:00
Chandler Carruth	e4646d63a8	[x86] Add two truly horrific test cases for the new vector shuffle lowering. I'm prepping patches to improve these, and this will let the delta of those patches show the improvement. =] llvm-svn: 228044	2015-02-03 21:56:28 +00:00
Chandler Carruth	7c1eb70e22	[x86] Update the indent and layout of some tests in this file. NFC This is just to remove voise from using the update_llc_test_checks script. llvm-svn: 228043	2015-02-03 21:56:24 +00:00
Chandler Carruth	0a9c0a2838	[x86] Tweak my update script to use test case function names starting with 'stress' to indicate that the specific output isn't interesting and relax them to only check the last instruction (a ret). I've updated the one test case that really uses this to name the one 'stress_test' which was actually producing output we can directly check. With this, the script doesn't introduce noise when run over the v16 test file. llvm-svn: 228033	2015-02-03 21:26:45 +00:00
Simon Pilgrim	6b1cbac334	[X86][SSE] Added general integer shuffle matching for MOVQ instruction This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend. llvm-svn: 228022	2015-02-03 20:09:18 +00:00
Simon Pilgrim	f6904ffa22	[X86][AVX2] Enabled shuffle matching for the AVX2 zero extension (128bit -> 256bit) vpmovzx* instructions. Differential Revision: http://reviews.llvm.org/D7251 llvm-svn: 228014	2015-02-03 19:34:09 +00:00
Rafael Espindola	b49c9ab599	Fix typo in test/CodeGen/X86/sibcall.ll (pr22331). llvm-svn: 228011	2015-02-03 19:20:26 +00:00
Sanjay Patel	dd58512572	Merge consecutive 16-byte loads into one 32-byte load (PR22329) This patch detects consecutive vector loads using the existing EltsFromConsecutiveLoads() logic. This fixes: http://llvm.org/bugs/show_bug.cgi?id=22329 This patch effectively reverts the tablegen additions of D6492 / http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack. The test cases that were added with that patch are simply modified to load from varying offsets of a base pointer. These loads did not match the existing tablegen patterns. A happy side effect of doing this optimization earlier is that we can now fold the load into a math op where possible; this is shown in some of the updated checks in the test file. Differential Revision: http://reviews.llvm.org/D7303 llvm-svn: 228006	2015-02-03 18:54:00 +00:00
Sanjay Patel	fb26cf0017	Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371). r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit. The commit log says that change did not affect anything, but that's not correct. That change allowed SSE instructions to have unaligned mem operands folded into math ops, and that's not allowed in the default specification for any SSE variant. The bug is exposed when compiling for an AVX-capable CPU that had this feature flag but without enabling AVX codegen. Another mistake in r224330 was not adding the feature flag to all AVX CPUs; the AMD chips were excluded. This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ). This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem". Changed the existing test case for the feature bit to reflect the new name and renamed the test file itself to better reflect the feature. Added runs to fold-vex.ll to check for the failing codegen. Note that the feature bit is not set by default on any CPU because it may require a configuration register setting to enable the enhanced unaligned behavior. llvm-svn: 227983	2015-02-03 17:13:04 +00:00
Sanjay Patel	35293871e5	Improve test to actually check for a folded load. This test was checking for lack of a "movaps" (an aligned load) rather than a "movups" (an unaligned load). It also included a store which complicated the checking. Add specific CPU runs to prevent subtarget feature flag overrides from inhibiting this optimization. llvm-svn: 227972	2015-02-03 15:37:18 +00:00
Bruno Cardoso Lopes	c6e5ceb6dd	[X86][MMX] Improve transfer from mmx to i32 Improve EXTRACT_VECTOR_ELT DAG combine to catch conversion patterns between x86mmx and i32 with more layers of indirection. Before: movq2dq %mm0, %xmm0 movd %xmm0, %eax After: movd %mm0, %eax llvm-svn: 227969	2015-02-03 14:46:49 +00:00
Alex Rosenberg	c81cb80d2a	Revert part of r227437 as it was unnecessary. Thanks to echristo for pointing this out. llvm-svn: 227897	2015-02-02 23:58:54 +00:00
Bruno Cardoso Lopes	b2e29f9645	[X86][MMX] Add tests for MMX extract element LLVM ToT produces poor MMX code compared to 3.5. However, part of the previous functionality can be achieved by using -x86-experimental-vector-widening-legalization. Add tests to be sure we don't regress again. llvm-svn: 227869	2015-02-02 22:00:48 +00:00
Bruno Cardoso Lopes	21e97803f6	[X86][MMX] Cleanup shuffle, bitcast and insert element tests - Merge MMX arg passing test files - Merge MMX bitcast, insert elt and shuffle tests llvm-svn: 227867	2015-02-02 21:56:11 +00:00
Sanjay Patel	963e49f948	fix typo llvm-svn: 227815	2015-02-02 17:47:30 +00:00
Michael Kuperstein	41ae9af2e3	[X86] Convert esp-relative movs of function arguments to pushes, step 2 This moves the transformation introduced in r223757 into a separate MI pass. This allows it to cover many more cases (not only cases where there must be a reserved call frame), and perform rudimentary call folding. It still doesn't have a heuristic, so it is enabled only for optsize/minsize, with stack alignment <= 8, where it ought to be a fairly clear win. (Re-commit of r227728) Differential Revision: http://reviews.llvm.org/D6789 llvm-svn: 227752	2015-02-01 16:56:04 +00:00
Michael Kuperstein	f73ce6a4c9	Revert r227728 due to bad line endings. llvm-svn: 227746	2015-02-01 16:15:07 +00:00
Michael Kuperstein	2f448f269c	[X86] Convert esp-relative movs of function arguments to pushes, step 2 This moves the transformation introduced in r223757 into a separate MI pass. This allows it to cover many more cases (not only cases where there must be a reserved call frame), and perform rudimentary call folding. It still doesn't have a heuristic, so it is enabled only for optsize/minsize, with stack alignment <= 8, where it ought to be a fairly clear win. Differential Revision: http://reviews.llvm.org/D6789 llvm-svn: 227728	2015-02-01 11:44:44 +00:00
Elena Demikhovsky	67c9d50a47	AVX2: Added 2 more tests for gather intrinsics. llvm-svn: 227718	2015-02-01 08:52:15 +00:00
Simon Pilgrim	45b04beea7	[X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd). Also adds shuffle mask decodes for integer loads (movd/movq). Differential Revision: http://reviews.llvm.org/D7228 llvm-svn: 227688	2015-01-31 14:09:36 +00:00
Ahmed Bougacha	f27e4369af	[X86] Cleanup tabs in test vector-zext.ll. NFC. Some tests have tabs, some don't. In vector-[sz]ext.ll, space wins (well duh!). llvm-svn: 227615	2015-01-30 21:41:28 +00:00
Reid Kleckner	42ded7f3da	Win64: Put a REX_W prefix on all TAILJMP* instructions MSDN's x64 software conventions page says that this is one of the fixed list of legal epilogues: https://msdn.microsoft.com/en-us/library/tawsa7cb.aspx Presumably this is how the unwinder distinguishes epilogue jumps from in-function control flow. Also normalize the way we place "## TAILCALL" comments on such jumps. llvm-svn: 227611	2015-01-30 21:03:31 +00:00
Reid Kleckner	9f52049132	x86: Fix large model calls to __chkstk for dynamic allocas In the large code model, we now put __chkstk in %r11 before calling it. Refactor the code so that we only do this once. Simplify things by using __chkstk_ms instead of __chkstk on cygming. We already use that symbol in the prolog emission, and it simplifies our logic. Second half of PR18582. llvm-svn: 227519	2015-01-29 23:58:04 +00:00
Reid Kleckner	d9129cb1d5	Update comments to use unreachable instead of llvm.trap, as implemented now win64: Call __chkstk through a register with the large code model Fixes half of PR18582. True dynamic allocas will still have a CALL64pcrel32 which will fail. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D7267 llvm-svn: 227503	2015-01-29 22:33:00 +00:00
Robert Lougher	f135e638ae	[X86] Use single add/sub for large stack offsets For large stack offsets the compiler generates multiple immediate mode sub/add instructions in the prologue/epilogue. This patch makes the compiler place the final amount to be added/subtracted into a register, which is then added/substracted with a single operation. Differential Revision: http://reviews.llvm.org/D7226 llvm-svn: 227458	2015-01-29 16:18:29 +00:00
Alex Rosenberg	8a41666e76	Make the test actually test what it's supposed to test. Add a test for the from memory variant of vcvtph2ps for 256-bit. llvm-svn: 227446	2015-01-29 15:19:54 +00:00
Alex Rosenberg	5e1a11c597	Cleanup a few tests on sse4a machines and FileCheckize along the way. llvm-svn: 227437	2015-01-29 13:31:32 +00:00
Rafael Espindola	8c74e872e9	Don't create multiple mergeable sections with -fdata-sections. ELF has support for sections that can be split into fixed size or null terminated entities. Since these sections can be split by the linker, it is not necessary to split them in codegen. This reduces the combined .o size in a llvm+clang build from 202,394,570 to 173,819,098 bytes. The time for linking clang with gold (on a VM, on a laptop) goes from 2.250089985 to 1.383001792 seconds. The flip side is the size of rodata in clang goes from 10,926,785 to 10,929,345 bytes. The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902. llvm-svn: 227431	2015-01-29 12:43:28 +00:00
Reid Kleckner	a789af4162	Add a Windows EH preparation pass that zaps resumes If the personality is not a recognized MSVC personality function, this pass delegates to the dwarf EH preparation pass. This chaining supports people on -windows-itanium or -windows-gnu targets. Currently this recognizes some personalities used by MSVC and turns resume instructions into traps to avoid link errors. Even if cleanups are not used in the source program, LLVM requires the frontend to emit a code path that resumes unwinding after an exception. Clang does this, and we get unreachable resume instructions. PR20300 covers cleaning up these unreachable calls to resume. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D7216 llvm-svn: 227405	2015-01-29 00:41:44 +00:00
Michael Kuperstein	139e0bbb66	[X86] Reduce some 32-bit imuls into lea + shl Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well. Differential Revision: http://reviews.llvm.org/D7196 llvm-svn: 227308	2015-01-28 14:08:22 +00:00
Michael Kuperstein	31413a17ef	[x32] Enable sibcall optimization on x32. This includes two things: 1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness). 2) Allow LEA64_32 in MatchingStackOffset. llvm-svn: 227307	2015-01-28 13:38:48 +00:00
Elena Demikhovsky	237f19f35f	AVX-512: Added FMA intrinsics with rounding mode By Asaf Badouh and Elena Demikhovsky Added special nodes for rounding: FMADD_RND, FMSUB_RND.. It will prevent merge between nodes with rounding and other standard nodes. llvm-svn: 227303	2015-01-28 10:21:27 +00:00
Quentin Colombet	5db4690593	Revert r227242 - Merge vector stores into wider vector stores (PR21711). This commit creates infinite loop in DAG combine for in the LLVM test-suite for aarch64 with mcpu=cylcone (just having neon may be enough to expose this). llvm-svn: 227272	2015-01-27 23:58:01 +00:00
Alexey Samsonov	fbc56e7e06	Revert "[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector" This reverts commits r226953 and r226974. llvm-svn: 227248	2015-01-27 21:34:11 +00:00
Sanjay Patel	be6834ec4b	Merge vector stores into wider vector stores (PR21711) This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). The 'f3' test case in that report presents a situation where we have two 128-bit stores extracted from a 256-bit source vector. Instead of producing this: vmovaps %xmm0, (%rdi) vextractf128 $1, %ymm0, 16(%rdi) This patch merges the 128-bit stores into a single 256-bit store: vmovups %ymm0, (%rdi) Differential Revision: http://reviews.llvm.org/D7208 llvm-svn: 227242	2015-01-27 20:50:27 +00:00
Simon Pilgrim	8cbdc5422d	[X86][SSE] Float comparisons can sometimes be safely commuted For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can. Differential Revision: http://reviews.llvm.org/D7178 llvm-svn: 227145	2015-01-26 22:29:24 +00:00
Simon Pilgrim	00b317ad36	[X86][PCLMUL] Enable commutation for PCLMUL instructions Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask. Differential Revision: http://reviews.llvm.org/D7180 llvm-svn: 227141	2015-01-26 22:00:18 +00:00
Simon Pilgrim	95cf90f418	Line endings fix. NFC. llvm-svn: 227138	2015-01-26 21:28:32 +00:00
Simon Pilgrim	718751993f	Line endings fix. NFC. llvm-svn: 227136	2015-01-26 21:15:42 +00:00
Bruno Cardoso Lopes	68ec9cc54a	[x86][MMX] Rename and cleanup tests: arith, intrinsics and shuffle - Rename mmx-builtins to mmx-intrinsics to match other intrinsic test naming. - Remove tests that duplicate functionality from mmx-intrinsics.ll. - Move arith related tests to mmx-arith.ll. - MMX related shuffle goes to vector-shuffle-mmx.ll. llvm-svn: 227130	2015-01-26 20:06:51 +00:00
Alex Rosenberg	6b9851d1e6	Use a different encoding for debugtrap on PS4. llvm-svn: 227116	2015-01-26 19:09:27 +00:00
Sanjay Patel	75e4c9502b	Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221) This patch fixes the following miscompile: define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp { %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind %a0 = extractelement <2 x double> %0, i32 0 %conv = fptrunc double %a0 to float %a1 = extractelement <2 x double> %0, i32 1 %conv3 = fptrunc double %a1 to float tail call void @callee2(float %conv, float %conv3) nounwind ret void } Current codegen: sqrtsd %xmm0, %xmm1 ## high element of %xmm1 is undef here xorps %xmm0, %xmm0 cvtsd2ss %xmm1, %xmm0 shufpd $1, %xmm1, %xmm1 cvtsd2ss %xmm1, %xmm1 ## operating on undef value jmp _callee This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ). All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); this should be the final patch needed to resolve that bug. Differential Revision: http://reviews.llvm.org/D6885 llvm-svn: 227111	2015-01-26 18:42:16 +00:00
Sanjay Patel	b7dee900e5	fix line-endings; NFC llvm-svn: 227095	2015-01-26 17:21:36 +00:00
Craig Topper	c3f12611ab	[X86] Change comparision immediate type to i8 in test cases for AVX512 floating point comparisons. The type was already changed in the definitions and was being auto upgraded to the new type. llvm-svn: 227064	2015-01-25 23:26:12 +00:00
Craig Topper	88781cf0d8	[X86] Use i8 immediate for comparison type on AVX512 packed integer instructions. This matches floating point equivalents. Includes autoupgrade support to convert old code. llvm-svn: 227063	2015-01-25 23:26:02 +00:00
Elena Demikhovsky	6889f421a2	AVX-512: Changes in operations on masks registers for KNL and SKX - Added KSHIFTB/D/Q for skx - Added KORTESTB/D/Q for skx - Fixed store operation for v8i1 type for KNL - Store size of v8i1, v4i1 and v2i1 are changed to 8 bits llvm-svn: 227043	2015-01-25 12:47:15 +00:00
Andrea Di Biagio	015201ba20	[DAG] Fix wrong canonicalization performed on shuffle nodes. This fixes a regression introduced by r226816. When replacing a splat shuffle node with a constant build_vector, make sure that the new build_vector has a valid number of elements. Thanks to Patrik Hagglund for reporting this problem and providing a small reproducible. llvm-svn: 227002	2015-01-24 11:54:29 +00:00
Reid Kleckner	66635592d6	Fix assertion when C++ EH filters are present in functions using SEH Should fix PR22305. llvm-svn: 226969	2015-01-23 23:51:25 +00:00
Bruno Cardoso Lopes	cdd0e7bf5f	[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of using stack store/load pair to do the job, use scalar_to_vector directly, which in the MMX case can use movq2dq. This was the current behavior prior to improvements for vector legalization of extloads in r213897. This commit fixes the regression and as a side-effect also remove some unnecessary shuffles. In the new attached testcase, we go from: pshufw $-18, (%rdi), %mm0 movq %mm0, -8(%rsp) movq -8(%rsp), %xmm0 pshufd $-44, %xmm0, %xmm0 movd %xmm0, %eax ... To: pshufw $-18, (%rdi), %mm0 movq2dq %mm0, %xmm0 movd %xmm0, %eax ... Differential Revision: http://reviews.llvm.org/D7126 rdar://problem/19413324 llvm-svn: 226953	2015-01-23 22:44:16 +00:00
Reid Kleckner	f3d1116092	Classify functions by EH personality type rather than using the triple This mostly reverts commit r222062 and replaces it with a new enum. At some point this enum will grow at least for other MSVC EH personalities. Also beefs up the way we were sniffing the personality function. Previously we would emit the Itanium LSDA despite using __C_specific_handler. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6987 llvm-svn: 226920	2015-01-23 18:49:01 +00:00
Craig Topper	043ec61ef1	[x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense. llvm-svn: 226902	2015-01-23 08:00:59 +00:00
Simon Pilgrim	46d1b2f8de	[X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests. Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing. Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP. llvm-svn: 226873	2015-01-22 22:39:59 +00:00
Simon Pilgrim	1165de73cd	Line endings fixes. NFC. llvm-svn: 226872	2015-01-22 22:27:37 +00:00
Simon Pilgrim	8bb4ad9673	[X86][SSE] Simplified PSUBUS tests Removed loops from PSUBUS tests - ensures folding is tested. Also renamed SSE2 tests SSSE3 to match cpu. This is a follow up commit agreed in http://reviews.llvm.org/D7094 llvm-svn: 226871	2015-01-22 22:19:58 +00:00
Ramkumar Ramachandra	550e92d3f7	Intrinsics: introduce llvm_any_ty aka ValueType Any Specifically, gc.result benefits from this greatly. Instead of: gc.result.int.* gc.result.float.* gc.result.ptr.* ... We now have a gc.result.* that can specialize to literally any type. Differential Revision: http://reviews.llvm.org/D7020 llvm-svn: 226857	2015-01-22 20:14:38 +00:00
Sanjay Patel	73374b7f96	merge consecutive stores of extracted vector elements (PR21711) This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. That patch was checked in at r224611, but reverted at r225031 because it caused a failure outside of the regression tests. The cause of the crash was not recognizing consecutive stores that have mixed source values (loads and vector element extracts), so this patch adds a check to bail out if any store value is not coming from a vector element extract. This patch also refactors the shared logic of the constant source and vector extracted elements source cases into a helper function. Differential Revision: http://reviews.llvm.org/D6850 llvm-svn: 226845	2015-01-22 18:21:26 +00:00
Michael Kuperstein	c63c75da0b	[DAGCombine] Produce better code for constant splats This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 Fixed recommit of r226811. llvm-svn: 226816	2015-01-22 13:07:28 +00:00
Michael Kuperstein	1c951c9953	Revert r226811, MSVC accepts code sane compilers don't. llvm-svn: 226814	2015-01-22 12:48:07 +00:00
Michael Kuperstein	91fb9ac13f	[DAGCombine] Produce better code for constant splats This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 llvm-svn: 226811	2015-01-22 12:37:23 +00:00
Elena Demikhovsky	4c1384e72f	Fixed a bug in type legalizer for masked load/store intrinsics. The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. llvm-svn: 226808	2015-01-22 12:07:59 +00:00
Elena Demikhovsky	cc330838cb	Fixed a bug in narrowing store operation. Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type, since the size of VT (1 bit) is not equal to its actual store size(8 bits). Added a test provided by David (dag@cray.com) llvm-svn: 226805	2015-01-22 09:39:08 +00:00
Reid Kleckner	82c69426ca	SEH: Finish writing the catch-all test case llvm-svn: 226768	2015-01-22 02:31:09 +00:00

1 2 3 4 5 ...

5619 Commits