llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 13:33:37 +02:00

Author	SHA1	Message	Date
Sanjay Patel	4179b31ef6	typo in comment llvm-svn: 216609	2014-08-27 20:27:05 +00:00
Reid Kleckner	49b9ed4bd9	X86 MC: Handle instructions like fxsave that match multiple operand sizes Instructions like 'fxsave' and control flow instructions like 'jne' match any operand size. The loop I added to the Intel syntax matcher assumed that using a different size would give a different instruction. Now it handles the case where we get the same instruction for different memory operand sizes. This also allows us to remove the hack we had for unsized absolute memory operands, because we can successfully match things like 'jnz' without reporting ambiguity. Removing this hack uncovered test case involving 'fadd' that was ambiguous. The memory operand could have been single or double precision. llvm-svn: 216604	2014-08-27 20:10:38 +00:00
Evgeniy Stepanov	bde2b50f23	Clang-format over X86AsmInstrumentation.* with LLVM style. r216536 mistakenly used -style=Google instead of LLVM. llvm-svn: 216543	2014-08-27 13:11:55 +00:00
Chandler Carruth	d55485e199	[x86] Fix a regression introduced with r213897 for 32-bit targets where we stopped efficiently lowering sextload using the SSE41 instructions for that operation. This is a consequence of a bad predicate I used thinking of the memory access needs. The code actually handles the cases where the predicate doesn't apply, and handles them much better. =] Simple fix and a test case added. Fixes PR20767. llvm-svn: 216538	2014-08-27 11:39:47 +00:00
Chandler Carruth	925d065888	[SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine. This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be exactly right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537	2014-08-27 11:22:16 +00:00
Evgeniy Stepanov	b7f5d6f168	Clang-format over X86AsmInstrumentation.*. llvm-svn: 216536	2014-08-27 11:10:54 +00:00
Robert Khasanov	a051690c52	[SKX] Added new versions of cmp instructions in avx512_icmp_cc multiclass, added VL multiclass. Added encoding tests llvm-svn: 216532	2014-08-27 09:34:37 +00:00
Elena Demikhovsky	fec7633c50	AVX-512: Added intrinsic for VMOVSS store form with mask. llvm-svn: 216530	2014-08-27 07:38:43 +00:00
Reid Kleckner	363b599ec0	MC: Split the x86 asm matcher implementations by dialect The existing matcher has lots of AT&T assembly dialect assumptions baked into it. In particular, the hack for resolving the size of a memory operand by appending the four most common suffixes doesn't work at all. The Intel assembly dialect mnemonic table has ambiguous entries, so we need to try matching multiple times with different operand sizes, since that's the only way to choose different instruction variants. This makes us more compatible with gas's implementation of Intel assembly syntax. MSVC assumes you want byte-sized operations for the instructions that we reject as ambiguous. Reviewed By: grosbach Differential Revision: http://reviews.llvm.org/D4747 llvm-svn: 216481	2014-08-26 20:32:34 +00:00
Chandler Carruth	c34cce2ae1	[x86] Fix a bug in r216319 where I was missing a 'break'. This actually was caught by existing tests but those tests were disabled with an XFAIL because of PR20736. While working on fixing that, I noticed the test failure, and tracked it down to this. We even have a really nice Clang warning that would have caught this but it isn't enabled in LLVM! =[ I may look at enabling it. llvm-svn: 216391	2014-08-25 18:06:11 +00:00
Robert Khasanov	4316b2ca5f	[SKX] avx512_icmp_packed multiclass extension Extended avx512_icmp_packed multiclass by masking versions. Added avx512_icmp_packed_rmb multiclass for embedded broadcast versions. Added corresponding _vl multiclasses. Added encoding tests for CPCMP{EQ\|GT}* instructions. Add more fields for X86VectorVTInfo. Added AVX512VLVectorVTInfo that include X86VectorVTInfo for 512/256/128-bit versions Differential Revision: http://reviews.llvm.org/D5024 llvm-svn: 216383	2014-08-25 14:49:34 +00:00
Karthik Bhat	d94045aa5a	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Craig Topper	c2e0ae6754	Use range based for loops to avoid needing to re-mention SmallPtrSet size. llvm-svn: 216351	2014-08-24 23:23:06 +00:00
Elena Demikhovsky	f1b643ba53	X86 intrinsics table - simplifies intrinsics lowering. The tables are initialized when X86TargetLowering object is created. llvm-svn: 216345	2014-08-24 09:19:56 +00:00
Chandler Carruth	090cb1bc30	[x86] Start fixing a really subtle and terrible form of miscompile in these DAG combines. The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing a node with its operand, you can cause its uses to CSE to itself, which then causes their uses to become your uses which causes them to be picked up by the RAUW. For nodes that are determined to be "no-ops", this is "fine". But if the RAUW is one of several steps to enact a transformation, this causes the DAG to really silently eat an discard nodes that you would never expect. It took days for me to actually pinpoint a test case triggering this and a really frustrating amount of time to even comprehend the bug because I never even thought about the ability of RAUW to iteratively consume nodes due to CSE-ing them into itself. To fix this, we have to build up a brand-new chain of operations any time we are combining across (potentially) intervening nodes. But once the logic is added to do this, another issue surfaces: CombineTo eagerly deletes the one node combined, but no others. This is... really frustrating. If deleting it makes its operands become dead, those operand nodes often won't go onto the worklist in the order you would want -- they're already on it and not near the top. That means things higher on the worklist will get combined prior to these dead nodes being GCed out of the worklist, and if the chain is long, the immediate users won't be enough to re-detect where the root of the chain is that became single-use again after deleting the dead nodes. The better way to do this is to never immediately delete nodes, and instead to just enqueue them so we can recursively delete them. The combined-from node is typically not on the worklist anyways by virtue of having been popped off.... But that in turn breaks other tests that require CombineTo to delete unused nodes. :: sigh :: Fortunately, there is a better way. This whole routine should have been returning the replacement rather than using CombineTo which is quite hacky. Switch to that, and all the pieces fall together. I suspect the same kind of miscompile is possible in the half-shuffle folding code, and potentially the recursive folding code. I'll be switching those over to a pattern more like this one for safety's sake even though I don't immediately have any test cases for them. Note that the only way I got a test case for this instance was with heavily DAG combined 256-bit shuffle sequences generated by my fuzzer. ;] llvm-svn: 216319	2014-08-23 10:25:15 +00:00
Reid Kleckner	60c1740ad6	ARM / x86_64 varargs: Don't save regparms in prologue without va_start There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. llvm-svn: 216294	2014-08-22 21:59:26 +00:00
Duncan P. N. Exon Smith	9728ae62bf	Revert "X86: Align the stack on word boundaries in LowerFormalArguments()" This (mostly) reverts commit r216119. Somewhere during the review Reid committed r214980 which fixed this another way, and I neglected to check that the testcase still failed before committing. I've left test/CodeGen/X86/aligned-variadic.ll around in case it adds extra coverage. llvm-svn: 216246	2014-08-21 23:36:08 +00:00
Philip Reames	7265b924e1	Minor refactor to make applying patches from 'Add a "probe-stack" attribute' review thread out of order easier. llvm-svn: 216241	2014-08-21 22:53:49 +00:00
Philip Reames	27065e1dff	Whitespace change to reduce diff in future patch. Patch 2 of 11 in 'Add a "probe-stack" attribute' review thread Patch by: john.kare.alsaker@gmail.com llvm-svn: 216235	2014-08-21 22:19:16 +00:00
Philip Reames	5ea14801e1	[X86] Split out the logic to select the stack probe function (NFC) Patch 1 of 11 in 'Add a "probe-stack" attribute' review thread. Patch by: <john.kare.alsaker@gmail.com> llvm-svn: 216233	2014-08-21 22:15:20 +00:00
Adam Nemet	ab33858cc4	[AVX512] Add class to group common template arguments related to vector type We discussed the issue of generality vs. readability of the AVX512 classes recently. I proposed this approach to try to hide and centralize the mappings we commonly perform based on the vector type. A new class X86VectorVTInfo captures these. The idea is to pass an instance of this class to classes/multiclasses instead of the corresponding ValueType. Then the class/multiclass can use its field for things that derive from the type rather than passing all those as separate arguments. I modified avx512_valign to demonstrate this new approach. As you can see instead of 7 related template parameters we now have one. The downside is that we have to refer to fields for the derived values. I named the argument '_' in order to make this as invisible as possible. Please let me know if you absolutely hate this. (Also once we allow local initializations in multiclasses we can recover the original version by assigning the fields to local variables.) Another possible use-case for this class is to directly map things, e.g.: RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC llvm-svn: 216209	2014-08-21 19:50:07 +00:00
Josh Klontz	7806ab827f	X86AsmPrinter MCJIT MSVC bug fix. Summary: This bug was introduced in r213006 which makes an assumption that MCSection is COFF for Windows MSVC. This assumption is broken for MCJIT users where ELF is used instead [1]. The fix is to change the MCSection cast to a dyn_cast. [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-December/068407.html. Reviewers: majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4872 llvm-svn: 216173	2014-08-21 12:55:27 +00:00
Benjamin Kramer	c0369ec35d	X86: Turn redundant if into an assertion. While there remove noop casts. llvm-svn: 216168	2014-08-21 10:31:37 +00:00
Robert Khasanov	84f9d2664b	[x86] Added _addcarry_ and _subborrow_ intrinsics llvm-svn: 216164	2014-08-21 09:43:43 +00:00
Robert Khasanov	350e87272b	[x86] SMAP: added HasSMAP attribute for CLAC/STAC, corrected attributes llvm-svn: 216163	2014-08-21 09:34:12 +00:00
Robert Khasanov	3a54da967f	[x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32\|64} intrinsics to LLVM. llvm-svn: 216162	2014-08-21 09:27:00 +00:00
Robert Khasanov	741d0742f0	[x86] Enable Broadwell target. Added FeatureSMAP. Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP llvm-svn: 216161	2014-08-21 09:16:12 +00:00
Sanjay Patel	3effcd99a5	Don't prevent a vselect of constants from becoming a single load (PR20648). Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648 This patch checks the operands of a vselect to see if all values are constants. If yes, bail out of any further attempts to create a blend or shuffle because SelectionDAGLegalize knows how to turn this kind of vselect into a single load. This already happens for machines without SSE4.1, so the added checks just send more targets down that path. Differential Revision: http://reviews.llvm.org/D4934 llvm-svn: 216121	2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith	305e4c4ae6	X86: Align the stack on word boundaries in LowerFormalArguments() The goal of the patch is to implement section 3.2.3 of the AMD64 ABI correctly. The controlling sentence is, "The size of each argument gets rounded up to eightbytes. Therefore the stack will always be eightbyte aligned." The equivalent sentence in the i386 ABI page 37 says, "At all times, the stack pointer should point to a word-aligned area." For both architectures, the stack pointer is not being rounded up to the nearest eightbyte or word between the last normal argument and the first variadic argument. Patch by Thomas Jablin! llvm-svn: 216119	2014-08-20 19:40:59 +00:00
Keno Fischer	1dbe2693fc	Do not insert a tail call when returning multiple values on X86 Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530. The problem is that X86ISelLowering erroneously thought the third call was eligible for tail call elimination. It would have been if it's return value was actually the one returned by the calling function, but here that is not the case and additional values are being returned. Test Plan: Test case from the original bug report is included. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D4968 llvm-svn: 216117	2014-08-20 19:00:37 +00:00
Pavel Chupin	77b41f178f	[x32] Fix FrameIndex check in SelectLEA64_32Addr Summary: Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new lea-5.ll case. Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in ESP/EBP case. Test Plan: lea tests modified to include x32/nacl and new test added Reviewers: nadav, dschuff, t.p.northover Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4929 llvm-svn: 216065	2014-08-20 11:59:22 +00:00
Juergen Ributzka	9c8880d176	Reapply [FastISel][X86] Add large code model support for materializing floating-point constants (r215595). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 216012	2014-08-19 19:44:13 +00:00
Juergen Ributzka	0d6f36970b	Reapply [FastISel][X86] Use XOR to materialize the "0" value (r215594). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. llvm-svn: 216011	2014-08-19 19:44:10 +00:00
Juergen Ributzka	496a8f883b	Reapply [FastISel][X86] Emit more efficient instructions for integer constant materialization (r215593). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 216010	2014-08-19 19:44:06 +00:00
Akira Hatanaka	857513b388	[X86, X87 stackifier] Do not mark an operand of a debug instruction as kill. <rdar://problem/16952634> llvm-svn: 215962	2014-08-19 02:09:57 +00:00
Quentin Colombet	191766f771	[X86][Haswell][SchedModel] Tidy up. <rdar://problem/15607571> llvm-svn: 215924	2014-08-18 17:56:01 +00:00
Quentin Colombet	35ae8395d0	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Other instructions. <rdar://problem/15607571> llvm-svn: 215923	2014-08-18 17:55:59 +00:00
Quentin Colombet	339e7a4ae7	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Logic instructions. <rdar://problem/15607571> llvm-svn: 215922	2014-08-18 17:55:56 +00:00
Quentin Colombet	a553451324	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Math instructions. <rdar://problem/15607571> llvm-svn: 215921	2014-08-18 17:55:53 +00:00
Quentin Colombet	d6c4c7ce9b	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215920	2014-08-18 17:55:51 +00:00
Quentin Colombet	7c1df6f078	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Conversion instructions. <rdar://problem/15607571> llvm-svn: 215919	2014-08-18 17:55:49 +00:00
Quentin Colombet	f82b53ca5a	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point XMM and YMM instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215918	2014-08-18 17:55:46 +00:00
Quentin Colombet	2138e9d6a6	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Other instructions. <rdar://problem/15607571> llvm-svn: 215917	2014-08-18 17:55:43 +00:00
Quentin Colombet	5564e8d426	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Logic instructions. <rdar://problem/15607571> llvm-svn: 215916	2014-08-18 17:55:41 +00:00
Quentin Colombet	1e0ae9ec68	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215915	2014-08-18 17:55:39 +00:00
Quentin Colombet	2e17eeecda	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer MMX and XMM instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215914	2014-08-18 17:55:36 +00:00
Quentin Colombet	5a5bf20c9d	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point x87 instructions. Sub-group: Math instructions. <rdar://problem/15607571> llvm-svn: 215913	2014-08-18 17:55:32 +00:00
Quentin Colombet	7cb8772661	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point x87 instructions. Sub-group: Arithmetic instructions. <rdar://problem/15607571> llvm-svn: 215912	2014-08-18 17:55:29 +00:00
Quentin Colombet	e9298615cc	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Floating Point x87 instructions. Sub-group: Move instructions. <rdar://problem/15607571> llvm-svn: 215911	2014-08-18 17:55:26 +00:00
Quentin Colombet	1f6b927d67	[X86][Haswell][SchedModel] Add architecture specific scheduling models. Group: Integer instructions. Sub-group: Other instructions. <rdar://problem/15607571> llvm-svn: 215910	2014-08-18 17:55:23 +00:00

1 2 3 4 5 ...

10605 Commits