llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00

Author	SHA1	Message	Date
Samuel Antao	83b3411742	Fix bug in GPR to FPR moves in PPC64LE. The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem. llvm-svn: 219441	2014-10-09 20:42:56 +00:00
Benjamin Kramer	4edebdbee3	Remove a compiler bug workaround from 2007. The affected versions of gcc are long gone. NFC. llvm-svn: 219433	2014-10-09 19:50:39 +00:00
Matt Arsenault	15f502932e	Fix typo llvm-svn: 219429	2014-10-09 19:15:15 +00:00
Tom Stellard	7496c3d0fc	R600/SI: Legalize CopyToReg during instruction selection The instruction emitter will crash if it encounters a CopyToReg node with a non-register operand like FrameIndex. llvm-svn: 219428	2014-10-09 19:06:00 +00:00
Lang Hames	e0ce993042	[PBQP] Add missing headers from r219421. llvm-svn: 219425	2014-10-09 18:36:59 +00:00
Lang Hames	1ac2927c37	[PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint). This patch removes the PBQPBuilder class and its subclasses and replaces them with a composable constraints class: PBQPRAConstraint. This allows constraints that are only required for optimisation (e.g. coalescing, soft pairing) to be mixed and matched. This patch also introduces support for target writers to supply custom constraints for their targets by overriding a TargetSubtargetInfo method: std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const; This patch should have no effect on allocations. llvm-svn: 219421	2014-10-09 18:20:51 +00:00
Tom Stellard	ffb5a36502	R600/SI: Legalize INSERT_SUBREG instructions during PostISelFolding LLVM assumes INSERT_SUBREG will always have register operands, so we need to legalize non-register operands, like FrameIndexes, to avoid random assertion failures. llvm-svn: 219420	2014-10-09 18:09:15 +00:00
Bill Schmidt	ddf2d00a6b	[PPC64] VSX indexed-form loads use wrong instruction format The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x incorrectly use the XForm_1 instruction format, rather than the XX1Form instruction format. This is likely a pasto when creating these instructions, which were based on lvx and so forth. This patch uses the correct format. The existing reformatting test (test/MC/PowerPC/vsx.s) missed this because the two formats differ only in that XX1Form has an extension to the target register field in bit 31. The tests for these instructions used a target register of 7, so the default of 0 in bit 31 for XForm_1 didn't expose a problem. For register numbers 32-63 this would be noticeable. I've changed the test to use higher register numbers to verify my change is effective. llvm-svn: 219416	2014-10-09 17:51:35 +00:00
Kevin Qin	33566b5879	[AArch64] Enable partial & runtime unrolling on cortex-a57. llvm-svn: 219401	2014-10-09 10:13:27 +00:00
Robert Khasanov	625ba0e53e	[AVX512] Extended avx512_binop_rm for AVX512VL subsets. Added avx512_binop_rm_vl multiclass for VL subset Added encoding tests llvm-svn: 219390	2014-10-09 08:38:48 +00:00
Bob Wilson	c80c9c8124	Use triple's isiOS() and isOSDarwin() methods. These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386	2014-10-09 05:43:30 +00:00
Eric Christopher	f9e1101078	Remove unused argument to CreateTargetScheduleState and change the TargetMachine to a TargetSubtargetInfo since everything we wanted is off of that. llvm-svn: 219382	2014-10-09 01:59:35 +00:00
Adam Nemet	248c8e9281	[AVX512] Rename AVX512_masking* to AVX512_maskable* No functional change. This is the current AVX512_maskable multiclass hierarchy: maskable_custom / \ / \ maskable_common maskable_in_asm / \ / \ maskable maskable_3src llvm-svn: 219363	2014-10-08 23:25:39 +00:00
Adam Nemet	9fae2c02a0	[AVX512] Intrinsics for vextract*x4 This adds the Pat<>'s for the intrinsics. These are necessary because we don't lower these intrinsics to SDNodes but match them directly. See the rational in the previous commit. llvm-svn: 219362	2014-10-08 23:25:37 +00:00
Adam Nemet	671fc00888	[AVX512] Add asm-only support for vextractx4 masking variants These derive from the new asm-only masking definitions. Unfortunately I wasn't able to find a ISel pattern that we could legally generate for the masking variants. The problem is that since the destination is v4 we would need VK4 register classes and v4i1 value types to express the masking. These are however not legal types/classes in AVX512f but only in VL, so things get complicated pretty quickly. We can revisit this question later if we have a more pressing need to express something like this. So the ISel patterns are empty for the masking instructions and the next patch will add Pat<>s instead to match the intrinsics calls with instructions. llvm-svn: 219361	2014-10-08 23:25:33 +00:00
Adam Nemet	701fdeb2f8	[AVX512] Move DAG for all-zero node to X86VectorVTInfo No functional change. No change in X86.td.expanded except for the appearance of the new attributes. The new attributes will be used in the subsequent patch. llvm-svn: 219360	2014-10-08 23:25:31 +00:00
Adam Nemet	81fdcfd475	[AVX512] Peel off an asm-only class from AVX512_masking_common. No functional change. This enables the generation of masking instructions that don't provide a ISel pattern. llvm-svn: 219358	2014-10-08 23:25:23 +00:00
Robin Morisset	c53597d7c5	[X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slow llvm-svn: 219357	2014-10-08 23:16:23 +00:00
Robin Morisset	2f2a857e2b	[X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load()) Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 llvm-svn: 219336	2014-10-08 19:38:18 +00:00
Robert Khasanov	6a848d2b84	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VPCMP/VPCMPU{BWDQ} Added CMP_MASK_CC intrinsic type. Added tests for intrinsics. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 219316	2014-10-08 15:49:26 +00:00
Robert Khasanov	643ca10b43	[AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking. Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. llvm-svn: 219310	2014-10-08 14:37:45 +00:00
Renato Golin	23c290356f	Emit unaligned access build attribute for ARM Patch by Charlie Turner. llvm-svn: 219301	2014-10-08 12:26:22 +00:00
Renato Golin	89819e0935	Refactor isThumb1Only() && isMClass() into a predicate called isV6M() This must be enforced for all v6M cores, not just the cortex-m0, irregardless of the user-specified alignment. Patch by Charlie Turner. llvm-svn: 219300	2014-10-08 12:26:16 +00:00
Renato Golin	3465effae4	Simplify switch statement in ARM subtarget align access This switch can be reduced to a simpler if/else statement. Patch by Charlie Turner. llvm-svn: 219299	2014-10-08 12:26:13 +00:00
Eric Christopher	ce3f63df4d	Cache TargetLowering on SelectionDAGISel and update previous calls to getTargetLowering() with the cached variable. llvm-svn: 219284	2014-10-08 07:32:17 +00:00
Chad Rosier	75e17097bb	[AArch64] Generate vector signed/unsigned mul and mla/mls long. Phabricator Revision: http://reviews.llvm.org/D5589 Patch by Balaram Makam <bmakam@codeaurora.org>!! llvm-svn: 219276	2014-10-08 02:31:24 +00:00
Robin Morisset	a701fac7bc	[X86] Fix a bug with fetch_add(INT32_MIN) Summary: Fix pr21099 The pseudocode of what we were doing (spread through two functions) was: if (operand.doesNotFitIn32Bits()) Opc.initializeWithFoo(); if (operand < 0) operand = -operand; if (operand.doesFitIn8Bits()) Opc.initializeWithBar(); else if (operand.doesFitIn32Bits()) Opc.initializeWithBlah(); doStuff(Opc); So for operand == INT32_MIN, Opc was never initialized because the operand changes from fitting in 32 bits to not fitting, causing the various bugs/error messages noted by pr21099. This patch adds an extra test at the beginning for this case, and an llvm_unreachable to have better error message if the operand ends up not fitting in 32-bits at the end. Test Plan: new test + make check Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5655 llvm-svn: 219257	2014-10-07 23:53:57 +00:00
Tom Stellard	27b55923bb	R600/SI: Refactor VOP3 instruction defs llvm-svn: 219256	2014-10-07 23:51:41 +00:00
Tom Stellard	55377634e4	R600/SI: Refactor VOPC instruction defs llvm-svn: 219255	2014-10-07 23:51:39 +00:00
Tom Stellard	3b3e07ea49	R600/SI: Refactor VOP2 instruction defs llvm-svn: 219254	2014-10-07 23:51:38 +00:00
Tom Stellard	7f424810c6	R600/SI: Refactor VOP1 instruction defs llvm-svn: 219253	2014-10-07 23:51:34 +00:00
Matt Arsenault	78b9619d35	R600: Remove dead code llvm-svn: 219242	2014-10-07 21:29:56 +00:00
Tom Stellard	d5474640ef	R600: Remove some redundant initializations from AMDGPUMCAsmInfo llvm-svn: 219238	2014-10-07 21:09:25 +00:00
Tom Stellard	465af285b8	R600: Use MCAsmInfoELF as AMDGPUMCAsmInfo base class The main reason for this is that the MCAsmInfo class, which we were previously using as the base class, sets PrivateGlobalPrefix to "L", which causes all global functions that start with L to be treated as local symbols. MCAsmInfoELF sets PrivateGlobalPrefix to ".L", which is what we want, and it is probably a good idea to use this as the base class anyway, since we are emitting ELF binaries. llvm-svn: 219237	2014-10-07 21:09:23 +00:00
Tom Stellard	733db75950	R600/SI: Remove assertion in SIInstrInfo::areLoadsFromSameBasePtr() Added a FIXME coment instead, we need to handle the case where the two DS instructions being compared have different numbers of operands. llvm-svn: 219236	2014-10-07 21:09:20 +00:00
Yuri Gorshenin	e2f4949ea2	[asan-asm-instrumentation] CFI directives are generated for .S files. Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 llvm-svn: 219199	2014-10-07 11:03:09 +00:00
Daniel Sanders	1dd7a136d4	[mips] Return {f128} correctly for N32/N64. Summary: According to the ABI documentation, f128 and {f128} should both be returned in $f0 and $f2. However, this doesn't match GCC's behaviour which is to return f128 in $f0 and $f2, but {f128} in $f0 and $f1. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5578 llvm-svn: 219196	2014-10-07 09:29:59 +00:00
Craig Topper	645553b9fd	[X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. llvm-svn: 219194	2014-10-07 07:29:50 +00:00
Craig Topper	0d4a8e6f6c	Formatting fixes. Most putting 'else' on the same line as the preceding curly brace. llvm-svn: 219193	2014-10-07 07:29:48 +00:00
Craig Topper	114c41d35c	Fix filename in header and use C++ version of the C header files. llvm-svn: 219192	2014-10-07 07:29:46 +00:00
Juergen Ributzka	06c32d25cd	[FastISel][AArch64] Teach the address computation code to also fold sign-/zero-extends. The code already folds sign-/zero-extends, but only if they are arguments to mul and shift instructions. This extends the code to also fold them when they are direct inputs. llvm-svn: 219187	2014-10-07 03:40:06 +00:00
Juergen Ributzka	6619d4d4b3	[FastISel][AArch64] Teach the address computation to also fold sub instructions. Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186	2014-10-07 03:40:03 +00:00
Juergen Ributzka	d396480465	[FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction." This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185	2014-10-07 03:39:59 +00:00
NAKAMURA Takumi	0500a87f46	ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable] llvm-svn: 219172	2014-10-06 23:48:04 +00:00
Tim Northover	8dbc8e49a9	ARM: silence unused variable warning llvm-svn: 219128	2014-10-06 17:26:36 +00:00
Tim Northover	57191702fc	ARM: remove dead InstPrinting code This instruction form is handled by different AsmOperands now, so the code is completely dead (and wrong anyway). llvm-svn: 219127	2014-10-06 17:10:13 +00:00
Benjamin Kramer	55428105eb	X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't convert them anymore. llvm-svn: 219112	2014-10-06 09:56:40 +00:00
Eric Christopher	faca264c55	Add subtarget caches to aarch64, arm, ppc, and x86. These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106	2014-10-06 06:45:36 +00:00
Chandler Carruth	88628181c0	[x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd. This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090	2014-10-05 22:57:31 +00:00
Benjamin Kramer	fac673d6bc	X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084	2014-10-05 16:14:29 +00:00
Elena Demikhovsky	0be5f4deeb	AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q. This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083	2014-10-05 14:11:08 +00:00
Chandler Carruth	aa7f8c811b	[x86] Fix PR21139, one of the last remaining regressions found in the new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081	2014-10-05 12:07:34 +00:00
Chandler Carruth	b8978f2ab2	[x86] Teach the new vector shuffle lowering how to lower 128-bit shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079	2014-10-05 11:41:36 +00:00
NAKAMURA Takumi	242f8cc95b	HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation] llvm-svn: 219073	2014-10-05 04:54:54 +00:00
NAKAMURA Takumi	ad3b39eff9	HexagonDesc: Update LLVMBuild.txt. llvm-svn: 219071	2014-10-05 04:54:29 +00:00
Benjamin Kramer	a961a36aad	[SystemZ] Make operator bool explicit. NFC. llvm-svn: 219069	2014-10-04 22:44:35 +00:00
Benjamin Kramer	860521c88b	Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it. llvm-svn: 219068	2014-10-04 22:44:29 +00:00
Benjamin Kramer	7db3ef45b9	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
Chandler Carruth	5063f25595	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Jingyue Wu	4a186967a9	Add fake use to suppress defined-but-unused warnings llvm-svn: 219045	2014-10-04 03:50:10 +00:00
Chandler Carruth	7001fb9ace	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful. It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044	2014-10-04 02:51:03 +00:00
Chandler Carruth	b73b4f12a1	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042	2014-10-04 01:05:48 +00:00
Matt Arsenault	c421684bad	R600/SI: Custom lower f64 -> i64 conversions llvm-svn: 219038	2014-10-03 23:54:56 +00:00
Matt Arsenault	7b24655980	R600: Custom lower [s\|u]int_to_fp for i64 -> f64 llvm-svn: 219037	2014-10-03 23:54:41 +00:00
Matt Arsenault	2456242394	R600/SI: Fix ftrunc f64 conformance failures. Re-add the tests since they were deleted at some point llvm-svn: 219036	2014-10-03 23:54:27 +00:00
Chandler Carruth	74c4b81b56	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033	2014-10-03 22:43:17 +00:00
Chandler Carruth	13d884e744	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Richard Smith	e3953b4126	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
Adam Nemet	3fe531df7b	[ISel] Keep matching state consistent when folding during X86 address match In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009	2014-10-03 20:00:34 +00:00
Tom Stellard	e64393bd73	R600: Align functions to 256 bytes llvm-svn: 219002	2014-10-03 19:02:02 +00:00
Benjamin Kramer	4c9fb3d669	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Robin Morisset	95772cea0c	[Power] Use lwsync for non-seq_cst fences Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995	2014-10-03 18:04:36 +00:00
Hans Wennborg	56b634e6ac	MipsAsmParser.cpp: fix VS2012 build llvm-svn: 218991	2014-10-03 17:16:24 +00:00
Hans Wennborg	52ee146fd6	HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012 llvm-svn: 218990	2014-10-03 17:02:28 +00:00
Daniel Sanders	208d0fa4ef	[mips] Print warning when using register names not available in N32/64 Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989	2014-10-03 15:37:37 +00:00
Sid Manning	d00c41c965	Fix build break on Hexagon Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987	2014-10-03 13:59:01 +00:00
Sid Manning	4435e352ae	Adding skeleton for unit testing Hexagon Code Emission Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986	2014-10-03 13:18:11 +00:00
Chandler Carruth	72b6e493f2	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985	2014-10-03 13:11:13 +00:00
Renato Golin	aea9ec6761	Revert 202433 - Provide a target override for the latest regalloc heuristic That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981	2014-10-03 12:20:53 +00:00
Chandler Carruth	8fe3dae595	[x86] Refactor the element insertion logic in the new vector shuffle lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980	2014-10-03 12:01:55 +00:00
Chandler Carruth	4c00e4aeb0	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977	2014-10-03 11:25:58 +00:00
Chandler Carruth	9f743353a1	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974	2014-10-03 10:11:39 +00:00
Eric Christopher	47c04156aa	constify TargetMachine parameter. llvm-svn: 218934	2014-10-03 00:42:41 +00:00
Eric Christopher	2cffcde1ca	constify TargetMachine argument. llvm-svn: 218930	2014-10-03 00:17:59 +00:00
Eric Christopher	f48003a13a	We can grab the options struct from the TargetMachine, no need to pass it down in the constructor. llvm-svn: 218929	2014-10-03 00:10:03 +00:00
Adam Nemet	9a91f09952	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928	2014-10-02 23:18:30 +00:00
Adam Nemet	bcff277351	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927	2014-10-02 23:18:28 +00:00
Adam Nemet	36d7986f4b	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926	2014-10-02 23:18:26 +00:00
Hal Finkel	2093a3cb26	[PowerPC] Modern Book-E cores support sync Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923	2014-10-02 22:34:22 +00:00
Robin Morisset	8895df3e75	[Power] Improve the expansion of atomic loads/stores Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922	2014-10-02 22:27:07 +00:00
Juergen Ributzka	2eab1ef77e	[Stackmaps] Make ithe frame-pointer required for stackmaps. Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920	2014-10-02 22:21:49 +00:00
Chandler Carruth	960ea9802e	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911	2014-10-02 21:37:14 +00:00
Tilmann Scheller	7617d101f0	[NVPTX] Remove dead code. Found by the Clang static analyzer. llvm-svn: 218874	2014-10-02 15:12:48 +00:00
Joerg Sonnenberger	ba894344cc	Support padding unaligned data in .text. llvm-svn: 218870	2014-10-02 13:41:42 +00:00
Chandler Carruth	3f933250d5	[x86] Improve and correct how the new vector shuffle lowering was matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we could lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. llvm-svn: 218839	2014-10-01 23:14:28 +00:00
Eric Christopher	3eb7c19a39	constify the TargetMachine argument used in the subtarget and lowering constructors. llvm-svn: 218832	2014-10-01 21:36:28 +00:00
Sanjay Patel	ae0d32c510	Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578 Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 llvm-svn: 218822	2014-10-01 21:20:06 +00:00
Eric Christopher	bce38d60f8	Now that the optimization level is adjusting the feature string before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817	2014-10-01 21:05:35 +00:00
Eric Christopher	4ce55b7a5c	Rework the PPC TargetMachine so that the non-function specific overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805	2014-10-01 20:38:26 +00:00
Eric Christopher	5245670b4d	constify TargetMachine parameter for X86TargetLowering. llvm-svn: 218804	2014-10-01 20:38:22 +00:00
Sanjay Patel	923727403c	Don't repeat function/variable name in comment. NFC. llvm-svn: 218791	2014-10-01 19:39:32 +00:00
Tim Northover	64a066f76a	ARM: allow copying of CPSR when all else fails. As with x86 and AArch64, certain situations can arise where we need to spill CPSR in the middle of a calculation. These should be avoided where possible (MRS/MSR is rather expensive), which ARM is actually better at than the other two since it tries to Glue defs to uses, but as a last ditch effort, copying is better than crashing. rdar://problem/18011155 llvm-svn: 218789	2014-10-01 19:21:03 +00:00
Adrian Prantl	2b1df58ebe	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Reed Kotler	f0d9d7da37	Add fptrunc to mips fast-sel Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel Test Plan: fptrunc.ll checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: rfuhler Differential Revision: http://reviews.llvm.org/D5553 llvm-svn: 218785	2014-10-01 18:47:02 +00:00
Adrian Prantl	0959156fa3	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	229943585f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Tom Stellard	7f89ae5275	R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table llvm-svn: 218776	2014-10-01 17:15:17 +00:00
Toma Tabacu	ff071cc027	[mips] Rename emit and parse functions for the .cpload assembler directive. NFC. Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 llvm-svn: 218768	2014-10-01 14:53:19 +00:00
Tom Stellard	ed0eb691ca	R600/SI: Add a generic pseudo EXP instruction llvm-svn: 218767	2014-10-01 14:44:45 +00:00
Tom Stellard	c3d0f56e3e	R600/SI: Add generic pseudo MTBUF instructions llvm-svn: 218766	2014-10-01 14:44:43 +00:00
Tom Stellard	a2e56fb3c0	R600/SI: Add generic pseudo SMRD instructions llvm-svn: 218765	2014-10-01 14:44:42 +00:00
Oliver Stannard	d5cdc39fe7	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5 Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763	2014-10-01 13:13:18 +00:00
Chandler Carruth	327441cee3	[x86] Fix a few more tiny patterns with the new vector shuffle lowering that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns just right. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're going to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756	2014-10-01 11:14:02 +00:00
Chandler Carruth	52fe0fc691	[x86] Delete some extraneous logic from the new vector shuffle lowering. Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755	2014-10-01 11:13:57 +00:00
Tom Coxon	50ff005894	[AArch64] Allow access to all system registers with MRS/MSR instructions. The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. llvm-svn: 218753	2014-10-01 10:13:59 +00:00
Asiri Rathnayake	0e12aa2ad9	Add missing natual vector cast. Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751	2014-10-01 09:59:45 +00:00
Oliver Stannard	f4985610cd	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM) The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747	2014-10-01 09:02:17 +00:00
Daniel Sanders	57fd199eaa	[mips] Fix disassembly of [ls][wd]c[23], cache, and pref Fixes PR21015, and PR20993. Patch by Jun Koi llvm-svn: 218745	2014-10-01 08:26:55 +00:00
Sasa Stankovic	f65f1c04c4	[mips] For indirect calls we don't need $gp to point to .got. Mips linker doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 llvm-svn: 218744	2014-10-01 08:22:21 +00:00
Nick Lewycky	abd4d05f7d	Fix typo in comment from r218733 llvm-svn: 218739	2014-10-01 03:37:34 +00:00
Chandler Carruth	6ffdf68855	[x86] Teach the new vector shuffle lowering to be even more aggressive in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. llvm-svn: 218734	2014-10-01 03:19:43 +00:00
Chandler Carruth	82405d868b	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733	2014-10-01 02:25:54 +00:00
Adam Nemet	d7923b8d25	[AVX512] Remove space before \t in AsmStrings. llvm-svn: 218725	2014-10-01 00:41:32 +00:00
Chandler Carruth	ed2c9efc13	[x86] Teach the new vector shuffle lowering about VBROADCAST and VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. llvm-svn: 218724	2014-10-01 00:41:21 +00:00
Sanjay Patel	c095454ca2	Split the estimate() interface into separate functions for each type. NFC. It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698	2014-09-30 20:28:48 +00:00
Juergen Ributzka	4b2325a925	Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218693	2014-09-30 19:59:35 +00:00
Matt Arsenault	7bc4ebe27d	R600/SI: Fix printing of clamp and omod No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. llvm-svn: 218692	2014-09-30 19:49:48 +00:00
Matt Arsenault	ace0d8c9eb	R600/SI: Update VOP3b to not include obsolete operands abs / neg are now part of the srcN_modifiers operands llvm-svn: 218691	2014-09-30 19:49:43 +00:00
Reed Kotler	b2d48eb5a0	Add numeric extend, trunctate to mips fast-isel Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 llvm-svn: 218681	2014-09-30 16:30:13 +00:00
Tom Coxon	856ab42e33	[AArch64] Remove unnecessary whitespace. (Test commit) llvm-svn: 218680	2014-09-30 16:23:16 +00:00
Robert Khasanov	f86e0f1129	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670	2014-09-30 12:15:52 +00:00
Robert Khasanov	61a1201dfa	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ} Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669	2014-09-30 11:41:54 +00:00
Robert Khasanov	2972b6033d	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW. Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668	2014-09-30 11:32:22 +00:00
Robert Khasanov	06de3de89a	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ. Added CMP_MASK intrinsic type llvm-svn: 218667	2014-09-30 11:19:50 +00:00
Job Noorman	43505909b0	Make sure aggregates are properly alligned on MSP430. llvm-svn: 218665	2014-09-30 11:15:44 +00:00
Chandler Carruth	d8a4e1fee3	[x86] Revert r218588, r218589, and r218600. These patches were pursuing a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658	2014-09-30 02:52:28 +00:00
Matt Arsenault	7e02861e45	Fix missing C++ mode comment llvm-svn: 218654	2014-09-30 01:05:27 +00:00
Juergen Ributzka	040a60a3d3	[FastISel][AArch64] Fold sign-/zero-extends into the load instruction. The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653	2014-09-30 00:49:58 +00:00
Juergen Ributzka	d6d5162a97	[FastISel][AArch64] Factor out scale factor calculation. NFC. Factor out the code that determines the implicit scale factor of memory operations for a given value type. llvm-svn: 218652	2014-09-30 00:49:54 +00:00
Eric Christopher	e90c19b249	Simplify conditional. llvm-svn: 218643	2014-09-29 23:31:13 +00:00
Adam Nemet	b414c07349	[AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs No functionality change. Makes the code more compact (see the FMA part). This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only defined this in the simple cases. See the commment before the attribute. Diff of X86.td.expanded before and after is empty except for the appearance of the new attribute. llvm-svn: 218637	2014-09-29 22:54:41 +00:00
Eric Christopher	dbc01e6de2	Add soft-float to the key for the subtarget lookup in the TargetMachine map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. llvm-svn: 218632	2014-09-29 21:57:54 +00:00
Eric Christopher	9cf7af11f4	Fix spelling and reflow comments. llvm-svn: 218631	2014-09-29 21:57:52 +00:00
Dave Estes	a9a3195105	[AArch64] Refines the Cortex-A57 Machine Model Primarily refines all of the instructions with accurate latency and micro-op information. Refinements largely focus on the NEON instructions. Additionally, a few advanced features are modeled, including forwarding for MAC instructions and hazards for floating point SQRT and DIV. Lastly, the issue-width is reduced to three so that the scheduler will better accommodate the narrower decode and dispatch width. llvm-svn: 218627	2014-09-29 21:27:36 +00:00
Matt Arsenault	e1c9443ceb	Fix include order llvm-svn: 218611	2014-09-29 15:53:15 +00:00
Matt Arsenault	2f7198a3f7	R600/SI: Fix hardcoded values for modifiers. Move enums to SIDefines.h llvm-svn: 218610	2014-09-29 15:50:26 +00:00
Matt Arsenault	1eee08e29a	R600/SI: Also fix fsub + fadd a, a to mad combines llvm-svn: 218609	2014-09-29 14:59:38 +00:00
Matt Arsenault	7f25a33b1a	R600/SI: Fix using mad with multiplies by 2 These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608	2014-09-29 14:59:34 +00:00
Chad Rosier	df7883744f	[AArch64] Improve cost model to handle sdiv by a pow-of-two. This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607	2014-09-29 13:59:31 +00:00
Oliver Stannard	e94bc91850	[Thumb2] ldrexd and strexd are not defined on v7M The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. llvm-svn: 218603	2014-09-29 10:57:29 +00:00
Chandler Carruth	63f6a480c4	[x86] Make the new vector shuffle lowering lower blends as VSELECT nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't actually bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600	2014-09-29 09:57:07 +00:00
Chandler Carruth	79ab812756	[x86] Delete a bunch of really bad and totally unnecessary code in the X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589	2014-09-29 02:01:20 +00:00
Chandler Carruth	99968311b6	[x86] Refactor all of the VSELECT-as-blend lowering code to avoid domain crossing and generally work more like the blend emission code in the new vector shuffle lowering. My goal is to have the new vector shuffle lowering just produce VSELECT nodes that are either matched here to BLENDI or are legal and matched in the .td files to specific blend instructions. That seems much cleaner as there are other ways to produce a VSELECT anyways. =] No observable functionality changed yet, mostly because this code appears to be near-dead. The behavior of this lowering routine did change though. This code being mostly dead and untestable will change with my next commit which will also point some new tests at it. llvm-svn: 218588	2014-09-29 01:32:54 +00:00
Chandler Carruth	49d664236d	[x86] Improve naming and comments for VSELECT lowering. No functionality changed. llvm-svn: 218586	2014-09-29 00:51:58 +00:00
Chandler Carruth	341be5ad1a	[x86] Add the dispatch skeleton to the new vector shuffle lowering for AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585	2014-09-29 00:37:27 +00:00
Chandler Carruth	f07e666623	[x86] Make the split-and-lower routine fully generic by relaxing the assertion, making the name generic, and improving the documentation. Step 1 in adding very primitive support for AVX-512. No functionality changed yet. llvm-svn: 218584	2014-09-29 00:21:49 +00:00
Chandler Carruth	f27811132f	[x86] Teach the new vector shuffle lowering to fall back on AVX-512 vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 very closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. llvm-svn: 218583	2014-09-28 23:53:10 +00:00
Chandler Carruth	b85e2503e5	[x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2 lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there only the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these should be. llvm-svn: 218582	2014-09-28 23:23:55 +00:00
Chandler Carruth	5878dec402	[x86] Fix a really silly bug that I introduced fixing another bug in the new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: llvm-svn: 218576	2014-09-28 06:11:04 +00:00
Chandler Carruth	773e8426f8	[x86] Fix yet another bug in the new vector shuffle lowering's handling of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. llvm-svn: 218575	2014-09-28 03:30:25 +00:00
Chandler Carruth	3dae5f12b2	[x86] Fix yet another issue with widening vector shuffle elements. I spotted this by inspection when debugging something else, so I have no test case what-so-ever, and am not even sure it is possible to realistically trigger the bug. But this is what was intended here. llvm-svn: 218565	2014-09-27 08:40:33 +00:00
Chandler Carruth	602ec05b20	[x86] Fix terrible bugs everywhere in the new vector shuffle lowering and in the target shuffle combining when trying to widen vector elements. Previously only one of these was correct, and we didn't correctly propagate zeroing target shuffle masks (which have a different sentinel value from undef in non- target shuffle masks now). This isn't just a missed optimization, this caused us to drop zeroing shuffles on the floor and miscompile code. The added test case is one example of that. There are other fixes to the test suite as a consequence of this as well as restoring the undef elements in some of the masks that were lost when I brought sanity to the actual value of the undef and zero sentinels. I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW combining code, but that code really needs to go. It was a nice initial attempt, but it isn't very principled and the recursive shuffle combiner is much more powerful. llvm-svn: 218562	2014-09-27 04:42:44 +00:00
Chandler Carruth	f284b12de2	[x86] Flip the sentinel values used in the target shuffle mask decoding to significantly more sane sentinels. Notably, everywhere else in the backend's representation of shuffles uses '-1' to represent undef. The target shuffle masks really shouldn't diverge from that, especially as in a few places they are manipulated by shared code. This causes us to lose some undef lanes in various test masks. I want to get these back, but technically it isn't invalid and there are a lot of bugs here so I want to try to establish a saner baseline for fixing some of the bugs by aligning the specific senitnel values used. llvm-svn: 218561	2014-09-27 04:42:39 +00:00
Sanjay Patel	98a98574c5	Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553	2014-09-26 23:01:47 +00:00
Chandler Carruth	0f4ad15770	[x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic that managed to elude all of my fuzz testing historically. =/ Something changed to allow this code path to actually be exercised and it was doing bad things. It is especially heavily exercised by the patterns that emerge when doing AVX shuffles that end up lowered through the 128-bit code path. llvm-svn: 218540	2014-09-26 20:41:45 +00:00
Matt Arsenault	5ecfa9e38d	R600/SI: Use break instead of continue If an instruction doesn't have src1, it doesn't have src2 llvm-svn: 218536	2014-09-26 17:55:14 +00:00
Matt Arsenault	c05b3987ac	R600/SI: Add a note about the order of the operands to div_scale llvm-svn: 218534	2014-09-26 17:55:09 +00:00
Matt Arsenault	65a0508cab	R600/SI: Move finding SGPR operand to move to separate function llvm-svn: 218533	2014-09-26 17:55:06 +00:00
Matt Arsenault	17cc9ab1c4	R600/SI Allow same SGPR to be used for multiple operands Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532	2014-09-26 17:55:03 +00:00
Matt Arsenault	6c1b5eacff	R600/SI: Partially move operand legalization to post-isel hook. Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531	2014-09-26 17:54:59 +00:00
Matt Arsenault	f9cd06ee61	R600/SI: Implement findCommutedOpIndices The base implementation of commuteInstruction is used in some cases, but it turns out this has been broken for a long time since modifiers were inserted between the real operands. The base implementation of commuteInstruction also fails on immediates, which also needs to be fixed. llvm-svn: 218530	2014-09-26 17:54:54 +00:00
Matt Arsenault	1aaec4e375	R600/SI: Don't move operands that are required to be SGPRs e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529	2014-09-26 17:54:52 +00:00
Matt Arsenault	48b91b1b68	R600/SI: Don't assert on exotic operand types This needs a test, but I'm not sure if it is currently possible and I originally hit it due to a bug. Right now the only global address operands have no reason to be VALU instructions, although it theoretically could be a problem. llvm-svn: 218528	2014-09-26 17:54:46 +00:00
Matt Arsenault	c34ac10b20	R600/SI: Fix using wrong operand indices when commuting No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527	2014-09-26 17:54:43 +00:00
Matt Arsenault	ebda7897a4	R600/SI: Remove apparently dead code in legalizeOperands No tests hit this, and I don't see any way a GlobalAddress node would survive beyond lowering on SI. It it would, the move should probably be inserted by selection. llvm-svn: 218526	2014-09-26 17:54:38 +00:00
Chandler Carruth	5dd21ccbdb	[x86] The mnemonic is SHUFPS not SHUPFS. =[ I'm very bad at spelling sadly. llvm-svn: 218524	2014-09-26 17:27:40 +00:00
Chandler Carruth	4096a52c28	[x86] In the new vector shuffle lowering, when trying to do another layer of tie-breaking sorting, it really helps to check that you're in a tie first. =] Otherwise the whole thing cycles infinitely. Test case added, another one found through fuzz testing. llvm-svn: 218523	2014-09-26 17:24:26 +00:00
Chandler Carruth	9381e0fa4c	[x86] Fix a large collection of bugs that crept in as I fleshed out the AVX support. New test cases included. Note that none of the existing test cases covered these buggy code paths. =/ Also, it is clear from this that SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[ These were all detected by fuzz-testing. (I <3 fuzz testing.) llvm-svn: 218522	2014-09-26 17:11:02 +00:00
Renato Golin	eb17383852	Elide repeated register operand in Thumb1 instructions This patch makes the ARM backend transform 3 operand instructions such as 'adds/subs' to the 2 operand version of the same instruction if the first two register operands are the same. Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'. Currently for some instructions such as 'adds' if you try to assemble 'adds r0, r0, #8' for thumb v6m the assembler would throw an error message because the immediate cannot be encoded using 3 bits. The backend should be smart enough to transform the instruction to 'adds r0, #8', which allows for larger immediate constants. Patch by Ranjeet Singh. llvm-svn: 218521	2014-09-26 16:14:29 +00:00
Andrea Di Biagio	c61458a223	[X86][SchedModel] SSE reciprocal square root instruction latencies. The SSE rsqrt instruction (a fast reciprocal square root estimate) was grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction. For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling was affecting performances. This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQER* classes with latency values based on Agner's table. Differential Revision: http://reviews.llvm.org/D5370 Patch by Simon Pilgrim. llvm-svn: 218517	2014-09-26 12:56:44 +00:00
Daniel Sanders	31d5b7be78	Fix unused variable warning added in r218509 llvm-svn: 218510	2014-09-26 10:45:26 +00:00
Daniel Sanders	4ed8535c04	[mips] Generalize the handling of f128 return values to support f128 arguments. Summary: This will allow us to handle f128 arguments without duplicating code from CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands(). No functional change. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5292 llvm-svn: 218509	2014-09-26 10:06:12 +00:00
Robert Khasanov	3e0cfb4887	[AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables. Added lowering tests for these instructions. llvm-svn: 218508	2014-09-26 09:48:50 +00:00
David Majnemer	2bce899754	Fix build breakage on MSVC 2013 llvm-svn: 218499	2014-09-26 04:47:54 +00:00
David Majnemer	f21a7fecf5	Target: Fix build breakage. No functional change intended. llvm-svn: 218497	2014-09-26 02:57:05 +00:00
Eric Christopher	a4eb9e140f	Add the first backend support for on demand subtarget creation based on the Function. This is currently used to implement mips16 support in the mips backend via the existing module pass resetting the subtarget. Things to note: a) This involved running resetTargetOptions before creating a new subtarget so that code generation options like soft-float could be recognized when creating the new subtarget. This is to deal with initialization code in isel lowering that only paid attention to the initial value. b) Many of the existing testcases weren't using the soft-float feature correctly. I've corrected these based on the check values assuming that was the desired behavior. c) The mips port now pays attention to the target-cpu and target-features strings when generating code for a particular function. I've removed these from one function where the requested cpu and features didn't match the check lines in the testcase. llvm-svn: 218492	2014-09-26 01:44:08 +00:00
Eric Christopher	3b65e1ff31	Move resetTargetOptions from taking a MachineFunction to a Function since we are accessing the TargetMachine that we're a member function of. llvm-svn: 218489	2014-09-26 01:28:10 +00:00
Matt Arsenault	5def7ffc65	R600/SI: Fix emitting trailing whitespace after s_waitcnt llvm-svn: 218486	2014-09-26 01:09:46 +00:00
Adam Nemet	75bf4d85fb	[AVX512] Simplify use of !con() No change in X86.td.expanded. llvm-svn: 218485	2014-09-26 00:53:12 +00:00
Adam Nemet	2f9330edf6	[AVX512] Pull pattern for subvector extract into the instruction definition No functional change. I initially thought that pulling the Pat<> into the instruction pattern was not possible because it was doing a transform on the index in order to convert it from a per-element (extract_subvector) index into a per-chunk (vextract*x4) index. Turns out this also works inside the pattern because the vextract_extract PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the index in $idx goes through the same conversion. The existing test CodeGen/X86/avx512-insert-extract.ll extended in the previous commit provides coverage for this change. llvm-svn: 218480	2014-09-25 23:48:49 +00:00
Adam Nemet	d8b4967294	[AVX512] Refactor subvector extracts No functional change. These are now implemented as two levels of multiclasses heavily relying on the new X86VectorVTInfo class. The multiclass at the first level that is called with float or int provides the 128 or 256 bit subvector extracts. The second level provides the register and memory variants and some more Pat<>s. I've compared the td.expanded files before and after. One change is that ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a bugfix. (BTW, this is the change that was blocked on the recent tablegen fix. The class-instance values X86VectorVTInfo inside vextract_for_type weren't properly evaluated.) Part of <rdar://problem/17688758> llvm-svn: 218478	2014-09-25 23:48:45 +00:00
Adam Nemet	1dca2e1268	[AVX512] Fix typo F->I in VEXTRACTF32x4rr. llvm-svn: 218477	2014-09-25 23:48:42 +00:00
Tom Stellard	eaa04c42f6	ARM: Remove unneeded check for MI->hasPostISelHook() llvm-svn: 218459	2014-09-25 18:59:23 +00:00
Tom Stellard	d53a286419	R600/SI: Add support for global atomic add llvm-svn: 218457	2014-09-25 18:30:26 +00:00
Robin Morisset	98b0bed638	Lower idempotent RMWs to fence+load Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 llvm-svn: 218455	2014-09-25 17:27:43 +00:00
Sid Manning	f1133d5d3f	Add missing attributes !cmp.[eq,gt,gtu] instructions. These instructions do not indicate they are extendable or the number of bits in the extendable operand. Rename to match architected names. Add a testcase for the intrinsics. llvm-svn: 218453	2014-09-25 13:09:54 +00:00
Daniel Sanders	f02986683b	Add llvm_unreachables() for [ASZ]ExtUpper to X86FastISel.cpp to appease the buildbots. llvm-svn: 218452	2014-09-25 13:08:51 +00:00
Daniel Sanders	c3ccff7583	[mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values. Summary: The N32/N64 ABI's require that structs passed in registers are laid out such that spilling the register with 'sd' places the struct at the lowest address. For little endian this is trivial but for big-endian it requires that structs are shifted into the upper bits of the register. We also require that structs passed in registers have the 'inreg' attribute for big-endian N32/N64 to work correctly. This is because the tablegen-erated calling convention implementation only has access to the lowered form of struct arguments (one or more integers of up to 64-bits each) and is unable to determine the original type. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5286 llvm-svn: 218451	2014-09-25 12:15:05 +00:00
Renato Golin	075db70d6a	Add aliases for VAND imm to VBIC ~imm On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with the same type size. Adding that logic to the parser, and generating VBIC instructions from VAND asm files. This patch also fixes the validation routines for NEON splat immediates which were wrong. Fixes PR20702. llvm-svn: 218450	2014-09-25 11:31:24 +00:00
Chandler Carruth	65ea352f53	[x86] Teach the new vector shuffle lowering to use AVX2 instructions for v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply llvm-svn: 218449	2014-09-25 11:03:55 +00:00
Chandler Carruth	1bb2adbd64	[x86] Teach the new vector shuffle lowering a fancier way to lower 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. llvm-svn: 218446	2014-09-25 10:21:15 +00:00
Oliver Stannard	d97ba3befc	[Thumb2] BXJ should be undefined for v7M, v8A The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not defined for v7M or v8A. It is defined for all other Thumb2-supporting architectures (v6T2, v7A and v7R). llvm-svn: 218445	2014-09-25 10:02:05 +00:00
Chandler Carruth	ecbf282d6a	[x86] Fix an oversight in the v8i32 path of the new vector shuffle lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. llvm-svn: 218442	2014-09-25 04:10:27 +00:00
Chandler Carruth	23706ac8f2	[x86] Rearrange the code for v16i16 lowering a bit for clarity and to reduce the amount of checking we do here. The first realization is that only non-crossing cases between 128-bit lanes are handled by almost the entire function. It makes more sense to handle the crossing cases first. THe second is that until we actually are going to generate fancy shared lowering strategies that use the repeated semantics of the v8i16 lowering, we should waste time checking for repeated masks. It is simplest to directly test for the entire unpck masks anyways, so we gained nothing from this. This also matches the structure of v32i8 more closely. No functionality changed here. llvm-svn: 218441	2014-09-25 04:03:22 +00:00
Chandler Carruth	eea8a61d43	[x86] Implement AVX2 support for v32i8 in the new vector shuffle lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. llvm-svn: 218440	2014-09-25 02:52:12 +00:00
Chandler Carruth	c64c12c522	[x86] Remove the defunct X86ISD::BLENDV entry -- we use vector selects for this now. Should prevent folks from running afoul of this and not knowing why their code won't instruction select the way I just did... llvm-svn: 218436	2014-09-25 01:16:01 +00:00
Chandler Carruth	33e481717d	[x86] Fix the v16i16 blend logic I added in the prior commit and add the missing test cases for it. Unsurprisingly, without test cases, there were bugs here. Surprisingly, this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV. It isn't wired to anything. Oops. I'll fix than next. llvm-svn: 218434	2014-09-25 01:13:38 +00:00
Akira Hatanaka	aeb16ab9f1	[X86,AVX] Add an isel pattern for X86VBroadcast. This fixes PR21050 and rdar://problem/18434607. llvm-svn: 218431	2014-09-25 00:26:15 +00:00
Chandler Carruth	98785bb00d	[x86] Implement v16i16 support with AVX2 in the new vector shuffle lowering. This also implements the fancy blend lowering for v16i16 using AVX2 and teaches the X86 backend to print shuffle masks for 256-bit PSHUFB and PBLENDW instructions. It also makes the mask decoding correct for PBLENDW instructions. The yaks, they are legion. Tests are updated accordingly. There are some missing tests for the VBLENDVB lowering, but I'll add those in a follow-up as this commit has accumulated enough cruft already. llvm-svn: 218430	2014-09-25 00:24:19 +00:00
Chandler Carruth	45f40e3ffe	[x86] Factor out the logic to generically decombose a vector shuffle into unblended shuffles and a blend. This is the consistent fallback for the lowering paths that have fast blend operations available, and its getting quite repetitive. No functionality changed. llvm-svn: 218399	2014-09-24 18:20:09 +00:00
Moritz Roth	3559e5b2b3	[Thumb] Make load/store optimizer less conservative. If it's safe to clobber the condition flags, we can do a few extra things: it's then possible to reset the base register writeback using a SUBS, so we can try to merge even if the base register isn't dead after the merged instruction. This is effectively a (heavily bug-fixed) rewrite of r208992. llvm-svn: 218386	2014-09-24 16:35:50 +00:00
Oliver Stannard	8ccbd5af71	[Thumb] 32-bit encodings of 'cps' are not valid for v7M v7M only allows the 16-bit encoding of the 'cps' (Change Processor State) instruction, and does not have the 32-bit encoding which is valid from v6T2 onwards. llvm-svn: 218382	2014-09-24 14:20:01 +00:00
Aaron Ballman	58b3f6f4cc	Silencing an "enumeral and non-enumeral type in conditional expression" warning. NFC. llvm-svn: 218381	2014-09-24 13:54:56 +00:00
Chandler Carruth	5fdf21544a	[x86] Teach the instruction lowering to add comments describing constant pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a bunch of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. llvm-svn: 218377	2014-09-24 09:39:41 +00:00
Chandler Carruth	b025884a94	[x86] More refactoring of the shuffle comment emission. The previous attempt didn't work out so well. It looks like it will be much better for introducing extra logic to find a shuffle mask if the finding logic is totally separate. This also makes it easy to sink the opcode logic completely out of the routine so we don't re-dispatch across it. Still no functionality changed. llvm-svn: 218363	2014-09-24 03:06:37 +00:00
Chandler Carruth	aa73f0333d	[x86] Bypass the shuffle mask comment generation when not using verbose asm. This can be somewhat expensive and there is no reason to do it outside of tests or debugging sessions. I'm also likely to make it significantly more expensive to support more styles of shuffles. llvm-svn: 218362	2014-09-24 03:06:34 +00:00
Chandler Carruth	4d88297f25	[x86] Hoist the logic for extracting the relevant bits of information from the MachineInstr into the caller which is already doing a switch over the instruction. This will make it more clear how to compute different operands to feed the comment selection for example. Also, in a drive-by-fix, don't append an empty comment string (which is a no-op ultimately). No functionality changed. llvm-svn: 218361	2014-09-24 02:24:41 +00:00
Matt Arsenault	5768cf6310	R600/SI: Add new helper isSGPRClassID Move these into header since they are trivial llvm-svn: 218360	2014-09-24 02:17:12 +00:00
Matt Arsenault	443f7ab7b7	R600/SI: Fix hardcoded and wrong operand numbers. Also fix leftover debug printing llvm-svn: 218359	2014-09-24 02:17:09 +00:00
Matt Arsenault	c2a7952859	R600/SI: Enable named operand table for SALU instructions llvm-svn: 218358	2014-09-24 02:17:06 +00:00
Chandler Carruth	4f26b542c2	[x86] Start refactoring the comment printing logic in the MC lowering of vector shuffles. This is just the beginning by hoisting it into its own function and making use of early exit to dramatically simplify the flow of the function. I'm going to be incrementally refactoring this until it is a bit less magical how this applies to other instructions, and I can teach it how to dig a shuffle mask out of a register. Then I plan to hook it up to VPERMD so we get our mask comments for it. No functionality changed yet. llvm-svn: 218357	2014-09-24 02:16:12 +00:00
Tom Stellard	661491af38	R600/SI: Enable selecting SALU inside branches We can do this now that the FixSGPRLiveRanges pass is working. llvm-svn: 218353	2014-09-24 01:33:28 +00:00
Tom Stellard	acd4ad7b18	R600/SI: Move PHIs that define SGPRs to the VALU in most cases This fixes a bug that is uncovered by a future commit and will be tested by the test/CodeGen/R600/sgpr-control-flow.ll test case. llvm-svn: 218352	2014-09-24 01:33:26 +00:00
Tom Stellard	094617318a	R600/SI: Fix the FixSGPRLiveRanges pass The previous implementation was extending the live range of SGPRs by modifying the live intervals directly. This was causing a lot of machine verification errors when the machine scheduler was enabled. The new implementation adds pseudo instructions with implicit uses to extend the live ranges of SGPRs, which works much better. llvm-svn: 218351	2014-09-24 01:33:24 +00:00
Tom Stellard	0568bef7fe	R600/SI: Mark EXEC_LO and EXEC_HI as reserved These registers can be allocated and used like other 32-bit registers, but it seems like a likely source for bugs. llvm-svn: 218350	2014-09-24 01:33:23 +00:00
Tom Stellard	ed9636005c	R600/SI: Fix SIRegisterInfo::getPhysRegSubReg() Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO, VCC_HI, and M0. The previous implementation would assertion fail when passed these registers. llvm-svn: 218349	2014-09-24 01:33:22 +00:00
Tom Stellard	324ab0fc1e	R600/SI: Implement VGPR register spilling for compute at -O0 v3 VGPRs are spilled to LDS. This still needs more testing, but we need to at least enable it at -O0, because the fast register allocator spills all registers that are live at the end of blocks and without this some future commits will break the flat-address-space.ll test. v2: Only calculate thread id once v3: Move insertion of spill instructions to SIRegisterInfo::eliminateFrameIndex() llvm-svn: 218348	2014-09-24 01:33:17 +00:00
Chandler Carruth	557f07316c	[x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. llvm-svn: 218347	2014-09-24 01:24:44 +00:00
Chandler Carruth	a3397f862a	[x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a numeric shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. llvm-svn: 218346	2014-09-24 01:03:57 +00:00
Chandler Carruth	5359ce7741	[x86] Teach the new vector shuffle lowering to lower v4i64 vector shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. llvm-svn: 218338	2014-09-23 22:39:02 +00:00
Jim Grosbach	41b2a508dd	AArch64: allow constant expressions for shifted reg literals e.g., add w1, w2, w3, lsl #(2 - 1) This sort of thing comes up in pre-processed assembly playing macro games. Still validate that it's an assembly time constant. The early exit error check was just a bit overzealous and disallowed a left paren. rdar://18430542 llvm-svn: 218336	2014-09-23 22:16:02 +00:00
Chandler Carruth	dc4c369444	[x86] Teach the rest of the 'target shuffle' machinery about blends and add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. llvm-svn: 218335	2014-09-23 22:14:14 +00:00
Tom Stellard	0dd5a64514	R600/SI: Clean up checks for legality of immediate operands There are new register classes VCSrc_* which represent operands that can take an SGPR, VGPR or inline constant. The VSrc_* class is now used to represent operands that can take an SGPR, VGPR, or a 32-bit immediate. This allows us to have more accurate checks for legality of immediates, since before we had no way to distinguish between operands that supported any 32-bit immediate and operands which could only support inline constants. llvm-svn: 218334	2014-09-23 21:26:25 +00:00
Robin Morisset	1d079fe807	[X86] Make wide loads be managed by AtomicExpand Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 llvm-svn: 218332	2014-09-23 20:59:25 +00:00
Robin Morisset	64053dff5f	[Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 llvm-svn: 218331	2014-09-23 20:46:49 +00:00
Robin Morisset	a7da34c778	Add AtomicExpandPass::bracketInstWithFences, and use it whenever getInsertFencesForAtomic would trigger in SelectionDAGBuilder Summary: The goal is to eventually remove all the code related to getInsertFencesForAtomic in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works mostly by accident because the backends are overly conservative), and repeats the same logic that goes in emitLeading/TrailingFence. In this patch, I make AtomicExpandPass insert the fences as it knows better where to put them. Because this requires getting the fences and not just passing an IRBuilder around, I had to change the return type of emitLeading/TrailingFence. This code only triggers on ARM for now. Because it is earlier in the pipeline than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so SelectionDAGBuilder does not add barriers anymore on ARM. If this patch is accepted I plan to implement emitLeading/TrailingFence for all backends that setInsertFencesForAtomic(true), which will allow both making them less conservative and simplifying SelectionDAGBuilder once they are all using this interface. This should not cause any functionnal change so the existing tests are used and not modified. Test Plan: make check-all, benefits from existing tests of atomics on ARM Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5179 llvm-svn: 218329	2014-09-23 20:31:14 +00:00
Lang Hames	719ff0ff4f	[MCJIT] Remove PPCRelocations.h - it's no longer used. This was overlooked in r218320, which removed the relocation headers for other targets. Thanks to Ulrich Weigand for catching it. llvm-svn: 218327	2014-09-23 19:17:48 +00:00
Robin Morisset	94e6d52fcb	Just add a fixme about a possibly faster implementation of some atomic loads on some ARM processors llvm-svn: 218326	2014-09-23 18:33:21 +00:00
Matt Arsenault	f1eaaf2d5f	Fix typo llvm-svn: 218324	2014-09-23 18:30:57 +00:00
Chandler Carruth	4951e449b6	[x86] Teach the new shuffle lowering's blend functionality to use AVX2's VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. llvm-svn: 218322	2014-09-23 18:16:12 +00:00
Lang Hames	f3ae47a1c7	[MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT is gone they're no longer needed. llvm-svn: 218320	2014-09-23 18:08:47 +00:00
Oliver Stannard	d5ddd9f5ee	Fix segfault in AArch64 backend with -g and -mbig-endian Fix a null pointer dereference when trying to swap the endianness of fixups in the .eh_frame section in the AArch64 backend. llvm-svn: 218311	2014-09-23 15:38:11 +00:00
Sid Manning	f50e99f69a	Loop instead of individual def's for each GPR. Differential Revision: http://reviews.llvm.org/D5450 llvm-svn: 218305	2014-09-23 13:55:50 +00:00
Chandler Carruth	d2677fff24	[x86] Teach the vector comment parsing and printing to correctly handle undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is much prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. llvm-svn: 218301	2014-09-23 11:15:19 +00:00
Chandler Carruth	257fbb56ae	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300	2014-09-23 10:08:29 +00:00
Chandler Carruth	322207fcc8	[x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282	2014-09-22 22:29:42 +00:00
Kaelyn Takata	282860afa5	Fix a "typo" from my previous commit. llvm-svn: 218281	2014-09-22 22:17:59 +00:00
Kaelyn Takata	d668801d94	Silence unused variable warnings in the new stub functions that occur when assertions are disabled. llvm-svn: 218280	2014-09-22 22:14:13 +00:00
Chandler Carruth	c0deb6c8a5	[x86] Stub out the integer lowering of 256-bit vectors with AVX2 support. No interesting functionality yet, but this will let me implement one vector type at a time. llvm-svn: 218277	2014-09-22 21:45:57 +00:00
Juergen Ributzka	96a3a7534a	[FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1). Shift-left immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. This should fix a bug found by Chad. llvm-svn: 218275	2014-09-22 21:08:53 +00:00
Ehsan Akhgari	6706f4ca4a	ms-inline-asm: Fix parsing label names inside bracket expressions Summary: This fixes a couple of issues. One is ensuring that AOK_Label rewrite rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to be able to skip the brackets properly. The other part of the fix ensures that we don't overwrite Identifier when looking up the identifier, and that we use the locally available information to generate the AOK_Label rewrite in ParseIntelIdentifier. Doing that in CreateMemForInlineAsm would be problematic since the Start location there may point to the beginning of a bracket expression, and not necessarily the beginning of an identifier. This also means that we don't need to carry around the InternlName field, which helps simplify the code. Test Plan: This will be tested on the clang side. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5445 llvm-svn: 218270	2014-09-22 20:40:36 +00:00
Sanjay Patel	f8bef0feec	Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for size on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263	2014-09-22 18:54:01 +00:00
Tom Stellard	a3a0e937f2	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. llvm-svn: 218257	2014-09-22 16:44:04 +00:00
Tom Stellard	ed09a99692	R600/SI: Add support for global atomic add llvm-svn: 218254	2014-09-22 15:35:35 +00:00
Tom Stellard	677e3ecc4c	R600/SI: Remove modifier operands from V_CNDMASK_B32_e64 Modifiers don't work for this instruction. llvm-svn: 218253	2014-09-22 15:35:34 +00:00
Tom Stellard	8c1a958993	R600: Don't set BypassSlowDiv for 64-bit division BypassSlowDiv is used by codegen prepare to insert a run-time check to see if the operands to a 64-bit division are really 32-bit values and if they are it will do 32-bit division instead. This is not useful for R600, which has predicated control flow since both the 32-bit and 64-bit paths will be executed in most cases. It also increases code size which can lead to more instruction cache misses. llvm-svn: 218252	2014-09-22 15:35:32 +00:00
Tom Stellard	14b4dc8502	R600/SI: Use ISD::MUL instead of ISD::UMULO when lowering division ISD::MUL and ISD:UMULO are the same except that UMULO sets an overflow bit. Since we aren't using the overflow bit, we should use ISD::MUL. llvm-svn: 218251	2014-09-22 15:35:30 +00:00
Tom Stellard	fdb89d4663	R600/SI: Add enums for some hard-coded values llvm-svn: 218250	2014-09-22 15:35:29 +00:00
Pavel Chupin	49e1488b89	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247	2014-09-22 13:11:35 +00:00
Robert Lougher	404eb80020	Fix assert when decoding PSHUFB mask The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. llvm-svn: 218242	2014-09-22 11:54:38 +00:00
Ehsan Akhgari	0a0d085305	ms-inline-asm: Add a sema callback for looking up label names The implementation of the callback in clang's Sema will return an internal name for labels. Test Plan: Will be tested in clang. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4587 llvm-svn: 218229	2014-09-22 02:21:35 +00:00
Chandler Carruth	973771fcb4	[x86] Back out a bad choice about lowering v4i64 and pave the way for a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled all support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 without access to hardware. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. llvm-svn: 218228	2014-09-22 00:32:15 +00:00
Chandler Carruth	9a747d4524	[x86] Teach the new vector shuffle lowering how to cleverly lower single input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. llvm-svn: 218226	2014-09-21 23:46:13 +00:00
Matt Arsenault	b8b59b2699	Fix typo llvm-svn: 218223	2014-09-21 17:27:32 +00:00
Matt Arsenault	5ec2f665df	Use llvm_unreachable instead of assert(!) llvm-svn: 218222	2014-09-21 17:27:31 +00:00
Matt Arsenault	d266054496	R600/SI: Don't use strings for single characters llvm-svn: 218221	2014-09-21 17:27:28 +00:00
Sanjay Patel	1235f854b3	Refactor reciprocal square root estimate into target-independent function; NFC. This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219	2014-09-21 15:19:15 +00:00
Chandler Carruth	268a5c84e0	[x86] With the stronger canonicalization of shuffles added in r218216, the new vector shuffle lowering no longer needs to check both symmetric forms of UNPCK patterns for v4f64. llvm-svn: 218217	2014-09-21 13:37:51 +00:00
Chandler Carruth	f6d0b2971e	[x86] Teach the new vector shuffle lowering to re-use the SHUFPS lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. llvm-svn: 218216	2014-09-21 13:35:14 +00:00
Chandler Carruth	85dc0ae984	[x86] Refactor the logic to form SHUFPS instruction patterns to lower a generic vector shuffle mask into a helper that isn't specific to the other things that influence which choice is made or the specific types used with the instruction. No functionality changed. llvm-svn: 218215	2014-09-21 13:03:00 +00:00
Chandler Carruth	f839a92cc3	[x86] Teach the new vector shuffle lowering the basics about insertion of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. llvm-svn: 218214	2014-09-21 12:49:46 +00:00
Chandler Carruth	b67aa278e2	[x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... llvm-svn: 218213	2014-09-21 12:20:44 +00:00
Chandler Carruth	f9f6492987	[x86] Begin teaching the new vector shuffle lowering among the most important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. llvm-svn: 218211	2014-09-21 12:01:19 +00:00
Chandler Carruth	9bce08955b	[x86] Explicitly lower to a blend early if it is trivial to do so for v8f32 shuffles in the new vector shuffle lowering code. This is very cheap to do and makes it much more clear that anything more expensive but overlapping with this lowering should be selected afterward (for example using AVX2's VPERMPS). However, no functionality changed here as without this code we would fall through to create no-op shuffles of each input and a blend. =] llvm-svn: 218209	2014-09-21 11:40:39 +00:00
Chandler Carruth	fe381c0450	[x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the reciprocal throughput on Ivy Bridge CPUs much like it does everywhere for 128-bits. There isn't a downside, so just eagerly use this instruction when it suffices. llvm-svn: 218208	2014-09-21 11:17:55 +00:00
Chandler Carruth	f39eeefb9c	[x86] Switch the blend implementation to use a MVT switch rather than awkward conditions. The readability improvement of this will be even more important as I generalize it to handle more types. No functionality changed. llvm-svn: 218205	2014-09-21 10:36:12 +00:00
Chandler Carruth	75c0bbd1cd	[x86] Remove some essentially lying comments from the v4f64 path of the new vector shuffle lowering. llvm-svn: 218204	2014-09-21 10:27:14 +00:00
Chandler Carruth	18803f920a	[x86] Fix a helper to reflect that what we actually care about is 128-bit lane crossings, not 'half' crossings. This came up in code review ages ago, but I hadn't really addresesd it. Also added some documentation for the helper. No functionality changed. llvm-svn: 218203	2014-09-21 09:35:25 +00:00
Chandler Carruth	aa86f1a455	[x86] Teach the new vector shuffle lowering the first step toward more actual support for complex AVX shuffling tricks. We can do independent blends of the low and high 128-bit lanes of an avx vector, so shuffle the inputs into place and then do the blend at 256 bits. This will in many cases remove one blend instruction. The next step is to permute the low and high halves in-place rather than extracting them and re-inserting them. llvm-svn: 218202	2014-09-21 09:35:22 +00:00
Chandler Carruth	8a3a7fd4ad	[x86] Teach the new vector shuffle lowering to use VPERMILPD for single-input shuffles with doubles. This allows them to fold memory operands into the shuffle, etc. This is just the analog to the v4f32 case in my prior commit. llvm-svn: 218193	2014-09-20 22:09:27 +00:00
Chandler Carruth	2d0034551f	[x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. llvm-svn: 218190	2014-09-20 20:52:07 +00:00
Chandler Carruth	d75afb43fb	[x86] Teach the v4f32 path of the new shuffle lowering to handle the tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often faster than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? llvm-svn: 218177	2014-09-20 04:15:22 +00:00
Chandler Carruth	f6f79b02bf	[x86] Refactor the code for emitting INSERTPS to reuse the zeroable mask analysis used elsewhere. This removes the last duplicate of this logic. Also simplify the code here quite a bit. No functionality changed. llvm-svn: 218176	2014-09-20 03:57:01 +00:00
Chandler Carruth	e1d5ad4403	[x86] Generalize the single-element insertion lowering to work with floating point types and use it for both v2f64 and v2i64 single-element insertion lowering. This fixes the last non-AVX performance regression test case I've gotten of for the new vector shuffle lowering. There is obvious analogous lowering for v4f32 that I'll add in a follow-up patch (because with INSERTPS, v4f32 requires special treatment). After that, its AVX stuff. llvm-svn: 218175	2014-09-20 03:32:25 +00:00
Chandler Carruth	ba78806bbf	[x86] Replace some duplicated logic reasoning about whether particular vector lanes can be modeled as zero with a call to the new function that computes a bit-vector representing that information. No functionality changed here, but will allow doing more clever things with the zero-test. llvm-svn: 218174	2014-09-20 02:44:21 +00:00
Robin Morisset	c5140ae96c	[X86] Erase some obsolete comments from README.txt I just tried reproducing some of the optimization failures in README.txt in the X86 backend, and many of them could not be reproduced. In general the entire file appears quite bit-rotted, whatever interesting parts remain should be moved to bugzilla, and the rest deleted. I did not spend the time to do that, so I just deleted the few I tried reproducing which are obsolete, to save some time to whoever will find the courage to do it. llvm-svn: 218170	2014-09-19 23:56:46 +00:00
Eric Christopher	5049b04396	constify the TargetMachine being passed through the Mips subtarget creation. llvm-svn: 218169	2014-09-19 23:30:42 +00:00
Juergen Ributzka	c86ece1ad5	[FastIsel][AArch64] Fix a think-o in address computation. When looking through sign/zero-extensions the code would always assume there is such an extension instruction and use the wrong operand for the address. There was also a minor issue in the handling of 'AND' instructions. I accidentially used a 'cast' instead of a 'dyn_cast'. llvm-svn: 218161	2014-09-19 22:23:46 +00:00
Chandler Carruth	1df25c70c1	[x86] Hoist a function up to the rest of the non-type-specific lowering helpers, and re-flow the logic to use early exit and be a bit more readable. No functionality changed. llvm-svn: 218155	2014-09-19 21:52:10 +00:00
Chandler Carruth	9e0d932bae	[x86] Hoist the actual lowering logic into a helper function to separate it from the shuffle pattern matching logic. Also cleaned up variable names, comments, etc. No functionality changed. llvm-svn: 218152	2014-09-19 21:20:08 +00:00
Tom Stellard	9c86690318	R600/SI: Fix config value for number of gprs In r217636, the value stored in KernelInfo.Num[VS]GPRSs was changed from the highest GPR index used to the number of gprs in order to be consistent with the name of the variable. The code writing the config values still assumed that the value in this variable was the highest GPR index used, which caused the compiler to over report the number of GPRs being used. https://bugs.freedesktop.org/show_bug.cgi?id=84089 llvm-svn: 218150	2014-09-19 20:42:37 +00:00
Chandler Carruth	32aacd5feb	[x86] Fully generalize the zext lowering in the new vector shuffle lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get exactly the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is huge. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. llvm-svn: 218143	2014-09-19 20:00:32 +00:00
Hal Finkel	0c0c256ad7	Optionally enable more-aggressive FMA formation in DAGCombine The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120	2014-09-19 11:42:56 +00:00
Chandler Carruth	8a927d739f	[x86] Recognize that we can use duplication to widen v16i8 shuffles due to undef lanes as well as defined widenable lanes. This dramatically improves the lowering we use for undef-shuffles in a zext-ish pattern for SSE2. llvm-svn: 218115	2014-09-19 09:45:21 +00:00
Chandler Carruth	9fc20a3f92	[x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32 shuffles that are zext-ing. Not a lot to see here; the undef lane variant is better handled with pshufd, but this improves the actual zext pattern. llvm-svn: 218112	2014-09-19 08:37:44 +00:00
Chandler Carruth	2c46914a5d	[x86] Add a dedicated lowering path for zext-compatible vector shuffles to the new vector shuffle lowering code. This allows us to emit PMOVZX variants consistently for patterns where it is a viable lowering. This instruction is both fast and allows us to fold loads into it. This only hooks the new lowering up for i16 and i8 element widths, mostly so I could manage the change to the tests. I'll add the i32 one next, although it is significantly less interesting. One thing to note is that we already had some tests for these patterns but those tests had far less horrible instructions. The problem is that those tests weren't checking the strict start and end of the instruction sequence. =[ As a consequence something changed in the lowering making us generate TERRIBLE code for these patterns in SSE2 through SSSE3. I've consolidated all of the tests and spelled out the madness that we currently emit for these shuffles. I'm going to try to figure out what has gone wrong here. llvm-svn: 218102	2014-09-19 06:07:49 +00:00
Matt Arsenault	114b472f2c	R600: Better fix for bug 20982 Just do the left shift as unsigned to avoid the UB. llvm-svn: 218092	2014-09-19 00:42:06 +00:00
Quentin Colombet	7f70cb4039	[ARM] Do not perform a tail call when the caller returns several values. The fix is slightly different then x86 (see r216117) because the number of values attached to a return can vary even for a single returned value (e.g., f64 yields two returned values). <rdar://problem/18352998> llvm-svn: 218076	2014-09-18 21:17:50 +00:00
Robin Morisset	53ad29f76a	Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" Summary: This patch was originally in D5304 (I could not find a way to reopen that revision). It was accepted, commited and broke the build bots because the overloading of the constructor of ArrayRef for braced initializer lists is not supported by all toolchains. I then reverted it, and propose this fixed version that uses a plain C array instead in makeDMB (that array is then converted implicitly to an ArrayRef, but that is not behind an ifdef). Could someone confirm me whether initialization lists for plain C arrays are supported by every toolchain used to build llvm ? Otherwise I can just initialize the array in the old way: args[0] = ...; .. ; args[5] = ...; Below is the description of the original patch: ``` I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. ``` Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D5386 llvm-svn: 218066	2014-09-18 18:56:04 +00:00
Aaron Ballman	c9d2119dc2	Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required. llvm-svn: 218062	2014-09-18 17:34:23 +00:00

... 4 5 6 7 8 ...

30587 Commits