llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 05:53:07 +01:00

Author	SHA1	Message	Date
Chandler Carruth	74dc5ae1d0	[x86] Fix another miscompile found through fuzz testing the new vector shuffle lowering. This is closely related to the previous one. Here we failed to use the source offset when swapping in the other case -- where we end up swapping the final shuffle. The cause of this bug is a bit different: I simply wasn't thinking about the fact that this mask is actually a slice of a wide mask and thus has numbers that need SourceOffset applied. Simple fix. Would be even more simple with an algorithm-y thing to use here, but correctness first. =] llvm-svn: 215095	2014-08-07 10:37:35 +00:00
Chandler Carruth	c19c208775	[x86] Fix another miscompile in the new vector shuffle lowering found via the fuzz tester. Here I missed an offset when round-tripping a value through a shuffle mask. I got it right 2 lines below. See a problem? I do. ;] I'll probably be adding a little "swap" algorithm which accepts a range and two values and swaps those values where they occur in the range. Don't really have a name for it, let me know if you do. llvm-svn: 215094	2014-08-07 10:14:27 +00:00
Chandler Carruth	4d35998980	[x86] Fix another miscompile in the new vector shuffle lowering found through the new fuzzer. This one is great: bad operator precedence led the modulus to happen at the wrong point. All the asserts didn't fire because there were usually the right values past the end of the 4 element region we were looking at. Probably could have gotten a crash here with ASan + fuzzing, but the correctness tests pinpointed this really nicely. llvm-svn: 215092	2014-08-07 09:45:02 +00:00
Pavel Chupin	7f4a227354	[x32] Use ebp/esp as frame and stack pointer Summary: Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still require 64-bit register, so using 64-bit MachineFramePtr where required. X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing this issue here as well by making isTarget64BitLP64 false. Also mark hasReservedSpillSlot unreachable on X86. See inlined comments. Test Plan: Add one new simple test and upgrade 2 existing with x32 target case. Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4617 llvm-svn: 215091	2014-08-07 09:41:19 +00:00
Chandler Carruth	291bc5d9a4	[x86] Fix a miscompile in the new shuffle lowering found through the new fuzz testing. The function which tested for adjacency did what it said on the tin, but when I called it, I wanted it to do something more thorough: I wanted to know if the pairs of shuffle elements were adjacent and started at 0 mod 2. In one place I had the decency to try to test for this, but in the other it was completely skipped, miscompiling this test case. Fix this by making the helper actually do what I wanted it to do everywhere I called it (and removing the now redundant code in one place). I really dislike the name "canWidenShuffleElements" for this predicate. If anyone can come up with a better name, please let me know. The other name I thought about was "canWidenShuffleMask" but is it really widening the mask to reduce the number of lanes shuffled? I don't know. Naming things is hard. llvm-svn: 215089	2014-08-07 08:11:31 +00:00
Pete Cooper	cbc13312c3	Update BitRecTy::convertValue to allow if expressions with bit values on both sides of the if llvm-svn: 215087	2014-08-07 05:47:10 +00:00
Pete Cooper	5d88ea715c	Change the { } expression in tablegen to accept sized binary literals which are not just 0 and 1. It also allows nested { } expressions, as now that they are sized, we can merge pull bits from the nested value. In the current behaviour, everything in { } must have been convertible to a single bit. However, now that binary literals are sized, its useful to be able to initialize a range of bits. So, for example, its now possible to do bits<8> x = { 0, 1, { 0b1001 }, 0, 0b0 } llvm-svn: 215086	2014-08-07 05:47:07 +00:00
Pete Cooper	5e735d5967	Change TableGen so that binary literals such as 0b001 are now sized. Instead of these becoming an integer literal internally, they now become bits<n> values. Prior to this change, 0b001 was 1 bit long. This is confusing as clearly the user gave 3 bits. This new type holds both the literal value and the size, and so can ensure sizes match on initializers. For example, this used to be legal bits<1> x = 0b00; but now it must be written as bits<2> x = 0b00; llvm-svn: 215084	2014-08-07 05:47:00 +00:00
Pete Cooper	91540288e1	TableGen: Change { } to only accept bits<n> entries when n == 1. Prior to this change, it was legal to do something like bits<2> opc = { 0, 1 }; bits<2> opc2 = { 1, 0 }; bits<2> a = { opc, opc2 }; This involved silently dropping bits from opc and opc2 which is very hard to debug. Now the above test would be an error. Having tested with an assert, none of LLVM/clang was relying on this behaviour. Thanks to Adam Nemet for the above test. llvm-svn: 215083	2014-08-07 05:46:57 +00:00
Pete Cooper	4afa5aa1cc	Fix a whole bunch of binary literals which were the wrong size. All were being silently zero extended to the correct width. The commit after this changes { } and 0bxx literals to be of type bits<n> and not int. This means we need to write exactly the right number of bits, and not rely on the values being silently zero extended for us. llvm-svn: 215082	2014-08-07 05:46:54 +00:00
Saleem Abdulrasool	37f9f1e4f7	MC: split Win64EHUnwindEmitter into a shared streamer This changes Win64EHEmitter into a utility WinEH UnwindEmitter that can be shared across multiple architectures and a target specific bit which is overridden (Win64::UnwindEmitter). This enables sharing the section selection code across X86 and the intended use in ARM for emitting unwind information for Windows on ARM. llvm-svn: 215050	2014-08-07 02:59:41 +00:00
Quentin Colombet	e540e9c357	[X86][SchedModel] Fixed missing/wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 215045	2014-08-07 00:20:44 +00:00
Reid Kleckner	f0567dde14	MC X86: Accept ".att_syntax prefix" and diagnose noprefix Fixes PR18916. I don't think we need to implement support for either hybrid syntax. Nobody should write Intel assembly with '%' prefixes on their registers or AT&T assembly without them. llvm-svn: 215031	2014-08-06 23:21:13 +00:00
David Blaikie	a8c5d79f89	Revert "Reapply "DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself."" This reverts commit r214761. Revert while Reid investigates & provides a reproduction for an assertion failure for this on Windows. llvm-svn: 214999	2014-08-06 22:30:12 +00:00
Sanjay Patel	8cd2aae34c	fix typo llvm-svn: 214995	2014-08-06 21:08:38 +00:00
Yaron Keren	baaa4b7845	getNewMemBuffer memsets the buffer to zeros, the caller don't have to initialize it. llvm-svn: 214994	2014-08-06 20:59:09 +00:00
Rui Ueyama	c76432d7c2	Revert "r214897 - Remove dead zero store to calloc initialized memory" It broke msan. llvm-svn: 214989	2014-08-06 19:30:38 +00:00
Eric Christopher	4a1cdb2ba7	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988	2014-08-06 18:45:26 +00:00
Adrian Prantl	9270b95c79	Improve performance of calculateDbgValueHistory. In r210492 the logic of calculateDbgValueHistory was changed to end register variable live ranges at the end of MBB conditionally on the fact that the register was or not clobbered by the function body. This requires an initial scan of all the operands of the function to collect all clobbered registers. In a second pass over all instructions, we compare this set with the set of clobbered registers for the current MachineInstruction. This modification incurred a compilation time regression on some benchmarks: the debug info emission phase takes ~10% more time. While a small performance hit is unavoidable due to the initial scan requirement, we can improve the situation by avoiding to create too many temporary sets and just use lambdas to work directly on the result of the initial scan. Fixes <rdar://problem/17884104> Patch by Frederic Riss! llvm-svn: 214987	2014-08-06 18:41:24 +00:00
Adrian Prantl	84367341ee	Cleanup collectChangingRegs The handling of the epilogue is best expressed as an early exit and there is no reason to look for register defs in DbgValue MIs. Patch by Frederic Riss! llvm-svn: 214986	2014-08-06 18:41:19 +00:00
Reid Kleckner	bec7530633	Don't count inreg params when mangling fastcall functions This is consistent with MSVC. llvm-svn: 214981	2014-08-06 18:09:04 +00:00
Reid Kleckner	e340c78d47	Round up the size of byval arguments to MinAlign Otherwise we can end up with an argument frame size that is not a multiple of stack slot size, which is very awkward. This fixes PR20547, which was a bug in x86_64 Sys V vararg handling. However, it's much easier to test this with x86 callee-cleanup functions, which previously ended in "retl $6" instead of "retl $8". This does affect behavior of all backends, but it presumably fixes the same bug in all of them. llvm-svn: 214980	2014-08-06 17:57:23 +00:00
Chad Rosier	0214f513b9	[AArch64] Add a few isTarget* API to AArch64 Subtarget. llvm-svn: 214977	2014-08-06 16:56:58 +00:00
Chad Rosier	8291c24b09	[AArch64] Fix OS ABI flag for aarch64-linux-gnu target. For triple aarch64-linux-gnu we were incorrectly setting IRIX. For triple aarch64 we are correctly setting SYSV. Patch by Ana Pazos <apazos@codeaurora.org>. llvm-svn: 214974	2014-08-06 16:05:02 +00:00
Sanjay Patel	d5cb9b68e1	use register iterators that include self to reduce code duplication in CriticalAntiDepBreaker This patch addresses 2 FIXME comments that I added to CriticalAntiDepBreaker while fixing PR20020. Initialize an MCSubRegIterator and an MCRegAliasIterator to include the self reg. Assuming that works as advertised, there should be functional difference with this patch, just less code. Also, remove the associated asserts - we're setting those values just before, so the asserts don't do anything meaningful. Differential Revision: http://reviews.llvm.org/D4566 llvm-svn: 214973	2014-08-06 15:58:15 +00:00
Robert Khasanov	970483b673	[AVX512] Added load/store instructions to Register2Memory opcode tables. Added lowering tests for load/store. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214972	2014-08-06 15:40:34 +00:00
James Molloy	7518c61a09	[AArch64] Add a testcase for r214957. llvm-svn: 214965	2014-08-06 13:31:32 +00:00
James Molloy	0127d12e19	Add a new option -run-slp-after-loop-vectorization. This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around. llvm-svn: 214963	2014-08-06 12:56:19 +00:00
Tim Northover	0028a9a97d	ARM: do not generate BLX instructions on Cortex-M CPUs. Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959	2014-08-06 11:13:14 +00:00
Tim Northover	7abd4db81b	ARM-MachO: materialize callee address correctly on v4t. llvm-svn: 214958	2014-08-06 11:13:06 +00:00
James Molloy	2bbe86fab0	[AArch64] Conditional selects are expensive on out-of-order cores. Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it. This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp. llvm-svn: 214957	2014-08-06 10:42:18 +00:00
Chandler Carruth	2a63640957	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954	2014-08-06 10:16:36 +00:00
Chandler Carruth	8d598f2f29	[x86] Switch to a formulation of a for loop that is much more obviously not corrupting the mask by mutating it more times than intended. No functionality changed (the results were non-overlapping so the old version "worked" but was non-obvious). llvm-svn: 214953	2014-08-06 10:16:33 +00:00
Adam Nemet	aea49d4d5f	[X86] Fixes commit r214890 to match the posted patch This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951	2014-08-06 07:13:12 +00:00
Matt Arsenault	7d4ad478b1	Correct comment llvm-svn: 214945	2014-08-06 00:44:25 +00:00
Peter Collingbourne	129ba2ac92	[dfsan] Try not to create too many additional basic blocks in functions which already have a large number of blocks. Works around a performance issue with the greedy register allocator. llvm-svn: 214944	2014-08-06 00:33:40 +00:00
Matt Arsenault	3005d01057	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	68712599d9	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
Quentin Colombet	8e3912c486	[X86][SchedModel] Fixed some wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 214940	2014-08-06 00:22:39 +00:00
David Blaikie	37a242b70f	DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range. This was coming in weird debug info that had variables (and hence debug_locs) but was in GMLT mode (because it was missing the 13th field of the compile_unit metadata) so no ranges were constructed. We should always have at least one range for any CU with a debug_loc in it - because the range should cover the debug_loc. The assertion just ensures that the "!= 1" range case inside the subsequent loop doesn't get entered for the case where there are no ranges at all, which should never reach here in the first place. llvm-svn: 214939	2014-08-06 00:21:25 +00:00
Matt Arsenault	1ecd6214f0	R600/SI: Add definitions for ds_read2st64_ / ds_write2st64_ llvm-svn: 214936	2014-08-05 23:53:20 +00:00
JF Bastien	314c089ba3	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934	2014-08-05 23:27:34 +00:00
David Blaikie	fec0292411	DebugInfo: Move the reference to the CU from the location list entry to the list itself, since it is constant across an entire list. This simplifies construction and usage while making the data structure smaller. It was a holdover from the days when we didn't have a separate DebugLocList and all we had was a flat list of DebugLocEntries. llvm-svn: 214933	2014-08-05 23:14:16 +00:00
Rafael Espindola	c981388b03	Remove a virtual function from TargetMachine. NFC. llvm-svn: 214929	2014-08-05 22:10:21 +00:00
Jonathan Roelofs	7bcdffb32c	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Bill Schmidt	8159f9047b	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Rafael Espindola	e145188cc8	Don't internalize all but main by default. This is mostly a cleanup, but it changes a fairly old behavior. Every "real" LTO user was already disabling the silly internalize pass and creating the internalize pass itself. The difference with this patch is for "opt -std-link-opts" and the C api. Now to get a usable behavior out of opt one doesn't need the funny looking command line: opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts llvm-svn: 214919	2014-08-05 20:10:38 +00:00
Chandler Carruth	045f351a7f	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
Duncan P. N. Exon Smith	abe071addb	Remove dead code in condition Whether or not it's appropriate, labels have been first-class types since r51511. llvm-svn: 214908	2014-08-05 18:22:58 +00:00
NAKAMURA Takumi	b1c451ece2	X86CodeEmitter.cpp: Add SEH_Epilogue to ignored list for legacy JIT, corresponding to r214775. llvm-svn: 214905	2014-08-05 18:04:15 +00:00
Adam Nemet	83d4fdecfc	[X86] Improve comments for r214888 A rebase somehow ate my comments. This restores them. llvm-svn: 214903	2014-08-05 17:58:49 +00:00
Matt Arsenault	4273af4e86	R600/SI: Use register class instead of list of registers I'm not sure if this has any consequence or not. llvm-svn: 214902	2014-08-05 17:52:40 +00:00
Matt Arsenault	f63af80339	R600/SI: Add exec_lo and exec_hi subregisters. This allows accessing an SReg subregister with a normal subregister index, instead of getting a machine verifier error. Also be sure to include all of these subregisters in SReg_32. This fixes inferring SGPR instead of SReg when finding a super register class. llvm-svn: 214901	2014-08-05 17:52:37 +00:00
Duncan P. N. Exon Smith	7aaaba94bb	BitcodeReader: Fix non-determinism in use-list order `BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied in a non-deterministic order. When predicting use-list order I've worked around this another way, but even when parsing lazily (and we can't recreate use-list order) use-lists should be deterministic. Make them so by using a side-queue of functions with forward-referenced blocks that gets visited in order. llvm-svn: 214899	2014-08-05 17:49:48 +00:00
Philip Reames	7088e609a9	Remove dead zero store to calloc initialized memory Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897	2014-08-05 17:48:20 +00:00
Jonathan Roelofs	5721e98d43	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Sanjay Patel	1f56d11869	Optimize vector fabs of bitcasted constant integer values. Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892	2014-08-05 17:35:22 +00:00
Adam Nemet	d956b84b6b	[AVX512] Add masking variant and intrinsics for valignd/q This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890	2014-08-05 17:23:04 +00:00
Adam Nemet	3d838a0ae1	[X86] Increase X86_MAX_OPERANDS from 5 to 6 This controls the number of operands in the disassembler's x86OperandSets table. The entries describe how the operand is encoded and its type. Not to surprisingly 5 operands is insufficient for AVX512. Consider VALIGNDrrik in the next patch. These are its operand specifiers: { /* 328 */ { ENCODING_DUP, TYPE_DUP1 }, { ENCODING_REG, TYPE_XMM512 }, { ENCODING_WRITEMASK, TYPE_VK8 }, { ENCODING_VVVV, TYPE_XMM512 }, { ENCODING_RM_CD64, TYPE_XMM512 }, { ENCODING_IB, TYPE_IMM8 }, }, llvm-svn: 214889	2014-08-05 17:23:01 +00:00
Adam Nemet	6ce89969d7	[X86] Add lowering to VALIGN This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these at the same time, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. llvm-svn: 214888	2014-08-05 17:22:59 +00:00
Adam Nemet	3cbf71a23b	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887	2014-08-05 17:22:55 +00:00
Adam Nemet	1a6d2ace74	[AVX512] alignr: Use suffix rather than name argument to multiclass Again no functional change. This prepares for the suffix to be used with the intrinsic matching. llvm-svn: 214886	2014-08-05 17:22:52 +00:00
Adam Nemet	b6384f0bad	[AVX512] Pull everything alignr-related into the multiclass The packed integer pattern becomes the DAG pattern for rri and the packed float, another Pat<> inside the multiclass. No functional change. llvm-svn: 214885	2014-08-05 17:22:50 +00:00
Adam Nemet	04f70d285b	Wrap long lines llvm-svn: 214884	2014-08-05 17:22:47 +00:00
Jonathan Roelofs	32bc0b3143	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00
David Blaikie	5dcea2d2d6	Partially revert r214761 that asserted that all concrete debug info variables had DIEs, due to a failure on Darwin. I'll work on a reduction and fix after this. llvm-svn: 214880	2014-08-05 16:47:23 +00:00
Joerg Sonnenberger	6eb6423316	Add accessors for the PPC 403 bank registers. llvm-svn: 214875	2014-08-05 15:45:15 +00:00
Keith Walker	1855cb5c25	Specify that the thumb setend and blx <immed> instructions are not valid on an m-class target llvm-svn: 214871	2014-08-05 15:11:59 +00:00
Keith Walker	505498fc84	Define stc2/stc2l/ldc2/ldc2l as thumb2 instructions llvm-svn: 214868	2014-08-05 14:58:05 +00:00
Joerg Sonnenberger	cc8f67da81	Accessors for SSR2 and SSR3 on PPC 403. llvm-svn: 214867	2014-08-05 14:53:05 +00:00
Tom Stellard	e624513d2d	R600/SI: Update MUBUF assembly string to match AMD proprietary compiler llvm-svn: 214866	2014-08-05 14:48:12 +00:00
Tom Stellard	e589f0e5f7	R600/SI: Avoid generating REGISTER_LOAD instructions. SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865	2014-08-05 14:40:52 +00:00
Joerg Sonnenberger	bc6e9b7c12	Add dci/ici instructions for PPC 476 and friends. llvm-svn: 214864	2014-08-05 14:40:32 +00:00
Joerg Sonnenberger	f726eed55e	Add mftblo and mftbhi for PPC 4xx. llvm-svn: 214863	2014-08-05 14:18:16 +00:00
Joerg Sonnenberger	d97414d887	Add lswi / stswi for assembler use with a warning to not add patterns for them. llvm-svn: 214862	2014-08-05 13:34:01 +00:00
Yi Kong	a05145f812	AArch64: Add support for instruction prefetch intrinsic Instruction prefetch is not implemented for AArch64, it is incorrectly translated into data prefetch instruction. Differential Revision: http://reviews.llvm.org/D4777 llvm-svn: 214860	2014-08-05 12:46:47 +00:00
James Molloy	ea323a2876	Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859	2014-08-05 12:30:34 +00:00
Chandler Carruth	72da99caf3	[x86] Reformat some code I moved around in a prior commit but left poorly formatted. Sorry about that. llvm-svn: 214853	2014-08-05 10:35:30 +00:00
Joerg Sonnenberger	f820adb814	Allow binary and for tblgen math. llvm-svn: 214851	2014-08-05 09:43:25 +00:00
Chandler Carruth	b57f9d965b	[x86] Fix a crash and wrong-code bug in the new vector lowering all found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the used slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] llvm-svn: 214849	2014-08-05 08:19:21 +00:00
Juergen Ributzka	8e89773258	[FastIsel][AArch64] Fix previous commit r214844 (Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.) The original code would fail for unsupported value types like i1, i8, and i16. This fix changes the code to only create a sub-register copy for i64 value types and all other types (i1/i8/i16/i32) just use the source register without any modifications. getRegClassFor() is now guarded by the i64 value type check, that guarantees that we always request a register for a valid value type. llvm-svn: 214848	2014-08-05 07:31:30 +00:00
Juergen Ributzka	ec5a9526be	[FastISel][AArch64] Implement the FastLowerArguments hook. This implements basic argument lowering for AArch64 in FastISel. It only handles a small subset of the C calling convention. It supports simple arguments that can be passed in GPR and FPR registers. This should cover most of the trivial cases without falling back to SelectionDAG. This fixes <rdar://problem/17890986>. llvm-svn: 214846	2014-08-05 05:43:48 +00:00
Kevin Qin	2c9a6f5e83	Revert "r214832 - MachineCombiner Pass for selecting faster instruction" It broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion. llvm-svn: 214845	2014-08-05 05:43:47 +00:00
Juergen Ributzka	20218b6fa0	[FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended. llvm-svn: 214844	2014-08-05 05:43:44 +00:00
Juergen Ributzka	2a3bd5d1f5	Provide convenient access to the zext/sext attributes of function arguments. NFC. llvm-svn: 214843	2014-08-05 05:43:41 +00:00
Eric Christopher	67c04e77e5	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838	2014-08-05 02:39:49 +00:00
Gerolf Hoflehner	40e09fb7dd	MachineCombiner Pass for selecting faster instruction sequence on AArch64 Re-commit of r214669 without changes to test cases LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and LLVM:: CodeGen/AArch64/dp-3source.ll This resolves the reported compfails of the original commit. llvm-svn: 214832	2014-08-05 01:16:13 +00:00
Joerg Sonnenberger	c84fe6e931	Add TCR register access llvm-svn: 214826	2014-08-04 23:53:42 +00:00
Joerg Sonnenberger	cd34d01e29	Add PPC 603's tlbld and tlbli instructions. llvm-svn: 214825	2014-08-04 23:49:45 +00:00
Renato Golin	710d9e32a8	Allow CP10/CP11 operations on ARMv5/v6 Those registers are VFP/NEON and vector instructions should be used instead, but old cores rely on those co-processors to enable VFP unwinding. This change was prompted by the libc++abi's unwinding routine and is also present in many legacy low-level bare-metal code that we ought to compile/assemble. Fixing bug PR20025 and allowing PR20529 to proceed with a fix in libc++abi. llvm-svn: 214802	2014-08-04 23:21:56 +00:00
Bill Schmidt	5e2cd3791c	[PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800	2014-08-04 23:21:01 +00:00
Kevin Enderby	b8b03399a5	Enable Darwin vararg parameters support in assembler macros. Duplicate the vararg tests for linux and add a tests which mixed vararg arguments with darwin positional parameters. Patch by: Janne Grunau <j@jannau.net> llvm-svn: 214799	2014-08-04 23:14:37 +00:00
Pedro Artigas	fa5203bdd3	Changed the liveness tracking in the RegisterScavenger to use register units instead of registers. reviewed by Jakob Stoklund Olesen. llvm-svn: 214798	2014-08-04 23:07:49 +00:00
Joerg Sonnenberger	b94b28812d	Add simplified aliases for access to DCCR, ICCR, DEAR and ESR llvm-svn: 214797	2014-08-04 22:56:42 +00:00
Juergen Ributzka	f39fd7e490	[FastISel][AArch64] Fix shift lowering for i8 and i16 value types. This fix changes the parameters #r and #s that are passed to the UBFM/SBFM instruction to get the zero/sign-extension for free. The original problem was that the shift left would use the 32-bit shift even for i8/i16 value types, which could leave the upper bits set with "garbage" values. The arithmetic shift right on the other side would use the wrong MSB as sign-bit to determine what bits to shift into the value. This fixes <rdar://problem/17907720>. llvm-svn: 214788	2014-08-04 21:49:51 +00:00
Chandler Carruth	c3af9263ee	[SDAG] Fix a really, really terrible bug in the DAG combiner. This code is completely wrong. It is also dead, as if it were to ever run, it would crash. Fortunately, after my work to the combiner, it is at least possible to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785	2014-08-04 21:29:59 +00:00
Joerg Sonnenberger	0f4e0ad62e	tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs. llvm-svn: 214784	2014-08-04 21:28:22 +00:00
Eric Christopher	99307e99a2	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Chad Rosier	d843d13525	[AArch64] Extend the number of scalar instructions supported in the AdvSIMD scalar integer instruction pass. This is a patch I had lying around from a few months ago. The pass is currently disabled by default, so nothing to interesting. llvm-svn: 214779	2014-08-04 21:20:25 +00:00
Reid Kleckner	07d84ca71d	Fix failure to invoke exception handler on Win64 When the last instruction prior to a function epilogue is a call, we need to emit a nop so that the return address is not in the epilogue IP range. This is consistent with MSVC's behavior, and may be a workaround for a bug in the Win64 unwinder. Differential Revision: http://reviews.llvm.org/D4751 Patch by Vadim Chugunov! llvm-svn: 214775	2014-08-04 21:05:27 +00:00
Joerg Sonnenberger	c3da10cbbc	Recognize mftbl as alias for mftb, for symmetry with mttb. llvm-svn: 214769	2014-08-04 20:28:34 +00:00
David Blaikie	b182c6ba51	Reapply "DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself." Originally reverted in r213432 with flakey failures on an ASan self-host build. After reduction it seems to be the same issue fixed in r213805 (ArgPromo + DebugInfo: Handle updating debug info over multiple applications of argument promotion) and r213952 (by having LiveDebugVariables strip dbg_value intrinsics in functions that are not described by debug info). Though I cannot explain why this failure was flakey... llvm-svn: 214761	2014-08-04 19:30:08 +00:00
Matt Arsenault	1f6d88bf21	R600/SI: Fix definitions for ds_read2 / ds_write2 instructions. These were just wrong, using the wrong register classes and store2 was missing an operand. llvm-svn: 214756	2014-08-04 18:49:22 +00:00
Joerg Sonnenberger	3be1146503	Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect the systems it represents. llvm-svn: 214755	2014-08-04 18:46:13 +00:00
Joerg Sonnenberger	660877af60	Allow .lcomm with alignment on ELF targets. llvm-svn: 214754	2014-08-04 18:45:10 +00:00
Alex Lorenz	faf4f53aec	Coverage: add HasCodeBefore flag to a mapping region. This flag will be used by the coverage tool to help compute the execution counts for each line in a source file. Differential Revision: http://reviews.llvm.org/D4746 llvm-svn: 214740	2014-08-04 18:00:51 +00:00
Eric Christopher	dd8d4e476d	Move the R600 intrinsic support back to the target machine - there's nothing subtarget dependent about the intrinsic support in any backend as far as I can tell. llvm-svn: 214738	2014-08-04 17:37:43 +00:00
Justin Bogner	cb125e9a3e	Path: Stop claiming path::const_iterator is bidirectional path::const_iterator claims that it's a bidirectional iterator, but it doesn't satisfy all of the contracts for a bidirectional iterator. For example, n3376 24.2.5 p6 says "If a and b are both dereferenceable, then a == b if and only if a and b are bound to the same object", but this doesn't work with how we stash and recreate Components. This means that our use of reverse_iterator on this type is invalid and leads to many of the valgrind errors we're hitting, as explained by Tilmann Scheller here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140728/228654.html Instead, we admit that path::const_iterator is only an input_iterator, and implement a second input_iterator for path::reverse_iterator (by changing const_iterator::operator-- to reverse_iterator::operator++). All of the uses of this just traverse once over the path in one direction or the other anyway. llvm-svn: 214737	2014-08-04 17:36:41 +00:00
Joerg Sonnenberger	51a8467117	Refactor SPRG instructions. llvm-svn: 214733	2014-08-04 17:26:15 +00:00
Akira Hatanaka	ee3d211c82	[X86] Place parentheses around "isMask_32(STReturns) && N <= 2". This corrects r214672, which was committed to silence a gcc warning. llvm-svn: 214732	2014-08-04 17:23:38 +00:00
Joerg Sonnenberger	f6049406d6	Add support for m[ft][di]bat[ul] instructions. llvm-svn: 214731	2014-08-04 17:07:41 +00:00
Matt Arsenault	2d8adc4261	Use the known address space constant rather than checking it llvm-svn: 214729	2014-08-04 16:55:35 +00:00
Matt Arsenault	55071ec9a7	R600: Remove unused include llvm-svn: 214728	2014-08-04 16:55:33 +00:00
Eric Christopher	4ba92fd57d	Add a dummy subtarget to the CPP backend target machine. This will allow us to forward all of the standard TargetMachine calls to the subtarget and still return null as we were before. llvm-svn: 214727	2014-08-04 16:40:55 +00:00
Joerg Sonnenberger	b8c7e54901	Add features for PPC 4xx and e500/e500mc instructions. Move the test cases for them into separate files. llvm-svn: 214724	2014-08-04 15:47:38 +00:00
Robert Khasanov	35dfdfef2d	[SKX] Enabling load/store instructions: encoding Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS, Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214719	2014-08-04 14:35:15 +00:00
Ulrich Weigand	5df23aacbe	[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718	2014-08-04 13:53:40 +00:00
Aaron Ballman	cc57635052	Improving the name of the function parameter, which happens to solve two likely-less-than-useful MSVC warnings: warning C4258: 'I' : definition from the for loop is ignored; the definition from the enclosing scope is used. llvm-svn: 214717	2014-08-04 13:51:27 +00:00
Ulrich Weigand	bf94969247	[PowerPC] MULHU/MULHS are not legal for vector types I ran into some test failures where common code changed vector division by constant into a multiply-high operation (MULHU). But these are not implemented by the back-end, so we failed to recognize the insn. Fixed by marking MULHU/MULHS as Expand for vector types. llvm-svn: 214716	2014-08-04 13:27:12 +00:00
Ulrich Weigand	32b8ceb243	[PowerPC] Fix and improve vector comparisons This patch refactors code generation of vector comparisons. This fixes a wrong code-gen bug for ISD::SETGE for floating-point types, and improves generated code for vector comparisons in general. Specifically, the patch moves all logic deciding how to implement vector comparisons into getVCmpInst, which gets two extra boolean outputs indicating to its caller whether its needs to swap the input operands and/or negate the result of the comparison. Apart from implementing these two modifications as directed by getVCmpInst, there is no need to ever implement vector comparisons in any other manner; in particular, there is never a need to perform two separate comparisons (e.g. one for equal and one for greater-than, as code used to do before this patch). Reviewed by Bill Schmidt. llvm-svn: 214714	2014-08-04 13:13:57 +00:00
Daniel Sanders	ea3b302186	[mips] Add assembler support for '.set mipsX'. Summary: This patch also fixes an issue with the way the Mips assembler enables/disables architecture features. Before this patch, the assembler never disabled feature bits. For example, .set mips64 .set mips32r2 would result in the 'OR' of mips64 with mips32r2 feature bits which isn't right. Unfortunately this isn't trivial to fix because there's not an easy way to clear feature bits as the algorithm in MCSubtargetInfo (ToggleFeature) only clears the bits that imply the feature being cleared and not the implied bits by the feature (there's a better explanation to the code I added). Patch by Matheus Almeida and updated by Toma Tabacu Reviewers: vmedic, matheusalmeida, dsanders Reviewed By: dsanders Subscribers: tomatabacu, llvm-commits Differential Revision: http://reviews.llvm.org/D4123 llvm-svn: 214709	2014-08-04 12:20:00 +00:00
Chandler Carruth	6bda1ceee2	[x86] Just unilaterally prefer SSSE3-style PSHUFB lowerings over clever use of PACKUS. It's cleaner that way. I looked at implementing clever combine-based folding of PACKUS chains into PSHUFB but it is quite hard and doesn't seem likely to be worth it. The most annoying part would be detecting that the correct masking had been done to use PACKUS-style instructions as a blend operation rather than there being any saturating as is indicated by its name. We generate really nice code for what few test cases I've come up with that aren't completely contrived for this by just directly prefering PSHUFB and so let's go with that strategy for now. =] llvm-svn: 214707	2014-08-04 10:17:35 +00:00
Chandler Carruth	8632182226	[x86] Implement more aggressive use of PACKUS chains for lowering common patterns of v16i8 shuffles. This implements one of the more important FIXMEs for the SSE2 support in the new shuffle lowering. We now generate the optimal shuffle sequence for truncate-derived shuffles which show up essentially everywhere. Unfortunately, this exposes a weakness in other parts of the shuffle logic -- we can no longer form PSHUFB here. I'll add the necessary support for that and other things in a subsequent commit. llvm-svn: 214702	2014-08-04 09:40:02 +00:00
Kevin Qin	348a8dd760	Revert "r214669 - MachineCombiner Pass for selecting faster instruction" This commit broke "make check" for several hours, so get it reverted. llvm-svn: 214697	2014-08-04 05:10:33 +00:00
NAKAMURA Takumi	9474954f12	MemoryBuffer: Don't use mmap when FileSize is multiple of 4k on Cygwin. On Cygwin, getpagesize() returns 64k(AllocationGranularity). In r214580, the size of X86GenInstrInfo.inc became 1499136. FIXME: We should reorganize again getPageSize() on Win32. MapFile allocates address along AllocationGranularity but view is mapped by physical page. llvm-svn: 214681	2014-08-04 01:43:37 +00:00
Chandler Carruth	ea07cc714d	[x86] Handle single input shuffles in the SSSE3 case more intelligently. I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. llvm-svn: 214680	2014-08-04 01:14:24 +00:00
Peter Zotov	18e1ab4b09	[LLVM-C] Add LLVM{IsConstantString,GetAsString,GetElementAsConstant}. llvm-svn: 214676	2014-08-03 23:54:16 +00:00
Chandler Carruth	9918929946	[x86] Don't add nodes to the combined set (and prune subsequent combines) until they are legal. Doing it the old way could, when the stars align just right, cause a node to get into the combine set prior to being legalized. Then, when the same node showed up as an operand to another node later on (but not so much later on that it had been deleted as dead) we would fail to add it back to the worklist thinking it had already been combined. This would in turn cause it to not be legalized. Fortunately, we can also walk the operands looking for uncombined (and thus potentially un-legalized) nodes late. It will still ensure that we walk all operands of all nodes and send all of them through both the legalizer without changes and the combiner at least once. (Which was the original goal of this). I have a test case for this bug, but it is terribly brittle. For example, it will stop finding the bug the moment I enable the new shuffle lowering. I don't yet have any test case that reliably exercises this bug, and it isn't clear that it will be possible to craft one. It is entirely possible that with the new shuffle lowering the two forms of doing this are precisely equivalent. That doesn't mean we shouldn't take the more conservative approach of insisting on things in the combined set having survived the legalizer. llvm-svn: 214673	2014-08-03 23:10:59 +00:00
Saleem Abdulrasool	1a36f331c8	X86: silence warning (-Wparentheses) GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition: lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] assert(STReturns == 0 \|\| isMask_32(STReturns) && N <= 2); llvm-svn: 214672	2014-08-03 23:00:39 +00:00
Saleem Abdulrasool	189689a302	CodeGen: silence a warning GCC 4.8.2 objects to the tautological condition in the assert as the unsigned value is guaranteed to be >= 0. Simplify the assertion by dropping the tautological condition. llvm-svn: 214671	2014-08-03 23:00:38 +00:00
Sanjay Patel	70baef2f02	fix for PR20354 - Miscompile of fabs due to vectorization This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670	2014-08-03 22:48:23 +00:00
Gerolf Hoflehner	265dd68643	MachineCombiner Pass for selecting faster instruction sequence - AArch64 target support This patch turns off madd/msub generation in the DAGCombiner and generates them in the MachineCombiner instead. It replaces the original code sequence with the combined sequence when it is beneficial to do so. When there is no machine model support it always generates the madd/msub instruction. This is true also when the objective is to optimize for code size: when the combined sequence is shorter is always chosen and does not get evaluated. When there is a machine model the combined instruction sequence is evaluated for critical path and resource length using machine trace metrics and the original code sequence is replaced when it is determined to be faster. rdar://16319955 llvm-svn: 214669	2014-08-03 22:03:40 +00:00
Gerolf Hoflehner	b4a5e33ee0	MachineCombiner Pass for selecting faster instruction sequence - target independent framework When the DAGcombiner selects instruction sequences it could increase the critical path or resource len. For example, on arm64 there are multiply-accumulate instructions (madd, msub). If e.g. the equivalent multiply-add sequence is not on the crictial path it makes sense to select it instead of the combined, single accumulate instruction (madd/msub). The reason is that the conversion from add+mul to the madd could lengthen the critical path by the latency of the multiply. But the DAGCombiner would always combine and select the madd/msub instruction. This patch uses machine trace metrics to estimate critical path length and resource length of an original instruction sequence vs a combined instruction sequence and picks the faster code based on its estimates. This patch only commits the target independent framework that evaluates and selects code sequences. The machine instruction combiner is turned off for all targets and expected to evolve over time by gradually handling DAGCombiner pattern in the target specific code. This framework lays the groundwork for fixing rdar://16319955 llvm-svn: 214666	2014-08-03 21:35:39 +00:00
Saleem Abdulrasool	ca07b08496	MC: virtualise EmitWindowsUnwindTables This makes EmitWindowsUnwindTables a virtual function and lowers the implementation of the function to the X86WinCOFFStreamer. This method is a target specific operation. This enables making the behaviour target dependent by isolating it entirely to the target specific streamer. llvm-svn: 214664	2014-08-03 18:51:26 +00:00
Saleem Abdulrasool	7b26bbada1	MC: rename Win64EHFrameInfo to WinEH::FrameInfo The frame information stored in this structure is driven by the requirements for Windows NT unwinding rather than Windows 64 specifically. As a result, this type can be shared across multiple architectures (ARM, AXP, MIPS, PPC, SH). Rename this class in preparation for adding support for supporting unwinding information for Windows on ARM. Take the opportunity to constify the members as everything except the ChainedParent is read-only. This required some adjustment to the label handling. llvm-svn: 214663	2014-08-03 18:51:17 +00:00
Matt Arsenault	90c2681a28	R600/SI: Fix extra whitespace in asm str This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660	2014-08-03 05:27:14 +00:00
Manman Ren	97601eacaa	[SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion. When we have a covered lookup table, make sure we don't delete PHINodes that are cached in PHIs. rdar://17887153 llvm-svn: 214642	2014-08-02 23:41:54 +00:00
Joerg Sonnenberger	1253396c1a	tlbia support llvm-svn: 214640	2014-08-02 20:16:29 +00:00
Joerg Sonnenberger	e208527e31	mfdcr / mtdcr support llvm-svn: 214639	2014-08-02 20:00:26 +00:00
Erik Eckstein	4cdbc63bd2	fix bug 20513 - Crash in SLP Vectorizer llvm-svn: 214638	2014-08-02 19:39:42 +00:00
Joerg Sonnenberger	ca3cb82714	Don't use additional arguments for dss and friends to satisfy DSS_Form, when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636	2014-08-02 15:09:41 +00:00
James Molloy	240087c61a	[AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634	2014-08-02 14:51:24 +00:00
Chandler Carruth	b86a753f28	[x86] Remove the FIXME that was implemented in r214628. Managed to forget to update the comment here... =/ llvm-svn: 214630	2014-08-02 11:34:23 +00:00
Chandler Carruth	ccbf36b272	[x86] Largely complete the use of PSHUFB in the new vector shuffle lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628	2014-08-02 10:39:15 +00:00
Chandler Carruth	a780975d83	[x86] Switch to using the variable we extracted this operand into. Spotted this missed refactoring by inspection when reading code, and it doesn't changethe functionality at all. llvm-svn: 214627	2014-08-02 10:29:36 +00:00
Chandler Carruth	9b3e1eb850	[x86] Fix a few typos in my comments spotted in passing. llvm-svn: 214626	2014-08-02 10:29:34 +00:00
Chandler Carruth	6875b68d8c	[x86] Teach the target shuffle mask extraction to recognize unary forms of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point outputs despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. llvm-svn: 214625	2014-08-02 10:27:38 +00:00
Chandler Carruth	746864d542	[x86] Teach my pshufb comment printer to handle VPSHUFB forms as well as PSHUFB forms. This will be important to update some AVX tests when I add PSHUFB combining. llvm-svn: 214624	2014-08-02 10:08:17 +00:00
Chandler Carruth	f8f86e7b92	[SDAG] Refactor the code which deletes nodes in the DAG combiner to do so using a single helper which adds operands back onto the worklist. Several places didn't rigorously do this but a couple already did. Factoring them together and doing it rigorously is important to delete things recursively early on in the combiner and get a chance to see accurate hasOneUse values. While no existing test cases change, an upcoming patch to add DAG combining logic for PSHUFB requires this to work correctly. llvm-svn: 214623	2014-08-02 10:02:07 +00:00
Owen Anderson	6c991acdcf	Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be constant-folded during DAGCombine in certain circumstances. Unfortunately, the circumstances required to trigger the issue seem to require a pretty specific interaction of DAGCombines, and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64. The functionality added here is replicated in essentially every other DAG combine, so it seems pretty obviously correct. llvm-svn: 214622	2014-08-02 08:45:33 +00:00
Justin Bogner	af51eedfd7	CodeGen: Remove commented out code These two lines have been commented out for over 4 years. They aren't helping anyone. llvm-svn: 214615	2014-08-02 06:47:07 +00:00
Akira Hatanaka	5a2758bfe7	[ARM] In dynamic-no-pic mode, ARM's post-RA pseudo expansion was incorrectly expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used in pic mode. This patch fixes the bug. <rdar://problem/17886592> llvm-svn: 214614	2014-08-02 05:40:40 +00:00
Lang Hames	f2bb6bf8f0	[MCJIT] Fix an overly-aggressive check in RuntimeDyldMachOARM. This should fix the MachO_ARM_PIC_relocations.s test failures on some 32-bit testers. llvm-svn: 214613	2014-08-02 03:00:49 +00:00
Matt Arsenault	ea70093fdd	R600/SI: Fix formatting. Avoid weird line wrapping of BuildMI dest register. llvm-svn: 214608	2014-08-02 01:10:28 +00:00
Alexey Samsonov	a08644f560	[ASan] Use metadata to pass source-level information from Clang to ASan. Instead of creating global variables for source locations and global names, just create metadata nodes and strings. They will be transformed into actual globals in the instrumentation pass (if necessary). This approach is more flexible: 1) we don't have to ensure that our custom globals survive all the optimizations 2) if globals are discarded for some reason, we will simply ignore metadata for them and won't have to erase corresponding globals 3) metadata for source locations can be reused for other purposes: e.g. we may attach source location metadata to alloca instructions and provide better descriptions for stack variables in ASan error reports. No functionality change. llvm-svn: 214604	2014-08-02 00:35:50 +00:00
Chandler Carruth	70ba0f5168	[SDAG] Allow the legalizer to delete an illegally typed intermediate introduced during legalization. This pattern is based on other patterns in the legalizer that I changed in the same way. Now, the legalizer eagerly collects its garbage when necessary so that we can survive leaving such nodes around for it. Instead, we add an assert to make sure the node will be correctly handled by that layer. llvm-svn: 214602	2014-08-02 00:24:54 +00:00
Chandler Carruth	d58d700ebe	[SDAG] Let the DAG combiner take care of dead nodes rather than manually deleting them. This already seems to work, as no tests fail without this. llvm-svn: 214601	2014-08-02 00:19:10 +00:00
Tyler Nowicki	0eb1e96567	Add diagnostics to the vectorizer cost model. When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599	2014-08-02 00:14:03 +00:00
Duncan P. N. Exon Smith	1780c7d423	IR: Add Value::reverseUseList() I'm going to use this to improve `verify-uselistorder`. Part of PR5680. llvm-svn: 214594	2014-08-01 23:28:49 +00:00
Peter Collingbourne	ca6ae41998	PartiallyInlineLibCalls: Check sqrt result type before transforming it. Some configure scripts declare this with the wrong prototype, which can lead to an assertion failure. llvm-svn: 214593	2014-08-01 23:21:21 +00:00
Duncan P. N. Exon Smith	3116b2db7f	verify-uselistorder: Move shuffleUseLists() out of lib/IR `shuffleUseLists()` is only used in `verify-uselistorder`, so move it there to avoid bloating other executables. As a drive-by, update some of the header docs. This is part of PR5680. llvm-svn: 214592	2014-08-01 23:03:36 +00:00
Adrian Prantl	61248f44e7	Attempt to increase the overall happiness of the MSCV-based buildbots. llvm-svn: 214588	2014-08-01 22:56:10 +00:00
Justin Bogner	48a07cae49	InstrProf: Allow multiple functions with the same name This updates the instrumentation based profiling format so that when we have multiple functions with the same name (but different function hashes) we keep all of them instead of rejecting the later ones. There are a number of scenarios where this can come up where it's more useful to keep multiple function profiles: * Name collisions in unrelated libraries that are profiled together. * Multiple "main" functions from multiple tools built against a common library. * Combining profiles from different build configurations (ie, asserts and no-asserts) The profile format now stores the number of counters between the hash and the counts themselves, so that multiple sets of counts can be stored. Since this is backwards incompatible, I've bumped the format version and added some trivial logic to skip this when reading the old format. llvm-svn: 214585	2014-08-01 22:50:07 +00:00
Duncan P. N. Exon Smith	2097a3a4ce	UseListOrder: Guarantee that shuffles change use-list order Change shuffleUseLists() always to change use-list order by rejecting orders that have no changes. This is part of PR5680. llvm-svn: 214584	2014-08-01 22:50:04 +00:00
Duncan P. N. Exon Smith	03001780da	UseListOrder: Fix blockaddress use-list order `parseBitcodeFile()` uses the generic `getLazyBitcodeFile()` function as a helper. Since `parseBitcodeFile()` isn't actually lazy -- it calls `MaterializeAllPermanently()` -- bypass the unnecessary call to `materializeForwardReferencedFunctions()` by extracting out a common helper function. This removes the last of the use-list churn caused by blockaddresses. This highlights that we can't reproduce use-list order of globals and constants when parsing lazily -- but that's necessarily out of scope. When we're parsing lazily, we never have all the functions in memory, so the use-lists of globals (and constants that reference globals) are always incomplete. This is part of PR5680. llvm-svn: 214581	2014-08-01 22:27:19 +00:00
Akira Hatanaka	2cf112b51e	[X86] Simplify X87 stackifier pass. Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580	2014-08-01 22:19:41 +00:00
Adrian Prantl	bce66c31aa	Debug info: Infrastructure to support debug locations for fragmented variables (for example, by-value struct arguments passed in registers, or large integer values split across several smaller registers). On the IR level, this adds a new type of complex address operation OpPiece to DIVariable that describes size and offset of a variable fragment. On the DWARF emitter level, all pieces describing the same variable are collected, sorted and emitted as DWARF expressions using the DW_OP_piece and DW_OP_bit_piece operators. http://reviews.llvm.org/D3373 rdar://problem/15928306 What this patch doesn't do / Future work: - This patch only adds the backend machinery to make this work, patches that change SROA and SelectionDAG's type legalizer to actually create such debug info will follow. (http://reviews.llvm.org/D2680) - Making the DIVariable complex expressions into an argument of dbg.value will reduce the memory footprint of the debug metadata. - The sorting/uniquing of pieces should be moved into DebugLocEntry, to facilitate the merging of multi-piece entries. llvm-svn: 214576	2014-08-01 22:11:58 +00:00
Chandler Carruth	697d6af472	[SDAG] MorphNodeTo recursively deletes dead operands of the old fromulation of the node, which isn't really the desired behavior from within the combiner or legalizer, but is necessary within ISel. I've added a hopefully helpful comment and fixed the only two places where this took place. Yet another step toward the combiner and legalizer not needing to use update listeners with virtual calls to manage the worklists behind legalization and combining. llvm-svn: 214574	2014-08-01 22:09:43 +00:00
Tom Stellard	2e31693e97	Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp" This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572	2014-08-01 21:55:50 +00:00
Duncan P. N. Exon Smith	323f635bbe	BitcodeReader: Change mechanics of BlockAddress forward references, NFC Now that we can reliably handle forward references to `BlockAddress` (r214563), change the mechanics to simplify predicting use-list order. Previously, we created dummy `GlobalVariable`s to represent block addresses. After every function was materialized, we'd go through any forward references to its blocks and RAUW them with a proper `BlockAddress` constant. This causes some (potentially a lot of) unnecessary use-list churn, since any constant expression that it's a part of will need to be rematerialized as well. Instead, pre-construct a `BasicBlock` immediately -- without attaching it to its (empty) `Function` -- and use that to construct a `BlockAddress`. This constant will not have to be regenerated. When the function body is parsed, hook this pre-constructed basic block up in the right place using `BasicBlock::insertInto()`. Both before and after this change, the IR is temporarily in an invalid state that gets resolved when `materializeForwardReferencedFunctions()` gets called. This is a prep commit that's part of PR5680, but the only functionality change is the reduction of churn in the constant pool. llvm-svn: 214570	2014-08-01 21:51:52 +00:00
Tom Stellard	db07c33258	R600/SI: Remove leftover debugging code llvm-svn: 214569	2014-08-01 21:51:05 +00:00
Tom Stellard	150fd6c318	R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566	2014-08-01 21:50:47 +00:00
Duncan P. N. Exon Smith	d51a043e5b	IR: Add BasicBlock::insertInto() Although unlinked `BasicBlock`s can be created, there's currently no way to insert them into `Function`s after the fact. In particular, `moveAfter()` and `moveBefore()` require that the basic block is already linked. Extract the logic for initially linking a `BasicBlock` out of the constructor and into a member function that can be used for lazy insertion. - Asserts that the basic block is currently unlinked. - Matches the logic of the constructor. - Changed the constructor to use it since the logic matches. This is needed in a follow-up commit for PR5680. llvm-svn: 214563	2014-08-01 21:22:04 +00:00
Peter Collingbourne	dd57da004b	[dfsan] Correctly handle loads and stores of zero size. llvm-svn: 214561	2014-08-01 21:18:18 +00:00
Eric Christopher	dfc6457da2	Add a non-const subtarget returning function to the target machine so that we can use it to get the old-style JIT out of the subtarget. This code should be removed when the old-style JIT is removed (imminently). llvm-svn: 214560	2014-08-01 21:18:01 +00:00
Duncan P. N. Exon Smith	9829cda628	BitcodeReader: Fix some BlockAddress forward reference corner cases `BlockAddress`es are interesting in that they can reference basic blocks from outside the block's function. Since basic blocks are not global values, this presents particular challenges for lazy parsing. One corner case was found in PR11677 and fixed in r147425. In that case, a global variable references a block address. It's necessary to load the relevant function to resolve the forward reference before doing anything with the module. By inspection, I found (and have fixed here) two other cases: - An instruction from one function references a block address from another function, and only the first function is lazily loaded. I fixed this the same way as PR11677: by eagerly loading the referenced function. - A function whose block address is taken is dematerialized, leaving invalid references to it. I fixed this by refusing to dematerialize functions whose block addresses are taken (if you have to load it, you can't unload it). llvm-svn: 214559	2014-08-01 21:11:34 +00:00
Reid Kleckner	bfbea18b59	MS inline asm: Use memory constraints for functions instead of registers This is consistent with how we parse them in a standalone .s file, and inline assembly shouldn't differ. This fixes errors about requiring more registers than available in cases like this: void f(); void __declspec(naked) g() { __asm pusha __asm call f __asm popa __asm ret } There are no registers available to pass the address of 'f' into the asm blob. The asm should now directly call 'f'. Tests will land in Clang shortly. llvm-svn: 214550	2014-08-01 20:21:24 +00:00
Chandler Carruth	d0836ba062	[SDAG] Begin simplifying the way in which the legalizer deletes nodes. This lifts the (very few) places the legalizer would delete dead nodes into the outer loop around the legalizer. This is significantly simpler because it doesn't require the legalizer itself to manage the iterator validity, and it doesn't require the legalizer to be a DAG update listener in order to remove things from the legalized set. It also makes the interface much less contrived for the case of the legalizer running inside the last phase of DAG combining. I'm working on centralizing the deletion of nodes during both legalizing and combining as much as possible. My hope is to remove the need for DAG update listeners from the combiner next, which would remove a costly virtual dispatch chain on every deletion. This in turn should allow us to more aggressively delete DAG nodes during combining which will in turn allow us to combine more aggressively by exposing the actual nodes which have single users to the combine phases. llvm-svn: 214546	2014-08-01 19:49:59 +00:00
Juergen Ributzka	bb4322fe8b	[FastISel][AArch64] Fold offset into the memory operation. Fold simple offsets into the memory operation: add x0, x0, #8 ldr x0, [x0] --> ldr x0, [x0, #8] Fixes <rdar://problem/17887945>. llvm-svn: 214545	2014-08-01 19:40:16 +00:00
Rafael Espindola	1181f5627e	Include Archive.h MSVC was complaining about Archive being an incomplete type. llvm-svn: 214542	2014-08-01 19:28:15 +00:00
Rafael Espindola	e64bd7fea2	Move virtual method out of line. Should fix the MSVC build. llvm-svn: 214539	2014-08-01 18:49:24 +00:00
Philip Reames	6f2c62b7c7	Add support for StackMap section for ELF/Linux systems This patch adds code to emits the StackMap section on ELF systems. This section is required to support llvm.experimental.stackmap and llvm.experimental.patchpoint intrinsics. Reviewers: ributzka, echristo Differential Revision: http://reviews.llvm.org/D4574 llvm-svn: 214538	2014-08-01 18:47:09 +00:00
Juergen Ributzka	5fad1ddebc	[FastISel][AArch64] Add branch weights. Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). Fixes <rdar://problem/17887137>. llvm-svn: 214537	2014-08-01 18:39:24 +00:00
Philip Reames	9ca9533b96	Explicitly report runtime stack realignment in StackMap section This change adds code to explicitly mark a function which requires runtime stack realignment as not having a fixed frame size in the StackMap section. As it happens, this is not actually a functional change. The size that would be reported without the check is also "-1", but as far as I can tell, that's an accident. The code change makes this explicit. Note: There's a separate bug in handling of stackmaps and patchpoints in functions which need dynamic frame realignment. The current code assumes that offsets can be calculated from RBP, but realigned frames must use RSP. (There's a variable gap between RBP and the spill slots.) This change set does not address that issue. Reviewers: atrick, ributzka Differential Revision: http://reviews.llvm.org/D4572 llvm-svn: 214534	2014-08-01 18:26:27 +00:00
Rafael Espindola	f2145bd790	Replace comment about ownership with std::unique_ptr. llvm-svn: 214533	2014-08-01 18:09:32 +00:00
Juergen Ributzka	51affe9d31	[FastISel][ARM] Do not emit stores for undef arguments. This is a followup patch for r214366, which added the same behavior to the AArch64 and X86 FastISel code. This fix reproduces the already existing behavior of SelectionDAG in FastISel. llvm-svn: 214531	2014-08-01 18:04:14 +00:00
Rafael Espindola	4b31591fa8	Use range loop. llvm-svn: 214530	2014-08-01 18:04:14 +00:00
Renato Golin	5b6b134ddb	Add missing breaks to AArch64InstrInfo::isGPRCopy llvm-svn: 214528	2014-08-01 17:27:31 +00:00
Matt Arsenault	550a31beed	R600/SI: Don't display GDS bit for read2 This isn't displayed for any other instructions anymore, and isn't ever used. llvm-svn: 214523	2014-08-01 17:00:26 +00:00
Chad Rosier	69caabf908	[AArch64] Generate tbz/tbnz when comparing against zero. The tbz/tbnz checks the sign bit to convert op w1, w1, w10 cmp w1, #0 b.lt .LBB0_0 to op w1, w1, w10 tbnz w1, #31, .LBB0_0 Differential Revision: http://reviews.llvm.org/D4440 llvm-svn: 214518	2014-08-01 14:48:56 +00:00
Ulrich Weigand	9dae728acd	[PowerPC] PR20280 - Slots for byval parameters are not immutable Found by inspection while looking at PR20280: code would mark slots in the parameter save area where a byval parameter is passed as "immutable". This is not correct since code is allowed to modify byval parameters in place in the parameter save area. llvm-svn: 214517	2014-08-01 14:35:58 +00:00
Rafael Espindola	19e7ab14ac	Remove some calls to std::move. Instead of moving out the data in a ErrorOr<std::unique_ptr<Foo>>, get a reference to it. Thanks to David Blaikie for the suggestion. llvm-svn: 214516	2014-08-01 14:31:55 +00:00
Rafael Espindola	18654adb14	[pr20127] Check for leading \1 in the Twine version of getNameWithPrefix. No functionality change, but will simplify an upcoming patch that uses the Twine version. llvm-svn: 214515	2014-08-01 14:16:40 +00:00
James Molloy	17c114f241	Allow only disassembling of M-class MSR masks that the assembler knows how to assemble back. Note: The current code in DecodeMSRMask() rejects the unpredictable A/R MSR mask '0000' with Fail. The code in the patch follows this style and rejects unpredictable M-class MSR masks also with Fail (instead of SoftFail). If SoftFail is preferred in this case then additional changes to ARMInstPrinter (to print non-symbolic masks) and ARMAsmParser (to parse non-symbolic masks) will be needed. Patch by Petr Pavlu! llvm-svn: 214505	2014-08-01 12:42:11 +00:00
Aaron Ballman	182f1c73e1	Improve some const-correctness to remove a -Wcast-qual warning. No functional changes intended. llvm-svn: 214503	2014-08-01 12:34:58 +00:00
Tilmann Scheller	2714ae3ed8	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM LDRB/LDRSB instructions. The ARM ARM prohibits LDRB/LDRSB instructions with writeback into the destination register. With this commit this constraint is now enforced and we stop assembling LDRH/LDRSH instructions with unpredictable behavior. llvm-svn: 214500	2014-08-01 12:08:04 +00:00
Tilmann Scheller	ac50eef593	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM LDRH/LDRSH instructions. The ARM ARM prohibits LDRH/LDRSH instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling LDRH/LDRSH instructions with unpredictable behavior. llvm-svn: 214499	2014-08-01 11:33:47 +00:00
Tilmann Scheller	4926154eef	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM LDR instructions. The ARM ARM prohibits LDR instructions with writeback into the destination register. With this commit this constraint is now enforced and we stop assembling LDR instructions with unpredictable behavior. llvm-svn: 214498	2014-08-01 11:08:51 +00:00
Erik Eckstein	7fd2b55433	SLPVectorizer: fix build problem in Release configuration llvm-svn: 214496	2014-08-01 09:47:38 +00:00
Erik Eckstein	92257de62c	SLPVectorizer: improved scheduling algorithm. llvm-svn: 214494	2014-08-01 09:20:42 +00:00
Daniel Sanders	c0101f770f	[mips][PR19612] Fix va_arg for big-endian mode. Summary: Big-endian mode was not correctly adjusting the offset for types smaller than an ABI slot. Fixes PR19612 Reviewers: dsanders Reviewed By: dsanders Subscribers: sstankovic, llvm-commits Differential Revision: http://reviews.llvm.org/D4556 llvm-svn: 214493	2014-08-01 09:17:39 +00:00
Erik Eckstein	274e069a3b	SLP Vectorizer: added statistics counter llvm-svn: 214487	2014-08-01 08:14:28 +00:00
Erik Eckstein	49d49372e6	SLP Vectorizer: improve canonicalize tree operands of commutitive binary operands. This reverts r214338 (except the test file) and replaces it with a more general algorithm. llvm-svn: 214485	2014-08-01 08:05:55 +00:00
Hal Finkel	f46410e6eb	[PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads Altivec vector loads on PowerPC have an interesting property: They always load from an aligned address (by rounding down the address actually provided if necessary). In order to generate an actual unaligned load, you can generate two load instructions, one with the original address, one offset by one vector length, and use a special permutation to extract the bytes desired. When this was originally implemented, I generated these two loads using regular ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with this: The alignment of a load does not contribute to its identity, and SDNodes are uniqued. So, imagine that we have some unaligned load, L1, that is not aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned). Further imagine that there had already existed a load (L1+16)(unaligned) with the same chain operand as the load L1. When (L1+16)(aligned) is created as part of the lowering of L1, this load is also the (L1+16)(unaligned) node, just now marked as aligned (because the new alignment overwrites the old). But the original users of (L1+16)(unaligned) now get the data intended for the permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists to get its own permutation-based expansion. This was PR19991. A second potential problem has to do with the MMOs on these loads, which can be used by AA during instruction scheduling to break chain-based dependencies. If the new "aligned" loads get the MMO from the original unaligned load, this does not represent the fact that it will load data from below the original address. Normally, this would not matter, but this load might be combined with another load pair for a previous vector, and then the dependency on the otherwise- ignored lower bytes can matter. To fix both problems, instead of generating the necessary loads using regular ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are provided with MMOs with a conservative address range. Unfortunately, I no longer have a failing test case (since PR19991 was reported, other changes in CodeGen have forced this bug back into hiding it again). Nevertheless, this should fix the underlying problem. llvm-svn: 214481	2014-08-01 05:20:41 +00:00
Suyog Sarda	2935dc4c7e	This patch implements transform for pattern "(A & ~B) ^ (~A) -> ~(A & B)". Differential Revision: http://reviews.llvm.org/D4653 llvm-svn: 214479	2014-08-01 05:07:20 +00:00
Suyog Sarda	8e372e5c2f	This patch implements transform for pattern "(A \| B) & ((~A) ^ B) -> (A & B)". Differential Revision: http://reviews.llvm.org/D4628 llvm-svn: 214478	2014-08-01 04:59:26 +00:00
Suyog Sarda	e84d4ba7d3	This patch implements transform for pattern "( A & (~B)) \| (A ^ B) -> (A ^ B)" Differential Revision: http://reviews.llvm.org/D4652 llvm-svn: 214477	2014-08-01 04:50:31 +00:00
Suyog Sarda	b8765dbda2	This patch implements transform for pattern "(A & B) \| ((~A) ^ B) -> (~A ^ B)". Patch Credit to Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4655 llvm-svn: 214476	2014-08-01 04:41:43 +00:00
Tom Stellard	fd915598f1	R600/SI: Fix build warning llvm-svn: 214475	2014-08-01 02:05:57 +00:00
Juergen Ributzka	1686869897	[FastISel][AArch64] Fix the immediate versions of the {s\|u}{add\|sub}.with.overflow intrinsics. ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit. This fix checks if the immediate version can be used under this constraints and if we can convert ADDS to SUBS or vice versa to support negative immediates. Also update the test cases to test the immediate versions. llvm-svn: 214470	2014-08-01 01:25:55 +00:00
Hal Finkel	3be61a8b81	[PowerPC] Recognize consecutive memory accesses from intrinsics When generating unaligned vector loads, we need to search for other loads or stores nearby offset by one vector width. If we find one, then we know that we can safely generate another aligned load at that address. Otherwise, we must generate the next load using an offset of the vector width minus one byte (so we don't read off the end of the allocation if the base unaligned address happened to be aligned at runtime). We had previously done this using only other vector loads and stores, but did not consider the PowerPC-specific vector load/store intrinsics. Now we'll also consider vector intrinsics. By itself, this change is a feature enhancement, but is a necessary step toward fixing the underlying problem behind PR19991. llvm-svn: 214469	2014-08-01 01:02:01 +00:00
Reid Kleckner	b48ce52729	MS inline asm: Fix null SMLoc when 'ptr' is missing after dword & co This improves the diagnostics from the regular assembler, but more importantly it fixes an assertion when parsing inline assembly. Test landing in Clang. llvm-svn: 214468	2014-08-01 00:59:22 +00:00
Tom Stellard	827479f1ca	R600/SI: Do abs/neg folding with ComplexPatterns Abs/neg folding has moved out of foldOperands and into the instruction selection phase using complex patterns. As a consequence of this change, we now prefer to select the 64-bit encoding for most instructions and the modifier operands have been dropped from integer VOP3 instructions. llvm-svn: 214467	2014-08-01 00:32:39 +00:00
Tom Stellard	f52d670860	R600/SI: Simplify and fix handling of VOP2 in SIInstrInfo::legalizeOperands We were incorrectly assuming that all VOP2 instructions can read SGPRs in Src0, but this is not true for instructions that read carry-in from VCC. The old logic has been replaced with new logic which checks the defined register classes of the VOP2 instruction to determine whether or not to legalize the operands. llvm-svn: 214465	2014-08-01 00:32:35 +00:00
Tom Stellard	313fcff563	R600/SI: Fold immediates when shrinking instructions This will prevent us from using extra MOV instructions once we prefer selecting 64-bit instructions. llvm-svn: 214464	2014-08-01 00:32:33 +00:00
Tom Stellard	3ac3ae86a9	R600/SI: Fix incorrect commute operation in shrink instructions pass We were commuting the instruction by still shrinking it using the original opcode. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 214463	2014-08-01 00:32:28 +00:00
Kevin Enderby	0615385ba4	Add support for the X86 secure guard extensions instructions in assembler (SGX). This allows assembling the two new instructions, encls and enclu for the SKX processor model. Note the diffs are a bigger than what might think, but to fit the new MRM_CF and MRM_D7 in things in the right places things had to be renumbered and shuffled down causing a bit more diffs. rdar://16228228 llvm-svn: 214460	2014-07-31 23:57:38 +00:00
Reid Kleckner	d154a413b6	X86 MC: Don't crash on empty memory operand parens Instead, create an absolute memory operand. Fixes PR20504. llvm-svn: 214457	2014-07-31 23:26:35 +00:00
Reid Kleckner	2a45d5a920	X86 MC: Reject invalid segment registers before a memory operand colon Previously we would execute unreachable during object emission. llvm-svn: 214456	2014-07-31 23:03:22 +00:00
Louis Gerbarg	751c622bb4	White space fix. llvm-svn: 214455	2014-07-31 22:57:46 +00:00
Louis Gerbarg	8048e52537	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449	2014-07-31 21:45:05 +00:00
Tyler Nowicki	f5be5413e6	Improve the remark generated for -Rpass-missed. The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss. Reviewed by Arnold Schwaighofer` llvm-svn: 214445	2014-07-31 21:22:22 +00:00
Eric Christopher	3f37143de7	Revert "Remove MCObjectDisassembler.cpp as it is untested and unused." as it is apparently used, but the build didn't return errors weirdly. This reverts commits 214437 and 214438. llvm-svn: 214444	2014-07-31 21:18:38 +00:00
Tyler Nowicki	192485e325	Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable. Reviewed by Arnold Schwaighofer llvm-svn: 214440	2014-07-31 21:02:40 +00:00
Aaron Ballman	6dc20e8b15	Fixing CMake problems with MCObjectDisassembler.cpp not existing. llvm-svn: 214438	2014-07-31 20:48:54 +00:00
Eric Christopher	f1ff92a15e	Remove MCObjectDisassembler.cpp as it is untested and unused. llvm-svn: 214437	2014-07-31 20:44:46 +00:00
Rafael Espindola	2ddb97e871	DWOHolder takes ownership of the argument constructor, use std::unique_ptr. Thanks to David Blaikie for noticing it. llvm-svn: 214434	2014-07-31 20:26:42 +00:00
Rafael Espindola	3127847c8f	Use a reference instead of a pointer. This makes using a std::unique_ptr in the caller more convenient. llvm-svn: 214433	2014-07-31 20:19:36 +00:00
Will Schmidt	4841f6aa42	Disable IsSub subregister assert. pr18663. This is a follow-up to the activity in the bug at http://llvm.org/bugs/show_bug.cgi?id=18663 . The underlying issue has to do with how the KILL pseudo-instruction is handled. I defer to Hal/Jakob/Uli for additional details and background. This will disable the (bad?) assert, add an associated fixme comment, and add a pair of tests. The code change and the pr18663-2.ll test are copied from the referenced bug. That test does not immediately fail in my environment, but I have added the pr18663.ll test which does. (Comment from Hal) to provide everyone else with some context, this assert was not bad when it was written. At that time, we only generated KILL pseudo instructions around subregister copies. This logic, unfortunately, had its own problems. In r199797, the relevant logic in MachineCopyPropagation was replaced to generate KILLs for other kinds of copies too. This change in semantics broke this now-problematic assumption in AggressiveAntiDepBreaker. The AggressiveAntiDepBreaker really needs a proper cleanup to deal with the change, but removing the assert (which just allows the function to return false) is a safe conservative behavior, and should do for the time being. llvm-svn: 214429	2014-07-31 19:50:53 +00:00
Rafael Espindola	b038e67ccd	Move MCObjectSymbolizer.h to MC/MCAnalysis. The cpp file is already in lib/MC/MCAnalysis. llvm-svn: 214424	2014-07-31 19:29:23 +00:00
Hal Finkel	89deb1a79e	Fix ScalarEvolutionExpander when creating a PHI in a block with duplicate predecessors It seems that when I fixed this, almost exactly a year ago, I did not quite do it correctly. When we have duplicate block predecessors, we can indeed not have different incoming values for the same block, but we must have duplicate entries. So, instead of skipping the duplicates, we explicitly add the duplicate incoming values. Fixes PR20442. llvm-svn: 214423	2014-07-31 19:13:38 +00:00
Duncan P. N. Exon Smith	ad02adcc91	UseListOrder: Handle self-users Correctly sort self-users (such as PHI nodes). I added a targeted test in `test/Bitcode/use-list-order.ll` and the final missing RUN line to tests in `test/Assembly`. This is part of PR5680. llvm-svn: 214417	2014-07-31 18:33:12 +00:00
Eric Christopher	90212bdd40	Fix loop end condition. Note: This code appears to be untested. llvm-svn: 214416	2014-07-31 18:28:08 +00:00
Aaron Ballman	e2e6979c9d	Fixing an -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended. llvm-svn: 214400	2014-07-31 12:58:50 +00:00
Aaron Ballman	eb534378e6	Fixing a -Wcast-qual warning in GCC. No functional changes. llvm-svn: 214399	2014-07-31 12:55:49 +00:00
Evgeniy Stepanov	ae18e84dcd	[msan] Fix handling of array types. Switch array type shadow from a single integer to an array of integers (i.e. make it per-element). This simplifies instrumentation of extractvalue and fixes PR20493. llvm-svn: 214398	2014-07-31 11:02:27 +00:00
Evgeniy Stepanov	5738c3882f	[asan] Support x86 REP MOVS asm instrumentation. Patch by Yuri Gorshenin. llvm-svn: 214395	2014-07-31 09:11:04 +00:00
Stepan Dyatkovskiy	a172d88bc2	MergeFunctions, tiny refactoring: cmpOperation has been renamed to cmpOperations (multiple form). llvm-svn: 214392	2014-07-31 07:16:59 +00:00
Juergen Ributzka	ed5bbc5130	[FastISel][AArch64] Add basic bitcast support for conversion between float and int. Fixes <rdar://problem/17867078>. llvm-svn: 214389	2014-07-31 06:25:37 +00:00
Juergen Ributzka	35e07f4cd0	[FastISel][AArch64] Add sqrt intrinsic support. Fixes <rdar://problem/17867067>. llvm-svn: 214388	2014-07-31 06:25:33 +00:00
David Majnemer	066fbe5798	InstCombine: Correctly propagate NSW/NUW for x-(-A) -> x+A We can only propagate the nsw bits if both subtraction instructions are marked with the appropriate bit. N.B. We only propagate the nsw bit in InstCombine because the nuw case is already handled in InstSimplify. This fixes PR20189. llvm-svn: 214385	2014-07-31 04:49:29 +00:00
David Majnemer	ad214c8e9e	InstSimplify: Simplify (X - (0 - Y)) if the second sub is NUW If the NUW bit is set for 0 - Y, we know that all values for Y other than 0 would produce a poison value. This allows us to replace (0 - Y) with 0 in the expression (X - (0 - Y)) which will ultimately leave us with X. This partially fixes PR20189. llvm-svn: 214384	2014-07-31 04:49:18 +00:00
Juergen Ributzka	622f41919a	[FastISel][AArch64] Add MachO large code model support for function calls. Currently the large code model for MachO uses the GOT to make function calls. Emit the required adrp and ldr instructions to load the address from the GOT. Related to <rdar://problem/17733076>. llvm-svn: 214381	2014-07-31 04:10:40 +00:00
Rafael Espindola	c2dc7844eb	A std::unique_ptr case I missed in the previous patch. llvm-svn: 214379	2014-07-31 03:36:00 +00:00
Rafael Espindola	191faa331e	Use std::unique_ptr to make the ownership explicit. llvm-svn: 214377	2014-07-31 03:12:45 +00:00
Pete Cooper	0141b69686	Don't fail tablegen immediately after failing to set a value. Instead allow the variable to be declared, but don't attach an initializer. This allows more than a single error to be emitted before we exit. Test case to follow soon in another patch. llvm-svn: 214375	2014-07-31 01:44:00 +00:00
Pete Cooper	fbba8abe2f	Add a better error message when failing to assign one tablegen value to another This is currently for assigning from one bit init to another. It can easily be extended to other types. Test to follow soon in another patch. llvm-svn: 214374	2014-07-31 01:43:57 +00:00
Pete Cooper	3421ff45fe	Fix bit initializer which was one bit too long, but worked so long as we silently dropped the leading 0 llvm-svn: 214373	2014-07-31 01:43:54 +00:00
Pete Cooper	03a09096b4	Fix bit initializer which was one bit too long, but worked so long as we silently dropped the leading 0 llvm-svn: 214372	2014-07-31 01:43:51 +00:00
Rafael Espindola	bb3f0d5cb5	Delete dead code. llvm-svn: 214370	2014-07-31 01:14:09 +00:00

... 3 4 5 6 7 ...

71930 Commits