llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00

Author	SHA1	Message	Date
Adrian Prantl	9270b95c79	Improve performance of calculateDbgValueHistory. In r210492 the logic of calculateDbgValueHistory was changed to end register variable live ranges at the end of MBB conditionally on the fact that the register was or not clobbered by the function body. This requires an initial scan of all the operands of the function to collect all clobbered registers. In a second pass over all instructions, we compare this set with the set of clobbered registers for the current MachineInstruction. This modification incurred a compilation time regression on some benchmarks: the debug info emission phase takes ~10% more time. While a small performance hit is unavoidable due to the initial scan requirement, we can improve the situation by avoiding to create too many temporary sets and just use lambdas to work directly on the result of the initial scan. Fixes <rdar://problem/17884104> Patch by Frederic Riss! llvm-svn: 214987	2014-08-06 18:41:24 +00:00
Adrian Prantl	84367341ee	Cleanup collectChangingRegs The handling of the epilogue is best expressed as an early exit and there is no reason to look for register defs in DbgValue MIs. Patch by Frederic Riss! llvm-svn: 214986	2014-08-06 18:41:19 +00:00
Reid Kleckner	bec7530633	Don't count inreg params when mangling fastcall functions This is consistent with MSVC. llvm-svn: 214981	2014-08-06 18:09:04 +00:00
Reid Kleckner	e340c78d47	Round up the size of byval arguments to MinAlign Otherwise we can end up with an argument frame size that is not a multiple of stack slot size, which is very awkward. This fixes PR20547, which was a bug in x86_64 Sys V vararg handling. However, it's much easier to test this with x86 callee-cleanup functions, which previously ended in "retl $6" instead of "retl $8". This does affect behavior of all backends, but it presumably fixes the same bug in all of them. llvm-svn: 214980	2014-08-06 17:57:23 +00:00
Chad Rosier	0214f513b9	[AArch64] Add a few isTarget* API to AArch64 Subtarget. llvm-svn: 214977	2014-08-06 16:56:58 +00:00
Chad Rosier	8291c24b09	[AArch64] Fix OS ABI flag for aarch64-linux-gnu target. For triple aarch64-linux-gnu we were incorrectly setting IRIX. For triple aarch64 we are correctly setting SYSV. Patch by Ana Pazos <apazos@codeaurora.org>. llvm-svn: 214974	2014-08-06 16:05:02 +00:00
Sanjay Patel	d5cb9b68e1	use register iterators that include self to reduce code duplication in CriticalAntiDepBreaker This patch addresses 2 FIXME comments that I added to CriticalAntiDepBreaker while fixing PR20020. Initialize an MCSubRegIterator and an MCRegAliasIterator to include the self reg. Assuming that works as advertised, there should be functional difference with this patch, just less code. Also, remove the associated asserts - we're setting those values just before, so the asserts don't do anything meaningful. Differential Revision: http://reviews.llvm.org/D4566 llvm-svn: 214973	2014-08-06 15:58:15 +00:00
Robert Khasanov	970483b673	[AVX512] Added load/store instructions to Register2Memory opcode tables. Added lowering tests for load/store. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214972	2014-08-06 15:40:34 +00:00
James Molloy	7518c61a09	[AArch64] Add a testcase for r214957. llvm-svn: 214965	2014-08-06 13:31:32 +00:00
James Molloy	0127d12e19	Add a new option -run-slp-after-loop-vectorization. This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around. llvm-svn: 214963	2014-08-06 12:56:19 +00:00
Tim Northover	0028a9a97d	ARM: do not generate BLX instructions on Cortex-M CPUs. Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959	2014-08-06 11:13:14 +00:00
Tim Northover	7abd4db81b	ARM-MachO: materialize callee address correctly on v4t. llvm-svn: 214958	2014-08-06 11:13:06 +00:00
James Molloy	2bbe86fab0	[AArch64] Conditional selects are expensive on out-of-order cores. Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it. This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp. llvm-svn: 214957	2014-08-06 10:42:18 +00:00
Chandler Carruth	2a63640957	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954	2014-08-06 10:16:36 +00:00
Chandler Carruth	8d598f2f29	[x86] Switch to a formulation of a for loop that is much more obviously not corrupting the mask by mutating it more times than intended. No functionality changed (the results were non-overlapping so the old version "worked" but was non-obvious). llvm-svn: 214953	2014-08-06 10:16:33 +00:00
Adam Nemet	aea49d4d5f	[X86] Fixes commit r214890 to match the posted patch This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951	2014-08-06 07:13:12 +00:00
Matt Arsenault	7d4ad478b1	Correct comment llvm-svn: 214945	2014-08-06 00:44:25 +00:00
Peter Collingbourne	129ba2ac92	[dfsan] Try not to create too many additional basic blocks in functions which already have a large number of blocks. Works around a performance issue with the greedy register allocator. llvm-svn: 214944	2014-08-06 00:33:40 +00:00
Matt Arsenault	3005d01057	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	68712599d9	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
Quentin Colombet	8e3912c486	[X86][SchedModel] Fixed some wrong scheduling model found by code inspection. Source: Agner Fog's Instruction tables. Related to <rdar://problem/15607571> llvm-svn: 214940	2014-08-06 00:22:39 +00:00
David Blaikie	37a242b70f	DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range. This was coming in weird debug info that had variables (and hence debug_locs) but was in GMLT mode (because it was missing the 13th field of the compile_unit metadata) so no ranges were constructed. We should always have at least one range for any CU with a debug_loc in it - because the range should cover the debug_loc. The assertion just ensures that the "!= 1" range case inside the subsequent loop doesn't get entered for the case where there are no ranges at all, which should never reach here in the first place. llvm-svn: 214939	2014-08-06 00:21:25 +00:00
Matt Arsenault	1ecd6214f0	R600/SI: Add definitions for ds_read2st64_ / ds_write2st64_ llvm-svn: 214936	2014-08-05 23:53:20 +00:00
JF Bastien	314c089ba3	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934	2014-08-05 23:27:34 +00:00
David Blaikie	fec0292411	DebugInfo: Move the reference to the CU from the location list entry to the list itself, since it is constant across an entire list. This simplifies construction and usage while making the data structure smaller. It was a holdover from the days when we didn't have a separate DebugLocList and all we had was a flat list of DebugLocEntries. llvm-svn: 214933	2014-08-05 23:14:16 +00:00
Rafael Espindola	c981388b03	Remove a virtual function from TargetMachine. NFC. llvm-svn: 214929	2014-08-05 22:10:21 +00:00
Jonathan Roelofs	7bcdffb32c	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Bill Schmidt	8159f9047b	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Rafael Espindola	e145188cc8	Don't internalize all but main by default. This is mostly a cleanup, but it changes a fairly old behavior. Every "real" LTO user was already disabling the silly internalize pass and creating the internalize pass itself. The difference with this patch is for "opt -std-link-opts" and the C api. Now to get a usable behavior out of opt one doesn't need the funny looking command line: opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts llvm-svn: 214919	2014-08-05 20:10:38 +00:00
Chandler Carruth	045f351a7f	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
Duncan P. N. Exon Smith	abe071addb	Remove dead code in condition Whether or not it's appropriate, labels have been first-class types since r51511. llvm-svn: 214908	2014-08-05 18:22:58 +00:00
NAKAMURA Takumi	b1c451ece2	X86CodeEmitter.cpp: Add SEH_Epilogue to ignored list for legacy JIT, corresponding to r214775. llvm-svn: 214905	2014-08-05 18:04:15 +00:00
Adam Nemet	83d4fdecfc	[X86] Improve comments for r214888 A rebase somehow ate my comments. This restores them. llvm-svn: 214903	2014-08-05 17:58:49 +00:00
Matt Arsenault	4273af4e86	R600/SI: Use register class instead of list of registers I'm not sure if this has any consequence or not. llvm-svn: 214902	2014-08-05 17:52:40 +00:00
Matt Arsenault	f63af80339	R600/SI: Add exec_lo and exec_hi subregisters. This allows accessing an SReg subregister with a normal subregister index, instead of getting a machine verifier error. Also be sure to include all of these subregisters in SReg_32. This fixes inferring SGPR instead of SReg when finding a super register class. llvm-svn: 214901	2014-08-05 17:52:37 +00:00
Duncan P. N. Exon Smith	7aaaba94bb	BitcodeReader: Fix non-determinism in use-list order `BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied in a non-deterministic order. When predicting use-list order I've worked around this another way, but even when parsing lazily (and we can't recreate use-list order) use-lists should be deterministic. Make them so by using a side-queue of functions with forward-referenced blocks that gets visited in order. llvm-svn: 214899	2014-08-05 17:49:48 +00:00
Philip Reames	7088e609a9	Remove dead zero store to calloc initialized memory Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897	2014-08-05 17:48:20 +00:00
Jonathan Roelofs	5721e98d43	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Sanjay Patel	1f56d11869	Optimize vector fabs of bitcasted constant integer values. Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892	2014-08-05 17:35:22 +00:00
Adam Nemet	d956b84b6b	[AVX512] Add masking variant and intrinsics for valignd/q This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890	2014-08-05 17:23:04 +00:00
Adam Nemet	3d838a0ae1	[X86] Increase X86_MAX_OPERANDS from 5 to 6 This controls the number of operands in the disassembler's x86OperandSets table. The entries describe how the operand is encoded and its type. Not to surprisingly 5 operands is insufficient for AVX512. Consider VALIGNDrrik in the next patch. These are its operand specifiers: { /* 328 */ { ENCODING_DUP, TYPE_DUP1 }, { ENCODING_REG, TYPE_XMM512 }, { ENCODING_WRITEMASK, TYPE_VK8 }, { ENCODING_VVVV, TYPE_XMM512 }, { ENCODING_RM_CD64, TYPE_XMM512 }, { ENCODING_IB, TYPE_IMM8 }, }, llvm-svn: 214889	2014-08-05 17:23:01 +00:00
Adam Nemet	6ce89969d7	[X86] Add lowering to VALIGN This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these at the same time, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. llvm-svn: 214888	2014-08-05 17:22:59 +00:00
Adam Nemet	3cbf71a23b	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887	2014-08-05 17:22:55 +00:00
Adam Nemet	1a6d2ace74	[AVX512] alignr: Use suffix rather than name argument to multiclass Again no functional change. This prepares for the suffix to be used with the intrinsic matching. llvm-svn: 214886	2014-08-05 17:22:52 +00:00
Adam Nemet	b6384f0bad	[AVX512] Pull everything alignr-related into the multiclass The packed integer pattern becomes the DAG pattern for rri and the packed float, another Pat<> inside the multiclass. No functional change. llvm-svn: 214885	2014-08-05 17:22:50 +00:00
Adam Nemet	04f70d285b	Wrap long lines llvm-svn: 214884	2014-08-05 17:22:47 +00:00
Jonathan Roelofs	32bc0b3143	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00
David Blaikie	5dcea2d2d6	Partially revert r214761 that asserted that all concrete debug info variables had DIEs, due to a failure on Darwin. I'll work on a reduction and fix after this. llvm-svn: 214880	2014-08-05 16:47:23 +00:00
Joerg Sonnenberger	6eb6423316	Add accessors for the PPC 403 bank registers. llvm-svn: 214875	2014-08-05 15:45:15 +00:00
Keith Walker	1855cb5c25	Specify that the thumb setend and blx <immed> instructions are not valid on an m-class target llvm-svn: 214871	2014-08-05 15:11:59 +00:00

1 2 3 4 5 ...

71712 Commits