llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00

Author	SHA1	Message	Date
Hans Wennborg	c2c8ff3fab	LowerSwitch: replace unreachable default with popular case destination SimplifyCFG currently does this transformation, but I'm planning to remove that to allow other passes, such as this one, to exploit the unreachable default. This patch takes care to keep track of what case values are unreachable even after the transformation, allowing for more efficient lowering. Differential Revision: http://reviews.llvm.org/D6697 llvm-svn: 226934	2015-01-23 20:43:51 +00:00
Colin LeMahieu	956de337d8	[Objdump] Output information about common symbols in a way closer to GNU objdump. llvm-svn: 226932	2015-01-23 20:06:24 +00:00
Kevin Enderby	5bd811e58f	Add the option, -data-in-code, to llvm-objdump used with -macho to print the Mach-O data in code table. llvm-svn: 226921	2015-01-23 18:52:17 +00:00
Reid Kleckner	f3d1116092	Classify functions by EH personality type rather than using the triple This mostly reverts commit r222062 and replaces it with a new enum. At some point this enum will grow at least for other MSVC EH personalities. Also beefs up the way we were sniffing the personality function. Previously we would emit the Itanium LSDA despite using __C_specific_handler. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6987 llvm-svn: 226920	2015-01-23 18:49:01 +00:00
Adrian Prantl	1201080352	Debug Info / PR22309: Allow union types to be emitted as unsigned constants. llvm-svn: 226919	2015-01-23 18:01:39 +00:00
Toma Tabacu	33566cf552	[mips] Add new error message and improve testing for parsing the .module directive. Summary: We used to silently ignore any empty .module's and we used to give an error saying that we found an "unexpected token at start of statement" when the value of the option wasn't an identifier (e.g. if it was a number). We now give an error saying that we "expected .module option identifier" in both of those cases. I also fixed the other tests in mips-abi-bad.s, which all seemed to be broken. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7095 llvm-svn: 226905	2015-01-23 10:40:19 +00:00
Jyoti Allur	44807650d4	This patch fixes issue with lowering below mentioned pattern :- _foo: smull r0, r1, r1, r0 smull r2, r3, r3, r2 adds r0, r2, r0 adc r1, r3, r1 bx lr to _foo: smull r0, r1, r1, r0 smlal r0, r1, r3, r2 bx lr llvm-svn: 226904	2015-01-23 09:10:03 +00:00
Craig Topper	043ec61ef1	[x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense. llvm-svn: 226902	2015-01-23 08:00:59 +00:00
Rafael Espindola	0b8bea563f	Add STB_GNU_UNIQUE to the ELF writer. This lets llvm-mc assemble files produced by gcc. llvm-svn: 226895	2015-01-23 04:44:35 +00:00
Jan Vesely	042d995634	R600: Try to use lower types for 64bit division if possible v2: add and enable tests for SI Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 226881	2015-01-22 23:42:43 +00:00
Simon Pilgrim	46d1b2f8de	[X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests. Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing. Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP. llvm-svn: 226873	2015-01-22 22:39:59 +00:00
Simon Pilgrim	1165de73cd	Line endings fixes. NFC. llvm-svn: 226872	2015-01-22 22:27:37 +00:00
Simon Pilgrim	8bb4ad9673	[X86][SSE] Simplified PSUBUS tests Removed loops from PSUBUS tests - ensures folding is tested. Also renamed SSE2 tests SSSE3 to match cpu. This is a follow up commit agreed in http://reviews.llvm.org/D7094 llvm-svn: 226871	2015-01-22 22:19:58 +00:00
Chandler Carruth	0147c610be	[PM] Actually add the new pass manager support for the assumption cache. I had already factored this analysis specifically to enable doing this, but hadn't actually committed the necessary wiring to get at this from the new pass manager. This also nicely shows how the separate cache object can be directly managed by the new pass manager. This analysis didn't have any direct tests and so I've added a printer pass and a boring test case. I chose to print the i1 value which is being assumed rather than the call to llvm.assume as that seems much more useful for testing... but suggestions on an even better printing strategy welcome. My main goal was to make sure things actually work. =] llvm-svn: 226868	2015-01-22 21:53:09 +00:00
Duncan P. N. Exon Smith	c7c851bb04	IR: Update references to temporaries before deleting During `MDNode::deleteTemporary()`, call `replaceAllUsesWith(nullptr)` to update all tracking references to `nullptr`. This fixes PR22280, where inverted destruction order between tracking references and the temporaries themselves caused a use-after-free in `LLParser`. An alternative fix would be to add an assertion that there are no users, and continue to fix inverted destruction order in clients (like `LLParser`), but instead I decided to make getting-teardown-right easy. (If someone disagrees let me know.) llvm-svn: 226866	2015-01-22 21:36:45 +00:00
Ramkumar Ramachandra	550e92d3f7	Intrinsics: introduce llvm_any_ty aka ValueType Any Specifically, gc.result benefits from this greatly. Instead of: gc.result.int.* gc.result.float.* gc.result.ptr.* ... We now have a gc.result.* that can specialize to literally any type. Differential Revision: http://reviews.llvm.org/D7020 llvm-svn: 226857	2015-01-22 20:14:38 +00:00
Reid Kleckner	ab481aa774	Revert "Don't remove a landing pad if the invoke requires a table entry." This reverts commit r176827. Björn Steinbrink pointed out that this didn't actually fix the bug (PR15555) it was attempting to fix. With this reverted, we can now remove landingpad cleanups that immediately resume unwinding, converting the invoke to a call. llvm-svn: 226850	2015-01-22 19:29:46 +00:00
Kevin Enderby	e13fcc2ba4	Add the option, -indirect-symbols, used with -macho to print the Mach-O indirect symbol table to llvm-objdump. llvm-svn: 226848	2015-01-22 18:55:27 +00:00
Sanjay Patel	73374b7f96	merge consecutive stores of extracted vector elements (PR21711) This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. That patch was checked in at r224611, but reverted at r225031 because it caused a failure outside of the regression tests. The cause of the crash was not recognizing consecutive stores that have mixed source values (loads and vector element extracts), so this patch adds a check to bail out if any store value is not coming from a vector element extract. This patch also refactors the shared logic of the constant source and vector extracted elements source cases into a helper function. Differential Revision: http://reviews.llvm.org/D6850 llvm-svn: 226845	2015-01-22 18:21:26 +00:00
David Blaikie	72a122ef32	Revert "PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site." The underlying bug has been fixed in r226736 so there's no need to workaround this anymore. This reverts commit r220923. llvm-svn: 226842	2015-01-22 17:49:59 +00:00
Rafael Espindola	3f5bcef6aa	[pr21886] Change MCJIT/ELF to support MSVC C++ mangled symbol. The ELF format is used on Windows by the MCJIT engine. Thus, on Windows, the ELFObjectWriter can encounter symbols mangled using the MS Visual Studio C++ name mangling. Symbols mangled using the MSVC C++ name mangling can legally have "@@@" as a substring. The EFLObjectWriter should not interpret the "@@@" substring as specifying GNU-style symbol versioning. The ELFObjectWriter therefore check for the MSVC C++ name mangling prefix which is either "?", "@?", "imp_?" or "imp_?@". llvm-svn: 226830	2015-01-22 14:20:45 +00:00
Michael Kuperstein	c63c75da0b	[DAGCombine] Produce better code for constant splats This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 Fixed recommit of r226811. llvm-svn: 226816	2015-01-22 13:07:28 +00:00
Michael Kuperstein	1c951c9953	Revert r226811, MSVC accepts code sane compilers don't. llvm-svn: 226814	2015-01-22 12:48:07 +00:00
Michael Kuperstein	91fb9ac13f	[DAGCombine] Produce better code for constant splats This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 llvm-svn: 226811	2015-01-22 12:37:23 +00:00
Elena Demikhovsky	4c1384e72f	Fixed a bug in type legalizer for masked load/store intrinsics. The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. llvm-svn: 226808	2015-01-22 12:07:59 +00:00
Elena Demikhovsky	cc330838cb	Fixed a bug in narrowing store operation. Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type, since the size of VT (1 bit) is not equal to its actual store size(8 bits). Added a test provided by David (dag@cray.com) llvm-svn: 226805	2015-01-22 09:39:08 +00:00
Sanjoy Das	4e36ac5e45	Fix crashes in IRCE caused by mismatched types There are places where the inductive range check elimination pass depends on two llvm::Values or llvm::SCEVs to be of the same llvm::Type when they do not need to be. This patch relaxes those restrictions (by bailing out of the optimization if the types mismatch), and adds test cases to trigger those paths. These issues were found by bootstrapping clang with IRCE running in the -O3 pass ordering. Differential Revision: http://reviews.llvm.org/D7082 llvm-svn: 226793	2015-01-22 08:29:18 +00:00
Elena Demikhovsky	837f1e39e9	Fixed a bug in masked load/store in reversed loop. Added a test. The bug was submitted to bugzilla: http://llvm.org/bugs/show_bug.cgi?id=22225 llvm-svn: 226791	2015-01-22 08:20:06 +00:00
Chandler Carruth	001ed18423	[canonicalize] Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available. Regardless of whether this particular type is good or bad, it ensures we don't get weird differences in generated code (and resulting performance) from "equivalent" patterns that happen to end up using a slightly different type. After some discussion on llvmdev it seems everyone generally likes this canonicalization. However, there may be some parts of LLVM that handle it poorly and need to be fixed. I have at least verified that this doesn't impede GVN and instcombine's store-to-load forwarding powers in any obvious cases. Subtle cases are exactly what we need te flush out if they remain. Also note that this IR pattern should already be hitting LLVM from Clang at least because it is exactly the IR which would be produced if you used memcpy to copy a pointer or floating point between memory instead of a variable. llvm-svn: 226781	2015-01-22 05:08:12 +00:00
Saleem Abdulrasool	47bffde92c	ARM: fail less catastrophically on invalid Windows input Windows supports a restricted set of relocations (compared to ARM ELF). In some cases, we may end up generating an unsupported relocation. This can occur with bad input to the assembler in particular (the frontend should never generate code that cannot be compiled). Generate an error rather than just aborting. The change in the API is driven by the desire to provide a slightly more helpful message for debugging purposes. llvm-svn: 226779	2015-01-22 04:03:32 +00:00
Reid Kleckner	82c69426ca	SEH: Finish writing the catch-all test case llvm-svn: 226768	2015-01-22 02:31:09 +00:00
Reid Kleckner	6b2145b9c9	Win64 SEH: Emit the constant 1 for catch-all into xdata llvm-svn: 226767	2015-01-22 02:27:44 +00:00
Sanjoy Das	0520741ff9	Make ScalarEvolution less aggressive with respect to no-wrap flags. ScalarEvolution currently lowers a subtraction recurrence to an add recurrence with the same no-wrap flags as the subtraction. This is incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y` and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch fixes the issue, and adds two test cases demonstrating the bug. Differential Revision: http://reviews.llvm.org/D7081 llvm-svn: 226755	2015-01-22 00:48:47 +00:00
Simon Pilgrim	c666c3a4b2	[X86][SSE] Missing SSE/AVX1 memory folding integer instructions Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1. The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests. Differential Revision: http://reviews.llvm.org/D7094 llvm-svn: 226745	2015-01-21 23:43:30 +00:00
Tim Northover	5b0e908c64	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N)) It can help with argument juggling on some targets, and is generally a good idea. llvm-svn: 226740	2015-01-21 23:17:19 +00:00
David Blaikie	5ab2c04898	DebugInfo: Use distinct inlinedAt MDLocations to avoid separate inlined calls being coalesced When two calls from the same MDLocation are inlined they currently get treated as one inlined function call (creating difficulty debugging, duplicate variables, etc). Clang worked around this by including column information on inline calls which doesn't address LTO inlining or calls to the same function from the same line and column (such as through a macro). It also didn't address ctor and member function calls. By making the inlinedAt locations distinct, every call site has an explicitly distinct location that cannot be coalesced with any other call. This can produce linearly (2x in the worst case where every call is inlined and the call instruction has a non-call instruction at the same location) more debug locations. Any increase beyond that are in cases where the Clang workaround was insufficient and the new scheme is creating necessary distinct nodes that were being erroneously coalesced previously. After this change to LLVM the incomplete workarounds in Clang. That should reduce the number of debug locations (in a build without column info, the default on Darwin, not the default on Linux) by not creating pseudo-distinct locations for every call to an inline function. (oh, and I made the inlined-at chain rebuilding iterative instead of recursive because I was having trouble wrapping my head around it the way it was - open to discussion on the right design for that function (including going back to a recursive solution)) llvm-svn: 226736	2015-01-21 22:57:29 +00:00
Matt Arsenault	ab16558365	R600: Add checks for urem/srem by a constant Make sure this uses the faster expansion using magic constants to avoid the full division path. llvm-svn: 226734	2015-01-21 22:56:15 +00:00
Simon Pilgrim	b377a5e42b	[X86][SSE] Added support for SSE3 lane duplication shuffle instructions This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds. Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles. Also adds a missing tablegen pattern for MOVDDUP. Differential Revision: http://reviews.llvm.org/D7042 llvm-svn: 226716	2015-01-21 22:44:35 +00:00
Matt Arsenault	58bdd73a01	R600: Add missing tests for i64 srem llvm-svn: 226713	2015-01-21 22:43:19 +00:00
Jonathan Roelofs	74e85c180f	Fix load-store optimizer on thumbv4t Thumbv4t does not have lo->lo copies other than MOVS, and that can't be predicated. So emit MOVS when needed and bail if there's a predicate. http://reviews.llvm.org/D6592 llvm-svn: 226711	2015-01-21 22:39:43 +00:00
George Burgess IV	54d138f1fd	Added test to cover the CFLAA bitset indexing bug. llvm-svn: 226710	2015-01-21 22:39:35 +00:00
David Majnemer	f4834fe3ed	InstCombine: Don't strip bitcasts off of callsites marked 'thunk' The return type of a thunk is meaningless, we just want the arguments and return value to be forwarded. llvm-svn: 226708	2015-01-21 22:32:04 +00:00
Matt Arsenault	3c5cd0f801	R600/SI: Custom lower fround This fixes it for SI. It also removes the pattern used previously for Evergreen for f32. I'm not sure if the the new R600 output is better or not, but it uses 1 fewer instructions if BFI is available. llvm-svn: 226682	2015-01-21 18:18:25 +00:00
Colin LeMahieu	eacab3ed87	[Hexagon] Converting multiply and accumulate with immediate intrinsics to patterns. llvm-svn: 226681	2015-01-21 18:13:15 +00:00
Ahmed Bougacha	aea688cba4	[X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal. Now that we can fully specify extload legality, we can declare them legal for the PMOVSX/PMOVZX instructions. This for instance enables a DAGCombine to fire on code such as (and (<zextload-equivalent> ...), <redundant mask>) to turn it into: (zextload ...) as seen in the testcase changes. There is one regression, in widen_load-2.ll: we're no longer able to do store-to-load forwarding with illegal extload memory types. This will be addressed separately. Differential Revision: http://reviews.llvm.org/D6533 llvm-svn: 226676	2015-01-21 17:07:06 +00:00
Tim Northover	c2963c8019	Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))" It hadn't gone through review yet, but was still on my local copy. This reverts commit r226663 llvm-svn: 226665	2015-01-21 15:48:52 +00:00
Tim Northover	4b020bd23b	AArch64: add backend option to reserve x18 (platform register) AAPCS64 says that it's up to the platform to specify whether x18 is reserved, and a first step on that way is to add a flag controlling it. From: Andrew Turner <andrew@fubar.geek.nz> llvm-svn: 226664	2015-01-21 15:43:31 +00:00
Tim Northover	c9cc73b336	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N)) llvm-svn: 226663	2015-01-21 15:43:28 +00:00
Michael Kuperstein	de439866fe	[x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses in x32 mode. llvm-svn: 226661	2015-01-21 14:44:05 +00:00
Alexander Potapenko	127e7cced0	Use a smaller pragma unroll threshold to reduce test execution time. When opt is compiled with AddressSanitizer it takes more than 30 seconds to unroll the loop in unroll_1M(). llvm-svn: 226660	2015-01-21 13:52:02 +00:00
Evgeniy Stepanov	309980065d	[msan] Update origin for the entire destination range on memory store. Previously we always stored 4 bytes of origin at the destination address even for 8-byte (and longer) stores. This should fix rare missing, or incorrect, origin stacks in MSan reports. llvm-svn: 226658	2015-01-21 13:21:31 +00:00
Jozef Kolek	07bbffd274	[mips][microMIPS] MicroMIPS 16-bit unconditional branch instruction B Implement microMIPS 16-bit unconditional branch instruction B. Implemented 16-bit microMIPS unconditional instruction has real name B16, and B is an alias which expands to either B16 or BEQ according to the rules: b 256 --> b16 256 # R_MICROMIPS_PC10_S1 b 12256 --> beq $zero, $zero, 12256 # R_MICROMIPS_PC16_S1 b label --> beq $zero, $zero, label # R_MICROMIPS_PC16_S1 Differential Revision: http://reviews.llvm.org/D3514 llvm-svn: 226657	2015-01-21 12:39:30 +00:00
Jozef Kolek	544ed14227	[mips][microMIPS] Implement ADDIUPC instruction Differential Revision: http://reviews.llvm.org/D6582 llvm-svn: 226656	2015-01-21 12:10:11 +00:00
Vladimir Medic	73fd86160c	[Mips][Disassembler]When disassembler meets load/store from coprocessor 2 instructions for mips r6 it crashes as the access to operands array is out of range. This patch adds dedicated decoder method that properly handles decoding of these instructions. llvm-svn: 226652	2015-01-21 10:47:36 +00:00
Craig Topper	39f463653a	[X86] Convert all the i8imm used by SSE and AVX instructions to u8imm. This makes the assembler check their size and removes a hack from the disassembler to avoid sign extending the immediate. llvm-svn: 226645	2015-01-21 08:15:54 +00:00
Craig Topper	8282659ccb	[x86] Add assembly parser bounds checking to the immediate value for cmpss/cmpsd/cmpps/cmppd. llvm-svn: 226642	2015-01-21 06:07:53 +00:00
Adrian Prantl	42d17b28f3	Make DIExpression::Verify() stricter by checking that the number of elements and the ordering is sane and cleanup the accessors. llvm-svn: 226627	2015-01-21 00:59:20 +00:00
Simon Pilgrim	c9bbf23dd5	[X86][AVX] Simplified diff between AVX1 and SSE42 fp stack folding tests. NFC. Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves. llvm-svn: 226623	2015-01-21 00:02:13 +00:00
Simon Pilgrim	0a0d5a1258	[X86][SSE] Added SSE/AVX1 integer stack folding tests. Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review. llvm-svn: 226622	2015-01-20 23:54:17 +00:00
Simon Pilgrim	b7181278f0	[X86][SSE] Added SSE fp stack folding tests. Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review. llvm-svn: 226621	2015-01-20 23:50:18 +00:00
Simon Pilgrim	e35020a95e	[X86][AVX] Renamed AVX1 fp stack folding tests. NFC. The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync. llvm-svn: 226620	2015-01-20 23:45:50 +00:00
Adrian Prantl	ce15a16bd9	DebugLocs without a scope should fail the verification. Follow-up to r226588. llvm-svn: 226616	2015-01-20 22:37:25 +00:00
Kevin Enderby	83c3448a11	For llvm-objdump, hook up existing options to work when using -macho (the Mach-O parser). llvm-svn: 226612	2015-01-20 21:47:46 +00:00
Colin LeMahieu	b11b24f79e	[Hexagon] Adding intrinsics for doubleword ALU operations. llvm-svn: 226606	2015-01-20 20:45:05 +00:00
Colin LeMahieu	4da4026191	[Hexagon] Removing unnecessary clutter in intrinsic tests. llvm-svn: 226602	2015-01-20 19:46:07 +00:00
Daniel Jasper	5a3e345e1b	Prevent binary-tree deterioration in sparse switch statements. This addresses part of llvm.org/PR22262. Specifically, it prevents considering the densities of sub-ranges that have fewer than TLI.getMinimumJumpTableEntries() elements. Those densities won't help jump tables. This is not a complete solution but works around the most pressing issue. Review: http://reviews.llvm.org/D7070 llvm-svn: 226600	2015-01-20 19:43:33 +00:00
Ramkumar Ramachandra	78c0138d51	[GC] Verify-pass void vararg functions in gc.statepoint With the appropriate Verifier changes, exactracting the result out of a statepoint wrapping a vararg function crashes. However, a void vararg function works fine: commit this first step. Differential Revision: http://reviews.llvm.org/D7071 llvm-svn: 226599	2015-01-20 19:42:46 +00:00
Adrian Prantl	5def96f7e9	Reapply: Teach SROA how to update debug info for fragmented variables. This reapplies r225379. ChangeLog: - The assertion that this commit previously ran into about the inability to handle indirect variables has since been removed and the backend can handle this now. - Testcases were upgrade to the new MDLocation format. - Instead of keeping a DebugDeclares map, we now use llvm::FindAllocaDbgDeclare(). Original commit message follows. Debug info: Teach SROA how to update debug info for fragmented variables. This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 226598	2015-01-20 19:42:22 +00:00
Tom Stellard	c7e82f2d5a	R600/SI: Fix simple-loop.ll test llvm-svn: 226596	2015-01-20 19:33:02 +00:00
Jozef Kolek	dc9fe311f2	Reverted revision 226577. llvm-svn: 226595	2015-01-20 19:29:28 +00:00
Tom Stellard	59a54d03fe	R600/SI: Add kill flag when copying scratch offset to a register This allows us to re-use the same register for the scratch offset when accessing large private arrays. llvm-svn: 226585	2015-01-20 17:49:45 +00:00
Tom Stellard	01ff7eb626	R600/SI: Don't store scratch buffer frame index in MUBUF offset field We don't have a good way of legalizing this if the frame index offset is more than the 12-bits, which is size of MUBUF's offset field, so now we store the frame index in the vaddr field. llvm-svn: 226584	2015-01-20 17:49:43 +00:00
Jozef Kolek	ca8263125b	[mips][microMIPS] MicroMIPS 16-bit unconditional branch instruction B Implement microMIPS 16-bit unconditional branch instruction B. Implemented 16-bit microMIPS unconditional instruction has real name B16, and B is an alias which expands to either B16 or BEQ according to the rules: b 256 --> b16 256 # R_MICROMIPS_PC10_S1 b 12256 --> beq $zero, $zero, 12256 # R_MICROMIPS_PC16_S1 b label --> beq $zero, $zero, label # R_MICROMIPS_PC16_S1 Differential Revision: http://reviews.llvm.org/D3514 llvm-svn: 226577	2015-01-20 16:45:27 +00:00
Kai Nacke	858f51b2b0	[mips] Add registers and ALL check prefix to octeon test case. No functional change. Reviewed by D. Sanders llvm-svn: 226574	2015-01-20 16:14:02 +00:00
Kai Nacke	84b6d0bc71	[mips] Add octeon branch instructions bbit0/bbit032/bbit1/bbit132 This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132. It also includes patterns for instruction selection and test cases. Reviewed by D. Sanders llvm-svn: 226573	2015-01-20 16:10:51 +00:00
Evgeniy Stepanov	7e7783512b	[msan] Optimize -msan-check-constant-shadow. The new code does not create new basic blocks in the case when shadow is a compile-time constant; it generates either an unconditional __msan_warning call or nothing instead. llvm-svn: 226569	2015-01-20 15:21:35 +00:00
Chandler Carruth	7cd55815b7	[PM] Port LoopInfo to the new pass manager, adding both a LoopAnalysis pass and a LoopPrinterPass with the expected associated wiring. I've added a RUN line to the only test case (!!!) we have that actually prints loops. Everything seems to be working. This is somewhat exciting as this is the first analysis using another analysis to go in for the new pass manager. =D I also believe it is the last analysis necessary for porting instcombine, but of course I may yet discover more. llvm-svn: 226560	2015-01-20 10:58:50 +00:00
Karthik Bhat	8fee134b62	Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain. This patch fixes 2 issues in reorderInputsAccordingToOpcode 1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases. 2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode. Thanks Michael for inputs and review. Review: http://reviews.llvm.org/D6677 llvm-svn: 226547	2015-01-20 06:11:00 +00:00
David Majnemer	4556f4a8ee	Bitcode: Don't create comdats when autoupgrading macho bitcode Don't infer COMDAT groups from older bitcode if the target is macho, it doesn't have COMDATs. llvm-svn: 226546	2015-01-20 05:58:07 +00:00
Frederic Riss	53e390879c	[dsymutil] Add the detected target triple to the debug map. It will be needed to instantiate the Target object that we will use to create all the MC objects for the dwarf emission. llvm-svn: 226525	2015-01-19 23:33:14 +00:00
Duncan P. N. Exon Smith	56c8d44827	AsmParser: Fix error location for missing fields llvm-svn: 226524	2015-01-19 23:32:36 +00:00
Simon Pilgrim	2ea397ac36	[X86][AVX] Missing AVX1 memory folding float instructions Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing). Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me. Differential Revision: http://reviews.llvm.org/D7055 llvm-svn: 226513	2015-01-19 22:40:45 +00:00
Rafael Espindola	a02b7738f7	Add r224985 back with fixes. The fixes are to note that AArch64 has additional restrictions on when local relocations can be used. In particular, ld64 requires that relocations to cstring/cfstrings use linker visible symbols. Original message: In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 226503	2015-01-19 21:11:14 +00:00
Colin LeMahieu	9cc2ac99d5	[Hexagon] Updating muxir/ri/ii intrinsics. Setting predicate registers as compatible with i32 rather than doing custom type conversion. llvm-svn: 226500	2015-01-19 20:31:18 +00:00
Colin LeMahieu	89f087ffd8	[Hexagon] Converting intrinsics combine imm/imm, simple shifts and extends. llvm-svn: 226483	2015-01-19 18:56:19 +00:00
Colin LeMahieu	c9d3d690e7	[Hexagon] Converting remaining ALU32/ALU intrinsics. llvm-svn: 226480	2015-01-19 18:33:58 +00:00
Colin LeMahieu	dc770a4bae	[Hexagon] Converting ALU32/ALU intrinsics to new patterns. llvm-svn: 226478	2015-01-19 18:22:19 +00:00
Adrian Prantl	e6b30cee57	Remove support for DIVariable's FlagIndirectVariable and expect frontends to use a DIExpression with a DW_OP_deref instead. This is not only a much more natural place for this informationl; there is also a technical reason: The FlagIndirectVariable is used to mark a variable that is turned into a reference by virtue of the calling convention; this happens for example to aggregate return values. The inliner, for example, may actually need to undo this indirection to correctly represent the value in its new context. This is impossible to implement because the DIVariable can't be safely modified. We can however safely construct a new DIExpression on the fly. llvm-svn: 226476	2015-01-19 17:57:29 +00:00
Greg Fitzgerald	91e9dd41fa	[AArch64] Implement GHC calling convention Original patch by Luke Iannini. Minor improvements and test added by Erik de Castro Lopo. Differential Revision: http://reviews.llvm.org/D6877 From: Erik de Castro Lopo <erikd@mega-nerd.com> llvm-svn: 226473	2015-01-19 17:40:05 +00:00
Colin LeMahieu	9b7e71b043	[Hexagon] Converting halfword to double accumulating multiply intrinsics. llvm-svn: 226472	2015-01-19 17:36:32 +00:00
Rafael Espindola	bd29c767dc	Produce errors when an assignment expression would use a common symbol. An assignment will produce a symbol with a given section and offset. There is no way to represent something like "1 byte after a common symbol". This matches the behavior of GNU as. Part of PR22217. llvm-svn: 226470	2015-01-19 17:30:24 +00:00
Bradley Smith	ae052a6630	[ARM] SSAT/USAT with an 'asr #32 ' shift should result in an undefined encoding rather than unpredictable llvm-svn: 226469	2015-01-19 16:37:17 +00:00
Bradley Smith	7561d7842c	[ARM] Fixup sign extend instruction availability w.r.t. DSP extension llvm-svn: 226468	2015-01-19 16:36:02 +00:00
Rafael Espindola	004c23be3b	Bring r226038 back. No change in this commit, but clang was changed to also produce trivial comdats when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226467	2015-01-19 15:16:06 +00:00
Michael Kuperstein	83ab484af5	[MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local This fixes PR21792. Differential Revision: http://reviews.llvm.org/D6823 llvm-svn: 226433	2015-01-19 07:30:47 +00:00
Hal Finkel	89328f88bf	[PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2 Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call instructions in order to represent the dependency on the TOC base pointer value. Restricting this to ELF V2, however, does not seem to make sense: calls under ELF V1 have the same dependence, and indirect calls have an r2 dependence just as direct ones. Make sure the dependence is noted for all calls under both ELF V1 and ELF V2. llvm-svn: 226432	2015-01-19 07:20:27 +00:00
Matt Arsenault	e51db749de	R600: Remove redundant test This is already covered in ftrunc.ll llvm-svn: 226412	2015-01-18 19:30:32 +00:00
Daniel Sanders	b006cbb88e	[mips] 'CHECK :' is not a valid check directive. Fixed. llvm-svn: 226409	2015-01-18 18:43:10 +00:00
Daniel Sanders	26a5237a9a	[mips] Make whitespace in disassembler tests more consistent. NFC. The tests for the ISA's should now be approximately diffable. That is, the output of 'diff valid-mips1.txt valid-mips2.txt' should be emit the lines for instructions that were added/removed to/from MIPS-I by MIPS-II. This doesn't work perfectly at the moment due to ordering differences but it should be close. llvm-svn: 226408	2015-01-18 18:38:36 +00:00
Daniel Sanders	60099a2678	[mips] Make whitespace of disassembler tests more consistent by removing blank lines. NFC. llvm-svn: 226407	2015-01-18 18:21:19 +00:00
Simon Pilgrim	c7e9e7d766	[X86][SSE] Added scalar min/max folding tests. NFC. llvm-svn: 226406	2015-01-18 18:06:23 +00:00
Simon Pilgrim	fe4662d567	[X86][SSE] Added float extract and xmm extract/insert stack folding tests. NFC. llvm-svn: 226405	2015-01-18 17:04:32 +00:00
Simon Pilgrim	f6df496d0c	[X86][SSE] Added scalar conversion stack folding tests. NFC. llvm-svn: 226404	2015-01-18 16:22:15 +00:00
Simon Pilgrim	5e48e7f807	AVX1 stack folding tests. NFC. Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added. I'll then move on to SSE tests and then the integer instructions. llvm-svn: 226400	2015-01-18 12:56:39 +00:00
Hal Finkel	e9b3a1a030	[PowerPC] Initial PPC64 calling-convention changes for fastcc The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is designed to work with both prototyped and non-prototyped/varargs functions. As a result, GPRs and stack space are allocated for every argument, even those that are passed in floating-point or vector registers. GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that do not have their address taken) to use the 'fast' calling convention. When functions are using the 'fast' calling convention, don't allocate GPRs for arguments passed in other types of registers, and don't allocate stack space for arguments passed in registers. Other changes for the fast calling convention may be added in the future. llvm-svn: 226399	2015-01-18 12:08:47 +00:00
Hal Finkel	470187b350	[PowerPC] Don't list R11 as a patchpoint scratch register R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is reserved for use as an "environment pointer" for compilation models that require such a thing. We don't, we also don't need a second scratch register, and because we support only "local" patchpoint call targets, we might as well let R11 be used for anyregcc patchpoints. llvm-svn: 226369	2015-01-17 03:57:34 +00:00
Mehdi Amini	e119c6afaf	Improve DAG combine pass on certain IR vector patterns Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 226360	2015-01-17 01:35:56 +00:00
Kevin Enderby	b88ab47be3	Change the test case for llvm-objdump’s -archive-headers option to not check the size while I once again try to figure out why only the clang-cmake-armv7-a15-full bot is getting that value wrong. llvm-svn: 226345	2015-01-16 23:29:07 +00:00
Matt Arsenault	ffd1bd1d5d	R600: Clean up floor tests These were using different naming schemes, not using multiple check prefixes and not using -LABEL. llvm-svn: 226333	2015-01-16 22:11:00 +00:00
Kevin Enderby	2473644cb2	Fix the Archive::Child::getRawSize() method used by llvm-objdump’s -archive-headers option and tweak its use in llvm-objdump. Add back the test case for the -archive-headers option. llvm-svn: 226332	2015-01-16 22:10:36 +00:00
Colin LeMahieu	b0ae008450	[Hexagon] Converting halfword to doubleword multiply intrinsics. llvm-svn: 226326	2015-01-16 21:41:57 +00:00
Colin LeMahieu	3f0206ff73	[Hexagon] Converting accumulating halfword multiply intrinsics to patterns. llvm-svn: 226324	2015-01-16 21:36:34 +00:00
Colin LeMahieu	6559af4ce0	[Hexagon] Beginning converting intrinsics to patterns instead of duplicated definitions. Converting halfword multiply intrinsics. llvm-svn: 226318	2015-01-16 20:38:54 +00:00
Adam Nemet	9fe8c32290	[AVX512] Add intrinsics for masked aligned FP loads and stores Similar to the unaligned cases. Test was generated with update_llc_test_checks.py. Part of <rdar://problem/17688758> llvm-svn: 226296	2015-01-16 18:50:09 +00:00
Adam Nemet	553d4195e9	[AVX512] Remove trailing whitespaces in this test llvm-svn: 226295	2015-01-16 18:50:07 +00:00
Duncan P. N. Exon Smith	97ed3e1e77	IR: Allow 16-bits for column info Raise the limit for column information from 8 bits to 16 bits. llvm-svn: 226291	2015-01-16 17:33:08 +00:00
Andrea Di Biagio	d2869df8d3	[X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0. This patch disables target specific combine on X86ISD::INSERTPS dag nodes if optlevel is CodeGenOpt::None. The backend currently implements a target specific combine rule that converts a vector load used by an INSERTPS dag node into a scalar load plus a scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of two instructions (i.e. a vector load plus INSERTPSrr). However, the existing target combine rule on INSERTPS nodes only works under the assumption that ISel will always be able to match an INSERTPSrm. This is not true in general at -O0, since the backend only allows folding a load into the memory operand of an instruction if the optimization level is not CodeGenOpt::None. In the example below: // __m128 test(__m128 a, __m128 b) { __m128 c = _mm_insert_ps(a, b, 1 << 6); return c; } // Before this patch, at -O0, the backend would have canonicalized the load to 'b' into a scalar load plus scalar_to_vector. Later on, ISel would have selected an INSERTPSrr leaving the insertps mask in an inconsistent state: movss 4(%rdi), %xmm1 insertps $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3]. With this patch, the backend avoids folding the vector load into the operand of the INSERTPS. The new codegen at -O0 is: movaps (%rdi), %xmm1 insertps $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3]. llvm-svn: 226277	2015-01-16 14:55:26 +00:00
Simon Pilgrim	f787eeaa8f	[X86] Refactored stack memory folding tests to explicitly force register spilling The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations. This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach). More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point. Differential Revision: http://reviews.llvm.org/D6932 llvm-svn: 226264	2015-01-16 09:32:54 +00:00
Timur Iskhodzhanov	6d120e1a54	Revert r226242 - Revert Revert Don't create new comdats in CodeGen This breaks AddressSanitizer (ninja check-asan) on Windows llvm-svn: 226251	2015-01-16 08:38:45 +00:00
Filipe Cabecinhas	47d5b20f32	Use report_fatal_error instead of llvm_unreachable, so we don't crash on user input llvm-svn: 226248	2015-01-16 04:54:12 +00:00
Hal Finkel	04316a019c	[PowerPC] Adjust PatchPoints for ppc64le Bill Schmidt pointed out that some adjustments would be needed to properly support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available as a scratch register, so we need to use R12. R12 is also available under ELF V1, so to maintain consistency, I flipped the order to make R12 the first scratch register in the array under both ABIs. llvm-svn: 226247	2015-01-16 04:40:58 +00:00
Mehdi Amini	a1e86a9849	Fix Reassociate handling of constant in presence of undef float http://reviews.llvm.org/D6993 llvm-svn: 226245	2015-01-16 03:00:58 +00:00
Rafael Espindola	f1394d41f0	Revert "Revert Don't create new comdats in CodeGen" This reverts commit r226173, adding r226038 back. No change in this commit, but clang was changed to also produce trivial comdats for costructors, destructors and vtables when needed. Original message: Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226242	2015-01-16 02:22:55 +00:00
Kevin Enderby	454bb27b57	Work around to get the build bot clang-cmake-armv7-a15-full green by removing the macho-archive-headers.test added with r226228 that it is failing on for now while I try to figure out what is going on. llvm-svn: 226241	2015-01-16 02:08:11 +00:00
Kevin Enderby	769159a05f	Another attempt to fix the build bot clang-cmake-armv7-a15-full failing on the macho-archive-headers.test added with r226228. llvm-svn: 226239	2015-01-16 01:09:54 +00:00
Sanjoy Das	8ce28789d0	Add a new pass "inductive range check elimination" IRCE eliminates range checks of the form 0 <= A * I + B < Length by splitting a loop's iteration space into three segments in a way that the check is completely redundant in the middle segment. As an example, IRCE will convert len = < known positive > for (i = 0; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } to len = < known positive > limit = smin(n, len) // no first segment for (i = 0; i < limit; i++) { if (0 <= i && i < len) { // this check is fully redundant do_something(); } else { throw_out_of_bounds(); } } for (i = limit; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } IRCE can deal with multiple range checks in the same loop (it takes the intersection of the ranges that will make each of them redundant individually). Currently IRCE does not do any profitability analysis. That is a TODO. Please note that the status of this pass is experimental, and it is not part of any default pass pipeline. Having said that, I will love to get feedback and general input from people interested in trying this out. This pass was originally r226201. It was reverted because it used C++ features not supported by MSVC 2012. Differential Revision: http://reviews.llvm.org/D6693 llvm-svn: 226238	2015-01-16 01:03:22 +00:00
Matt Arsenault	5500d52bba	R600/SI: Add patterns for v_cvt_{flr\|rpi}_i32_f32 llvm-svn: 226230	2015-01-15 23:58:35 +00:00
Filipe Cabecinhas	f9b20bcc5c	Fix edge case when Start overflowed in 32 bit mode llvm-svn: 226229	2015-01-15 23:50:44 +00:00
Kevin Enderby	c8e3a999b3	Add the option, -archive-headers, used with -macho to print the Mach-O archive headers to llvm-objdump. llvm-svn: 226228	2015-01-15 23:19:11 +00:00
Matt Arsenault	2f04c34f62	R600/SI: Fix trailing comma with modifiers Instructions with 1 operand can still use source modifiers, so make sure we don't print an extra comma afterwards. llvm-svn: 226226	2015-01-15 23:17:03 +00:00
Colin LeMahieu	ef6291ec41	[Hexagon] Adding new-value store and bit reverse instructions. llvm-svn: 226224	2015-01-15 23:10:29 +00:00
Filipe Cabecinhas	f8083bfc50	Report fatal errors instead of segfaulting/asserting on a few invalid accesses while reading MachO files. Summary: Shift an older “invalid file” test to get a consistent naming for these tests. Bugs found by afl-fuzz Reviewers: rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6945 llvm-svn: 226219	2015-01-15 22:52:38 +00:00
Sanjoy Das	618e939258	Revert r226201 (Add a new pass "inductive range check elimination") The change used C++11 features not supported by MSVC 2012. I will fix the change to use things supported MSVC 2012 and recommit shortly. llvm-svn: 226216	2015-01-15 22:18:10 +00:00
Hal Finkel	dcf8b14857	[PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the POWER7, A2 and earlier cores) are really pointers to a function descriptor, a structure with three pointers: the actual pointer to the code to which to jump, the pointer to the TOC needed by the callee, and an environment pointer. We used to chain these loads, and make them opaque to the rest of the optimizer, so that they'd always occur directly before the call. This is not necessary, and in fact, highly suboptimal on embedded cores. Once the function pointer is known, the loads can be performed ahead of time; in fact, they can be hoisted out of loops. Now these function descriptors are almost always generated by the linker, and thus the contents of the descriptors are invariant. As a result, by default, we'll mark the associated loads as invariant (allowing them to be hoisted out of loops). I've added a target feature to turn this off, however, just in case someone needs that option (constructing an on-stack descriptor, casting it to a function pointer, and then calling it cannot be well-defined C/C++ code, but I can imagine some JIT-compilation system doing so). Consider this simple test: $ cat call.c typedef void (fp)(); void bar(fp x) { for (int i = 0; i < 1600000000; ++i) x(); } $ cat main.c typedef void (fp)(); void bar(fp x); void foo() {} int main() { bar(foo); } On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads as invariant brings the execution time down to ~8 seconds from ~32 seconds with the loads in the loop. The difference on the POWER7 is smaller. Compiling with: gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2] clang -O3 -mcpu=native call.c main.c : ~5.3 seconds clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds (looks like we'd benefit from additional loop unrolling here, as a first guess, because this is faster with the extra loads) The -mno-invariant-function-descriptors will be added to Clang shortly. llvm-svn: 226207	2015-01-15 21:17:34 +00:00
Colin LeMahieu	ad558c9627	[Hexagon] Updating indexed load-extend patterns and changing test to new expected output. llvm-svn: 226206	2015-01-15 21:07:52 +00:00
Sanjoy Das	a7eb1a0b3d	Add a new pass "inductive range check elimination" IRCE eliminates range checks of the form 0 <= A * I + B < Length by splitting a loop's iteration space into three segments in a way that the check is completely redundant in the middle segment. As an example, IRCE will convert len = < known positive > for (i = 0; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } to len = < known positive > limit = smin(n, len) // no first segment for (i = 0; i < limit; i++) { if (0 <= i && i < len) { // this check is fully redundant do_something(); } else { throw_out_of_bounds(); } } for (i = limit; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } IRCE can deal with multiple range checks in the same loop (it takes the intersection of the ranges that will make each of them redundant individually). Currently IRCE does not do any profitability analysis. That is a TODO. Please note that the status of this pass is experimental, and it is not part of any default pass pipeline. Having said that, I will love to get feedback and general input from people interested in trying this out. Differential Revision: http://reviews.llvm.org/D6693 llvm-svn: 226201	2015-01-15 20:45:46 +00:00
Hal Finkel	d48111840e	Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"" Reapply r226071 with fixes. Two fixes: 1. We need to manually remove the old and create the new 'deaf defs' associated with physical register definitions when we move the definition of the physical register from the copy point to the point of the original vreg def. This problem was picked up by the machinstr verifier, and could trigger a verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've turned on the verifier in the tests. 2. When moving the def point of the phys reg up, we need to make sure that it is neither defined nor read in between the two instructions. We don't, however, extend the live ranges of phys reg defs to cover uses, so just checking for live-range overlap between the pair interval and the phys reg aliases won't pick up reads. As a result, we manually iterate over the range and check for reads. A test soon to be committed to the PowerPC backend will test this change. Original commit message: [RegisterCoalescer] Remove copies to reserved registers This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying from a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226200	2015-01-15 20:32:09 +00:00
Matt Arsenault	1851a2058d	R600/SI: Improve fpext / fptrunc test coverage llvm-svn: 226197	2015-01-15 19:39:42 +00:00
Colin LeMahieu	14474ecfe7	[Hexagon] Removing old versions of vsplice, valign, cl0, ct0 and updating references to new versions. llvm-svn: 226194	2015-01-15 19:28:32 +00:00
Marek Olsak	6dc07f6738	R600/SI: Use 64-bit encoding by default for opcodes that are VOP3-only on VI llvm-svn: 226190	2015-01-15 18:43:01 +00:00
Colin LeMahieu	39bcb9a1e4	[Hexagon] Adding vmux instruction. Removing old transfer instructions and updating references. llvm-svn: 226184	2015-01-15 18:16:00 +00:00
Ramkumar Ramachandra	6658c23924	statepoint tests: use statepoint-example gc Mechanical conversion of statepoint tests to use the example-statepoint gc. llvm-svn: 226183	2015-01-15 18:10:44 +00:00
Joerg Sonnenberger	76e84ed36b	Support @PLT loads on 32bit x86. llvm-svn: 226182	2015-01-15 17:59:02 +00:00
Colin LeMahieu	55264a56e6	[Hexagon] Deleting old float comparison instruction and updating references to new ones. llvm-svn: 226179	2015-01-15 17:28:14 +00:00
Colin LeMahieu	6ce6625967	[Hexagon] Replacing old fadd/fsub instructions and updating references. llvm-svn: 226176	2015-01-15 16:30:07 +00:00
Timur Iskhodzhanov	7b5eababde	Revert Don't create new comdats in CodeGen It breaks AddressSanitizer on Windows. llvm-svn: 226173	2015-01-15 16:14:34 +00:00
Daniel Sanders	ebcb17dfb5	[mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6. Summary: The patterns intended for the SETLE node were actually matching the SETLT node. Reviewers: atanasyan, sstankovic, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6997 llvm-svn: 226171	2015-01-15 15:41:03 +00:00
Vladimir Medic	68db5bdb63	Add disassembler tests for mips64r6 platform. There are no functional changes. llvm-svn: 226166	2015-01-15 14:18:12 +00:00
Vladimir Medic	d7462ab6c7	Add disassembler tests for mips32r6 platform. There are no functional changes. llvm-svn: 226165	2015-01-15 14:11:38 +00:00
Vladimir Medic	4f5cb92c60	Add disassembler tests for mips64r2 platform. There are no functional changes. llvm-svn: 226164	2015-01-15 14:06:34 +00:00
Chandler Carruth	0a49f1bfc1	[PM] Port TargetLibraryInfo to the new pass manager, provided by the TargetLibraryAnalysis pass. There are actually no direct tests of this already in the tree. I've added the most basic test that the pass manager bits themselves work, and the TLI object produced will be tested by an upcoming patches as they port passes which rely on TLI. This is starting to point out the awkwardness of the invalidate API -- it seems poorly fitting on the result object. I suspect I will change it to live on the analysis instead, but that's not for this change, and I'd rather have a few more passes ported in order to have more experience with how this plays out. I believe there is only one more analysis required in order to start porting instcombine. =] llvm-svn: 226160	2015-01-15 11:39:46 +00:00
Vladimir Medic	df5781c7e5	Add disassembler tests for mips64 platform. There are no functional changes. llvm-svn: 226151	2015-01-15 08:50:20 +00:00
Hal Finkel	0d21c2bfbe	Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers" Reverting this while I investigate some bad behavior this is causing. As a possibly-related issue, adding -verify-machineinstrs to one of the test cases now fails because of this change: llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs * Bad machine code: No instruction at def index * - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS Valno #3 is defined at 624r * Bad machine code: Live segment doesn't end at a valid instruction * - function: foo - basic block: BB#0 return (0x10007e21f10) [0B;736B) - liverange: [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78 4r,784d:0) 0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r - register: %DS [624r,624d:3) LLVM ERROR: Found 2 machine code errors. where 624r corresponds exactly to the interval combining change: 624B %RSP<def> = COPY %vreg16; GR64:%vreg16 Considering merging %vreg16 with %RSP RHS = %vreg16 [608r,624r:0) 0@608r updated: 608B %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1] Success: %vreg16 -> %RSP Result = %RSP llvm-svn: 226086	2015-01-15 03:08:59 +00:00
Sanjoy Das	4b5a4b2975	Fix PR22222 The bug was introduced in r225282. r225282 assumed that sub X, Y is the same as add X, -Y. This is not correct if we are going to upgrade the sub to sub nuw. This change fixes the issue by making the optimization ignore sub instructions. Differential Revision: http://reviews.llvm.org/D6979 llvm-svn: 226075	2015-01-15 01:46:09 +00:00
Hal Finkel	a919bf8508	[RegisterCoalescer] Remove copies to reserved registers This allows the RegisterCoalescer to join "non-flipped" range pairs with a physical destination register -- which allows the RegisterCoalescer to remove copies like this: <vreg> = something (maybe a load, for example) ... (things that don't use PHYSREG) PHYSREG = COPY <vreg> (with all of the restrictions normally applied by the RegisterCoalescer: having compatible register classes, etc. ) Previously, the RegisterCoalescer handled only the opposite case (copying from a physical register). I don't handle the problem fully here, but try to get the common case where there is only one use of <vreg> (the COPY). An upcoming commit to the PowerPC backend will make this pattern much more common on PPC64/ELF systems. llvm-svn: 226071	2015-01-15 01:25:28 +00:00
Hal Finkel	28c9c891ec	[PowerPC] Add assembler support for mcrfs and friends Fill out our support for the floating-point status and control register instructions (mcrfs and friends). As it turns out, these are necessary for compiling src/test/harness_fp.h in TBB for PowerPC. Thanks to Raf Schietekat for reporting the issue! llvm-svn: 226070	2015-01-15 01:00:53 +00:00
Richard Smith	cd37c22fb7	For PR21145: recognise a builtin call to a known deallocation function even if it's defined in the current module. Clang generates this situation for the C++14 sized deallocation functions, because it generates a weak definition in case one isn't provided by the C++ runtime library. llvm-svn: 226069	2015-01-15 01:00:33 +00:00
Ramkumar Ramachandra	3cbd3c16ec	[GC] CodeGenPrep transform: simplify offsetable relocate The transform is somewhat involved, but the basic idea is simple: find derived pointers that have been offset from the base pointer using gep and replace the relocate of the derived pointer with a gep to the relocated base pointer (with the same offset). llvm-svn: 226060	2015-01-14 23:27:07 +00:00
Philip Reames	a4eb622240	getMangledTypeStr: clarify how it mangles types, and add tests "Write a set of tests that show how name mangling is done for overloaded intrinsics." These happen to use gc.relocates to exercise the codepath in question, but is not a GC specific test. Patch by: artagnon@gmail.com Differential Revision: http://reviews.llvm.org/D6915 llvm-svn: 226056	2015-01-14 23:05:17 +00:00
Duncan P. N. Exon Smith	4a5feedcaa	IR: Move MDLocation into place This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048	2015-01-14 22:27:36 +00:00
Duncan P. N. Exon Smith	c9a6b68ef2	IR: Always print MDLocation line Print `MDLocation`'s `line` field even when it's 0. llvm-svn: 226046	2015-01-14 22:14:26 +00:00
Rafael Espindola	80f6c3cd98	Don't create new comdats in CodeGen. This patch stops the implicit creation of comdats during codegen. Clang now sets the comdat explicitly when it is required. With this patch clang and gcc now produce the same result in pr19848. llvm-svn: 226038	2015-01-14 20:55:48 +00:00
Rafael Espindola	001c2e28c6	Add a test that would have found the issue with r225644. llvm-svn: 226035	2015-01-14 20:24:46 +00:00
Chandler Carruth	ba862f13cb	[MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement. Some benchmarks have shown that this could lead to a potential performance benefit, and so adding some flags to try to help measure the difference. A possible explanation. In diamond-shaped CFGs (A followed by either B or C both followed by D), putting B and C both in between A and D leads to the code being less dense than it could be. Always either B or C have to be skipped increasing the chance of cache misses etc. Moving either B or C to after D might be beneficial on average. In the long run, but we should probably do a better job of analyzing the basic block and branch probabilities to move the correct one of B or C to after D. But even if we don't use this in the long run, it is a good baseline for benchmarking. Original patch authored by Daniel Jasper with test tweaks and a second flag added by me. Differential Revision: http://reviews.llvm.org/D6969 llvm-svn: 226034	2015-01-14 20:19:29 +00:00
Bill Schmidt	6d9693229c	[PPC64] Add support for the ICBT instruction on POWER8. Patch by Kit Barton. Support for the ICBT instruction is currently present, but limited to embedded processors. This change adds a new FeatureICBT that can be used to identify whether the ICBT instruction is available on a specific processor. Two new tests are added: * Positive test to ensure the icbt instruction is present when using -mcpu=pwr8 * Negative test to ensure the icbt instruction is not generated when using -mcpu=pwr7 Both test cases use the Prefetch opcode in LLVM. They are based on the ppc64-prefetch.ll test case. llvm-svn: 226033	2015-01-14 20:17:10 +00:00
Rafael Espindola	d5fe70d054	Fix linking of shared libraries. In shared libraries the plugin can see non-weak declarations that are still undefined. llvm-svn: 226031	2015-01-14 20:08:46 +00:00
Rafael Espindola	75b9d612a7	Fix handling of extern_weak. This was broken by r225983. llvm-svn: 226026	2015-01-14 19:43:32 +00:00
David Majnemer	b89ab3fd88	InstCombine: Don't take A-B<0 into A<B if A-B has other uses This fixes PR22226. llvm-svn: 226023	2015-01-14 19:26:56 +00:00
Rafael Espindola	24f46a1c22	Revert "Add r224985 back with two fixes." This reverts commit r225644 while I debug a regression. llvm-svn: 226022	2015-01-14 19:07:23 +00:00
Rafael Espindola	ee0c3b182d	Add support for comdats with names larger than 256 characters. llvm-svn: 226012	2015-01-14 18:25:45 +00:00
Olivier Sallenave	02920d23af	Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding. llvm-svn: 225987	2015-01-14 15:36:28 +00:00
Rafael Espindola	ae29621e74	Handle a symbol being undefined. This can happen if: * It is present in a comdat in one file. * It is not present in the comdat of the file that is kept. * Is is not used. This should fix the LTO boostrap. Thanks to Takumi NAKAMURA for setting up the bot! llvm-svn: 225983	2015-01-14 13:53:50 +00:00
Vladimir Medic	05f989c2b2	Add disassembler tests for mips32r2 platform. There are no functional changes. llvm-svn: 225980	2015-01-14 11:35:22 +00:00
Jyoti Allur	52ab477f16	Correct POP handling for v7m llvm-svn: 225972	2015-01-14 10:48:16 +00:00
Chandler Carruth	354bc97f39	[PM] Port domtree to the new pass manager (at last). This adds the domtree analysis to the new pass manager. The analysis returns the same DominatorTree result entity used by the old pass manager and essentially all of the code is shared. We just have different boilerplate for running and printing the analysis. I've converted one test to run in both modes just to make sure this is exercised while both are live in the tree. llvm-svn: 225969	2015-01-14 10:19:28 +00:00
Kai Nacke	8d1e1991a8	[mips] Refine octeon instructions seq/seqi/sne/snei This commit refines the pattern for the octeon seq/seqi/sne/snei instructions. The target register is set to 0 or 1 according to the result of the comparison. In C, this is something like rd = (unsigned long)(rs == rt) This commit adds a zext to bring the result to i64. With this change the instruction is selected for this type of code. (gcc produces the same code for the above C code.) llvm-svn: 225968	2015-01-14 10:19:09 +00:00
Vladimir Medic	b4ee77ba8a	Add disassembler tests for mips32r2 platform. There are no functional changes. llvm-svn: 225967	2015-01-14 10:18:56 +00:00
Brad Smith	24327cda0b	Use the integrated assembler by default on SPARC. llvm-svn: 225957	2015-01-14 07:53:39 +00:00
JF Bastien	6c7aa853bb	Revert "Insert random noops to increase security against ROP attacks (llvm)" This reverts commit: http://reviews.llvm.org/D3392 llvm-svn: 225948	2015-01-14 05:24:33 +00:00
Saleem Abdulrasool	85b2a0e385	X86: validate 'int' instruction The int instruction takes as an operand an 8-bit immediate value. Validate that the input is valid rather than silently truncating the value. llvm-svn: 225941	2015-01-14 05:10:21 +00:00
NAKAMURA Takumi	dbaf514823	Disable a couple of tests, CodeGen/X86/noop-insert.ll and CodeGen/X86/noop-insert-percentage.ll, in r225908, to unbreak tests. llvm-svn: 225940	2015-01-14 04:21:33 +00:00
Chandler Carruth	801414c054	[dom] Add a basic dominator tree test. Correct, we have zero basic testing of the dominator tree in the regression test suite. There is a single test that even prints it out, and that test only checks a single line of the output. There are a handful of tests that check post dominators, but all of those are looking for bugs rather than just exercising the basic machinery. This test is super boring and unexciting. But hey, it's something. I needed there to be something so I could switch the basic test to run with both the old and new pass manager. llvm-svn: 225936	2015-01-14 03:34:55 +00:00
Tim Northover	dd963c2722	ARM: add test for crc32 instructions in CodeGen. Somehow we seem to have ended up without any actual tests of the CodeGen side. Easy enough to fix. llvm-svn: 225930	2015-01-14 01:43:33 +00:00
Hal Finkel	794dabb4dc	[PowerPC] Fix the noop-insert test The form of nops used is CPU-specific (some CPUs, such as the POWER7, have special group-terminating nops). We probably want a different callback for this kind of nop insertion (something more like MCAsmBackend::writeNopData), or for PPC to use a different mechanism for scheduling nops, but this will stop the test from failing for now. llvm-svn: 225928	2015-01-14 01:37:21 +00:00
Matt Arsenault	d3aa8b3a41	R600/SI: Remove some redudant load testcases. This reduces coverage for Evergreen, since the more complete tests have those run lines disabled. llvm-svn: 225927	2015-01-14 01:35:26 +00:00
Matt Arsenault	424a0025ca	R600/SI: Fix bad code with unaligned byte vector loads Don't do the v4i8 -> v4f32 combine if the load will need to be expanded due to alignment. This stops adding instructions to repack into a single register that the v_cvt_ubyteN_f32 instructions read. llvm-svn: 225926	2015-01-14 01:35:22 +00:00
Matt Arsenault	22a9f67443	Implement new way of expanding extloads. Now that the source and destination types can be specified, allow doing an expansion that doesn't use an EXTLOAD of the result type. Try to do a legal extload to an intermediate type and extend that if possible. This generalizes the special case custom lowering of extloads R600 has been using to work around this problem. This also happens to fix a bug that would incorrectly use more aligned loads than should be used. llvm-svn: 225925	2015-01-14 01:35:17 +00:00
Duncan P. N. Exon Smith	2bfbe34846	Utils: Handle remapping distinct MDLocations Part of PR21433. llvm-svn: 225921	2015-01-14 01:29:32 +00:00
Duncan P. N. Exon Smith	e0ad6d2442	Utils: Add mapping for uniqued MDLocations Still doesn't handle distinct ones. Part of PR21433. llvm-svn: 225914	2015-01-14 01:20:27 +00:00
Hal Finkel	a11b7ea471	Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"" This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs. These problems caused the original regression tests to assert/segfault on many (but not all) systems. Original commit message: This commit does two things: 1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them). 2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases. One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it). StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment! llvm-svn: 225909	2015-01-14 01:07:51 +00:00
JF Bastien	c2f3b58bb0	Insert random noops to increase security against ROP attacks (llvm) A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks. Command line options: -noop-insertion // Enable noop insertion. -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion) -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions. In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future). -fdiversify This is the llvm part of the patch. clang part: D3393 http://reviews.llvm.org/D3392 Patch by Stephen Crane (@rinon) llvm-svn: 225908	2015-01-14 01:07:26 +00:00
Reid Kleckner	b190c8f871	CodeGen support for x86_64 SEH catch handlers in LLVM This adds handling for ExceptionHandling::MSVC, used by the x86_64-pc-windows-msvc triple. It assumes that filter functions have already been outlined in either the frontend or the backend. Filter functions are used in place of the landingpad catch clause type info operands. In catch clause order, the first filter to return true will catch the exception. The C specific handler table expects the landing pad to be split into one block per handler, but LLVM IR uses a single landing pad for all possible unwind actions. This patch papers over the mismatch by synthesizing single instruction BBs for every catch clause to fill in the EH selector that the landing pad block expects. Missing functionality: - Accessing data in the parent frame from outlined filters - Cleanups (from __finally) are unsupported, as they will require outlining and parent frame access - Filter clauses are unsupported, as there's no clear analogue in SEH In other words, this is the minimal set of changes needed to write IR to catch arbitrary exceptions and resume normal execution. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6300 llvm-svn: 225904	2015-01-14 01:05:27 +00:00
Ahmed Bougacha	284d7745d1	[SimplifyLibCalls] Don't try to simplify indirect calls. It turns out, all callsites of the simplifier are guarded by a check for CallInst::getCalledFunction (i.e., to make sure the callee is direct). This check wasn't done when trying to further optimize a simplified fortified libcall, introduced by a refactoring in r225640. Fix that, add a testcase, and document the requirement. llvm-svn: 225895	2015-01-14 00:55:05 +00:00
Adrian Prantl	523d44398e	Debug Info: Move the complex expression handling (=the remainder) of emitDebugLocValue() into DwarfExpression. Ought to be NFC, but it actually uncovered a bug in the debug-loc-asan.ll testcase. The testcase checks that the address of variable "y" is stored at [RSP+16], which also lines up with the comment. It also check(ed) that the value of "y" is stored in RDI before that, but that is actually incorrect, since RDI is the very value that is stored in [RSP+16]. Here's the assembler output: movb 2147450880(%rcx), %r8b #DEBUG_VALUE: bar:y <- RDI cmpb $0, %r8b movq %rax, 32(%rsp) # 8-byte Spill movq %rsi, 24(%rsp) # 8-byte Spill movq %rdi, 16(%rsp) # 8-byte Spill .Ltmp3: #DEBUG_VALUE: bar:y <- [RSP+16] Fixed the comment to spell out the correct register and the check to expect an address rather than a value. Note that the range that is emitted for the RDI location was and is still wrong, it claims to begin at the function prologue, but really it should start where RDI is first assigned. llvm-svn: 225851	2015-01-13 23:39:11 +00:00
Adam Nemet	0795f8999d	[AVX512] Add 16x32 unpck tests as well Forgot this from r225838. llvm-svn: 225850	2015-01-13 23:27:55 +00:00
Chandler Carruth	edb0b67c45	[PM] Remove the defunt CGSCC-specific debug flag. Even before I sunk the debug flag into the opt tool this had been made obsolete by factoring the pass and analysis managers into a single set of templates that all used the core flag. No functionality changed here. llvm-svn: 225842	2015-01-13 22:45:13 +00:00
Adam Nemet	847eb51105	Fix function names in tests from r225838. llvm-svn: 225840	2015-01-13 22:40:15 +00:00
Adam Nemet	1859ba1d79	[AVX512] Unpack support in new shuffle lowering This now handles both 32 and 64-bit element sizes. In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by Chandler's update_llc_test_checks.py. Part of <rdar://problem/17688758> llvm-svn: 225838	2015-01-13 22:20:18 +00:00
Duncan P. N. Exon Smith	85eaac222d	AsmParser/Bitcode: Add support for MDLocation This adds assembly and bitcode support for `MDLocation`. The assembly side is rather big, since this is the first `MDNode` subclass (that isn't `MDTuple`). Part of PR21433. (If you're wondering where the mountains of testcase updates are, we don't need them until I update `DILocation` and `DebugLoc` to actually use this class.) llvm-svn: 225830	2015-01-13 21:10:44 +00:00
Matt Arsenault	4cc5aca737	R600: Implement getRsqrtEstimate Only do for f32 since I'm unclear on both what this is expecting for the refinement steps in terms of accuracy, and what f64 instruction actually provides. llvm-svn: 225827	2015-01-13 20:53:18 +00:00
Matt Arsenault	ef34769c29	R600: Make cttz / ctlz cheap to speculate Speculating things is generally good. SI+ has instructions for these for 32-bit values. This is still probably better even with the expansion for 64-bit values, although it is odd that this callback doesn't have the size as a parameter. llvm-svn: 225822	2015-01-13 19:46:48 +00:00
Ulrich Weigand	4155c036c2	Use the integrated assembler as default on SystemZ This was already done in clang, this commit now uses the integrated assembler as default when using LLVM tools directly. A number of test cases deliberately using an invalid instruction in inline asm now have to use -no-integrated-as. llvm-svn: 225820	2015-01-13 19:45:16 +00:00
Ulrich Weigand	3ec182626d	Use the integrated assembler as default on PowerPC This was already done in clang, this commit now uses the integrated assembler as default when using LLVM tools directly. A number of test cases using inline asm had to be adapted, either by updating the expected output, or by using -no-integrated-as (for such tests that deliberately use an invalid instruction in inline asm). llvm-svn: 225819	2015-01-13 19:43:45 +00:00
Hal Finkel	c6fdfe466f	Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support" Reverting this while I investiage buildbot failures (segfaulting in GetCostForDef at ScheduleDAGRRList.cpp:314). llvm-svn: 225811	2015-01-13 18:25:05 +00:00
Will Schmidt	b64212df3e	Update multiline.ll testcase to handle (ppc64le) .localentry directive The ppc64le platform will emit a .localentry directive. This is triggering a false-positive against a CHECK-NOT: .loc in multiline.ll. Add a space "{{ }}" to the check-not line to allow for arguments, and prevent .localentry from matching. Differential Revision: http://reviews.llvm.org/D6935 llvm-svn: 225810	2015-01-13 18:17:08 +00:00
Hal Finkel	ed17decbc6	[PowerPC] Add StackMap/PatchPoint support This commit does two things: 1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them). 2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases. One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it). StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment! llvm-svn: 225808	2015-01-13 17:48:12 +00:00
Jozef Kolek	b17d503df7	[mips][microMIPS] Fix issue with 16b instructions in jr instruction delay slot 16 bit instructions are not allowed in jr delay slot. Same stands for PseudoIndirectBranch and PseudoReturn. Differential Revision: http://reviews.llvm.org/D6815 llvm-svn: 225798	2015-01-13 15:59:17 +00:00
Chandler Carruth	d27c0143d5	[PM] Refactor the new pass manager to use a single template to implement the generic functionality of the pass managers themselves. In the new infrastructure, the pass "manager" isn't actually interesting at all. It just pipelines a single chunk of IR through N passes. We don't need to know anything about the IR or the passes to do this really and we can replace the 3 implementations of the exact same functionality with a single generic PassManager template, complementing the single generic AnalysisManager template. I've left typedefs in place to give convenient names to the various obvious instantiations of the template. With this, I think I've nuked almost all of the redundant logic in the managers, and I think the overall design is actually simpler for having single templates that clearly indicate there is no special logic here. The logging is made somewhat more annoying by this change, but I don't think the difference is worth having heavy-weight traits to help log things. llvm-svn: 225783	2015-01-13 11:13:56 +00:00
Chandler Carruth	62054fc3fc	[PM] Fold all three analysis managers into a single AnalysisManager template. This consolidates three copies of nearly the same core logic. It adds "complexity" to the ModuleAnalysisManager in that it makes it possible to share a ModuleAnalysisManager across multiple modules... But it does so by deleting all of the code, so I'm OK with that. This will naturally make fixing bugs in this code much simpler, etc. The only down side here is that we have to use 'typename' and 'this->' in various places, and the implementation is lifted into the header. I'll take that for the code size reduction. The convenient names are still typedef-ed and used throughout so that users can largely ignore this aspect of the implementation. The follow-up change to this will do the exact same refactoring for the PassManagers. =D It turns out that the interesting different code is almost entirely in the adaptors. At the end, that should be essentially all that is left. llvm-svn: 225757	2015-01-13 02:51:47 +00:00
Reid Kleckner	033ced7470	Rename llvm.recoverframeallocation to llvm.framerecover This name is less descriptive, but it sort of puts things in the 'llvm.frame...' namespace, relating it to frameallocate and frameaddress. It also avoids using "allocate" and "allocation" together. llvm-svn: 225752	2015-01-13 01:51:34 +00:00
Reid Kleckner	002e480f22	Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics These intrinsics allow multiple functions to share a single stack allocation from one function's call frame. The function with the allocation may only perform one allocation, and it must be in the entry block. Functions accessing the allocation call llvm.recoverframeallocation with the function whose frame they are accessing and a frame pointer from an active call frame of that function. These intrinsics are very difficult to inline correctly, so the intention is that they be introduced rarely, or at least very late during EH preparation. Reviewers: echristo, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D6493 llvm-svn: 225746	2015-01-13 00:48:10 +00:00
Matt Arsenault	bdd98aed5a	Combine fcmp + select to fminnum / fmaxnum if no nans and legal Also require unsafe FP math for no since there isn't a way to test for signed zeros. llvm-svn: 225744	2015-01-13 00:43:00 +00:00
Reid Kleckner	2b47fc9382	musttail: Only set the inreg flag for fastcall and vectorcall Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and stdcall thunks, leaving us with no scratch registers for indirect call targets. Fixes PR22052. llvm-svn: 225729	2015-01-12 23:28:23 +00:00
Adrian Prantl	f326dd1acd	Debug info: Factor out the creation of DWARF expressions from AsmPrinter into a new class DwarfExpression that can be shared between AsmPrinter and DwarfUnit. This is the first step towards unifying the two entirely redundant implementations of dwarf expression emission in DwarfUnit and AsmPrinter. Almost no functional change — Testcases were updated because asm comments that used to be on two lines now appear on the same line, which is actually preferable. llvm-svn: 225706	2015-01-12 22:19:22 +00:00
Ahmed Bougacha	8c30b8debe	[X86] Also create+widen FMIN/FMAX nodes for v2f32. This happens in the HINT benchmark, where the SLP-vectorizer created v2f32 fcmp/select code. The "correct" solution would have been to teach the vectorizer cost model that v2f32 isn't legal (because really, it isn't), but if we can vectorize we might as well do so. We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on. v3f32 were already widened to v4f32 by the generic unroll-and-build-vector legalization. rdar://15763436 Differential Revision: http://reviews.llvm.org/D6557 llvm-svn: 225691	2015-01-12 20:31:30 +00:00
Ahmed Bougacha	369ae03621	[X86] Make SSE min/max testcases more explicit. NFC. llvm-svn: 225687	2015-01-12 20:15:47 +00:00
Tom Stellard	d556c00330	R600/SI: Use RegisterOperands to specify which operands can accept immediates There are some operands which can take either immediates or registers and we were previously using different register class to distinguish between operands that could take immediates and those that could not. This patch switches to using RegisterOperands which should simplify the backend by reducing the number of register classes and also make it easier to implement the assembler. llvm-svn: 225662	2015-01-12 19:33:18 +00:00
Sanjay Patel	ed6b379845	GVN: propagate equalities for floating point compares Allow optimizations based on FP comparison values in the same way as integers. This resolves PR17713: http://llvm.org/bugs/show_bug.cgi?id=17713 Differential Revision: http://reviews.llvm.org/D6911 llvm-svn: 225660	2015-01-12 19:29:48 +00:00
Rafael Espindola	b90e1b4a20	Add r224985 back with two fixes. One is that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. The other is that ld64 requires the relocations to cstring to use linker visible symbols on AArch64. Thanks to Michael Zolotukhin for testing this! Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225644	2015-01-12 18:13:07 +00:00
Jozef Kolek	c8014187bb	[mips][microMIPS] Implement BEQZ16 and BNEZ16 instructions Differential Revision: http://reviews.llvm.org/D5271 llvm-svn: 225627	2015-01-12 12:03:34 +00:00
Richard Smith	57a4ba2da9	Put this test's input in the Inputs directory where it belongs, rather than reusing a file from a different test directory. llvm-svn: 225621	2015-01-12 08:50:47 +00:00
Hal Finkel	08f3b7b05f	[PowerPC] Fix calls to non-function objects Looking at r225438 inspired me to see how the PowerPC backend handled the situation (calling a bitcasted TLS global), and it turns out we also produced an error (cannot select ...). What it means to "call" something that is not a function is implementation and platform specific, but in the name of doing something (besides crashing), this makes sure we do what GCC does (treat all such calls as calls through a function pointer -- meaning that the pointer is assumed, as is the convention on PPC, to point to a function descriptor structure holding the actual code address along with the function's TOC pointer and environment pointer). As GCC does, we now do the same for calling regular (non-TLS) non-function globals too. I'm not sure whether this is the most useful way to define the behavior, but at least we won't be alone. llvm-svn: 225617	2015-01-12 04:34:47 +00:00
David Majnemer	3153df330a	Revert most of r225597 We can't rely on a DataLayout enlightened constant folder. llvm-svn: 225599	2015-01-11 07:29:51 +00:00
David Majnemer	030393ac1e	X86: Properly decode shuffle masks when the constant pool type is weird It's possible for the constant pool entry for the shuffle mask to come from a completely different operation. This occurs when Constants have the same bit pattern but have different types. Make DecodePSHUFBMask tolerant of types which, after a bitcast, are appropriately sized vector types. This fixes PR22188. llvm-svn: 225597	2015-01-11 05:08:57 +00:00
Saleem Abdulrasool	7d9a7afdb1	X86: teach X86TargetLowering about L,M,O constraints Teach the ISelLowering for X86 about the L,M,O target specific constraints. Although, for the moment, clang performs constraint validation and prevents passing along inline asm which may have immediate constant constraints violated, the backend should be able to cope with the invalid inline asm a bit better. llvm-svn: 225596	2015-01-11 04:39:24 +00:00
Saleem Abdulrasool	d18c3b767f	ARM: add support for segment base relocations (SBREL) This adds support for parsing and emitting the SBREL relocation variant for the ARM target. Handling this relocation variant is necessary for supporting the full ARM ELF specification. Addresses PR22128. llvm-svn: 225595	2015-01-11 04:39:18 +00:00
Chandler Carruth	2da477ebfb	[x86] Remove some windows line endings that snuck into the tests here. Folks on Windows, remember to set up your subversion to strip these when submitting... llvm-svn: 225593	2015-01-11 01:36:20 +00:00
Sanjoy Das	f93f60b30a	Fix PR22179. We were incorrectly inferring nsw for certain SCEVs. We can be more aggressive here (see Richard Smith's comment on http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just focuses on correctness. Differential Revision: http://reviews.llvm.org/D6914 llvm-svn: 225591	2015-01-10 23:41:24 +00:00
Simon Pilgrim	0737d28010	[X86][SSE] Improved (v)insertps shuffle matching In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable. This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced. We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function. Differential Revision: http://reviews.llvm.org/D6879 llvm-svn: 225589	2015-01-10 19:45:33 +00:00
Hal Finkel	0a750c5cd8	[PowerPC] Mark zext of a small scalar load as free This initial implementation of PPCTargetLowering::isZExtFree marks as free zexts of small scalar loads (that are not sign-extending). This callback is used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to determine whether a zext or an anyext is used to lower illegally-typed PHIs. Because later truncates of zero-extended values are nops, this allows for the elimination of later unnecessary truncations. Fixes the initial complaint associated with PR22120. llvm-svn: 225584	2015-01-10 08:21:59 +00:00
Saleem Abdulrasool	56f7ee08f0	tests: fix previous commit The previous commit accidentally missed changes to the test output checking, resulting in an errant failure. llvm-svn: 225577	2015-01-10 02:53:25 +00:00
Saleem Abdulrasool	7b637a49c1	test: merge ARM relocations test There is a fair number of relocations that are part of the AAELF specification. Simply merge the tests into a single test file, otherwise, we will end up with far too many test files to test each relocation type. NFC. llvm-svn: 225576	2015-01-10 02:48:29 +00:00
Saleem Abdulrasool	93aa06cca8	tests: convert a couple of ARM relocation tests to readobj These tests are checking the relocation generation. Use the readobj output as it is much easier to follow when glancing over the tests. llvm-svn: 225575	2015-01-10 02:48:25 +00:00
Justin Hibbits	b4eb439b90	Fully fix Bug #22115 . Summary: In the previous commit, the register was saved, but space was not allocated. This resulted in the parameter save area potentially clobbering r30, leading to nasty results. Test Plan: Tests updated Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6906 llvm-svn: 225573	2015-01-10 01:57:21 +00:00
Hal Finkel	c563f95905	[PowerPC] Readjust the loop unrolling threshold Now that the way that the partial unrolling threshold for small loops is used to compute the unrolling factor as been corrected, a slightly smaller threshold is preferable. This is expected; other targets may need to re-tune as well. llvm-svn: 225566	2015-01-10 00:31:10 +00:00
Hal Finkel	977fb5e0c3	[LoopUnroll] Fix the partial unrolling threshold for small loop sizes When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565	2015-01-10 00:30:55 +00:00
Rafael Espindola	3ca4721c27	Use the DiagnosticHandler to print diagnostics when reading bitcode. The bitcode reading interface used std::error_code to report an error to the callers and it is the callers job to print diagnostics. This is not ideal for error handling or diagnostic reporting: * For error handling, all that the callers care about is 3 possibilities: * It worked * The bitcode file is corrupted/invalid. * The file is not bitcode at all. * For diagnostic, it is user friendly to include far more information about the invalid case so the user can find out what is wrong with the bitcode file. This comes up, for example, when a developer introduces a bug while extending the format. The compromise we had was to have a lot of error codes. With this patch we use the DiagnosticHandler to communicate with the human and std::error_code to communicate with the caller. This allows us to have far fewer error codes and adds the infrastructure to print better diagnostics. This is so because the diagnostics are printed when he issue is found. The code that detected the problem in alive in the stack and can pass down as much context as needed. As an example the patch updates test/Bitcode/invalid.ll. Using a DiagnosticHandler also moves the fatal/non-fatal error decision to the caller. A simple one like llvm-dis can just use fatal errors. The gold plugin needs a bit more complex treatment because of being passed non-bitcode files. An hypothetical interactive tool would make all bitcode errors non-fatal. llvm-svn: 225562	2015-01-10 00:07:30 +00:00
Alexey Samsonov	8ccabc2dd8	Disable Go bindings test under UBSan. llvm-svn: 225557	2015-01-09 23:17:23 +00:00
Andrew Kaylor	8d6df13172	Fix the JIT event listeners and replace the associated tests. The changes to EventListenerCommon.h were contributed by Arch Robison. This fixes bug 22095. http://reviews.llvm.org/D6905 llvm-svn: 225554	2015-01-09 22:53:24 +00:00
Hans Wennborg	5ee44a1865	SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210) The previous code assumed that such instructions could not have any uses outside CaseDest, with the motivation that the instruction could not dominate CommonDest because CommonDest has phi nodes in it. That simply isn't true; e.g., CommonDest could have an edge back to itself. llvm-svn: 225552	2015-01-09 22:13:31 +00:00
Simon Pilgrim	02b7bb4769	[X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zeros pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable. Differential Revision: http://reviews.llvm.org/D6878 llvm-svn: 225551	2015-01-09 22:03:19 +00:00
Rafael Espindola	793f8a4d23	Add a testcase of llvm-lto error handling. llvm-svn: 225545	2015-01-09 20:55:09 +00:00
Kevin Enderby	901dddff2f	Add the option, -universal-headers, used with -macho to print the Mach-O universal headers to llvm-objdump. llvm-svn: 225537	2015-01-09 19:22:37 +00:00
Tim Northover	ef7d18507b	Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" It's not really expected to stick around, last time it provoked a weird LTO build failure that I can't reproduce now, and the bot logs are long gone. I'll re-revert it if the failures recur. Original description: Perform Scalar PRE on gep indices that feed loads before doing Load PRE. llvm-svn: 225536	2015-01-09 19:19:56 +00:00
Daniel Sanders	8507fc3677	[mips] Add support for accessing $gp as a named register. Summary: Mips Linux uses $gp to hold a pointer to thread info structure and accesses it with a named register. This makes this work for LLVM. The N32 ABI doesn't quite work yet since the frontend generates incorrect IR for this case. It neglects to truncate the 64-bit GPR to a 32-bit value before converting to a pointer. Given correct IR (as in the testcase in this patch), it works correctly. Reviewers: sstankovic, vmedic, atanasyan Reviewed By: atanasyan Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6893 llvm-svn: 225529	2015-01-09 17:21:30 +00:00
Hal Finkel	556331a037	[PowerPC] Enable late partial unrolling on the POWER7 The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. llvm-svn: 225522	2015-01-09 15:51:16 +00:00
Saleem Abdulrasool	4e030253cf	ARM: add support for R_ARM_ABS16 Add support for R_ARM_ABS16 relocation mapping. Addresses PR22156. llvm-svn: 225510	2015-01-09 06:57:24 +00:00
Saleem Abdulrasool	a49e5cd052	test: add additional test for SVN r225507 Add an additional test case to ensure that we generate the relocation even if the thumb target is used. llvm-svn: 225509	2015-01-09 06:57:18 +00:00
Saleem Abdulrasool	91d3faf645	ARM: add support for R_ARM_ABS8 relocations Add support for R_ARM_ABS8 relocation. Addresses PR22126. llvm-svn: 225507	2015-01-09 05:59:12 +00:00
Matthias Braun	e2f9677f81	RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness The code that eliminated additional coalescable copies in removeCopyByCommutingDef() used MergeValueNumberInto() which internally may merge A into B or B into A. In this case A and B had different Def points, so we have to reset ValNo.Def to the intended one after merging. llvm-svn: 225503	2015-01-09 03:01:31 +00:00
Hal Finkel	cfa765d60f	[PowerPC] Fold [sz]ext with fp_to_int lowering where possible On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32 to i64 into the load necessary for an i64 -> fp conversion. llvm-svn: 225493	2015-01-09 01:34:30 +00:00
Duncan P. N. Exon Smith	e06d77c9ae	Utils: Keep distinct MDNodes distinct in MapMetadata() Create new copies of distinct `MDNode`s instead of following the uniquing `MDNode` logic. Just like self-references (or other cycles), `MapMetadata()` creates a new node. In practice most calls use `RF_NoModuleLevelChanges`, in which case nothing is duplicated anyway. Part of PR22111. llvm-svn: 225476	2015-01-08 22:42:30 +00:00
Duncan P. N. Exon Smith	bc9ee9160a	IR: Add 'distinct' MDNodes to bitcode and assembly Propagate whether `MDNode`s are 'distinct' through the other types of IR (assembly and bitcode). This adds the `distinct` keyword to assembly. Currently, no one actually calls `MDNode::getDistinct()`, so these nodes only get created for: - self-references, which are never uniqued, and - nodes whose operands are replaced that hit a uniquing collision. The concept of distinct nodes is still not quite first-class, since distinct-ness doesn't yet survive across `MapMetadata()`. Part of PR22111. llvm-svn: 225474	2015-01-08 22:38:29 +00:00
Hal Finkel	cf6cbfbe30	[PowerPC] Mark all instructions as non-cheap for MachineLICM MachineLICM uses a callback named hasLowDefLatency to determine if an instruction def operand has a 'low' latency. If all relevant operands have a 'low' latency, the instruction is considered too cheap to hoist out of loops even in low-register-pressure situations. On PowerPC cores, both the embedded cores and the others, there is no reason to believe that this is a good choice: all instructions have a cost inside a loop, and hoisting them when not limited by register pressure is a reasonable default. llvm-svn: 225471	2015-01-08 22:11:49 +00:00
Akira Hatanaka	71e44f03e9	[ARM] Fix a bug in constant island pass that was triggering an assertion. The assert was being triggered when the distance between a constant pool entry and its user exceeded the maximally allowed distance after thumb2 branch shortening. A padding was inserted after a thumb2 branch instruction was shrunk, which caused the user to be out of range. This is wrong as the padding should have been inserted by the layout algorithm so that the distance between two instructions doesn't grow later during thumb2 instruction optimization. This commit fixes the code in ARMConstantIslands::createNewWater to call computeBlockSize and set BasicBlock::Unalign when a branch instruction is inserted to create new water after a basic block. A non-zero Unalign causes the worst-case padding to be inserted when adjustBBOffsetsAfter is called to recompute the basic block offsets. rdar://problem/19130476 llvm-svn: 225467	2015-01-08 20:44:50 +00:00
Matt Arsenault	4b66850c40	Fix fcmp + fabs instcombines when using the intrinsic This was only handling the libcall. This is another example of why only the intrinsic should ever be used when it exists. llvm-svn: 225465	2015-01-08 20:09:34 +00:00
Lang Hames	58e00a9ed2	[MCJIT] Remove a few redundant MCJIT tests, and drop the extraneous datalayout strings from the copies that remain. llvm-svn: 225460	2015-01-08 18:52:15 +00:00
Rafael Espindola	198d2d6974	Make this test a bit stricter. It now checks for the end of the line or the opening '{'. While at it, remove empty comments. llvm-svn: 225451	2015-01-08 16:11:18 +00:00
Justin Hibbits	68fee020d6	Add saving and restoring of r30 to the prologue and epilogue, respectively Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450	2015-01-08 15:47:19 +00:00
Kristof Beyls	53a9880dcd	Fix large stack alignment codegen for ARM and Thumb2 targets This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all. Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case. This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits. The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached. llvm-svn: 225446	2015-01-08 15:09:14 +00:00
Tom Stellard	d8d9d6ab95	R600/SI: Remove SIISelLowering::legalizeOperands() Its functionality has been replaced by calling SIInstrInfo::legalizeOperands() from SIISelLowering::AdjstInstrPostInstrSelection() and running the SIFoldOperands and SIShrinkInstructions passes. llvm-svn: 225445	2015-01-08 15:08:17 +00:00
Elena Demikhovsky	9ab7f6e415	Masked Load/Store - fixed a bug in type legalization. llvm-svn: 225441	2015-01-08 12:29:19 +00:00
Michael Kuperstein	f9bd4536d2	Fix a think-o in the test for r225438. llvm-svn: 225440	2015-01-08 12:05:02 +00:00
Michael Kuperstein	f890489bdb	[X86] Don't try to generate direct calls to TLS globals The call lowering assumes that if the callee is a global, we want to emit a direct call. This is correct for regular globals, but not for TLS ones. Differential Revision: http://reviews.llvm.org/D6862 llvm-svn: 225438	2015-01-08 11:50:58 +00:00
Craig Topper	057e795d39	Fix test case I missed in r225432. llvm-svn: 225434	2015-01-08 07:57:27 +00:00
Craig Topper	ac00edab84	[X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the LEA variants in Intel syntax. The memory operand is inherently unsized. llvm-svn: 225432	2015-01-08 07:41:30 +00:00
Adrian Prantl	d3017f3565	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424	2015-01-08 02:02:00 +00:00
Quentin Colombet	41c4d5ee6c	[RegAllocGreedy] Introduce a late pass to repair broken hints. A broken hint is a copy where both ends are assigned different colors. When a variable gets evicted in the neighborhood of such copies, it is likely we can reconcile some of them. Context Copies are inserted during the register allocation via splitting. These split points are required to relax the constraints on the allocation problem. When such a point is inserted, both ends of the copy would not share the same color with respect to the current allocation problem. When variables get evicted, the allocation problem becomes different and some split point may not be required anymore. However, the related variables may already have been colored. This usually shows up in the assembly with pattern like this: def A ... save A to B def A use A restore A from B ... use B Whereas we could simply have done: def B ... def A use A ... use B Proposed Solution A variable having a broken hint is marked for late recoloring if and only if selecting a register for it evict another variable. Indeed, if no eviction happens this is pointless to look for recoloring opportunities as it means the situation was the same as the initial allocation problem where we had to break the hint. Finally, when everything has been allocated, we look for recoloring opportunities for all the identified candidates. The recoloring is performed very late to rely on accurate copy cost (all involved variables are allocated). The recoloring is simple unlike the last change recoloring. It propagates the color of the broken hint to all its copy-related variables. If the color is available for them, the recoloring uses it, otherwise it gives up on that hint even if a more complex coloring would have worked. The recoloring happens only if it is profitable. The profitability is evaluated using the expected frequency of the copies of the currently recolored variable with a) its current color and b) with the target color. If a) is greater or equal than b), then it is profitable and the recoloring happen. Example Consider the following example: BB1: a = b = BB2: ... = b = a Let us assume b gets split: BB1: a = b = BB2: c = b ... d = c = d = a Because of how the allocation work, b, c, and d may be assigned different colors. Now, if a gets evicted to make room for c, assuming b and d were assigned to something different than a. We end up with: BB1: a = st a, SpillSlot b = BB2: c = b ... d = c = d e = ld SpillSlot = e This is likely that we can assign the same register for b, c, and d, getting rid of 2 copies. Performances Both ARM64 and x86_64 show performance improvements of up to 3% for the llvm-testsuite + externals with Os and O3. There are a few regressions too that comes from the (in)accuracy of the block frequency estimate. <rdar://problem/18312047> llvm-svn: 225422	2015-01-08 01:16:39 +00:00
Matthias Braun	22b8fb521f	RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases. I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a subregister index in SrcReg/DstReg, but they are actually subregister indices of the coalesced register that get you back to SrcReg/DstReg when applied. Fixed the bug, improved comments and simplified code accordingly. Testcase by Tom Stellard! llvm-svn: 225415	2015-01-07 23:58:38 +00:00
Philip Reames	2ad347b060	[GC] improve testing around gc.relocate and fix a test Patch by: Ramkumar Ramachandra <artagnon@gmail.com> "This patch started out as an exploration of gc.relocate, and an attempt to write a simple test in call-lowering. I then noticed that the arguments of gc.relocate were not checked fully, so I went in and fixed a few things. Finally, the most important outcome of this patch is that my new error handling code caught a bug in a callsite in stackmap-format." Differential Revision: http://reviews.llvm.org/D6824 llvm-svn: 225412	2015-01-07 22:48:01 +00:00
Tom Stellard	1e2b9bc41b	R600/SI: Commute instructions to enable more folding opportunities llvm-svn: 225410	2015-01-07 22:44:19 +00:00
Tom Stellard	8d2d692413	R600/SI: Only fold immediates that have one use Folding the same immediate into multiple instruction will increase program size, which can hurt performance. llvm-svn: 225405	2015-01-07 22:18:27 +00:00
Duncan P. N. Exon Smith	0d2cbb7da2	Linker: Don't use MDNode::replaceOperandWith() `MDNode::replaceOperandWith()` changes all instances of metadata. Stop using it when linking module flags, since (due to uniquing) the flag values could be used by other metadata. Instead, use new API `NamedMDNode::setOperand()` to update the reference directly. llvm-svn: 225397	2015-01-07 21:32:27 +00:00
Alexey Samsonov	823c2375cf	XFAIL several MCJIT EH tests under ASan and MSan bootstrap. llvm-svn: 225393	2015-01-07 21:27:26 +00:00
Rafael Espindola	537f09c5f7	Add a test that would have found the issue in r224935. llvm-svn: 225385	2015-01-07 21:10:25 +00:00
Kevin Enderby	ab9fa91a6e	Slightly refactor things for llvm-objdump and the -macho option so it can be used with options other than just -disassemble so that universal files can be used with other options combined with -arch options. No functional change to existing options and use. One test case added for the additional functionality with a universal file an a -arch option. llvm-svn: 225383	2015-01-07 21:02:18 +00:00
Olivier Sallenave	82830c0d30	More FMA folding opportunities. llvm-svn: 225380	2015-01-07 20:54:17 +00:00
Adrian Prantl	f59b4b4d08	Reapply: Teach SROA how to update debug info for fragmented variables. The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379	2015-01-07 20:52:22 +00:00
Adrian Prantl	fec7723d82	Debug info: Allow aggregate types to be described by constants. llvm-svn: 225378	2015-01-07 20:48:58 +00:00
Colin LeMahieu	113d011e38	[Hexagon] Adding floating point classification and creation. llvm-svn: 225374	2015-01-07 20:28:57 +00:00
Tom Stellard	0014f9fa76	R600/SI: Add a V_MOV_B64 pseudo instruction This is used to simplify the SIFoldOperands pass and make it easier to fold immediates. llvm-svn: 225373	2015-01-07 20:27:25 +00:00
Colin LeMahieu	4e075604d7	[Hexagon] Adding encodings for v5 floating point instructions. llvm-svn: 225372	2015-01-07 20:24:09 +00:00
Colin LeMahieu	bb38f7d496	[Hexagon] Adding encoding for popcount, fastcorner, dword asr with rounding. llvm-svn: 225371	2015-01-07 20:07:28 +00:00
Tom Stellard	cf5b7b89ba	R600/SI: Teach SIFoldOperands to split 64-bit constants when folding This allows folding of sequences like: s[0:1] = s_mov_b64 4 v_add_i32 v0, s0, v0 v_addc_u32 v1, s1, v1 into v_add_i32 v0, 4, v0 v_add_i32 v1, 0, v1 llvm-svn: 225369	2015-01-07 19:56:17 +00:00
Philip Reames	813212cde9	Introduce an example statepoint GC strategy This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.) Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811 There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once. Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others. Differential Revision: http://reviews.llvm.org/D6808 llvm-svn: 225365	2015-01-07 19:07:50 +00:00
David Majnemer	84c4bd1a4c	X86: Allow the stack probe size to be configurable per function LLVM emits stack probes on Windows targets to ensure that the stack is correctly accessed. However, the amount of stack allocated before emitting such a probe is hardcoded to 4096. It is desirable to have this be configurable so that a function might opt-out of stack probes. Our level of granularity is at the function level instead of, say, the module level to permit proper generation of code after LTO. Patch by Andrew H! N.B. The inliner needs to be updated to properly consider what happens after inlining a function with a specific stack-probe-size into another function with a different stack-probe-size. llvm-svn: 225360	2015-01-07 18:14:07 +00:00
Ahmed Bougacha	cb2ee2e772	[X86] Teach FCOPYSIGN lowering to recognize constant magnitudes. For code like: float foo(float x) { return copysign(1.0, x); } We used to generate: andps <-0.000000e+00,0,0,0>, %xmm0 movss <1.000000e+00>, %xmm1 andps <nan>, %xmm1 orps %xmm0, %xmm1 Basically doing an abs(1.0f) in the two middle instructions. We now generate: andps <-0.000000e+00,0,0,0>, %xmm0 orps <1.000000e+00,0,0,0>, %xmm0 Builds on cleanups r223415, r223542. rdar://19049548 Differential Revision: http://reviews.llvm.org/D6555 llvm-svn: 225357	2015-01-07 17:33:03 +00:00
Charlie Turner	576ceb22ba	[ARM] Add missing Tag_DIV_use tests. llvm-svn: 225348	2015-01-07 11:37:40 +00:00
Chandler Carruth	8b09c4f3f8	[PM] Give slightly less horrible names to the utility pass templates for requiring and invalidating specific analyses. Also make their printed names match their class names. Writing these out as prose really doesn't make sense to me any more. llvm-svn: 225346	2015-01-07 11:14:51 +00:00
Karthik Bhat	d463305416	Revert r225165 and r225169 Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case where the original subtraction could have caused an overflow. Reverting the same. llvm-svn: 225341	2015-01-07 06:34:34 +00:00
Chandler Carruth	2ca6c65ad5	[PM] Fix a pretty nasty bug where the new pass manager would invalidate passes too many time. I think this is actually the issue that someone raised with me at the developer's meeting and in an email, but that we never really got to the bottom of. Having all the testing utilities made it much easier to dig down and uncover the core issue. When a pass manager is running many passes over a single function, we need it to invalidate the analyses between each run so that they can be re-computed as needed. We also need to track the intersection of preserved higher-level analyses across all the passes that we run (for example, if there is one module analysis which all the function analyses preserve, we want to track that and propagate it). Unfortunately, this interacted poorly with any enclosing pass adaptor between two IR units. It would see the intersection of preserved analyses, and need to invalidate any other analyses, but some of the un-preserved analyses might have already been invalidated and recomputed! We would fail to propagate the fact that the analysis had already been invalidated. The solution to this struck me as really strange at first, but the more I thought about it, the more natural it seemed. After a nice discussion with Duncan about it on IRC, it seemed even nicer. The idea is that invalidating an analysis causes it to be preserved! Preserving the lack of result is trivial. If it is recomputed, great. Until something else invalidates it again, we're good. The consequence of this is that the invalidate methods on the analysis manager which operate over many passes now consume their PreservedAnalyses object, update it to "preserve" every analysis pass to which it delivers an invalidation (regardless of whether the pass chooses to be removed, or handles the invalidation itself by updating itself). Then we return this augmented set from the invalidate routine, letting the pass manager take the result and use the intersection of that across each pass run to compute the final preserved set. This accounts for all the places where the early invalidation of an analysis has already "preserved" it for a future run. I've beefed up the testing and adjusted the assertions to show that we no longer repeatedly invalidate or compute the analyses across nested pass managers. llvm-svn: 225333	2015-01-07 01:58:35 +00:00
Matt Arsenault	53120c2e9a	R600/SI: Add combine for isinfinite pattern llvm-svn: 225310	2015-01-06 23:00:46 +00:00
Matt Arsenault	b663657a06	R600/SI: Pattern match isinf to v_cmp_class instructions llvm-svn: 225307	2015-01-06 23:00:41 +00:00
Matt Arsenault	208e0172ef	R600/SI: Add basic DAG combines for fp_class llvm-svn: 225306	2015-01-06 23:00:39 +00:00
Matt Arsenault	08086327f3	R600/SI: Add class intrinsic llvm-svn: 225305	2015-01-06 23:00:37 +00:00
Matt Arsenault	0d86cea633	Fix using wrong intrinsic in test This is a leftover from renaming the intrinsic. It's surprising the unknown llvm. intrinsic wasn't rejected. llvm-svn: 225304	2015-01-06 23:00:33 +00:00
Rafael Espindola	20dc6c7571	Change the .ll syntax for comdats and add a syntactic sugar. In order to make comdats always explicit in the IR, we decided to make the syntax a bit more compact for the case of a GlobalObject in a comdat with the same name. Just dropping the $name causes problems for @foo = globabl i32 0, comdat $bar = comdat ... and declare void @foo() comdat $bar = comdat ... So the syntax is changed to @g1 = globabl i32 0, comdat($c1) @g2 = globabl i32 0, comdat and declare void @foo() comdat($c1) declare void @foo() comdat llvm-svn: 225302	2015-01-06 22:55:16 +00:00
Hal Finkel	930e5f41df	[PowerPC] Reuse a load operand in int->fp conversions int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. llvm-svn: 225301	2015-01-06 22:31:02 +00:00
Colin LeMahieu	5dbfa1b1a1	[Hexagon] Adding compound jump encodings. llvm-svn: 225291	2015-01-06 20:03:31 +00:00
Tom Stellard	372a94c88c	R600/SI: Insert s_waitcnt before s_barrier instructions. This ensures that all memory operations are complete when all threads reach the barrier. llvm-svn: 225290	2015-01-06 19:52:07 +00:00

... 4 5 6 7 8 ...

28354 Commits