llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00

Author	SHA1	Message	Date
James Molloy	255241a790	[ARM] Do not use vtrn for vectorshuffle if the order is reversed The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case. Patch by Jeroen Ketema! llvm-svn: 247254	2015-09-10 08:42:28 +00:00
Chandler Carruth	277f8d2d80	[ADT] Apply a large hammer to StringRef functions: attribute always_inline. The logic of this follows something Howard does in libc++ and something I discussed with Chris eons ago -- for a lot of functions, there is really no benefit to preserving "debug information" by leaving the out-of-line even in debug builds. This is especially true as we now do a very good job of preserving most debug information even in the face of inlining. There are a bunch of methods in StringRef that we are paying a completely unacceptable amount for with every debug build of every LLVM developer. Some day, we should fix Clang/LLVM so that developers can reasonable use a default of something other than '-O0' and not waste their lives waiting on completely unoptimized code to execute. We should have a default that doesn't impede debugging while providing at least plausable performance. But today is not that day. So today, I'm applying always_inline to the functions that are really hurting the critical path for stuff like 'check_llvm'. I'm being very cautious here, but there are a few other APIs that we really should do this for as a matter of pragmatism. Hopefully we can rip this out some day. With this change, TripleTest.Normalization runtime decreases by over 10%, and the total 'check-llvm' time on my 48-core box goes from 38s to just under 37s. llvm-svn: 247253	2015-09-10 08:29:35 +00:00
Chandler Carruth	57fe727884	[Support] Fix the always_inline attribute macro to not include the 'inline' specifier. That specifier may or may not be valid for a given function, or it may be required for correct linkage even when the compiler doesn't support the always_inline attribute. llvm-svn: 247252	2015-09-10 08:29:30 +00:00
Chandler Carruth	0d407eafe0	[ADT] Micro-optimize the Triple constructor by doing a single split and re-using the resulting components rather than repeatedly splitting and re-splitting to compute each component as part of the initializer list. This is more work on PR23676. Sadly, it doesn't help much. It removes the constructor from my profile, but doesn't make a sufficient dent in the total time. But it should play together nicely with subsequent changes. llvm-svn: 247250	2015-09-10 07:51:43 +00:00
Chandler Carruth	e9c3d2d316	[ADT] Fix a confusing interface spec and some annoying peculiarities with the StringRef::split method when used with a MaxSplit argument other than '-1' (which nobody really does today, but which should actually work). The spec claimed both to split up to MaxSplit times, but also to append <= MaxSplit strings to the vector. One of these doesn't make sense. Given the name "MaxSplit", let's go with it being a max over how many splits occur, which means the max on how many strings get appended is MaxSplit+1. I'm not actually sure the implementation correctly provided this logic either, as it used a really opaque loop structure. The implementation was also playing weird games with nullptr in the data field to try to rely on a totally opaque hidden property of the split method that returns a pair. Nasty IMO. Replace all of this with what is (IMO) simpler code that doesn't use the pair returning split method, and instead just finds each separator and appends directly. I think this is a lot easier to read, and it most definitely matches the spec. Added some tests that exercise the corner cases around StringRef() and StringRef("") that all now pass. I'll start using this in code in the next commit. llvm-svn: 247249	2015-09-10 07:51:37 +00:00
NAKAMURA Takumi	c05f204b7a	GlobalsAAResult(&&): Move every members. Or, one of MSVC builders failed with unexpected behavior. llvm-svn: 247247	2015-09-10 07:16:42 +00:00
Elena Demikhovsky	49ab60d17b	Added isUndef() interface for SDNode Differential Revision: http://reviews.llvm.org/D12720 llvm-svn: 247246	2015-09-10 06:33:13 +00:00
Chandler Carruth	c484a28f0a	[ADT] Switch a bunch of places in LLVM that were doing single-character splits to actually use the single character split routine which does less work, and in a debug build is substantially faster. llvm-svn: 247245	2015-09-10 06:12:31 +00:00
Chandler Carruth	5e0afb35b0	[ADT] Add a single-character version of the small vector split routine on StringRef. Finding and splitting on a single character is substantially faster than doing it on even a single character StringRef -- we immediately get to a very tuned memchr call this way. Even nicer, we get to this even in a debug build, shaving 18% off the runtime of TripleTest.Normalization, helping PR23676 some more. llvm-svn: 247244	2015-09-10 06:07:03 +00:00
Chandler Carruth	6ce48f45a7	Add a way to skip the Go bindings tests even when Go is configured in CMake. The Go bindings tests in an unoptimized build take over 30 seconds for me, making it the slowest test in 'check-llvm' by a factor of two. I've only rigged this up fully to the CMake build. If someone is interested in rigging it up to the autoconf build, they're welcome to do so. llvm-svn: 247243	2015-09-10 05:47:43 +00:00
Sanjoy Das	33ef65354e	[ScalarEvolution] Fix PR24757. Summary: PR24757 was caused by some incorect math in `ScalarEvolution::HowFarToZero` -- the smallest unsigned solution for X in 2^N * A = 2^N * X is not necessarily A. Reviewers: atrick, majnemer, meheff Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D12721 llvm-svn: 247242	2015-09-10 05:27:38 +00:00
Chandler Carruth	812097ec2e	[LPM] Simplify this code and fix a compile error for compilers that don't correctly implement the scoping rules of C++11 range based for loops. This kind of aliasing isn't a good idea anyways (and wasn't really intended). llvm-svn: 247241	2015-09-10 04:22:36 +00:00
Chandler Carruth	b0c5d21894	[LPM] Use a map from analysis ID to immutable passes in the legacy pass manager to avoid a slow linear scan of every immutable pass and on every attempt to find an analysis pass. This speeds up 'check-llvm' on an unoptimized build for me by 15%, YMMV. It should also help (a tiny bit) other folks that are really bottlenecked on repeated runs of tiny pass pipelines across small IR files. llvm-svn: 247240	2015-09-10 02:31:42 +00:00
Kit Barton	8bde6cb866	Enable the shrink wrapping optimization for PPC64. The changes in this patch are as follows: 1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function 2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function 3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run: Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64. Phabricator review: http://reviews.llvm.org/D11817 llvm-svn: 247237	2015-09-10 01:55:44 +00:00
Ahmed Bougacha	f0f3751dc9	[AArch64] Match FI+offset in STNP addressing mode. First, we need to teach isFrameOffsetLegal about STNP. It already knew about the STP/LDP variants, but those were probably never exercised, because it's only the load/store optimizer that generates STP/LDP, and the only user of the method is frame lowering, which runs earlier. The STP/LDP cases were wrong: they didn't take into account the fact that they return two results, not one, so the immediate offset will be the 4th operand, not the 3rd. Follow-up to r247234. llvm-svn: 247236	2015-09-10 01:54:43 +00:00
Davide Italiano	d66d78419d	[MC] Convert all the remaining tests from macho-dump to llvm-readobj. This sort-of deprecates macho-dump. It may take still a little while to garbage collect it, but at least there's no real usage of it in the tree anymore. New tests should always rely on llvm-readobj or llvm-objdump. llvm-svn: 247235	2015-09-10 01:50:00 +00:00
Ahmed Bougacha	ac10764756	[AArch64] Match base+offset in STNP addressing mode. Followup to r247231. llvm-svn: 247234	2015-09-10 01:48:29 +00:00
Mehdi Amini	0085639fea	Makes EmitRecord() accepting ArrayRef and raw array (NFC) After r247186, a vector is no longer needed as the push_front for the code is removed. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247232	2015-09-10 01:45:55 +00:00
Ahmed Bougacha	5f7023b12c	[AArch64] Support selecting STNP. We could go through the load/store optimizer and match STNP where we would have matched a nontemporal-annotated STP, but that's not reliable enough, as an opportunistic optimization. Insetad, we can guarantee emitting STNP, by matching them at ISel. Since there are no single-input nontemporal stores, we have to resort to some high-bits-extracting trickery to generate an STNP from a plain store. Also, we need to support another, LDP/STP-specific addressing mode, base + signed scaled 7-bit immediate offset. For now, only match the base. Let's make it smart separately. Part of PR24086. llvm-svn: 247231	2015-09-10 01:42:28 +00:00
Matt Arsenault	e27d2bced7	AMDGPU/SI: Fix more cases of losing exec operands llvm-svn: 247230	2015-09-10 01:23:28 +00:00
Matt Arsenault	238e81b7c6	AMDGPU/SI: Fix creating v_mov_b32s without exec uses This will be caught by existing tests with a verifier check to be added in a future commit. llvm-svn: 247229	2015-09-10 01:06:06 +00:00
Hans Wennborg	ddb1cf7aeb	Revert r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes" This caused build breakges, e.g. http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/24926 llvm-svn: 247226	2015-09-10 00:57:26 +00:00
Ahmed Bougacha	d90c25bcb0	[CodeGen] Make x86 nontemporal store patfrags generic. NFC. To be used by other targets. llvm-svn: 247225	2015-09-10 00:53:15 +00:00
Philip Reames	a5cde832ad	[RewriteStatepointsForGC] Minor refactor to use shared implementation [NFC] llvm-svn: 247223	2015-09-10 00:44:10 +00:00
Philip Reames	0e2ea3ec9a	[RewriteStatepointsForGC] Strengthen a confusingly weak assertion [NFC] The assertion was weaker than it should be and gave the impression we're growing the number of base defining values being considered during the fixed point interation. That's not true. The tighter form of the assert is useful documentation. llvm-svn: 247221	2015-09-10 00:32:56 +00:00
Philip Reames	5581a8a2b4	[RewriteStatepointsForGC] One last bit of naming [NFCI] llvm-svn: 247220	2015-09-10 00:27:50 +00:00
Reid Kleckner	b5753fd45f	[WinEH] Add codegen support for cleanuppad and cleanupret All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219	2015-09-10 00:25:23 +00:00
Philip Reames	ad09731a88	[RewriteStatepointsForGC] Further style/naming fixup [NFCI] llvm-svn: 247217	2015-09-10 00:22:49 +00:00
Hans Wennborg	b5db40bf43	Fix Clang-tidy misc-use-override warnings, other minor fixes Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D12740 llvm-svn: 247216	2015-09-10 00:12:56 +00:00
Mehdi Amini	7ea419f1ad	Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC) This reapply commit r247178 after post-commit review from D.Blaikie in a way that makes it compatible with the existing API. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247215	2015-09-10 00:05:09 +00:00
Mehdi Amini	f9dd260ee5	Add makeArrayRef() overload for ArrayRef input (no-op/identity) NFC The purpose is to allow templated wrapper to work with either ArrayRef or any convertible operation: template<typename Container> void wrapper(const Container &Arr) { impl(makeArrayRef(Arr)); } with Container being a std::vector, a SmallVector, or an ArrayRef. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247214	2015-09-10 00:05:04 +00:00
Philip Reames	153e484862	[RewriteStatepointsForGC] More naming cleanup [NFCI] llvm-svn: 247213	2015-09-10 00:01:53 +00:00
Philip Reames	b1d69773b9	[RewriteStatepointsForGC] Code cleanup [NFC] Factor out common code related to naming values, fix a small style issue. More to follow in separate changes. llvm-svn: 247211	2015-09-09 23:57:18 +00:00
Philip Reames	b2d2c7c606	[RewriteStatepointsForGC] Extend base pointer inference to handle insertelement This change is simply enhancing the existing inference algorithm to handle insertelement instructions by conservatively inserting a new instruction to propagate the vector of associated base pointers. In the process, I'm ripping out the peephole optimizations which mostly helped cover the fact this hadn't been done. Note that most of the newly inserted nodes will be nearly immediately removed by the post insertion optimization pass introduced in 246718. Arguably, we should be trying harder to avoid the malloc traffic here, but I'd rather get the code correct, then worry about compile time. Unlike previous extensions of the algorithm to handle more case, I discovered the existing code was causing miscompiles in some cases. In particular, we had an implicit assumption that the peephole covered all insert element instructions, so if we had a value directly based on a insert element the peephole didn't cover, we proceeded as if it were a base anyways. Not good. I believe we had the same issue with shufflevector which is why I adjusted the predicate for them as well. Differential Revision: http://reviews.llvm.org/D12583 llvm-svn: 247210	2015-09-09 23:40:12 +00:00
Philip Reames	b4dfda2138	[RewriteStatepointsForGC] Make base pointer inference deterministic Previously, the base pointer algorithm wasn't deterministic. The core fixed point was (of course), but we were inserting new nodes and optimizing them in an order which was unspecified and variable. We'd somewhat hacked around this for testing by sorting by value name, but that doesn't solve the general determinism problem. Instead, we can use the order of traversal over the def/use graph to give us a single consistent ordering. Today, this is a DFS order, but the exact order doesn't mater provided it's deterministic for a given input. (Q: It is safe to rely on a deterministic order of operands right?) Note that this only fixes the determinism within a single inference step. The inference step is currently invoked many times in a non-deterministic order. That's a future change in the sequence. :) Differential Revision: http://reviews.llvm.org/D12640 llvm-svn: 247208	2015-09-09 23:26:08 +00:00
Peter Collingbourne	04c6c402ba	LowerBitSets: Fix non-determinism bug. Visit disjoint sets in a deterministic order based on the maximum BitSetNM index, otherwise the order in which we visit them will depend on pointer comparisons. This was being exposed by MSan. llvm-svn: 247201	2015-09-09 22:30:32 +00:00
Reid Kleckner	40ce82e375	[SEH] Emit 32-bit SEH tables for the new EH IR The 32-bit tables don't actually contain PC range data, so emitting them is incredibly simple. The 64-bit tables, on the other hand, use the same table for state numbering as well as label ranges. This makes things more difficult, so it will be implemented later. llvm-svn: 247192	2015-09-09 21:10:03 +00:00
Dan Gohman	c4e3274379	[WebAssembly] Update target datalayout strings. llvm-svn: 247187	2015-09-09 20:54:31 +00:00
Teresa Johnson	8df32eb93e	Change EmitRecordWithAbbrevImpl to take Optional record code. NFC. This change enables EmitRecord to pass the supplied record Code to EmitRecordWithAbbrevImpl, rather than insert it into the Vals array. It is an enabler for changing EmitRecord to take an ArrayRef<uintty> instead of a SmallVectorImpl<uintty>& Patch suggested by Duncan P. N. Exon Smith, modified by myself a bit to get correct assertion checking. llvm-svn: 247186	2015-09-09 20:53:31 +00:00
Piotr Padlewski	56f48d9943	ScalarEvolution assume hanging bugfix http://reviews.llvm.org/D12719 llvm-svn: 247184	2015-09-09 20:47:30 +00:00
Mehdi Amini	2f7683bcd2	Revert "Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC)" This reverts commit r247178. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247182	2015-09-09 20:35:15 +00:00
David Majnemer	3926c488cf	Revert trunc(lshr (sext A), Cst) to ashr A, Cst This reverts commit r246997, it introduced a regression (PR24763). llvm-svn: 247180	2015-09-09 20:20:08 +00:00
Mehdi Amini	acd97a51fe	Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247178	2015-09-09 20:08:39 +00:00
Renato Golin	32a92f6d16	Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177	2015-09-09 19:44:40 +00:00
Sanjay Patel	0d6bc4f108	allow unpredictable metadata on switch statements llvm-svn: 247174	2015-09-09 18:38:30 +00:00
Matthias Braun	a4356ce0e6	Save LaneMask with livein registers With subregister liveness enabled we can detect the case where only parts of a register are live in, this is expressed as a 32bit lanemask. The current code only keeps registers in the live-in list and therefore enumerated all subregisters affected by the lanemask. This turned out to be too conservative as the subregister may also cover additional parts of the lanemask which are not live. Expressing a given lanemask by enumerating a minimum set of subregisters is computationally expensive so the best solution is to simply change the live-in list to store the lanemasks as well. This will reduce memory usage for targets using subregister liveness and slightly increase it for other targets Differential Revision: http://reviews.llvm.org/D12442 llvm-svn: 247171	2015-09-09 18:08:03 +00:00
Matthias Braun	d5a7fc40ed	VirtRegMap: Improve addMBBLiveIns() using SlotIndex::MBBIndexIterator; NFC Now that we have an explicit iterator over the idx2MBBMap in SlotIndices we can use the fact that segments and the idx2MBBMap is sorted by SlotIndex position so can advance both simultaneously instead of starting from the beginning for each segment. This complicates the code for the subregister case somewhat but should be more efficient and has the advantage that we get the final lanemask for each block immediately which will be important for a subsequent change. Removes the now unused SlotIndexes::findMBBLiveIns function. Differential Revision: http://reviews.llvm.org/D12443 llvm-svn: 247170	2015-09-09 18:07:54 +00:00
Chandler Carruth	d7003090ac	[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are within a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167	2015-09-09 17:55:00 +00:00
Matthias Braun	364e6db7a4	MachineVerifier: Check that SlotIndex MBBIndexList is sorted. This introduces a check that the MBBIndexList is sorted as proposed in http://reviews.llvm.org/D12443 but split up into a separate commit. llvm-svn: 247166	2015-09-09 17:49:46 +00:00
Matt Arsenault	bcf29c51fa	AMDGPU: Extract full 64-bit subregister and use subregs Instead of extracting both 32-bit components from the 128-bit register. This produces fewer copies and is easier for the copy peephole optimizer to understand and see the actual uses as extracts from a reg_sequence. This avoids needing to handle subregister composing in the PeepholeOptimizer's ValueTracker for this case. llvm-svn: 247162	2015-09-09 17:03:29 +00:00

1 2 3 4 5 ...

121469 Commits