llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

Author	SHA1	Message	Date
Sanjoy Das	b20d278ebd	Don't IPO over functions that can be de-refined Summary: Fixes PR26774. If you're aware of the issue, feel free to skip the "Motivation" section and jump directly to "This patch". Motivation: I define "refinement" as discarding behaviors from a program that the optimizer has license to discard. So transforming: ``` void f(unsigned x) { unsigned t = 5 / x; (void)t; } ``` to ``` void f(unsigned x) { } ``` is refinement, since the behavior went from "if x == 0 then undefined else nothing" to "nothing" (the optimizer has license to discard undefined behavior). Refinement is a fundamental aspect of many mid-level optimizations done by LLVM. For instance, transforming `x == (x + 1)` to `false` also involves refinement since the expression's value went from "if x is `undef` then { `true` or `false` } else { `false` }" to "`false`" (by definition, the optimizer has license to fold `undef` to any non-`undef` value). Unfortunately, refinement implies that the optimizer cannot assume that the implementation of a function it can see has all of the behavior an unoptimized or a differently optimized version of the same function can have. This is a problem for functions with comdat linkage, where a function can be replaced by an unoptimized or a differently optimized version of the same source level function. For instance, FunctionAttrs cannot assume a comdat function is actually `readnone` even if it does not have any loads or stores in it; since there may have been loads and stores in the "original function" that were refined out in the currently visible variant, and at the link step the linker may in fact choose an implementation with a load or a store. As an example, consider a function that does two atomic loads from the same memory location, and writes to memory only if the two values are not equal. The optimizer is allowed to refine this function by first CSE'ing the two loads, and the folding the comparision to always report that the two values are equal. Such a refined variant will look like it is `readonly`. However, the unoptimized version of the function can still write to memory (since the two loads //can// result in different values), and selecting the unoptimized version at link time will retroactively invalidate transforms we may have done under the assumption that the function does not write to memory. Note: this is not just a problem with atomics or with linking differently optimized object files. See PR26774 for more realistic examples that involved neither. This patch: This change introduces a new set of linkage types, predicated as `GlobalValue::mayBeDerefined` that returns true if the linkage type allows a function to be replaced by a differently optimized variant at link time. It then changes a set of IPO passes to bail out if they see such a function. Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18634 llvm-svn: 265762	2016-04-08 00:48:30 +00:00
Nicolai Haehnle	69b2d0adeb	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Silviu Baranga	aab80ed89c	Revert r265535 until we know how we can fix the bots llvm-svn: 265541	2016-04-06 14:06:32 +00:00
Silviu Baranga	1fce9f5629	[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535	2016-04-06 13:18:26 +00:00
George Burgess IV	5a36438b8e	[CFLAA] Fix PR27213; incorrect tagging of args/globals Prior to this patch, CFLAA wouldn't tag arguments/globals properly if it didn't find any "interesting" edges on them. This means that, if all you do is store constants to a global or argument, we would never actually treat it as a global/argument. Test case: define void @foo(i32* %A, i32* %B) #0 { entry: store i32 0, i32* %A, align 4 store i32 0, i32* %B, align 4 ret void } CFLAA would say that %A can't alias %B, because neither pointer was used in an interesting way. This patch makes us note whether something is an argument, global, ... regardless of how interesting CFLAA thinks its uses are. (For the record, using a value in an interesting way means loading from it, using it in a GEP, ...) llvm-svn: 265474	2016-04-05 21:40:45 +00:00
Brendon Cahoon	ef0603b919	[DependenceAnalysis] Check if result of getConstantPart is null A seg-fault occurs due to a reference of a null pointer, which is the value returned by getConstantPart. This function returns null if the constant part is not found. The code that calls this function needs to check for the null return value. Differential Revision: http://reviews.llvm.org/D18718 llvm-svn: 265319	2016-04-04 18:13:18 +00:00
Benjamin Kramer	40fc84b1be	[TTI] Let the cost model estimate ctpop costs based on legality PPC has a vector popcount, this lets the vectorizer use the correct cost for it. Tweak X86 test to use an intrinsic that's actually scalarized (we have a somewhat efficient lowering for vector popcount using SSE, the cost model finds that now). llvm-svn: 265005	2016-03-31 10:42:40 +00:00
Matt Arsenault	753d425ead	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376	2016-03-25 01:16:40 +00:00
Matt Arsenault	bfd4cf42ec	AMDGPU: Partially implement getArithmeticInstrCost for FP ops llvm-svn: 264374	2016-03-25 01:00:32 +00:00
Matt Arsenault	1c86c8a2a7	TTI: Report 0 cost for free addrspacecasts llvm-svn: 264369	2016-03-25 00:26:29 +00:00
Matt Arsenault	ab1f6bea74	TTI: Use 0 for cost of fabs if free Ideally this would also happen for fneg, but that isn't a distinct operation in the IR. llvm-svn: 264368	2016-03-25 00:26:22 +00:00
Matt Arsenault	ffa532b58f	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. llvm-svn: 264364	2016-03-25 00:14:11 +00:00
Adam Nemet	e20e31875b	[LAA] Support memchecks involving loop-invariant addresses We used to only allow SCEVAddRecExpr for pointer expressions in order to be able to compute the bounds. However this is also trivially possible for loop-invariant addresses (scUnknown) since then the bounds are the address itself. Interestingly, we used allow this for the special case when the loop-invariant address happens to also be an SCEVAddRecExpr (in an outer loop). There are a couple more loops that are vectorized in SPEC after this. My guess is that the main reason we don't see more because for example a loop-invariant load is vectorized into a splat vector with several vector-inserts. This is likely to make the vectorization unprofitable. I.e. we don't notice that a later LICM will move all of this out of the loop so the cost estimate should really be 0. llvm-svn: 264243	2016-03-24 04:28:47 +00:00
Matthias Braun	87b1f010b0	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088	2016-03-22 20:24:34 +00:00
Nicolai Haehnle	a279ac7cfa	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	b24081a638	AMDGPU: mark atomic instructions as sources of divergence Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719	2016-03-17 16:21:59 +00:00
Nicolai Haehnle	176749e5b2	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440	2016-03-14 15:37:18 +00:00
Chandler Carruth	ea13b8a791	[PM/AA] Teach the AAManager how to handle module analyses in addition to function analyses, and use it to wire up globals-aa to the new pass manager. llvm-svn: 263211	2016-03-11 09:15:11 +00:00
Artur Pilipenko	7bad97e2f6	Support arbitrary addrspace pointers in masked load/store intrinsics This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158	2016-03-10 20:39:22 +00:00
Chandler Carruth	161eaed043	[CG] Add a new pass manager printer pass for the old call graph and actually finish wiring up the old call graph. There were bugs in the old call graph that hadn't been caught because it wasn't being tested. It wasn't being tested because it wasn't in the pipeline system and we didn't have a printing pass to run in tests. This fixes all of that. As for why I'm still keeping the old call graph alive its so that I can port GlobalsAA to the new pass manager with out forking it to work with the lazy call graph. That's clearly the right eventual design, but it seems pragmatic to defer that until its necessary. The old call graph works just fine for GlobalsAA. llvm-svn: 263104	2016-03-10 11:24:11 +00:00
Chandler Carruth	233c4dc5ed	[LCG] Spell the printing pass pipeline name for the lazy call graph 'lcg' instead of just 'cg'. This makes it consistent with the analysis name of 'lcg'. No functionality changed. llvm-svn: 263103	2016-03-10 11:24:06 +00:00
Matthias Braun	3757169d38	InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2 As part of r251146 InstCombine was extended to call computeKnownBits on every value in the function to determine whether it happens to be constant. This increases typical compiletime by 1-3% (5% in irgen+opt time) in my measurements. On the other hand this case did not trigger once in the whole llvm-testsuite. This patch introduces the notion of ExpensiveCombines which are only enabled for OptLevel > 2. I removed the check in InstructionSimplify as that is called from various places where the OptLevel is not known but given the rarity of the situation I think a check in InstCombine is enough. Differential Revision: http://reviews.llvm.org/D16835 llvm-svn: 263047	2016-03-09 18:47:11 +00:00
Sanjoy Das	0e2a270d97	Remove trailing newline from test case; NFC llvm-svn: 262980	2016-03-09 01:51:44 +00:00
Sanjoy Das	337a8c9c46	[SCEV] Slightly generalize getRangeViaFactoring Building on the previous change, this generalizes ScalarEvolution::getRangeViaFactoring to work with {Ext(C?A:B)+k0,+,Ext(C?A:B)+k1} where Ext can be a zero extend, sign extend or truncate operation, and k0 and k1 are constants. llvm-svn: 262979	2016-03-09 01:51:02 +00:00
Sanjoy Das	29f43f7c7a	[SCEV] Slightly generalize getRangeViaFactoring This change generalizes ScalarEvolution::getRangeViaFactoring to work with {Ext(C?A:B),+,Ext(C?A:B)} where Ext can be a zero extend, sign extend or truncate operation. llvm-svn: 262978	2016-03-09 01:50:57 +00:00
Adam Nemet	1e851654db	[ScopedNoAliasAA] Make test basic.ll less confusing Summary: This testcase had me confused. It made me believe that you can use alias scopes and alias scopes list interchangeably with alias.scope and noalias. Both langref and the other testcase use scope lists so I went looking. Turns out using scope directly only happens to work by chance. When ScopedNoAliasAAResult::mayAliasInScopes traverses this as a scope list: !1 = !{!1, !0, !"some scope"} , the first entry is in fact a scope but only because the scope is happened to be defined self-referentially to make it unique globally. The remaining elements in the tuple (!0, !"some scope") are considered as scopes but AliasScopeNode::getDomain will just bail on those without any error. This change avoids this ambiguity in the test but I've also been wondering if we should issue some sort of a diagnostics. Reviewers: dexonsmith, hfinkel Subscribers: mssimpso, llvm-commits Differential Revision: http://reviews.llvm.org/D16670 llvm-svn: 262841	2016-03-07 17:49:10 +00:00
Philip Reames	f86417771d	[ValueTracking] Remove dead code from an old experiment This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment. Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree. For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach. llvm-svn: 262646	2016-03-03 19:44:06 +00:00
Sanjoy Das	c51e182cd8	[SCEV] Prove no-overflow via constant ranges Exploit ScalarEvolution::getRange's newly acquired smartness (since r262438) by using that to infer nsw and nuw when possible. llvm-svn: 262639	2016-03-03 18:31:29 +00:00
Sanjoy Das	7b29c5b2d5	[SCEV] Be less eager about demoting zexts to sexts After r262438 we can have provably positive NSW SCEV expressions whose zero extensions cannot be simplified (since r262438 makes SCEV better at computing constant ranges). This means demoting sexts of positive add recurrences eagerly can result in an unsimplified zero extension where we could have had a simplified sign extension. This change fixes the issue by teaching SCEV to demote sext of a positive SCEV expression to a zext only if the sext could not be simplified. llvm-svn: 262638	2016-03-03 18:31:23 +00:00
Sanjoy Das	88f19f877b	[SCEV] Make getRange smarter around selects Have ScalarEvolution::getRange re-consider cases like "{C?A:B,+,C?P:Q}" by factoring out "C" and computing RangeOf{A,+,P} union RangeOf({B,+,Q}) instead. The latter can be easier to compute precisely in cases like "{C?0:N,+,C?1:-1}" N is the backedge taken count of the loop; since in such cases the latter form simplifies to [0,N+1) union [0,N+1). llvm-svn: 262438	2016-03-02 00:57:54 +00:00
Hongbin Zheng	09e77e87cf	Another fix the testcase introduced by r261903 - Add the missing matches llvm-svn: 261971	2016-02-26 03:41:47 +00:00
Hongbin Zheng	218c423299	Use regex in testcase, do not fail windows bots llvm-svn: 261922	2016-02-25 19:16:40 +00:00
Hongbin Zheng	f0c9856bf9	Introduce RegionInfoAnalysis, which compute Region Tree in the new PassManager. NFC Differential Revision: http://reviews.llvm.org/D17571 llvm-svn: 261904	2016-02-25 17:54:25 +00:00
Hongbin Zheng	e391b241c7	Introduce DominanceFrontierAnalysis to the new PassManager to compute DominanceFrontier. NFC Differential Revision: http://reviews.llvm.org/D17570 llvm-svn: 261903	2016-02-25 17:54:15 +00:00
Hongbin Zheng	3e75d3d4e5	Introduce analysis pass to compute PostDominators in the new pass manager. NFC Differential Revision: http://reviews.llvm.org/D17537 llvm-svn: 261902	2016-02-25 17:54:07 +00:00
Hongbin Zheng	3d17e7bc47	Revert "Introduce analysis pass to compute PostDominators in the new pass manager. NFC" This reverts commit a3e5cc6a51ab5ad88d1760c63284294a4e34c018. llvm-svn: 261891	2016-02-25 16:45:53 +00:00
Hongbin Zheng	b446bc1e4f	Revert "Introduce DominanceFrontierAnalysis to the new PassManager to compute DominanceFrontier. NFC" This reverts commit 109c38b2226a87b0be73fa7a0a8c1a81df20aeb2. llvm-svn: 261890	2016-02-25 16:45:46 +00:00
Hongbin Zheng	4f423eca95	Revert "Introduce RegionInfoAnalysis, which compute Region Tree in the new PassManager. NFC" This reverts commit 8228b4d374edeb4cc0c5fddf6e1ab876918ee126. llvm-svn: 261889	2016-02-25 16:45:37 +00:00
Hongbin Zheng	4a56a70d94	Introduce RegionInfoAnalysis, which compute Region Tree in the new PassManager. NFC Differential Revision: http://reviews.llvm.org/D17571 llvm-svn: 261884	2016-02-25 16:33:26 +00:00
Hongbin Zheng	10a5a19fa0	Introduce DominanceFrontierAnalysis to the new PassManager to compute DominanceFrontier. NFC Differential Revision: http://reviews.llvm.org/D17570 llvm-svn: 261883	2016-02-25 16:33:15 +00:00
Hongbin Zheng	26e597b0c3	Introduce analysis pass to compute PostDominators in the new pass manager. NFC Differential Revision: http://reviews.llvm.org/D17537 llvm-svn: 261882	2016-02-25 16:33:06 +00:00
Chandler Carruth	c495d89da2	[PM/AA] Wire up TBAA to the new pass manager's registry and test it. llvm-svn: 261411	2016-02-20 04:04:52 +00:00
Chandler Carruth	0b3be566d6	[PM/AA] Wire up the scoped-no-alias AA to the new pass manager's registry and test it. llvm-svn: 261410	2016-02-20 04:03:06 +00:00
Chandler Carruth	6d49893aaa	[PM/AA] Wire up SCEVAA to the new pass manager's registry and test it. llvm-svn: 261409	2016-02-20 04:01:45 +00:00
Chandler Carruth	a43d617870	[PM/AA] Wire up CFLAA to the new pass manager fully, and port one of its tests over to exercise this code. This uncovered a few missing bits here and there in the analysis, but nothing interesting. llvm-svn: 261404	2016-02-20 03:52:02 +00:00
Chandler Carruth	6d0392224e	[PM/AA] Port alias analysis evaluator to the new pass manager, and use it to actually test the new pass manager AA wiring. This patch was extracted from the (somewhat too large) D12357 and rebosed on top of the slightly different design of the new pass manager AA wiring that I just landed. With this we can start testing the AA in a thorough way with the new pass manager. Some minor cleanups to the code in the pass was necessitated here, but otherwise it is a very minimal change. Differential Revision: http://reviews.llvm.org/D17372 llvm-svn: 261403	2016-02-20 03:46:03 +00:00
Matthew Simpson	4ea67eb418	[AArch64] Reduce vector insert/extract cost for Kryo Differential Revision: http://reviews.llvm.org/D17379 llvm-svn: 261237	2016-02-18 18:35:45 +00:00
Chandler Carruth	4ec346556c	[LCG] Construct an actual call graph with call-edge SCCs nested inside reference-edge SCCs. This essentially builds a more normal call graph as a subgraph of the "reference graph" that was the old model. This allows both to exist and the different use cases to use the aspect which addresses their needs. Specifically, the pass manager and other ordering constrained logic can use the reference graph to achieve conservative order of visit, while analyses reasoning about attributes and other properties derived from reachability can reason about the direct call graph. Note that this isn't necessarily complete: it doesn't model edges to declarations or indirect calls. Those can be found by scanning the instructions of the function if desirable, and in fact every user currently does this in order to handle things like calls to instrinsics. If useful, we could consider caching this information in the call graph to save the instruction scans, but currently that doesn't seem to be important. An important realization for why the representation chosen here works is that the call graph is a formal subset of the reference graph and thus both can live within the same data structure. All SCCs of the call graph are necessarily contained within an SCC of the reference graph, etc. The design is to build 'RefSCC's to model SCCs of the reference graph, and then within them more literal SCCs for the call graph. The formation of actual call edge SCCs is not done lazily, unlike reference edge 'RefSCC's. Instead, once a reference SCC is formed, it directly builds the call SCCs within it and stores them in a post-order sequence. This is used to provide a consistent platform for mutation and update of the graph. The post-order also allows for very efficient updates in common cases by bounding the number of nodes (and thus edges) considered. There is considerable common code that I'm still looking for the best way to factor out between the various DFS implementations here. So far, my attempts have made the code harder to read and understand despite reducing the duplication, which seems a poor tradeoff. I've not given up on figuring out the right way to do this, but I wanted to wait until I at least had the system working and tested to continue attempting to factor it differently. This also requires introducing several new algorithms in order to handle all of the incremental update scenarios for the more complex structure involving two edge colorings. I've tried to comment the algorithms sufficiently to make it clear how this is expected to work, but they may still need more extensive documentation. I know that there are some changes which are not strictly necessarily coupled here. The process of developing this started out with a very focused set of changes for the new structure of the graph and algorithms, but subsequent changes to bring the APIs and code into consistent and understandable patterns also ended up touching on other aspects. There was no good way to separate these out without causing massive merge conflicts. Ultimately, to a large degree this is a rewrite of most of the core algorithms in the LCG class and so I don't think it really matters much. Many thanks to the careful review by Sanjoy Das! Differential Revision: http://reviews.llvm.org/D16802 llvm-svn: 261040	2016-02-17 00:18:16 +00:00
Chandler Carruth	710380f52f	[attrs] Move the norecurse deduction to operate on the node set rather than the SCC object, and have it scan the instruction stream directly rather than relying on call records. This makes the behavior of this routine consistent between libc routines and LLVM intrinsics for libc routines. We can go and start teaching it about those being norecurse, but we should behave the same for the intrinsic and the libc routine rather than differently. I chatted with James Molloy and the inconsistency doesn't seem intentional and likely is due to intrinsic calls not being modelled in the call graph analyses. This also fixes a bug where we would deduce norecurse on optnone functions, when generally we try to handle optnone functions as-if they were replaceable and thus unanalyzable. llvm-svn: 260813	2016-02-13 08:47:51 +00:00
Matt Arsenault	612f0d286e	AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis llvm-svn: 260491	2016-02-11 05:32:51 +00:00

1 2 3 4 5 ...

917 Commits