llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00

Author	SHA1	Message	Date
Hongbin Zheng	4da3f9fa46	BBVectorize: Add the const modifier to the VectorizeConfig because we won't modify it. llvm-svn: 154098	2012-04-05 16:07:49 +00:00
Hongbin Zheng	7a4e40f87f	Introduce the VectorizeConfig class, with which we can control the behavior of the BBVectorizePass without using command line option. As pointed out by Hal, we can ask the TargetLoweringInfo for the architecture specific VectorizeConfig to perform vectorizing with architecture specific information. llvm-svn: 154096	2012-04-05 15:46:55 +00:00
Hongbin Zheng	8d380b332d	Add the function "vectorizeBasicBlock" which allow users vectorize a BasicBlock in other passes, e.g. we can call vectorizeBasicBlock in the loop unroll pass right after the loop is unrolled. llvm-svn: 154089	2012-04-05 08:05:16 +00:00
Jakob Stoklund Olesen	e1ae4f161c	Pass the right sign to TLI->isLegalICmpImmediate. LSR can fold three addressing modes into its ICmpZero node: ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset ICmpZero -1ScaleReg + Offset => ICmp ScaleReg, Offset ICmpZero BaseReg + -1ScaleReg => ICmp BaseReg, ScaleReg The first two cases are only used if TLI->isLegalICmpImmediate() likes the offset. Make sure the right Offset sign is passed to this method in the second case. The ARM version is not symmetric. <rdar://problem/11184260> llvm-svn: 154079	2012-04-05 03:10:56 +00:00
Rafael Espindola	88a1aeb123	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Hongbin Zheng	c50b4781ab	LoopUnrollPass: Use variable "Threshold" instead of "CurrentThreshold" when reducing unroll count, otherwise the reduced unroll count is not taking the "OptimizeForSize" attribute into account. llvm-svn: 154007	2012-04-04 11:44:08 +00:00
Bill Wendling	1db4186413	Add an option to turn off the expensive GVN load PRE part of GVN. llvm-svn: 153902	2012-04-02 22:16:50 +00:00
Stepan Dyatkovskiy	0ddc03ebad	Fast fix for PR12343: http://llvm.org/bugs/show_bug.cgi?id=12343 We have not trivial way for splitting edges that are goes from indirect branch. We can do it with some tricks, but it should be additionally discussed. And it is still dangerous due to difficulty of indirect branches controlling. Fix forbids this case for unswitching. llvm-svn: 153879	2012-04-02 17:16:45 +00:00
Chandler Carruth	8b30ff4d0f	Belatedly address some code review from Chris. As a side note, I really dislike array_pod_sort... Do we really still care about any STL implementations that get this so wrong? Does libc++? llvm-svn: 153834	2012-04-01 10:41:24 +00:00
Chandler Carruth	1a2234d527	Fix a pretty scary bug I introduced into the always inliner with a single missing character. Somehow, this had gone untested. I've added tests for returns-twice logic specifically with the always-inliner that would have caught this, and fixed the bug. Thanks to Matt for the careful review and spotting this!!! =D llvm-svn: 153832	2012-04-01 10:21:05 +00:00
Chandler Carruth	ad48424b6b	Give the always-inliner its own custom filter. It shouldn't have to pay the very high overhead of the complex inline cost analysis when all it wants to do is detect three patterns which must not be inlined. Comment the code, clean it up, and leave some hints about possible performance improvements if this ever shows up on a profile. Moving this off of the (now more expensive) inline cost analysis is particularly important because we have to run this inliner even at -O0. llvm-svn: 153814	2012-03-31 13:17:18 +00:00
Chandler Carruth	61d0735f00	Remove a bunch of empty, dead, and no-op methods from all of these interfaces. These methods were used in the old inline cost system where there was a persistent cache that had to be updated, invalidated, and cleared. We're now doing more direct computations that don't require this intricate dance. Even if we resume some level of caching, it would almost certainly have a simpler and more narrow interface than this. llvm-svn: 153813	2012-03-31 12:48:08 +00:00
Chandler Carruth	8cacff57bf	Initial commit for the rewrite of the inline cost analysis to operate on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the front of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was extremely flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code path, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is not yet mirrored is to allow inlining of functions with a dynamic alloca if the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. llvm-svn: 153812	2012-03-31 12:42:41 +00:00
Benjamin Kramer	b7ee22e3f3	Internalize: Remove reference of @llvm.noinline, it was replaced with the noinline attribute a long time ago. llvm-svn: 153806	2012-03-31 11:03:47 +00:00
Hal Finkel	45ad90afac	Correctly vectorize powi. The powi intrinsic requires special handling because it always takes a single integer power regardless of the result type. As a result, we can vectorize only if the powers are equal. Fixes PR12364. llvm-svn: 153797	2012-03-31 03:38:40 +00:00
Jakob Stoklund Olesen	9571cb56c5	Don't PRE compares. CodeGenPrepare sinks compare instructions down to their uses to prevent live flags and predicate registers across basic blocks. PRE of a compare instruction prevents that, forcing the i1 compare result into a general purpose register. That is usually more expensive than the redundant compare PRE was trying to eliminate in the first place. llvm-svn: 153657	2012-03-29 17:22:39 +00:00
Benjamin Kramer	b6baea7014	GlobalOpt: If we have an inbounds GEP from a ConstantAggregateZero global that we just determined to be constant, replace all loads from it with a zero value. llvm-svn: 153576	2012-03-28 14:50:09 +00:00
Chandler Carruth	e507ddfe74	Switch to WeakVHs in the value mapper, and aggressively prune dead basic blocks in the function cloner. This removes the last case of trivially dead code that I've been seeing in the wild getting inlined, analyzed, re-inlined, optimized, only to be deleted. Nukes a FIXME from the cleanup tests. llvm-svn: 153572	2012-03-28 08:38:27 +00:00
Chad Rosier	23505e5bb6	Fix 80-column violation. llvm-svn: 153556	2012-03-28 00:35:33 +00:00
Chandler Carruth	cd6d2689a0	Make a seemingly tiny change to the inliner and fix the generated code size bloat. Unfortunately, I expect this to disable the majority of the benefit from r152737. I'm hopeful at least that it will fix PR12345. To explain this requires... quite a bit of backstory I'm afraid. TL;DR: The change in r152737 actually did The Wrong Thing for linkonce-odr functions. This change makes it do the right thing. The benefits we saw were simple luck, not any actual strategy. Benchmark numbers after a mini-blog-post so that I've written down my thoughts on why all of this works and doesn't work... To understand what's going on here, you have to understand how the "bottom-up" inliner actually works. There are two fundamental modes to the inliner: 1) Standard fixed-cost bottom-up inlining. This is the mode we usually think about. It walks from the bottom of the CFG up to the top, looking at callsites, taking information about the callsite and the called function and computing th expected cost of inlining into that callsite. If the cost is under a fixed threshold, it inlines. It's a touch more complicated than that due to all the bonuses, weights, etc. Inlining the last callsite to an internal function gets higher weighth, etc. But essentially, this is the mode of operation. 2) Deferred bottom-up inlining (a term I just made up). This is the interesting mode for this patch an r152737. Initially, this works just like mode #1, but once we have the cost of inlining into the callsite, we don't just compare it with a fixed threshold. First, we check something else. Let's give some names to the entities at this point, or we'll end up hopelessly confused. We're considering inlining a function 'A' into its callsite within a function 'B'. We want to check whether 'B' has any callers, and whether it might be inlined into those callers. If so, we also check whether inlining 'A' into 'B' would block any of the opportunities for inlining 'B' into its callers. We take the sum of the costs of inlining 'B' into its callers where that inlining would be blocked by inlining 'A' into 'B', and if that cost is less than the cost of inlining 'A' into 'B', then we skip inlining 'A' into 'B'. Now, in order for #2 to make sense, we have to have some confidence that we will actually have the opportunity to inline 'B' into its callers when cheaper, and that we'll be able to revisit the decision and inline 'A' into 'B' if that ever becomes the correct tradeoff. This often isn't true for external functions -- we can see very few of their callers, and we won't be able to re-consider inlining 'A' into 'B' if 'B' is external when we finally see more callers of 'B'. There are two cases where we believe this to be true for C/C++ code: functions local to a translation unit, and functions with an inline definition in every translation unit which uses them. These are represented as internal linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for linkonce-odr in r152737. Unfortunately, when I did that, I also introduced a subtle bug. There was an implicit assumption that the last caller of the function within the TU was the last caller of the function in the program. We want to bonus the last caller of the function in the program by a huge amount for inlining because inlining that callsite has very little cost. Unfortunately, the last caller in the TU of a linkonce-odr function is not the last caller in the program, and so we don't want to apply this bonus. If we do, we can apply it to one callsite per-TU. Because of the way deferred inlining works, when it sees this bonus applied to one callsite in the TU for 'B', it decides that inlining 'B' is of the utmost importance just so we can get that final bonus. It then proceeds to essentially force deferred inlining regardless of the actual cost tradeoff. The result? PR12345: code bloat, code bloat, code bloat. Another result is getting damn lucky on a few benchmarks, and the over-inlining exposing critically important optimizations. I would very much like a list of benchmarks that regress after this change goes in, with bitcode before and after. This will help me greatly understand what opportunities the current cost analysis is missing. Initial benchmark numbers look very good. WebKit files that exhibited the worst of PR12345 went from growing to shrinking compared to Clang with r152737 reverted. - Bootstrapped Clang is 3% smaller with this change. - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4% faster with this change. Please let me know about any other performance impact you see. Thanks to Nico for reporting and urging me to actually fix, Richard Smith, Duncan Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues today. llvm-svn: 153506	2012-03-27 10:48:28 +00:00
Nadav Rotem	bcd74e695a	153465 was incorrect. In this code we wanted to check that the pointer operand is of pointer type (and not vector type). llvm-svn: 153468	2012-03-26 21:00:53 +00:00
Nadav Rotem	165c8a3432	PR12357: The pointer was used before it was checked. llvm-svn: 153465	2012-03-26 20:39:18 +00:00
Andrew Trick	a7e2266fa7	LSR ivchain bug fix: corner case with ConstantExpr. Fixes PR11950. llvm-svn: 153463	2012-03-26 20:28:37 +00:00
Andrew Trick	048fee970f	comment typo llvm-svn: 153462	2012-03-26 20:28:35 +00:00
Chris Lattner	5a5f3badd6	eliminate an unneeded branch, part of PR12357 llvm-svn: 153458	2012-03-26 19:13:57 +00:00
Eric Christopher	b13384d5f5	Tidy. llvm-svn: 153456	2012-03-26 19:09:40 +00:00
Eric Christopher	2778e29567	Tidy. llvm-svn: 153455	2012-03-26 19:09:38 +00:00
Andrew Trick	de4046d7dd	LSR cleanup: potential bug caught by PVS-Studio. Thanks Andrey. llvm-svn: 153451	2012-03-26 18:03:16 +00:00
Kostya Serebryany	605c594f4f	[tsan] treat vtable pointer updates in a special way (requires tbaa); fix a bug (forgot to return true after instrumenting); make sure the tsan tests are run llvm-svn: 153448	2012-03-26 17:35:03 +00:00
Craig Topper	76f7896f49	Prune some includes and forward declarations. llvm-svn: 153429	2012-03-26 06:58:25 +00:00
Chandler Carruth	fc1ee5b5d6	Teach the function cloner (and thus the inliner) to simplify PHINodes aggressively. There are lots of dire warnings about this being expensive that seem to predate switching to the TrackingVH-based value remapper that is automatically updated on RAUW. This makes it easy to not just prune single-entry PHIs, but to fully simplify PHIs, and to recursively simplify the newly inlined code to propagate PHINode simplifications. This introduces a bit of a thorny problem though. We may end up simplifying a branch condition to a constant when we fold PHINodes, and we would like to nuke any dead blocks resulting from this so that time isn't wasted continually analyzing them, but this isn't easy. Deleting basic blocks after they are fully cloned and mapped into the new function currently requires manually updating the value map. The last piece of the simplification-during-inlining puzzle will require either switching to WeakVH mappings or some other piece of refactoring. I've left a FIXME in the testcase about this. llvm-svn: 153410	2012-03-25 10:34:54 +00:00
Chandler Carruth	2a69d3eac5	Move the instruction simplification of callsite arguments in the inliner to instead rely on much more generic and powerful instruction simplification in the function cloner (and thus inliner). This teaches the pruning function cloner to use instsimplify rather than just the constant folder to fold values during cloning. This can simplify a large number of things that constant folding alone cannot begin to touch. For example, it will realize that 'or' and 'and' instructions with certain constant operands actually become constants regardless of what their other operand is. It also can thread back through the caller to perform simplifications that are only possible by looking up a few levels. In particular, GEPs and pointer testing tend to fold much more heavily with this change. This should (in some cases) have a positive impact on compile times with optimizations on because the inliner itself will simply avoid cloning a great deal of code. It already attempted to prune proven-dead code, but now it will be use the stronger simplifications to prove more code dead. llvm-svn: 153403	2012-03-25 04:03:40 +00:00
Chandler Carruth	6d9fc81b34	Add an asserting ValueHandle to the block simplification code which will fire if anything ever invalidates the assumption of a terminator instruction being unchanged throughout the routine. I've convinced myself that the current definition of simplification precludes such a transformation, so I think getting some asserts coverage that we don't violate this agreement is sufficient to make this code safe for the foreseeable future. Comments to the contrary or other suggestions are of course welcome. =] The bots are now happy with this code though, so it appears the bug here has indeed been fixed. llvm-svn: 153401	2012-03-25 03:29:25 +00:00
Chandler Carruth	2633e643e5	Don't form a WeakVH around the sentinel node in the instructions BB list. This is a bad idea. ;] I'm hopeful this is the bug that's showing up with the MSVC bots, but we'll see. It is definitely unnecessary. InstSimplify won't do anything to a terminator instruction, we don't need to even include it in the iteration range. We can also skip the now dead terminator check, although I've made it an assert to help document that this is an important invariant. I'm still a bit queasy about this because there is an implicit assumption that the terminator instruction cannot be RAUW'ed by the simplification code. While that appears to be true at the moment, I see no guarantee that would ensure it remains true in the future. I'm looking at the cleanest way to solve that... llvm-svn: 153399	2012-03-24 23:03:27 +00:00
Chandler Carruth	58c542736c	Refactor the interface to recursively simplifying instructions to be tad bit simpler by handling a common case explicitly. Also, refactor the implementation to use a worklist based walk of the recursive users, rather than trying to use value handles to detect and recover from RAUWs during the recursive descent. This fixes a very subtle bug in the previous implementation where degenerate control flow structures could cause mutually recursive instructions (PHI nodes) to collapse in just such a way that From became equal to To after some amount of recursion. At that point, we hit the inf-loop that the assert at the top attempted to guard against. This problem is defined away when not using value handles in this manner. There are lots of comments claiming that the WeakVH will protect against just this sort of error, but they're not accurate about the actual implementation of WeakVHs, which do still track RAUWs. I don't have any test case for the bug this fixes because it requires running the recursive simplification on unreachable phi nodes. I've no way to either run this or easily write an input that triggers it. It was found when using instruction simplification inside the inliner when running over the nightly test-suite. llvm-svn: 153393	2012-03-24 21:11:24 +00:00
Francois Pichet	6a1f32c9cf	Fix the MSVC build. llvm-svn: 153366	2012-03-24 01:36:37 +00:00
Andrew Trick	0ff69472e6	More IndVarSimplify cleanup. llvm-svn: 153362	2012-03-24 00:51:17 +00:00
Kostya Serebryany	5fa635b5bf	add EP_OptimizerLast extension point llvm-svn: 153353	2012-03-23 23:22:59 +00:00
Dan Gohman	ef28237798	Don't convert objc_retainAutoreleasedReturnValue to objc_retain if it is retaining the return value of an invoke that it immediately follows. llvm-svn: 153344	2012-03-23 18:09:00 +00:00
Dan Gohman	0c02608af1	It's not possible to insert code immediately after an invoke in the same basic block, and it's not safe to insert code in the successor blocks if the edges are critical edges. Splitting those edges is possible, but undesirable, especially on the unwind side. Instead, make the bottom-up code motion to consider invokes to be part of their successor blocks, rather than part of their parent blocks, so that it doesn't push code past them and onto the edges. This fixes PR12307. llvm-svn: 153343	2012-03-23 17:47:54 +00:00
Duncan Sands	fcc4791c7b	When propagating equalities, eg replacing A with B in every basic block dominated by Root, check that B is available throughout the scope. This is obviously true (famous last words?) given the current logic, but the check may be helpful if more complicated reasoning is added one day. llvm-svn: 153323	2012-03-23 08:45:52 +00:00
Duncan Sands	1ac68ded1a	Indentation. llvm-svn: 153322	2012-03-23 08:29:04 +00:00
Andrew Trick	690dfe6b2a	Remove -enable-lsr-retry in time for 3.1. llvm-svn: 153287	2012-03-22 22:42:51 +00:00
Andrew Trick	121e3e153c	Remove -enable-lsr-nested in time for 3.1. Tests cases have been removed but attached to open PR12330. llvm-svn: 153286	2012-03-22 22:42:45 +00:00
Dan Gohman	df222b2e87	Refactor the code for visiting instructions out into helper functions. llvm-svn: 153267	2012-03-22 18:24:56 +00:00
Andrew Trick	74bcc63b1b	Remove unused simplifyIVUsers llvm-svn: 153262	2012-03-22 17:47:30 +00:00
Andrew Trick	30b133e1ae	Remove -enable-iv-rewrite, which has been unsupported since 3.0. llvm-svn: 153260	2012-03-22 17:10:11 +00:00
Chris Lattner	9bb176f16d	don't use "signed", just something I noticed in patches flying by. llvm-svn: 153237	2012-03-22 03:46:58 +00:00
Kostya Serebryany	19681e071c	[asan] fix one more bug related to long double llvm-svn: 153189	2012-03-21 15:28:50 +00:00
Eric Christopher	b43efeeb1f	Zap some dead code pointed out by Chandler. llvm-svn: 153150	2012-03-20 23:28:58 +00:00

1 2 3 4 5 ...

8770 Commits