llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00

Author	SHA1	Message	Date
Chandler Carruth	d24c86e0dd	[SROA] Teach SROA how to handle pointers from address spaces other than the default. Based on the patch by Matt Arsenault, D1764! I switched one place to use the more direct pointer type to compute the desired address space, and I reworked the memcpy rewriting section to reflect significant refactorings that this patch helped inspire. Thanks to several of the folks who helped review and improve the patch as well. llvm-svn: 202247	2014-02-26 08:25:02 +00:00
Chandler Carruth	33730334ba	[SROA] Split the alignment computation complete for the memcpy rewriting to work independently for the slice side and the other side. This allows us to only compute the minimum of the two when we actually rewrite to a memcpy that needs to take the minimum, and preserve higher alignment for one side or the other when rewriting to loads and stores. This fix was inspired by seeing the result of some refactoring that makes addrspace handling better. llvm-svn: 202242	2014-02-26 07:29:54 +00:00
Chandler Carruth	724a260ac5	[SROA] The original refactoring inspired by the addrspace patch in D1764, which in turn set off the other refactorings to make 'getSliceAlign()' a sensible thing. There are two possible inputs to the required alignment of a memory transfer intrinsic: the alignment constraints of the source and the destination. If we are only introducing a (potentially new) offset onto one side of the transfer, we don't need to consider the alignment constraints of the other side. Use this to simplify the logic feeding into alignment computation for unsplit transfers. Also, hoist the clamp of the magical zero alignment for these intrinsics to the more customary one alignment early. This lets several other conditions melt away. No functionality changed. There is a further improvement this exposes which will change functionality, but that's arriving in a separate patch. llvm-svn: 202232	2014-02-26 05:33:36 +00:00
Chandler Carruth	b93e3941c1	[SROA] Yet another slight refactoring that simplifies an API in the rewriting logic: don't pass custom offsets for the adjusted pointer to the new alloca. We always passed NewBeginOffset here. Sometimes we spelled it BeginOffset, but only when they were in fact equal. Whats worse, the API is set up so that you can't reasonably call it with anything else -- it assumes that you're passing it an offset relative to the original alloca that happens to fall within the new one. That's the whole point of NewBeginOffset, it's the clamped beginning offset. No functionality changed. llvm-svn: 202231	2014-02-26 05:12:43 +00:00
Chandler Carruth	836ce7bd11	[SROA] Simplify the computing of alignment: we only ever need the alignment of the slice being rewritten, not any arbitrary offset. Every caller is really just trying to compute the alignment for the whole slice, never for some arbitrary alignment. They are also just passing a type when they have one to see if we can skip an explicit alignment in the IR by using the type's alignment. This makes for a much simpler interface. Another refactoring inspired by the addrspace patch for SROA, although only loosely related. llvm-svn: 202230	2014-02-26 05:02:19 +00:00
Chandler Carruth	c8cbd02c0c	[SROA] Use NewOffsetBegin in the unsplit case for memset merely for consistency with memcpy rewriting, and fix a latent bug in the alignment management for memset. The alignment issue is that getAdjustedAllocaPtr is computing the relative offset into the new alloca, but the alignment isn't being set to the relative offset, it was using the the absolute offset which is into the old alloca. I don't think its possible to write a test case that actually reaches this code where the resulting alignment would be observably different, but the intent was clearly to use the relative offset within the new alloca. llvm-svn: 202229	2014-02-26 04:45:24 +00:00
Chandler Carruth	4eab6cfb07	[SROA] Use the members for New{Begin,End}Offset in the rewrite helpers rather than passing them as arguments. While I generally prefer actual arguments, in this case the readability loss is substantial. By using members we avoid repeatedly calculating the offsets, and once we're using members it is useful to ensure that those names always refer to the original-alloca-relative new offset for a rewritten slice. No functionality changed. Follow-up refactoring, all toward getting the address space patch merged. llvm-svn: 202228	2014-02-26 04:25:04 +00:00
Chandler Carruth	2894b88fb9	[SROA] Compute the New{Begin,End}Offset values once for each alloca slice being rewritten. We had the same code scattered across most of the visits. Instead, compute the new offsets and the slice size once when we start to visit a particular slice, and use the member variables from then on. This reduces quite a bit of code duplication. No functionality changed. Refactoring inspired to make it easier to apply the address space patch to SROA. llvm-svn: 202227	2014-02-26 04:20:00 +00:00
Chandler Carruth	62c5338f7a	[SROA] Fix PR18615 with some long overdue simplifications to the bounds checking in SROA. The primary change is to just rely on uge for checking that the offset is within the allocation size. This removes the explicit checks against isNegative which were terribly error prone (including the reversed logic that led to PR18615) and prevented us from supporting stack allocations larger than half the address space.... Ok, so maybe the latter isn't common but it's a silly restriction to have. Also, we used to try to support a PHI node which loaded from before the start of the allocation if any of the loaded bytes were within the allocation. This doesn't make any sense, we have never really supported loading or storing before the allocation starts. The simplified logic just doesn't care. We continue to allow loading past the end of the allocation in part to support cases where there is a PHI and some loads are larger than others and the larger ones reach past the end of the allocation. We could solve this a different and more conservative way, but I'm still somewhat paranoid about this. llvm-svn: 202224	2014-02-26 03:14:14 +00:00
Chandler Carruth	e79993509f	[reassociate] Switch two std::sort calls into std::stable_sort calls as their inputs come from std::stable_sort and they are not total orders. I'm not a huge fan of this, but the really bad std::stable_sort is right at the beginning of Reassociate. After we commit to stable-sort based consistent respect of source order, the downstream sorts shouldn't undo that unless they have a total order or they are used in an order-insensitive way. Neither appears to be true for these cases. I don't have particularly good test cases, but this jumped out by inspection when looking for output instability in this pass due to changes in the ordering of std::sort. llvm-svn: 202196	2014-02-25 21:54:50 +00:00
Chandler Carruth	5a7b0aba14	[SROA] Add an off-by-default strict inbounds check to SROA. I had SROA implemented this way a long time ago and due to the overwhelming bugs that surfaced, moved to a much more relaxed variant. Richard Smith would like to understand the magnitude of this problem and it seems fairly harmless to keep some flag-controlled logic to get the extremely strict behavior here. I'll remove it if it doesn't prove useful. llvm-svn: 202193	2014-02-25 21:24:45 +00:00
Rafael Espindola	32da4bdd4b	Make DataLayout a plain object, not a pass. Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM don't don't handle passes to also use DataLayout. llvm-svn: 202168	2014-02-25 17:30:31 +00:00
Rafael Espindola	ea1d1e568d	Factor out calls to AA.getDataLayout(). llvm-svn: 202157	2014-02-25 15:52:19 +00:00
Chandler Carruth	f4a944dda1	[SROA] Use the original load name with the SROA-prefixed IRB rather than just "load". This helps avoid pointless de-duping with order-sensitive numbers as we already have unique names from the original load. It also makes the resulting IR quite a bit easier to read. llvm-svn: 202140	2014-02-25 11:21:48 +00:00
Chandler Carruth	d54b53e176	[SROA] Thread the ability to add a pointer-specific name prefix through the pointer adjustment code. This is the primary code path that creates totally new instructions in SROA and being able to lump them based on the pointer value's name for which they were created causes significantly fewer name collisions and general noise in the debug output. This is particularly significant because it is making it much harder to track down instability in the output of SROA, as name de-duplication is a totally harmless form of instability that gets in the way of seeing real problems. The new fancy naming scheme tries to dig out the root "pre-SROA" name for pointer values and associate that all the way through the pointer formation instructions. Digging out the root is important to prevent the multiple iterative rounds of SROA from just layering too much cruft on top of cruft here. We already track the layers of SROAs iteration in the alloca name prefix. We don't need to duplicate it here. Should have no functionality change, and shouldn't have any really measurable impact on NDEBUG builds, as most of the complex logic is debug-only. llvm-svn: 202139	2014-02-25 11:19:56 +00:00
Chandler Carruth	ea27d3f4fc	[SROA] Rather than copying the logic for building a name prefix into the PHI-pointer builder, just copy the builder and clobber the obvious fields. llvm-svn: 202136	2014-02-25 11:12:04 +00:00
Chandler Carruth	4ced299134	[SROA] Simplify some of the logic to dig out the old pointer value by using OldPtr more heavily. Lots of this code was written before the rewriter had an OldPtr member setup ahead of time. There are already asserts in place that should ensure this doesn't change any functionality. llvm-svn: 202135	2014-02-25 11:08:02 +00:00
Chandler Carruth	f7d0635448	[SROA] Adjust to new clang-format style. llvm-svn: 202134	2014-02-25 11:07:58 +00:00
Chandler Carruth	11e572d7b0	[SROA] Fix a glaring bug in r202091: you have to actually write the break statement, not just think it to yourself.... No idea how this worked at all, much less survived most bots, my bootstrap, and some bot bootstraps! The Polly one didn't survive, and this was filed as PR18959. I don't have a reduced test case and honestly I'm not seeing the need. What we probably need here are better asserts / debug-build behavior in SmallPtrSet so that this madness doesn't make it so far. llvm-svn: 202129	2014-02-25 09:45:27 +00:00
Alexey Samsonov	92c41baf88	Silence GCC warning llvm-svn: 202119	2014-02-25 07:56:00 +00:00
Alp Toker	f3e1a22860	Fix typos llvm-svn: 202107	2014-02-25 04:21:15 +00:00
Chandler Carruth	2dab15dfbc	[SROA] Add a debugging tool which shuffles the slices sequence prior to sorting it. This helps uncover latent reliance on the original ordering which aren't guaranteed to be preserved by std::sort (but often are), and which are based on the use-def chain orderings which also aren't (technically) guaranteed. Only available in C++11 debug builds, and behind a flag to prevent noise at the moment, but this is generally useful so figured I'd put it in the tree rather than keeping it out-of-tree. llvm-svn: 202106	2014-02-25 03:59:29 +00:00
Chandler Carruth	e33cfcb4e8	[SROA] Use a more direct way of determining whether we are processing the destination operand or source operand of a memmove. It so happens that it was impossible for SROA to try to rewrite self-memmove where the operands are identical, because either such a think is volatile (and we don't rewrite) or it is non-volatile, and we don't even register it as a use of the alloca. However, making the 'IsDest' test rely on this subtle fact is... Very confusing for the reader. We should use the direct and readily available test of the Use* which gives us concrete information about which operand is being rewritten. No functionality changed, I hope! ;] llvm-svn: 202103	2014-02-25 03:50:14 +00:00
Chandler Carruth	2a5f3cfadc	[SROA] Fix another instability in SROA with respect to the slice ordering. The fundamental problem that we're hitting here is that the use-def chain ordering is itself not a stable thing to be relying on in the rewriting for SROA. Further, we use a non-stable sort over the slices to arrange them based on the section of the alloca they're operating on. With a debugging STL implementation (or different implementations in stage2 and stage3) this can cause stage2 != stage3. The specific aspect of this problem fixed in this commit deals with the rewriting and load-speculation around PHIs and Selects. This, like many other aspects of the use-rewriting in SROA, is really part of the "strong SSA-formation" that is doen by SROA where it works very hard to canonicalize loads and stores in just the right way to satisfy the needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the same alloca, we test that loads downstream of the select are speculatable around it twice. If only one of the operands to the select needs to be rewritten, then if we get lucky we rewrite that one first and the select is immediately speculatable. This can cause the order of operand visitation, and thus the order of slices to be rewritten, to change an alloca from promotable to non-promotable and vice versa. The fix is to defer all of the speculation until after the rewrite phase is done. Once we've rewritten everything, we can accurately test for whether speculation will work (once, instead of twice!) and the order ceases to matter. This also happens to simplify the other subtlety of speculation -- we need to not speculate anything unless the result of speculating will make the alloca fully promotable by mem2reg. I had a previous attempt at simplifying this, but it was still pretty horrible. There is actually already a really nice test case for this in basictest.ll, but on multiple STL implementations and inputs, we just got "lucky". Fortunately, the test case is very small and we can essentially build it in exactly the opposite way to get reasonable coverage in both directions even from normal STL implementations. llvm-svn: 202092	2014-02-25 00:07:09 +00:00
Rafael Espindola	6c834371d9	Make some DataLayout pointers const. No functionality change. Just reduces the noise of an upcoming patch. llvm-svn: 202087	2014-02-24 23:12:18 +00:00
Logan Chien	6cc287e13e	Include <cctype> for isdigit(). llvm-svn: 201930	2014-02-22 06:34:10 +00:00
Quentin Colombet	fc711dd23c	[CodeGenPrepare] Move CodeGenPrepare into lib/CodeGen. CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen. This is a layer violation which would introduce eventually a dependence on CodeGen in ScalarOpts. Move CodeGenPrepare into libLLVMCodeGen to avoid that. Follow-up of <rdar://problem/15519855> llvm-svn: 201912	2014-02-22 00:07:45 +00:00
Rafael Espindola	4803b77df5	Rename a few more DataLayout variables from TD to DL. llvm-svn: 201870	2014-02-21 18:34:28 +00:00
Rafael Espindola	1f7e9d4bed	Rename a few more DataLayout variables. llvm-svn: 201833	2014-02-21 01:53:35 +00:00
Rafael Espindola	83f8550fb2	Rename many DataLayout variables from TD to DL. I am really sorry for the noise, but the current state where some parts of the code use TD (from the old name: TargetData) and other parts use DL makes it hard to write a patch that changes where those variables come from and how they are passed along. llvm-svn: 201827	2014-02-21 00:06:31 +00:00
Tim Northover	1b102abe53	X86 CodeGenPrep: sink shufflevectors before shifts On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift is really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655	2014-02-19 10:02:43 +00:00
Tim Northover	83bbdcb246	GlobalMerge: move "-global-merge" option to the pass itself. It's rather odd to have the flag enabling and disabling this pass only affect a single target. llvm-svn: 201559	2014-02-18 11:17:29 +00:00
Quentin Colombet	5700bbac29	[CodeGenPrepare][AddressingModeMatcher] Give up on type promotion if the transformation does not bring any immediate benefits and introduce an illegal operation. llvm-svn: 201439	2014-02-14 22:23:22 +00:00
Rafael Espindola	cd84fe8173	Trivial cleanup: reuse existing variable. Extracted while trying to understand http://llvm-reviews.chandlerc.com/D1764. Patch by Matt Arsenault. llvm-svn: 201425	2014-02-14 19:02:01 +00:00
Chandler Carruth	aa1d9ed9b0	[LPM] Switch LICM to actively use LCSSA in addition to preserving it. Fixes PR18753 and PR18782. This is necessary for LICM to preserve LCSSA correctly and efficiently. There is still some active discussion about whether we should be using LCSSA, but we can't just immediately stop using it and we need LICM to preserve it while we are using it. We can restore the old SSAUpdater driven code if and when there is a serious effort to remove the reliance on LCSSA from all of the loop passes. However, this also serves as a great example of why LCSSA is very nice to have. This change significantly simplifies the process of sinking instructions for LICM, and makes it quite a bit less expensive. It wouldn't even be as complex as it is except that I had to start the process of removing the big recursive LCSSA formation hammer in order to switch even this much of the re-forming code to asserting that LCSSA was preserved. I'll fully remove that next just to tidy things up until the LCSSA debate settles one way or the other. llvm-svn: 201148	2014-02-11 12:52:27 +00:00
Quentin Colombet	826c5ca154	[CodeGenPrepare] Undo changes that happened for the profitability check. The addressing mode matcher checks at some point the profitability of folding an instruction into the addressing mode. When the instruction to be folded has several uses, it checks that the instruction can be folded in each use. To do so, it creates a new matcher for each use and check if the instruction is in the list of the matched instructions of this new matcher. The new matchers may promote some instructions and this has to be undone to keep the state of the original matcher consistent. A test case will follow. <rdar://problem/16020230> llvm-svn: 201121	2014-02-11 01:59:02 +00:00
Benjamin Kramer	4779ebf069	Make succ_iterator a real random access iterator and clean up a couple of users. llvm-svn: 201088	2014-02-10 14:17:42 +00:00
Juergen Ributzka	a44e3756e3	[Constant Hoisting] Fix insertion point for constant materialization. The bitcast instruction during constant materialization was not placed correcly in the presence of phi nodes. This commit fixes the insertion point to be in the idom instead. This fixes PR18768 llvm-svn: 201009	2014-02-08 00:20:49 +00:00
Juergen Ributzka	5435f3e6f6	[Constant Hoisting] Don't update the use list while traversing it - DOH! This fix first traverses the whole use list of the constant expression and keeps track of the instructions that need to be updated. Then perform the fixup afterwards. llvm-svn: 201008	2014-02-08 00:20:45 +00:00
Quentin Colombet	f0d12dd9ee	[CodeGenPrepare] Move away sign extensions that get in the way of addressing mode. Basically the idea is to transform code like this: %idx = add nsw i32 %a, 1 %sextidx = sext i32 %idx to i64 %gep = gep i8* %myArray, i64 %sextidx load i8* %gep Into: %sexta = sext i32 %a to i64 %idx = add nsw i64 %sexta, 1 %gep = gep i8* %myArray, i64 %idx load i8* %gep That way the computation can be folded into the addressing mode. This transformation is done as part of the addressing mode matcher. If the matching fails (not profitable, addressing mode not legal, etc.), the matcher will revert the related promotions. <rdar://problem/15519855> llvm-svn: 200947	2014-02-06 21:44:56 +00:00
Nick Lewycky	03b9ed1b7b	A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! llvm-svn: 200907	2014-02-06 06:29:19 +00:00
Paul Robinson	189e175394	Disable most IR-level transform passes on functions marked 'optnone'. Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892	2014-02-06 00:07:05 +00:00
Duncan P. N. Exon Smith	7024ad6965	cleanup: scc_iterator consumers should use isAtEnd No functional change. Updated loops from: for (I = scc_begin(), E = scc_end(); I != E; ++I) to: for (I = scc_begin(); !I.isAtEnd(); ++I) for teh win. llvm-svn: 200789	2014-02-04 19:19:07 +00:00
Nick Lewycky	df5396144d	Self-memcpy-elision and memcpy of constant byte to memset transforms don't care how many bytes you were trying to transfer. Sink that safety test after those transforms. Noticed by inspection. llvm-svn: 200726	2014-02-04 00:18:54 +00:00
Chandler Carruth	a93c365f31	[LPM] Apply a really big hammer to fix PR18688 by recursively reforming LCSSA when we promote to SSA registers inside of LICM. Currently, this is actually necessary. The promotion logic in LICM uses SSAUpdater which doesn't understand how to place LCSSA PHI nodes. Teaching it to do so would be a very significant undertaking. It may be worthwhile and I've left a FIXME about this in the code as well as starting a thread on llvmdev to try to figure out the right long-term solution. For now, the PR needs to be fixed. Short of using the promition SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes, I don't see a cleaner or cheaper way of achieving this. Fortunately, LCSSA is relatively lazy and sparse -- it should only update instructions which need it. We can also skip the recursive variant when we don't promote to SSA values. llvm-svn: 200612	2014-02-01 13:35:14 +00:00
Chandler Carruth	6ba48b6c38	[LPM] Fix PR18643, another scary place where loop transforms failed to preserve loop simplify of enclosing loops. The problem here starts with LoopRotation which ends up cloning code out of the latch into the new preheader it is buidling. This can create a new edge from the preheader into the exit block of the loop which breaks LoopSimplify form. The code tries to fix this by splitting the critical edge between the latch and the exit block to get a new exit block that only the latch dominates. This sadly isn't sufficient. The exit block may be an exit block for multiple nested loops. When we clone an edge from the latch of the inner loop to the new preheader being built in the outer loop, we create an exiting edge from the outer loop to this exit block. Despite breaking the LoopSimplify form for the inner loop, this is fine for the outer loop. However, when we split the edge from the inner loop to the exit block, we create a new block which is in neither the inner nor outer loop as the new exit block. This is a predecessor to the old exit block, and so the split itself takes the outer loop out of LoopSimplify form. We need to split every edge entering the exit block from inside a loop nested more deeply than the exit block in order to preserve all of the loop simplify constraints. Once we try to do that, a problem with splitting critical edges surfaces. Previously, we tried a very brute force to update LoopSimplify form by re-computing it for all exit blocks. We don't need to do this, and doing this much will sometimes but not always overlap with the LoopRotate bug fix. Instead, the code needs to specifically handle the cases which can start to violate LoopSimplify -- they aren't that common. We need to see if the destination of the split edge was a loop exit block in simplified form for the loop of the source of the edge. For this to be true, all the predecessors need to be in the exact same loop as the source of the edge being split. If the dest block was originally in this form, we have to split all of the deges back into this loop to recover it. The old mechanism of doing this was conservatively correct because at least one of the exiting blocks it rewrote was the DestBB and so the DestBB's predecessors were fixed. But this is a much more targeted way of doing it. Making it targeted is important, because ballooning the set of edges touched prevents LoopRotate from being able to split edges it needs to split to preserve loop simplify in a coherent way -- the critical edge splitting would sometimes find the other edges in need of splitting but not others. Many, many thanks for help from Nick reducing these test cases mightily. And helping lots with the analysis here as this one was quite tricky to track down. llvm-svn: 200393	2014-01-29 13:16:53 +00:00
Chandler Carruth	ed726e1be7	[LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" because of the inside-out run of LoopSimplify in the LoopPassManager and the fact that LoopSimplify couldn't be "preserved" across two independent LoopPassManagers. Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI node because it thought it was rewriting (via SCEV) the incoming value to a loop invariant value. While it may well be invariant for the current loop, it may be rewritten in terms of an enclosing loop's values. This in and of itself is fine, as the LCSSA PHI node in the enclosing loop for the inner loop value we're rewriting will have its own LCSSA PHI node if used outside of the enclosing loop. With me so far? Well, the current loop and the enclosing loop may share an exiting block and exit block, and when they do they also share LCSSA PHI nodes. In this case, its not valid to RAUW through the LCSSA PHI node. Expected crazy test included. llvm-svn: 200372	2014-01-29 04:40:19 +00:00
Reid Kleckner	c9ab4a9a3b	Update optimization passes to handle inalloca arguments Summary: I searched Transforms/ and Analysis/ for 'ByVal' and updated those call sites to check for inalloca if appropriate. I added tests for any change that would allow an optimization to fire on inalloca. Reviewers: nlewycky Differential Revision: http://llvm-reviews.chandlerc.com/D2449 llvm-svn: 200281	2014-01-28 02:38:36 +00:00
Benjamin Kramer	65df2371a8	ConstantHoisting: We can't insert instructions directly in front of a PHI node. Insert before the terminating instruction of the dominating block instead. llvm-svn: 200218	2014-01-27 13:11:43 +00:00
Chandler Carruth	3998de34a0	[LPM] Make LCSSA a utility with a FunctionPass that applies it to all the loops in a function, and teach LICM to work in the presance of LCSSA. Previously, LCSSA was a loop pass. That made passes requiring it also be loop passes and unable to depend on function analysis passes easily. It also caused outer loops to have a different "canonical" form from inner loops during analysis. Instead, we go into LCSSA form and preserve it through the loop pass manager run. Note that this has the same problem as LoopSimplify that prevents enabling its verification -- loop passes which run at the end of the loop pass manager and don't preserve these are valid, but the subsequent loop pass runs of outer loops that do preserve this pass trigger too much verification and fail because the inner loop no longer verifies. The other problem this exposed is that LICM was completely unable to handle LCSSA form. It didn't preserve it and it actually would give up on moving instructions in many cases when they were used by an LCSSA phi node. I've taught LICM to support detecting LCSSA-form PHI nodes and to hoist and sink around them. This may actually let LICM fire significantly more because we put everything into LCSSA form to rotate the loop before running LICM. =/ Now LICM should handle that fine and preserve it correctly. The down side is that LICM has to require LCSSA in order to preserve it. This is just a fact of life for LCSSA. It's entirely possible we should completely remove LCSSA from the optimizer. The test updates are essentially accomodating LCSSA phi nodes in the output of LICM, and the fact that we now completely sink every instruction in ashr-crash below the loop bodies prior to unrolling. With this change, LCSSA is computed only three times in the pass pipeline. One of them could be removed (and potentially a SCEV run and a separate LoopPassManager entirely!) if we had a LoopPass variant of InstCombine that ran InstCombine on the loop body but refused to combine away LCSSA PHI nodes. Currently, this also prevents loop unrolling from being in the same loop pass manager is rotate, LICM, and unswitch. There is one thing that I really don't like -- preserving LCSSA in LICM is quite expensive. We end up having to re-run LCSSA twice for some loops after LICM runs because LICM can undo LCSSA both in the current loop and the parent loop. I don't really see good solutions to this other than to completely move away from LCSSA and using tools like SSAUpdater instead. llvm-svn: 200067	2014-01-25 04:07:24 +00:00

1 2 3 4 5 ...

5932 Commits