1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
llvm-mirror/test/Transforms/SimpleLoopUnswitch/preserve-analyses.ll
Chandler Carruth 99ee90db82 [PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
  below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
  infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
  and instead incrementally updates it.

I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.

My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.

This pass makes two very significant assumptions that enable most of these
improvements:

1) Focus on *full* unswitching -- that is, completely removing whatever
   control flow construct is being unswitched from the loop. In the case
   of trivial unswitching, this means removing the trivial (exiting)
   edge. In non-trivial unswitching, this means removing the branch or
   switch itself. This is in opposition to *partial* unswitching where
   some part of the unswitched control flow remains in the loop. Partial
   unswitching only really applies to switches and to folded branches.
   These are very similar to full unrolling and partial unrolling. The
   full form is an effective canonicalization, the partial form needs
   a complex cost model, cannot be iterated, isn't canonicalizing, and
   should be a separate pass that runs very late (much like unrolling).

2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
   dates from a time when a great deal of LLVM's loop infrastructure was
   missing, ineffective, and/or unreliable. As a consequence, a lot of
   complexity was added which we no longer need.

With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.

Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.

One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.

Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.

Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.

Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.

I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.

Differential Revision: https://reviews.llvm.org/D32409

llvm-svn: 301576
2017-04-27 18:45:20 +00:00

130 lines
5.1 KiB
LLVM

; RUN: opt -simple-loop-unswitch -verify-loop-info -verify-dom-info -disable-output < %s
; Loop unswitch should be able to unswitch these loops and
; preserve LCSSA and LoopSimplify forms.
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:64:64-v128:128:128-a0:0:64"
target triple = "armv6-apple-darwin9"
@delim1 = external global i32 ; <i32*> [#uses=1]
@delim2 = external global i32 ; <i32*> [#uses=1]
define i32 @ineqn(i8* %s, i8* %p) nounwind readonly {
entry:
%0 = load i32, i32* @delim1, align 4 ; <i32> [#uses=1]
%1 = load i32, i32* @delim2, align 4 ; <i32> [#uses=1]
br label %bb8.outer
bb: ; preds = %bb8
%2 = icmp eq i8* %p_addr.0, %s ; <i1> [#uses=1]
br i1 %2, label %bb10, label %bb2
bb2: ; preds = %bb
%3 = getelementptr inbounds i8, i8* %p_addr.0, i32 1 ; <i8*> [#uses=3]
switch i32 %ineq.0.ph, label %bb8.backedge [
i32 0, label %bb3
i32 1, label %bb6
]
bb8.backedge: ; preds = %bb6, %bb5, %bb2
br label %bb8
bb3: ; preds = %bb2
%4 = icmp eq i32 %8, %0 ; <i1> [#uses=1]
br i1 %4, label %bb8.outer.loopexit, label %bb5
bb5: ; preds = %bb3
br i1 %6, label %bb6, label %bb8.backedge
bb6: ; preds = %bb5, %bb2
%5 = icmp eq i32 %8, %1 ; <i1> [#uses=1]
br i1 %5, label %bb7, label %bb8.backedge
bb7: ; preds = %bb6
%.lcssa1 = phi i8* [ %3, %bb6 ] ; <i8*> [#uses=1]
br label %bb8.outer.backedge
bb8.outer.backedge: ; preds = %bb8.outer.loopexit, %bb7
%.lcssa2 = phi i8* [ %.lcssa1, %bb7 ], [ %.lcssa, %bb8.outer.loopexit ] ; <i8*> [#uses=1]
%ineq.0.ph.be = phi i32 [ 0, %bb7 ], [ 1, %bb8.outer.loopexit ] ; <i32> [#uses=1]
br label %bb8.outer
bb8.outer.loopexit: ; preds = %bb3
%.lcssa = phi i8* [ %3, %bb3 ] ; <i8*> [#uses=1]
br label %bb8.outer.backedge
bb8.outer: ; preds = %bb8.outer.backedge, %entry
%ineq.0.ph = phi i32 [ 0, %entry ], [ %ineq.0.ph.be, %bb8.outer.backedge ] ; <i32> [#uses=3]
%p_addr.0.ph = phi i8* [ %p, %entry ], [ %.lcssa2, %bb8.outer.backedge ] ; <i8*> [#uses=1]
%6 = icmp eq i32 %ineq.0.ph, 1 ; <i1> [#uses=1]
br label %bb8
bb8: ; preds = %bb8.outer, %bb8.backedge
%p_addr.0 = phi i8* [ %p_addr.0.ph, %bb8.outer ], [ %3, %bb8.backedge ] ; <i8*> [#uses=3]
%7 = load i8, i8* %p_addr.0, align 1 ; <i8> [#uses=2]
%8 = sext i8 %7 to i32 ; <i32> [#uses=2]
%9 = icmp eq i8 %7, 0 ; <i1> [#uses=1]
br i1 %9, label %bb10, label %bb
bb10: ; preds = %bb8, %bb
%.0 = phi i32 [ %ineq.0.ph, %bb ], [ 0, %bb8 ] ; <i32> [#uses=1]
ret i32 %.0
}
; This is a simplified form of ineqn from above. It triggers some
; different cases in the loop-unswitch code.
define void @simplified_ineqn() nounwind readonly {
entry:
br label %bb8.outer
bb8.outer: ; preds = %bb6, %bb2, %entry
%x = phi i32 [ 0, %entry ], [ 0, %bb6 ], [ 1, %bb2 ] ; <i32> [#uses=1]
br i1 undef, label %return, label %bb2
bb2: ; preds = %bb
switch i32 %x, label %bb6 [
i32 0, label %bb8.outer
]
bb6: ; preds = %bb2
br i1 undef, label %bb8.outer, label %bb2
return: ; preds = %bb8, %bb
ret void
}
; This function requires special handling to preserve LCSSA form.
; PR4934
define void @pnp_check_irq() nounwind noredzone {
entry:
%conv56 = trunc i64 undef to i32 ; <i32> [#uses=1]
br label %while.cond.i
while.cond.i: ; preds = %while.cond.i.backedge, %entry
%call.i25 = call i8* @pci_get_device() nounwind noredzone ; <i8*> [#uses=2]
br i1 undef, label %if.then65, label %while.body.i
while.body.i: ; preds = %while.cond.i
br i1 undef, label %if.then31.i.i, label %while.cond.i.backedge
while.cond.i.backedge: ; preds = %if.then31.i.i, %while.body.i
br label %while.cond.i
if.then31.i.i: ; preds = %while.body.i
switch i32 %conv56, label %while.cond.i.backedge [
i32 14, label %if.then42.i.i
i32 15, label %if.then42.i.i
]
if.then42.i.i: ; preds = %if.then31.i.i, %if.then31.i.i
%call.i25.lcssa48 = phi i8* [ %call.i25, %if.then31.i.i ], [ %call.i25, %if.then31.i.i ] ; <i8*> [#uses=0]
unreachable
if.then65: ; preds = %while.cond.i
unreachable
}
declare i8* @pci_get_device() noredzone