[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
2017-04-27 20:45:20 +02:00
|
|
|
; RUN: opt -passes='loop(unswitch),verify<loops>' -S < %s | FileCheck %s
|
2018-12-04 15:23:37 +01:00
|
|
|
; RUN: opt -enable-mssa-loop-dependency=true -verify-memoryssa -passes='loop(unswitch),verify<loops>' -S < %s | FileCheck %s
|
[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
2017-04-27 20:45:20 +02:00
|
|
|
|
|
|
|
declare void @some_func() noreturn
|
[PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.
The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.
The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[
This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.
And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.
llvm-svn: 336646
2018-07-10 10:36:05 +02:00
|
|
|
declare void @sink(i32)
|
[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
2017-04-27 20:45:20 +02:00
|
|
|
|
2018-07-07 03:12:56 +02:00
|
|
|
declare i1 @cond()
|
|
|
|
declare i32 @cond.i32()
|
|
|
|
|
[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
2017-04-27 20:45:20 +02:00
|
|
|
; This test contains two trivial unswitch condition in one loop.
|
|
|
|
; LoopUnswitch pass should be able to unswitch the second one
|
|
|
|
; after unswitching the first one.
|
|
|
|
define i32 @test1(i32* %var, i1 %cond1, i1 %cond2) {
|
|
|
|
; CHECK-LABEL: @test1(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: br i1 %{{.*}}, label %entry.split, label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br i1 %{{.*}}, label %entry.split.split, label %loop_exit
|
|
|
|
;
|
|
|
|
; CHECK: entry.split.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
br i1 %cond1, label %continue, label %loop_exit ; first trivial condition
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %continue
|
|
|
|
|
|
|
|
continue:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
br i1 %cond2, label %do_something, label %loop_exit ; second trivial condition
|
|
|
|
; CHECK: continue:
|
|
|
|
; CHECK-NEXT: load
|
|
|
|
; CHECK-NEXT: br label %do_something
|
|
|
|
|
|
|
|
do_something:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: do_something:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: br label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
}
|
|
|
|
|
|
|
|
; Test for two trivially unswitchable switches.
|
|
|
|
define i32 @test3(i32* %var, i32 %cond1, i32 %cond2) {
|
|
|
|
; CHECK-LABEL: @test3(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: switch i32 %cond1, label %entry.split [
|
|
|
|
; CHECK-NEXT: i32 0, label %loop_exit1
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: switch i32 %cond2, label %loop_exit2 [
|
|
|
|
; CHECK-NEXT: i32 42, label %loop_exit2
|
|
|
|
; CHECK-NEXT: i32 0, label %entry.split.split
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
switch i32 %cond1, label %continue [
|
|
|
|
i32 0, label %loop_exit1
|
|
|
|
]
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %continue
|
|
|
|
|
|
|
|
continue:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
switch i32 %cond2, label %loop_exit2 [
|
|
|
|
i32 0, label %do_something
|
|
|
|
i32 42, label %loop_exit2
|
|
|
|
]
|
|
|
|
; CHECK: continue:
|
|
|
|
; CHECK-NEXT: load
|
|
|
|
; CHECK-NEXT: br label %do_something
|
|
|
|
|
|
|
|
do_something:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: do_something:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit1:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit1:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
|
|
|
|
loop_exit2:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit2:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
;
|
|
|
|
; We shouldn't have any unreachable blocks here because the unswitched switches
|
|
|
|
; turn into branches instead.
|
|
|
|
; CHECK-NOT: unreachable
|
|
|
|
}
|
|
|
|
|
|
|
|
; Test for a trivially unswitchable switch with multiple exiting cases and
|
|
|
|
; multiple looping cases.
|
|
|
|
define i32 @test4(i32* %var, i32 %cond1, i32 %cond2) {
|
|
|
|
; CHECK-LABEL: @test4(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: switch i32 %cond2, label %loop_exit2 [
|
|
|
|
; CHECK-NEXT: i32 13, label %loop_exit1
|
|
|
|
; CHECK-NEXT: i32 42, label %loop_exit3
|
|
|
|
; CHECK-NEXT: i32 0, label %entry.split
|
|
|
|
; CHECK-NEXT: i32 1, label %entry.split
|
|
|
|
; CHECK-NEXT: i32 2, label %entry.split
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
switch i32 %cond2, label %loop_exit2 [
|
|
|
|
i32 0, label %loop0
|
|
|
|
i32 1, label %loop1
|
|
|
|
i32 13, label %loop_exit1
|
|
|
|
i32 2, label %loop2
|
|
|
|
i32 42, label %loop_exit3
|
|
|
|
]
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: load
|
[PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.
The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.
The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[
This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.
And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.
llvm-svn: 336646
2018-07-10 10:36:05 +02:00
|
|
|
; CHECK-NEXT: switch i32 %cond2, label %loop2 [
|
[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
2017-04-27 20:45:20 +02:00
|
|
|
; CHECK-NEXT: i32 0, label %loop0
|
|
|
|
; CHECK-NEXT: i32 1, label %loop1
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
|
|
|
|
loop0:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_latch
|
|
|
|
; CHECK: loop0:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_latch
|
|
|
|
|
|
|
|
loop1:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_latch
|
|
|
|
; CHECK: loop1:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_latch
|
|
|
|
|
|
|
|
loop2:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_latch
|
|
|
|
; CHECK: loop2:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_latch
|
|
|
|
|
|
|
|
loop_latch:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: loop_latch:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit1:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit1:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
|
|
|
|
loop_exit2:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit2:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
|
|
|
|
loop_exit3:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit3:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
}
|
2017-05-12 04:19:59 +02:00
|
|
|
|
|
|
|
; This test contains a trivially unswitchable branch with an LCSSA phi node in
|
|
|
|
; a loop exit block.
|
|
|
|
define i32 @test5(i1 %cond1, i32 %x, i32 %y) {
|
|
|
|
; CHECK-LABEL: @test5(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: br i1 %{{.*}}, label %entry.split, label %loop_exit
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
br i1 %cond1, label %latch, label %loop_exit
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %latch
|
|
|
|
|
|
|
|
latch:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: latch:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
%result1 = phi i32 [ %x, %loop_begin ]
|
|
|
|
%result2 = phi i32 [ %y, %loop_begin ]
|
|
|
|
%result = add i32 %result1, %result2
|
|
|
|
ret i32 %result
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %x, %entry ]
|
|
|
|
; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %y, %entry ]
|
|
|
|
; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1]], %[[R2]]
|
|
|
|
; CHECK-NEXT: ret i32 %[[R]]
|
|
|
|
}
|
|
|
|
|
|
|
|
; This test contains a trivially unswitchable branch with a real phi node in LCSSA
|
|
|
|
; position in a shared exit block where a different path through the loop
|
|
|
|
; produces a non-invariant input to the PHI node.
|
|
|
|
define i32 @test6(i32* %var, i1 %cond1, i1 %cond2, i32 %x, i32 %y) {
|
|
|
|
; CHECK-LABEL: @test6(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: br i1 %{{.*}}, label %entry.split, label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
br i1 %cond1, label %continue, label %loop_exit
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %continue
|
|
|
|
|
|
|
|
continue:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
br i1 %cond2, label %latch, label %loop_exit
|
|
|
|
; CHECK: continue:
|
|
|
|
; CHECK-NEXT: load
|
|
|
|
; CHECK-NEXT: br i1 %cond2, label %latch, label %loop_exit
|
|
|
|
|
|
|
|
latch:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: latch:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
%result1 = phi i32 [ %x, %loop_begin ], [ %var_val, %continue ]
|
|
|
|
%result2 = phi i32 [ %var_val, %continue ], [ %y, %loop_begin ]
|
|
|
|
%result = add i32 %result1, %result2
|
|
|
|
ret i32 %result
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %var_val, %continue ]
|
|
|
|
; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %var_val, %continue ]
|
|
|
|
; CHECK-NEXT: br label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: %[[R1S:.*]] = phi i32 [ %x, %entry ], [ %[[R1]], %loop_exit ]
|
|
|
|
; CHECK-NEXT: %[[R2S:.*]] = phi i32 [ %y, %entry ], [ %[[R2]], %loop_exit ]
|
|
|
|
; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1S]], %[[R2S]]
|
|
|
|
; CHECK-NEXT: ret i32 %[[R]]
|
|
|
|
}
|
|
|
|
|
|
|
|
; This test contains a trivially unswitchable switch with an LCSSA phi node in
|
|
|
|
; a loop exit block.
|
|
|
|
define i32 @test7(i32 %cond1, i32 %x, i32 %y) {
|
|
|
|
; CHECK-LABEL: @test7(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: switch i32 %cond1, label %entry.split [
|
|
|
|
; CHECK-NEXT: i32 0, label %loop_exit
|
|
|
|
; CHECK-NEXT: i32 1, label %loop_exit
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
switch i32 %cond1, label %latch [
|
|
|
|
i32 0, label %loop_exit
|
|
|
|
i32 1, label %loop_exit
|
|
|
|
]
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %latch
|
|
|
|
|
|
|
|
latch:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: latch:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
%result1 = phi i32 [ %x, %loop_begin ], [ %x, %loop_begin ]
|
|
|
|
%result2 = phi i32 [ %y, %loop_begin ], [ %y, %loop_begin ]
|
|
|
|
%result = add i32 %result1, %result2
|
|
|
|
ret i32 %result
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %x, %entry ], [ %x, %entry ]
|
|
|
|
; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %y, %entry ], [ %y, %entry ]
|
|
|
|
; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1]], %[[R2]]
|
|
|
|
; CHECK-NEXT: ret i32 %[[R]]
|
|
|
|
}
|
|
|
|
|
|
|
|
; This test contains a trivially unswitchable switch with a real phi node in
|
|
|
|
; LCSSA position in a shared exit block where a different path through the loop
|
|
|
|
; produces a non-invariant input to the PHI node.
|
|
|
|
define i32 @test8(i32* %var, i32 %cond1, i32 %cond2, i32 %x, i32 %y) {
|
|
|
|
; CHECK-LABEL: @test8(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: switch i32 %cond1, label %entry.split [
|
|
|
|
; CHECK-NEXT: i32 0, label %loop_exit.split
|
|
|
|
; CHECK-NEXT: i32 1, label %loop_exit2
|
|
|
|
; CHECK-NEXT: i32 2, label %loop_exit.split
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
switch i32 %cond1, label %continue [
|
|
|
|
i32 0, label %loop_exit
|
|
|
|
i32 1, label %loop_exit2
|
|
|
|
i32 2, label %loop_exit
|
|
|
|
]
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %continue
|
|
|
|
|
|
|
|
continue:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
switch i32 %cond2, label %latch [
|
|
|
|
i32 0, label %loop_exit
|
|
|
|
]
|
|
|
|
; CHECK: continue:
|
|
|
|
; CHECK-NEXT: load
|
|
|
|
; CHECK-NEXT: switch i32 %cond2, label %latch [
|
|
|
|
; CHECK-NEXT: i32 0, label %loop_exit
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
|
|
|
|
latch:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: latch:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
%result1.1 = phi i32 [ %x, %loop_begin ], [ %x, %loop_begin ], [ %var_val, %continue ]
|
|
|
|
%result1.2 = phi i32 [ %var_val, %continue ], [ %y, %loop_begin ], [ %y, %loop_begin ]
|
|
|
|
%result1 = add i32 %result1.1, %result1.2
|
|
|
|
ret i32 %result1
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %var_val, %continue ]
|
|
|
|
; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %var_val, %continue ]
|
|
|
|
; CHECK-NEXT: br label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: %[[R1S:.*]] = phi i32 [ %x, %entry ], [ %x, %entry ], [ %[[R1]], %loop_exit ]
|
|
|
|
; CHECK-NEXT: %[[R2S:.*]] = phi i32 [ %y, %entry ], [ %y, %entry ], [ %[[R2]], %loop_exit ]
|
|
|
|
; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1S]], %[[R2S]]
|
|
|
|
; CHECK-NEXT: ret i32 %[[R]]
|
|
|
|
|
|
|
|
loop_exit2:
|
|
|
|
%result2.1 = phi i32 [ %x, %loop_begin ]
|
|
|
|
%result2.2 = phi i32 [ %y, %loop_begin ]
|
|
|
|
%result2 = add i32 %result2.1, %result2.2
|
|
|
|
ret i32 %result2
|
|
|
|
; CHECK: loop_exit2:
|
|
|
|
; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %x, %entry ]
|
|
|
|
; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %y, %entry ]
|
|
|
|
; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1]], %[[R2]]
|
|
|
|
; CHECK-NEXT: ret i32 %[[R]]
|
|
|
|
}
|
[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch
pass.
The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.
The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.
Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.
In good news, with this patch, turning on simple unswitch passes the
LLVM test suite for me with asserts enabled.
Differential Revision: https://reviews.llvm.org/D32740
llvm-svn: 303843
2017-05-25 08:33:36 +02:00
|
|
|
|
|
|
|
; This test, extracted from the LLVM test suite, has an interesting dominator
|
|
|
|
; tree to update as there are edges to sibling domtree nodes within child
|
|
|
|
; domtree nodes of the unswitched node.
|
|
|
|
define void @xgets(i1 %cond1, i1* %cond2.ptr) {
|
|
|
|
; CHECK-LABEL: @xgets(
|
|
|
|
entry:
|
|
|
|
br label %for.cond.preheader
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %for.cond.preheader
|
|
|
|
|
|
|
|
for.cond.preheader:
|
|
|
|
br label %for.cond
|
|
|
|
; CHECK: for.cond.preheader:
|
|
|
|
; CHECK-NEXT: br i1 %cond1, label %for.cond.preheader.split, label %if.end17.thread.loopexit
|
|
|
|
;
|
|
|
|
; CHECK: for.cond.preheader.split:
|
|
|
|
; CHECK-NEXT: br label %for.cond
|
|
|
|
|
|
|
|
for.cond:
|
|
|
|
br i1 %cond1, label %land.lhs.true, label %if.end17.thread.loopexit
|
|
|
|
; CHECK: for.cond:
|
|
|
|
; CHECK-NEXT: br label %land.lhs.true
|
|
|
|
|
|
|
|
land.lhs.true:
|
|
|
|
br label %if.then20
|
|
|
|
; CHECK: land.lhs.true:
|
|
|
|
; CHECK-NEXT: br label %if.then20
|
|
|
|
|
|
|
|
if.then20:
|
|
|
|
%cond2 = load volatile i1, i1* %cond2.ptr
|
|
|
|
br i1 %cond2, label %if.then23, label %if.else
|
|
|
|
; CHECK: if.then20:
|
|
|
|
; CHECK-NEXT: %[[COND2:.*]] = load volatile i1, i1* %cond2.ptr
|
|
|
|
; CHECK-NEXT: br i1 %[[COND2]], label %if.then23, label %if.else
|
|
|
|
|
|
|
|
if.else:
|
|
|
|
br label %for.cond
|
|
|
|
; CHECK: if.else:
|
|
|
|
; CHECK-NEXT: br label %for.cond
|
|
|
|
|
|
|
|
if.end17.thread.loopexit:
|
|
|
|
br label %if.end17.thread
|
|
|
|
; CHECK: if.end17.thread.loopexit:
|
|
|
|
; CHECK-NEXT: br label %if.end17.thread
|
|
|
|
|
|
|
|
if.end17.thread:
|
|
|
|
br label %cleanup
|
|
|
|
; CHECK: if.end17.thread:
|
|
|
|
; CHECK-NEXT: br label %cleanup
|
|
|
|
|
|
|
|
if.then23:
|
|
|
|
br label %cleanup
|
|
|
|
; CHECK: if.then23:
|
|
|
|
; CHECK-NEXT: br label %cleanup
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
ret void
|
|
|
|
; CHECK: cleanup:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
2018-06-20 20:57:07 +02:00
|
|
|
|
|
|
|
define i32 @test_partial_condition_unswitch_and(i32* %var, i1 %cond1, i1 %cond2) {
|
|
|
|
; CHECK-LABEL: @test_partial_condition_unswitch_and(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: br i1 %cond1, label %entry.split, label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br i1 %cond2, label %entry.split.split, label %loop_exit
|
|
|
|
;
|
|
|
|
; CHECK: entry.split.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
br i1 %cond1, label %continue, label %loop_exit
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: br label %continue
|
|
|
|
|
|
|
|
continue:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
%var_cond = trunc i32 %var_val to i1
|
|
|
|
%cond_and = and i1 %var_cond, %cond2
|
|
|
|
br i1 %cond_and, label %do_something, label %loop_exit
|
|
|
|
; CHECK: continue:
|
|
|
|
; CHECK-NEXT: %[[VAR:.*]] = load i32
|
|
|
|
; CHECK-NEXT: %[[VAR_COND:.*]] = trunc i32 %[[VAR]] to i1
|
|
|
|
; CHECK-NEXT: %[[COND_AND:.*]] = and i1 %[[VAR_COND]], true
|
|
|
|
; CHECK-NEXT: br i1 %[[COND_AND]], label %do_something, label %loop_exit
|
|
|
|
|
|
|
|
do_something:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: do_something:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: br label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
}
|
|
|
|
|
|
|
|
define i32 @test_partial_condition_unswitch_or(i32* %var, i1 %cond1, i1 %cond2, i1 %cond3, i1 %cond4, i1 %cond5, i1 %cond6) {
|
|
|
|
; CHECK-LABEL: @test_partial_condition_unswitch_or(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: %[[INV_OR1:.*]] = or i1 %cond4, %cond2
|
|
|
|
; CHECK-NEXT: %[[INV_OR2:.*]] = or i1 %[[INV_OR1]], %cond3
|
|
|
|
; CHECK-NEXT: %[[INV_OR3:.*]] = or i1 %[[INV_OR2]], %cond1
|
|
|
|
; CHECK-NEXT: br i1 %[[INV_OR3]], label %loop_exit.split, label %entry.split
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
%var_cond = trunc i32 %var_val to i1
|
|
|
|
%cond_or1 = or i1 %var_cond, %cond1
|
|
|
|
%cond_or2 = or i1 %cond2, %cond3
|
|
|
|
%cond_or3 = or i1 %cond_or1, %cond_or2
|
|
|
|
%cond_xor1 = xor i1 %cond5, %var_cond
|
|
|
|
%cond_and1 = and i1 %cond6, %var_cond
|
|
|
|
%cond_or4 = or i1 %cond_xor1, %cond_and1
|
|
|
|
%cond_or5 = or i1 %cond_or3, %cond_or4
|
|
|
|
%cond_or6 = or i1 %cond_or5, %cond4
|
|
|
|
br i1 %cond_or6, label %loop_exit, label %do_something
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: %[[VAR:.*]] = load i32
|
|
|
|
; CHECK-NEXT: %[[VAR_COND:.*]] = trunc i32 %[[VAR]] to i1
|
|
|
|
; CHECK-NEXT: %[[COND_OR1:.*]] = or i1 %[[VAR_COND]], false
|
|
|
|
; CHECK-NEXT: %[[COND_OR2:.*]] = or i1 false, false
|
|
|
|
; CHECK-NEXT: %[[COND_OR3:.*]] = or i1 %[[COND_OR1]], %[[COND_OR2]]
|
|
|
|
; CHECK-NEXT: %[[COND_XOR:.*]] = xor i1 %cond5, %[[VAR_COND]]
|
|
|
|
; CHECK-NEXT: %[[COND_AND:.*]] = and i1 %cond6, %[[VAR_COND]]
|
|
|
|
; CHECK-NEXT: %[[COND_OR4:.*]] = or i1 %[[COND_XOR]], %[[COND_AND]]
|
|
|
|
; CHECK-NEXT: %[[COND_OR5:.*]] = or i1 %[[COND_OR3]], %[[COND_OR4]]
|
|
|
|
; CHECK-NEXT: %[[COND_OR6:.*]] = or i1 %[[COND_OR5]], false
|
|
|
|
; CHECK-NEXT: br i1 %[[COND_OR6]], label %loop_exit, label %do_something
|
|
|
|
|
|
|
|
do_something:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: do_something:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
ret i32 0
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
}
|
|
|
|
|
|
|
|
define i32 @test_partial_condition_unswitch_with_lcssa_phi1(i32* %var, i1 %cond, i32 %x) {
|
|
|
|
; CHECK-LABEL: @test_partial_condition_unswitch_with_lcssa_phi1(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: br i1 %cond, label %entry.split, label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
%var_cond = trunc i32 %var_val to i1
|
|
|
|
%cond_and = and i1 %var_cond, %cond
|
|
|
|
br i1 %cond_and, label %do_something, label %loop_exit
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: %[[VAR:.*]] = load i32
|
|
|
|
; CHECK-NEXT: %[[VAR_COND:.*]] = trunc i32 %[[VAR]] to i1
|
|
|
|
; CHECK-NEXT: %[[COND_AND:.*]] = and i1 %[[VAR_COND]], true
|
|
|
|
; CHECK-NEXT: br i1 %[[COND_AND]], label %do_something, label %loop_exit
|
|
|
|
|
|
|
|
do_something:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK: do_something:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
%x.lcssa = phi i32 [ %x, %loop_begin ]
|
|
|
|
ret i32 %x.lcssa
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: %[[LCSSA:.*]] = phi i32 [ %x, %loop_begin ]
|
|
|
|
; CHECK-NEXT: br label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: %[[LCSSA_SPLIT:.*]] = phi i32 [ %x, %entry ], [ %[[LCSSA]], %loop_exit ]
|
|
|
|
; CHECK-NEXT: ret i32 %[[LCSSA_SPLIT]]
|
|
|
|
}
|
|
|
|
|
|
|
|
define i32 @test_partial_condition_unswitch_with_lcssa_phi2(i32* %var, i1 %cond, i32 %x, i32 %y) {
|
|
|
|
; CHECK-LABEL: @test_partial_condition_unswitch_with_lcssa_phi2(
|
|
|
|
entry:
|
|
|
|
br label %loop_begin
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: br i1 %cond, label %entry.split, label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %loop_begin
|
|
|
|
|
|
|
|
loop_begin:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
%var_cond = trunc i32 %var_val to i1
|
|
|
|
%cond_and = and i1 %var_cond, %cond
|
|
|
|
br i1 %cond_and, label %do_something, label %loop_exit
|
|
|
|
; CHECK: loop_begin:
|
|
|
|
; CHECK-NEXT: %[[VAR:.*]] = load i32
|
|
|
|
; CHECK-NEXT: %[[VAR_COND:.*]] = trunc i32 %[[VAR]] to i1
|
|
|
|
; CHECK-NEXT: %[[COND_AND:.*]] = and i1 %[[VAR_COND]], true
|
|
|
|
; CHECK-NEXT: br i1 %[[COND_AND]], label %do_something, label %loop_exit
|
|
|
|
|
|
|
|
do_something:
|
|
|
|
call void @some_func() noreturn nounwind
|
|
|
|
br i1 %var_cond, label %loop_begin, label %loop_exit
|
|
|
|
; CHECK: do_something:
|
|
|
|
; CHECK-NEXT: call
|
|
|
|
; CHECK-NEXT: br i1 %[[VAR_COND]], label %loop_begin, label %loop_exit
|
|
|
|
|
|
|
|
loop_exit:
|
|
|
|
%xy.lcssa = phi i32 [ %x, %loop_begin ], [ %y, %do_something ]
|
|
|
|
ret i32 %xy.lcssa
|
|
|
|
; CHECK: loop_exit:
|
|
|
|
; CHECK-NEXT: %[[LCSSA:.*]] = phi i32 [ %x, %loop_begin ], [ %y, %do_something ]
|
|
|
|
; CHECK-NEXT: br label %loop_exit.split
|
|
|
|
;
|
|
|
|
; CHECK: loop_exit.split:
|
|
|
|
; CHECK-NEXT: %[[LCSSA_SPLIT:.*]] = phi i32 [ %x, %entry ], [ %[[LCSSA]], %loop_exit ]
|
|
|
|
; CHECK-NEXT: ret i32 %[[LCSSA_SPLIT]]
|
|
|
|
}
|
2018-07-07 03:12:56 +02:00
|
|
|
|
|
|
|
; Unswitch will not actually change the loop nest from:
|
|
|
|
; A < B < C
|
|
|
|
define void @hoist_inner_loop0() {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop0(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
%v1 = call i1 @cond()
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: %v1 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v1, label %[[B_LATCH_SPLIT:.*]], label %[[B_HEADER_SPLIT:.*]]
|
|
|
|
;
|
|
|
|
; CHECK: [[B_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
br i1 %v1, label %b.latch, label %c.latch
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: br label %c.latch
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %c.header, label %b.latch
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %c.header, label %b.latch
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %b.header, label %a.latch
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: br label %[[B_LATCH_SPLIT]]
|
|
|
|
;
|
|
|
|
; CHECK: [[B_LATCH_SPLIT]]:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %b.header, label %a.latch
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Unswitch will transform the loop nest from:
|
|
|
|
; A < B < C
|
|
|
|
; into
|
|
|
|
; A < (B, C)
|
|
|
|
define void @hoist_inner_loop1(i32* %ptr) {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop1(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
%x.a = load i32, i32* %ptr
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: %x.a = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
%x.b = load i32, i32* %ptr
|
|
|
|
%v1 = call i1 @cond()
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: %x.b = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: %v1 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v1, label %b.latch, label %[[B_HEADER_SPLIT:.*]]
|
|
|
|
;
|
|
|
|
; CHECK: [[B_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: %[[X_B_LCSSA:.*]] = phi i32 [ %x.b, %b.header ]
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
br i1 %v1, label %b.latch, label %c.latch
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: br label %c.latch
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
; Use values from other loops to check LCSSA form.
|
|
|
|
store i32 %x.a, i32* %ptr
|
|
|
|
store i32 %x.b, i32* %ptr
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %c.header, label %a.exit.c
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: store i32 %x.a, i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %[[X_B_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %c.header, label %a.exit.c
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %b.header, label %a.exit.b
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %b.header, label %a.exit.b
|
|
|
|
|
|
|
|
a.exit.c:
|
|
|
|
br label %a.latch
|
|
|
|
; CHECK: a.exit.c
|
|
|
|
; CHECK-NEXT: br label %a.latch
|
|
|
|
|
|
|
|
a.exit.b:
|
|
|
|
br label %a.latch
|
|
|
|
; CHECK: a.exit.b:
|
|
|
|
; CHECK-NEXT: br label %a.latch
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Unswitch will transform the loop nest from:
|
|
|
|
; A < B < C
|
|
|
|
; into
|
|
|
|
; (A < B), C
|
|
|
|
define void @hoist_inner_loop2(i32* %ptr) {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop2(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
%x.a = load i32, i32* %ptr
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: %x.a = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
%x.b = load i32, i32* %ptr
|
|
|
|
%v1 = call i1 @cond()
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: %x.b = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: %v1 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v1, label %b.latch, label %[[B_HEADER_SPLIT:.*]]
|
|
|
|
;
|
|
|
|
; CHECK: [[B_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: %[[X_A_LCSSA:.*]] = phi i32 [ %x.a, %b.header ]
|
|
|
|
; CHECK-NEXT: %[[X_B_LCSSA:.*]] = phi i32 [ %x.b, %b.header ]
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
br i1 %v1, label %b.latch, label %c.latch
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: br label %c.latch
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
; Use values from other loops to check LCSSA form.
|
|
|
|
store i32 %x.a, i32* %ptr
|
|
|
|
store i32 %x.b, i32* %ptr
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %c.header, label %exit
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: store i32 %[[X_A_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %[[X_B_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %c.header, label %exit
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %b.header, label %a.latch
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %b.header, label %a.latch
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Same as @hoist_inner_loop2 but with a nested loop inside the hoisted loop.
|
|
|
|
; Unswitch will transform the loop nest from:
|
|
|
|
; A < B < C < D
|
|
|
|
; into
|
|
|
|
; (A < B), (C < D)
|
|
|
|
define void @hoist_inner_loop3(i32* %ptr) {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop3(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
%x.a = load i32, i32* %ptr
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: %x.a = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
%x.b = load i32, i32* %ptr
|
|
|
|
%v1 = call i1 @cond()
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: %x.b = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: %v1 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v1, label %b.latch, label %[[B_HEADER_SPLIT:.*]]
|
|
|
|
;
|
|
|
|
; CHECK: [[B_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: %[[X_A_LCSSA:.*]] = phi i32 [ %x.a, %b.header ]
|
|
|
|
; CHECK-NEXT: %[[X_B_LCSSA:.*]] = phi i32 [ %x.b, %b.header ]
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
br i1 %v1, label %b.latch, label %c.body
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: br label %c.body
|
|
|
|
|
|
|
|
c.body:
|
|
|
|
%x.c = load i32, i32* %ptr
|
|
|
|
br label %d.header
|
|
|
|
; CHECK: c.body:
|
|
|
|
; CHECK-NEXT: %x.c = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %d.header
|
|
|
|
|
|
|
|
d.header:
|
|
|
|
; Use values from other loops to check LCSSA form.
|
|
|
|
store i32 %x.a, i32* %ptr
|
|
|
|
store i32 %x.b, i32* %ptr
|
|
|
|
store i32 %x.c, i32* %ptr
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %d.header, label %c.latch
|
|
|
|
; CHECK: d.header:
|
|
|
|
; CHECK-NEXT: store i32 %[[X_A_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %[[X_B_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %x.c, i32* %ptr
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %d.header, label %c.latch
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %c.header, label %exit
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %c.header, label %exit
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
%v4 = call i1 @cond()
|
|
|
|
br i1 %v4, label %b.header, label %a.latch
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: %v4 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v4, label %b.header, label %a.latch
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; This test is designed to exercise checking multiple remaining exits from the
|
|
|
|
; loop being unswitched.
|
|
|
|
; Unswitch will transform the loop nest from:
|
|
|
|
; A < B < C < D
|
|
|
|
; into
|
|
|
|
; A < B < (C, D)
|
|
|
|
define void @hoist_inner_loop4() {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop4(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
%v1 = call i1 @cond()
|
|
|
|
br label %d.header
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: %v1 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v1, label %[[C_HEADER_SPLIT:.*]], label %c.latch
|
|
|
|
;
|
|
|
|
; CHECK: [[C_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: br label %d.header
|
|
|
|
|
|
|
|
d.header:
|
|
|
|
br i1 %v1, label %d.exiting1, label %c.latch
|
|
|
|
; CHECK: d.header:
|
|
|
|
; CHECK-NEXT: br label %d.exiting1
|
|
|
|
|
|
|
|
d.exiting1:
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %d.exiting2, label %a.latch
|
|
|
|
; CHECK: d.exiting1:
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %d.exiting2, label %a.latch
|
|
|
|
|
|
|
|
d.exiting2:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %d.exiting3, label %loopexit.d
|
|
|
|
; CHECK: d.exiting2:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %d.exiting3, label %loopexit.d
|
|
|
|
|
|
|
|
d.exiting3:
|
|
|
|
%v4 = call i1 @cond()
|
|
|
|
br i1 %v4, label %d.latch, label %b.latch
|
|
|
|
; CHECK: d.exiting3:
|
|
|
|
; CHECK-NEXT: %v4 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v4, label %d.latch, label %b.latch
|
|
|
|
|
|
|
|
d.latch:
|
|
|
|
br label %d.header
|
|
|
|
; CHECK: d.latch:
|
|
|
|
; CHECK-NEXT: br label %d.header
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
%v5 = call i1 @cond()
|
|
|
|
br i1 %v5, label %c.header, label %loopexit.c
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: %v5 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v5, label %c.header, label %loopexit.c
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
loopexit.d:
|
|
|
|
br label %exit
|
|
|
|
; CHECK: loopexit.d:
|
|
|
|
; CHECK-NEXT: br label %exit
|
|
|
|
|
|
|
|
loopexit.c:
|
|
|
|
br label %exit
|
|
|
|
; CHECK: loopexit.c:
|
|
|
|
; CHECK-NEXT: br label %exit
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Unswitch will transform the loop nest from:
|
|
|
|
; A < B < C < D
|
|
|
|
; into
|
|
|
|
; A < ((B < C), D)
|
|
|
|
define void @hoist_inner_loop5(i32* %ptr) {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop5(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
%x.a = load i32, i32* %ptr
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: %x.a = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
%x.b = load i32, i32* %ptr
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: %x.b = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
%x.c = load i32, i32* %ptr
|
|
|
|
%v1 = call i1 @cond()
|
|
|
|
br label %d.header
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: %x.c = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: %v1 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v1, label %c.latch, label %[[C_HEADER_SPLIT:.*]]
|
|
|
|
;
|
|
|
|
; CHECK: [[C_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: %[[X_B_LCSSA:.*]] = phi i32 [ %x.b, %c.header ]
|
|
|
|
; CHECK-NEXT: %[[X_C_LCSSA:.*]] = phi i32 [ %x.c, %c.header ]
|
|
|
|
; CHECK-NEXT: br label %d.header
|
|
|
|
|
|
|
|
d.header:
|
|
|
|
br i1 %v1, label %c.latch, label %d.latch
|
|
|
|
; CHECK: d.header:
|
|
|
|
; CHECK-NEXT: br label %d.latch
|
|
|
|
|
|
|
|
d.latch:
|
|
|
|
; Use values from other loops to check LCSSA form.
|
|
|
|
store i32 %x.a, i32* %ptr
|
|
|
|
store i32 %x.b, i32* %ptr
|
|
|
|
store i32 %x.c, i32* %ptr
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %d.header, label %a.latch
|
|
|
|
; CHECK: d.latch:
|
|
|
|
; CHECK-NEXT: store i32 %x.a, i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %[[X_B_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %[[X_C_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %d.header, label %a.latch
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %c.header, label %b.latch
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %c.header, label %b.latch
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Same as `@hoist_inner_loop2` but using a switch.
|
|
|
|
; Unswitch will transform the loop nest from:
|
|
|
|
; A < B < C
|
|
|
|
; into
|
|
|
|
; (A < B), C
|
|
|
|
define void @hoist_inner_loop_switch(i32* %ptr) {
|
|
|
|
; CHECK-LABEL: define void @hoist_inner_loop_switch(
|
|
|
|
entry:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: entry:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
a.header:
|
|
|
|
%x.a = load i32, i32* %ptr
|
|
|
|
br label %b.header
|
|
|
|
; CHECK: a.header:
|
|
|
|
; CHECK-NEXT: %x.a = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: br label %b.header
|
|
|
|
|
|
|
|
b.header:
|
|
|
|
%x.b = load i32, i32* %ptr
|
|
|
|
%v1 = call i32 @cond.i32()
|
|
|
|
br label %c.header
|
|
|
|
; CHECK: b.header:
|
|
|
|
; CHECK-NEXT: %x.b = load i32, i32* %ptr
|
|
|
|
; CHECK-NEXT: %v1 = call i32 @cond.i32()
|
|
|
|
; CHECK-NEXT: switch i32 %v1, label %[[B_HEADER_SPLIT:.*]] [
|
|
|
|
; CHECK-NEXT: i32 1, label %b.latch
|
|
|
|
; CHECK-NEXT: i32 2, label %b.latch
|
|
|
|
; CHECK-NEXT: i32 3, label %b.latch
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: [[B_HEADER_SPLIT]]:
|
|
|
|
; CHECK-NEXT: %[[X_A_LCSSA:.*]] = phi i32 [ %x.a, %b.header ]
|
|
|
|
; CHECK-NEXT: %[[X_B_LCSSA:.*]] = phi i32 [ %x.b, %b.header ]
|
|
|
|
; CHECK-NEXT: br label %c.header
|
|
|
|
|
|
|
|
c.header:
|
|
|
|
switch i32 %v1, label %c.latch [
|
|
|
|
i32 1, label %b.latch
|
|
|
|
i32 2, label %b.latch
|
|
|
|
i32 3, label %b.latch
|
|
|
|
]
|
|
|
|
; CHECK: c.header:
|
|
|
|
; CHECK-NEXT: br label %c.latch
|
|
|
|
|
|
|
|
c.latch:
|
|
|
|
; Use values from other loops to check LCSSA form.
|
|
|
|
store i32 %x.a, i32* %ptr
|
|
|
|
store i32 %x.b, i32* %ptr
|
|
|
|
%v2 = call i1 @cond()
|
|
|
|
br i1 %v2, label %c.header, label %exit
|
|
|
|
; CHECK: c.latch:
|
|
|
|
; CHECK-NEXT: store i32 %[[X_A_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: store i32 %[[X_B_LCSSA]], i32* %ptr
|
|
|
|
; CHECK-NEXT: %v2 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v2, label %c.header, label %exit
|
|
|
|
|
|
|
|
b.latch:
|
|
|
|
%v3 = call i1 @cond()
|
|
|
|
br i1 %v3, label %b.header, label %a.latch
|
|
|
|
; CHECK: b.latch:
|
|
|
|
; CHECK-NEXT: %v3 = call i1 @cond()
|
|
|
|
; CHECK-NEXT: br i1 %v3, label %b.header, label %a.latch
|
|
|
|
|
|
|
|
a.latch:
|
|
|
|
br label %a.header
|
|
|
|
; CHECK: a.latch:
|
|
|
|
; CHECK-NEXT: br label %a.header
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
; CHECK: exit:
|
|
|
|
; CHECK-NEXT: ret void
|
|
|
|
}
|
[PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.
The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.
The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[
This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.
And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.
llvm-svn: 336646
2018-07-10 10:36:05 +02:00
|
|
|
|
|
|
|
define void @test_unswitch_to_common_succ_with_phis(i32* %var, i32 %cond) {
|
|
|
|
; CHECK-LABEL: @test_unswitch_to_common_succ_with_phis(
|
|
|
|
entry:
|
|
|
|
br label %header
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: switch i32 %cond, label %loopexit1 [
|
|
|
|
; CHECK-NEXT: i32 13, label %loopexit2
|
|
|
|
; CHECK-NEXT: i32 0, label %entry.split
|
|
|
|
; CHECK-NEXT: i32 1, label %entry.split
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %header
|
|
|
|
|
|
|
|
header:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
switch i32 %cond, label %loopexit1 [
|
|
|
|
i32 0, label %latch
|
|
|
|
i32 1, label %latch
|
|
|
|
i32 13, label %loopexit2
|
|
|
|
]
|
|
|
|
; CHECK: header:
|
|
|
|
; CHECK-NEXT: load
|
|
|
|
; CHECK-NEXT: br label %latch
|
|
|
|
|
|
|
|
latch:
|
|
|
|
; No-op PHI node to exercise weird PHI update scenarios.
|
|
|
|
%phi = phi i32 [ %var_val, %header ], [ %var_val, %header ]
|
|
|
|
call void @sink(i32 %phi)
|
|
|
|
br label %header
|
|
|
|
; CHECK: latch:
|
|
|
|
; CHECK-NEXT: %[[PHI:.*]] = phi i32 [ %var_val, %header ]
|
|
|
|
; CHECK-NEXT: call void @sink(i32 %[[PHI]])
|
|
|
|
; CHECK-NEXT: br label %header
|
|
|
|
|
|
|
|
loopexit1:
|
|
|
|
ret void
|
|
|
|
; CHECK: loopexit1:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
|
|
|
|
loopexit2:
|
|
|
|
ret void
|
|
|
|
; CHECK: loopexit2:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
}
|
|
|
|
|
|
|
|
define void @test_unswitch_to_default_common_succ_with_phis(i32* %var, i32 %cond) {
|
|
|
|
; CHECK-LABEL: @test_unswitch_to_default_common_succ_with_phis(
|
|
|
|
entry:
|
|
|
|
br label %header
|
|
|
|
; CHECK-NEXT: entry:
|
|
|
|
; CHECK-NEXT: switch i32 %cond, label %entry.split [
|
|
|
|
; CHECK-NEXT: i32 13, label %loopexit
|
|
|
|
; CHECK-NEXT: ]
|
|
|
|
;
|
|
|
|
; CHECK: entry.split:
|
|
|
|
; CHECK-NEXT: br label %header
|
|
|
|
|
|
|
|
header:
|
|
|
|
%var_val = load i32, i32* %var
|
|
|
|
switch i32 %cond, label %latch [
|
|
|
|
i32 0, label %latch
|
|
|
|
i32 1, label %latch
|
|
|
|
i32 13, label %loopexit
|
|
|
|
]
|
|
|
|
; CHECK: header:
|
|
|
|
; CHECK-NEXT: load
|
|
|
|
; CHECK-NEXT: br label %latch
|
|
|
|
|
|
|
|
latch:
|
|
|
|
; No-op PHI node to exercise weird PHI update scenarios.
|
|
|
|
%phi = phi i32 [ %var_val, %header ], [ %var_val, %header ], [ %var_val, %header ]
|
|
|
|
call void @sink(i32 %phi)
|
|
|
|
br label %header
|
|
|
|
; CHECK: latch:
|
|
|
|
; CHECK-NEXT: %[[PHI:.*]] = phi i32 [ %var_val, %header ]
|
|
|
|
; CHECK-NEXT: call void @sink(i32 %[[PHI]])
|
|
|
|
; CHECK-NEXT: br label %header
|
|
|
|
|
|
|
|
loopexit:
|
|
|
|
ret void
|
|
|
|
; CHECK: loopexit:
|
|
|
|
; CHECK-NEXT: ret
|
|
|
|
}
|