Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
; RUN: opt < %s -inline -inline-threshold=20 -S | FileCheck %s
|
2006-05-27 03:16:22 +02:00
|
|
|
|
2012-03-12 12:19:28 +01:00
|
|
|
define internal i32 @callee1(i32 %A, i32 %B) {
|
|
|
|
%C = sdiv i32 %A, %B
|
|
|
|
ret i32 %C
|
2006-05-27 03:16:22 +02:00
|
|
|
}
|
|
|
|
|
2012-03-12 12:19:28 +01:00
|
|
|
define i32 @caller1() {
|
2013-07-14 03:50:49 +02:00
|
|
|
; CHECK-LABEL: define i32 @caller1(
|
2012-03-12 12:19:28 +01:00
|
|
|
; CHECK-NEXT: ret i32 3
|
2008-03-01 10:15:35 +01:00
|
|
|
|
2012-03-12 12:19:28 +01:00
|
|
|
%X = call i32 @callee1( i32 10, i32 3 )
|
|
|
|
ret i32 %X
|
|
|
|
}
|
2012-03-12 12:19:33 +01:00
|
|
|
|
|
|
|
define i32 @caller2() {
|
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
; Check that we can constant-prop through instructions after inlining callee21
|
|
|
|
; to get constants in the inlined callsite to callee22.
|
|
|
|
; FIXME: Currently, the threshold is fixed at 20 because we don't perform
|
|
|
|
; *recursive* cost analysis to realize that the nested call site will definitely
|
|
|
|
; inline and be cheap. We should eventually do that and lower the threshold here
|
|
|
|
; to 1.
|
|
|
|
;
|
2013-07-14 03:42:54 +02:00
|
|
|
; CHECK-LABEL: @caller2(
|
2012-03-12 12:19:33 +01:00
|
|
|
; CHECK-NOT: call void @callee2
|
|
|
|
; CHECK: ret
|
|
|
|
|
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
%x = call i32 @callee21(i32 42, i32 48)
|
2012-03-12 12:19:33 +01:00
|
|
|
ret i32 %x
|
|
|
|
}
|
|
|
|
|
|
|
|
define i32 @callee21(i32 %x, i32 %y) {
|
|
|
|
%sub = sub i32 %y, %x
|
|
|
|
%result = call i32 @callee22(i32 %sub)
|
|
|
|
ret i32 %result
|
|
|
|
}
|
|
|
|
|
|
|
|
declare i8* @getptr()
|
|
|
|
|
|
|
|
define i32 @callee22(i32 %x) {
|
|
|
|
%icmp = icmp ugt i32 %x, 42
|
|
|
|
br i1 %icmp, label %bb.true, label %bb.false
|
|
|
|
bb.true:
|
|
|
|
; This block musn't be counted in the inline cost.
|
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
%x1 = add i32 %x, 1
|
|
|
|
%x2 = add i32 %x1, 1
|
|
|
|
%x3 = add i32 %x2, 1
|
|
|
|
%x4 = add i32 %x3, 1
|
|
|
|
%x5 = add i32 %x4, 1
|
|
|
|
%x6 = add i32 %x5, 1
|
|
|
|
%x7 = add i32 %x6, 1
|
|
|
|
%x8 = add i32 %x7, 1
|
2012-03-12 12:19:33 +01:00
|
|
|
|
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
ret i32 %x8
|
2012-03-12 12:19:33 +01:00
|
|
|
bb.false:
|
|
|
|
ret i32 %x
|
|
|
|
}
|
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
|
|
|
|
define i32 @caller3() {
|
|
|
|
; Check that even if the expensive path is hidden behind several basic blocks,
|
|
|
|
; it doesn't count toward the inline cost when constant-prop proves those paths
|
|
|
|
; dead.
|
|
|
|
;
|
2013-07-14 03:42:54 +02:00
|
|
|
; CHECK-LABEL: @caller3(
|
Initial commit for the rewrite of the inline cost analysis to operate
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
llvm-svn: 153812
2012-03-31 14:42:41 +02:00
|
|
|
; CHECK-NOT: call
|
|
|
|
; CHECK: ret i32 6
|
|
|
|
|
|
|
|
entry:
|
|
|
|
%x = call i32 @callee3(i32 42, i32 48)
|
|
|
|
ret i32 %x
|
|
|
|
}
|
|
|
|
|
|
|
|
define i32 @callee3(i32 %x, i32 %y) {
|
|
|
|
%sub = sub i32 %y, %x
|
|
|
|
%icmp = icmp ugt i32 %sub, 42
|
|
|
|
br i1 %icmp, label %bb.true, label %bb.false
|
|
|
|
|
|
|
|
bb.true:
|
|
|
|
%icmp2 = icmp ult i32 %sub, 64
|
|
|
|
br i1 %icmp2, label %bb.true.true, label %bb.true.false
|
|
|
|
|
|
|
|
bb.true.true:
|
|
|
|
; This block musn't be counted in the inline cost.
|
|
|
|
%x1 = add i32 %x, 1
|
|
|
|
%x2 = add i32 %x1, 1
|
|
|
|
%x3 = add i32 %x2, 1
|
|
|
|
%x4 = add i32 %x3, 1
|
|
|
|
%x5 = add i32 %x4, 1
|
|
|
|
%x6 = add i32 %x5, 1
|
|
|
|
%x7 = add i32 %x6, 1
|
|
|
|
%x8 = add i32 %x7, 1
|
|
|
|
br label %bb.merge
|
|
|
|
|
|
|
|
bb.true.false:
|
|
|
|
; This block musn't be counted in the inline cost.
|
|
|
|
%y1 = add i32 %y, 1
|
|
|
|
%y2 = add i32 %y1, 1
|
|
|
|
%y3 = add i32 %y2, 1
|
|
|
|
%y4 = add i32 %y3, 1
|
|
|
|
%y5 = add i32 %y4, 1
|
|
|
|
%y6 = add i32 %y5, 1
|
|
|
|
%y7 = add i32 %y6, 1
|
|
|
|
%y8 = add i32 %y7, 1
|
|
|
|
br label %bb.merge
|
|
|
|
|
|
|
|
bb.merge:
|
|
|
|
%result = phi i32 [ %x8, %bb.true.true ], [ %y8, %bb.true.false ]
|
|
|
|
ret i32 %result
|
|
|
|
|
|
|
|
bb.false:
|
|
|
|
ret i32 %sub
|
|
|
|
}
|
2012-08-07 12:59:59 +02:00
|
|
|
|
2012-12-28 15:23:32 +01:00
|
|
|
declare {i8, i1} @llvm.uadd.with.overflow.i8(i8 %a, i8 %b)
|
|
|
|
|
|
|
|
define i8 @caller4(i8 %z) {
|
|
|
|
; Check that we can constant fold through intrinsics such as the
|
|
|
|
; overflow-detecting arithmetic instrinsics. These are particularly important
|
|
|
|
; as they are used heavily in standard library code and generic C++ code where
|
|
|
|
; the arguments are oftent constant but complete generality is required.
|
|
|
|
;
|
2013-07-14 03:42:54 +02:00
|
|
|
; CHECK-LABEL: @caller4(
|
2012-12-28 15:23:32 +01:00
|
|
|
; CHECK-NOT: call
|
|
|
|
; CHECK: ret i8 -1
|
|
|
|
|
|
|
|
entry:
|
|
|
|
%x = call i8 @callee4(i8 254, i8 14, i8 %z)
|
|
|
|
ret i8 %x
|
|
|
|
}
|
|
|
|
|
|
|
|
define i8 @callee4(i8 %x, i8 %y, i8 %z) {
|
|
|
|
%uadd = call {i8, i1} @llvm.uadd.with.overflow.i8(i8 %x, i8 %y)
|
|
|
|
%o = extractvalue {i8, i1} %uadd, 1
|
|
|
|
br i1 %o, label %bb.true, label %bb.false
|
|
|
|
|
|
|
|
bb.true:
|
|
|
|
ret i8 -1
|
|
|
|
|
|
|
|
bb.false:
|
|
|
|
; This block musn't be counted in the inline cost.
|
|
|
|
%z1 = add i8 %z, 1
|
|
|
|
%z2 = add i8 %z1, 1
|
|
|
|
%z3 = add i8 %z2, 1
|
|
|
|
%z4 = add i8 %z3, 1
|
|
|
|
%z5 = add i8 %z4, 1
|
|
|
|
%z6 = add i8 %z5, 1
|
|
|
|
%z7 = add i8 %z6, 1
|
|
|
|
%z8 = add i8 %z7, 1
|
|
|
|
ret i8 %z8
|
|
|
|
}
|
|
|
|
|
2012-12-28 15:43:42 +01:00
|
|
|
define i64 @caller5(i64 %y) {
|
|
|
|
; Check that we can round trip constants through various kinds of casts etc w/o
|
|
|
|
; losing track of the constant prop in the inline cost analysis.
|
|
|
|
;
|
2013-07-14 03:42:54 +02:00
|
|
|
; CHECK-LABEL: @caller5(
|
2012-12-28 15:43:42 +01:00
|
|
|
; CHECK-NOT: call
|
|
|
|
; CHECK: ret i64 -1
|
|
|
|
|
|
|
|
entry:
|
|
|
|
%x = call i64 @callee5(i64 42, i64 %y)
|
|
|
|
ret i64 %x
|
|
|
|
}
|
|
|
|
|
|
|
|
define i64 @callee5(i64 %x, i64 %y) {
|
|
|
|
%inttoptr = inttoptr i64 %x to i8*
|
|
|
|
%bitcast = bitcast i8* %inttoptr to i32*
|
|
|
|
%ptrtoint = ptrtoint i32* %bitcast to i64
|
|
|
|
%trunc = trunc i64 %ptrtoint to i32
|
|
|
|
%zext = zext i32 %trunc to i64
|
|
|
|
%cmp = icmp eq i64 %zext, 42
|
|
|
|
br i1 %cmp, label %bb.true, label %bb.false
|
|
|
|
|
|
|
|
bb.true:
|
|
|
|
ret i64 -1
|
|
|
|
|
|
|
|
bb.false:
|
|
|
|
; This block musn't be counted in the inline cost.
|
|
|
|
%y1 = add i64 %y, 1
|
|
|
|
%y2 = add i64 %y1, 1
|
|
|
|
%y3 = add i64 %y2, 1
|
|
|
|
%y4 = add i64 %y3, 1
|
|
|
|
%y5 = add i64 %y4, 1
|
|
|
|
%y6 = add i64 %y5, 1
|
|
|
|
%y7 = add i64 %y6, 1
|
|
|
|
%y8 = add i64 %y7, 1
|
|
|
|
ret i64 %y8
|
|
|
|
}
|
|
|
|
|
2013-07-20 06:09:00 +02:00
|
|
|
define float @caller6() {
|
|
|
|
; Check that we can constant-prop through fcmp instructions
|
|
|
|
;
|
|
|
|
; CHECK-LABEL: @caller6(
|
|
|
|
; CHECK-NOT: call
|
|
|
|
; CHECK: ret
|
|
|
|
%x = call float @callee6(float 42.0)
|
|
|
|
ret float %x
|
|
|
|
}
|
|
|
|
|
|
|
|
define float @callee6(float %x) {
|
|
|
|
%icmp = fcmp ugt float %x, 42.0
|
|
|
|
br i1 %icmp, label %bb.true, label %bb.false
|
|
|
|
|
|
|
|
bb.true:
|
|
|
|
; This block musn't be counted in the inline cost.
|
|
|
|
%x1 = fadd float %x, 1.0
|
|
|
|
%x2 = fadd float %x1, 1.0
|
|
|
|
%x3 = fadd float %x2, 1.0
|
|
|
|
%x4 = fadd float %x3, 1.0
|
|
|
|
%x5 = fadd float %x4, 1.0
|
|
|
|
%x6 = fadd float %x5, 1.0
|
|
|
|
%x7 = fadd float %x6, 1.0
|
|
|
|
%x8 = fadd float %x7, 1.0
|
|
|
|
ret float %x8
|
|
|
|
|
|
|
|
bb.false:
|
|
|
|
ret float %x
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2012-08-07 12:59:59 +02:00
|
|
|
|
|
|
|
define i32 @PR13412.main() {
|
|
|
|
; This is a somewhat complicated three layer subprogram that was reported to
|
|
|
|
; compute the wrong value for a branch due to assuming that an argument
|
|
|
|
; mid-inline couldn't be equal to another pointer.
|
|
|
|
;
|
|
|
|
; After inlining, the branch should point directly to the exit block, not to
|
|
|
|
; the intermediate block.
|
|
|
|
; CHECK: @PR13412.main
|
|
|
|
; CHECK: br i1 true, label %[[TRUE_DEST:.*]], label %[[FALSE_DEST:.*]]
|
|
|
|
; CHECK: [[FALSE_DEST]]:
|
|
|
|
; CHECK-NEXT: call void @PR13412.fail()
|
|
|
|
; CHECK: [[TRUE_DEST]]:
|
|
|
|
; CHECK-NEXT: ret i32 0
|
|
|
|
|
|
|
|
entry:
|
|
|
|
%i1 = alloca i64
|
|
|
|
store i64 0, i64* %i1
|
|
|
|
%arraydecay = bitcast i64* %i1 to i32*
|
|
|
|
%call = call i1 @PR13412.first(i32* %arraydecay, i32* %arraydecay)
|
|
|
|
br i1 %call, label %cond.end, label %cond.false
|
|
|
|
|
|
|
|
cond.false:
|
|
|
|
call void @PR13412.fail()
|
|
|
|
br label %cond.end
|
|
|
|
|
|
|
|
cond.end:
|
|
|
|
ret i32 0
|
|
|
|
}
|
|
|
|
|
|
|
|
define internal i1 @PR13412.first(i32* %a, i32* %b) {
|
|
|
|
entry:
|
|
|
|
%call = call i32* @PR13412.second(i32* %a, i32* %b)
|
|
|
|
%cmp = icmp eq i32* %call, %b
|
|
|
|
ret i1 %cmp
|
|
|
|
}
|
|
|
|
|
|
|
|
declare void @PR13412.fail()
|
|
|
|
|
|
|
|
define internal i32* @PR13412.second(i32* %a, i32* %b) {
|
|
|
|
entry:
|
|
|
|
%sub.ptr.lhs.cast = ptrtoint i32* %b to i64
|
|
|
|
%sub.ptr.rhs.cast = ptrtoint i32* %a to i64
|
|
|
|
%sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
|
|
|
|
%sub.ptr.div = ashr exact i64 %sub.ptr.sub, 2
|
|
|
|
%cmp = icmp ugt i64 %sub.ptr.div, 1
|
|
|
|
br i1 %cmp, label %if.then, label %if.end3
|
|
|
|
|
|
|
|
if.then:
|
|
|
|
%0 = load i32* %a
|
|
|
|
%1 = load i32* %b
|
|
|
|
%cmp1 = icmp eq i32 %0, %1
|
|
|
|
br i1 %cmp1, label %return, label %if.end3
|
|
|
|
|
|
|
|
if.end3:
|
|
|
|
br label %return
|
|
|
|
|
|
|
|
return:
|
|
|
|
%retval.0 = phi i32* [ %b, %if.end3 ], [ %a, %if.then ]
|
|
|
|
ret i32* %retval.0
|
|
|
|
}
|