1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00
llvm-mirror/lib/Transforms/IPO
Chandler Carruth cd6d2689a0 Make a seemingly tiny change to the inliner and fix the generated code
size bloat. Unfortunately, I expect this to disable the majority of the
benefit from r152737. I'm hopeful at least that it will fix PR12345. To
explain this requires... quite a bit of backstory I'm afraid.

TL;DR: The change in r152737 actually did The Wrong Thing for
linkonce-odr functions. This change makes it do the right thing. The
benefits we saw were simple luck, not any actual strategy. Benchmark
numbers after a mini-blog-post so that I've written down my thoughts on
why all of this works and doesn't work...

To understand what's going on here, you have to understand how the
"bottom-up" inliner actually works. There are two fundamental modes to
the inliner:

1) Standard fixed-cost bottom-up inlining. This is the mode we usually
   think about. It walks from the bottom of the CFG up to the top,
   looking at callsites, taking information about the callsite and the
   called function and computing th expected cost of inlining into that
   callsite. If the cost is under a fixed threshold, it inlines. It's
   a touch more complicated than that due to all the bonuses, weights,
   etc. Inlining the last callsite to an internal function gets higher
   weighth, etc. But essentially, this is the mode of operation.

2) Deferred bottom-up inlining (a term I just made up). This is the
   interesting mode for this patch an r152737. Initially, this works
   just like mode #1, but once we have the cost of inlining into the
   callsite, we don't just compare it with a fixed threshold. First, we
   check something else. Let's give some names to the entities at this
   point, or we'll end up hopelessly confused. We're considering
   inlining a function 'A' into its callsite within a function 'B'. We
   want to check whether 'B' has any callers, and whether it might be
   inlined into those callers. If so, we also check whether inlining 'A'
   into 'B' would block any of the opportunities for inlining 'B' into
   its callers. We take the sum of the costs of inlining 'B' into its
   callers where that inlining would be blocked by inlining 'A' into
   'B', and if that cost is less than the cost of inlining 'A' into 'B',
   then we skip inlining 'A' into 'B'.

Now, in order for #2 to make sense, we have to have some confidence that
we will actually have the opportunity to inline 'B' into its callers
when cheaper, *and* that we'll be able to revisit the decision and
inline 'A' into 'B' if that ever becomes the correct tradeoff. This
often isn't true for external functions -- we can see very few of their
callers, and we won't be able to re-consider inlining 'A' into 'B' if
'B' is external when we finally see more callers of 'B'. There are two
cases where we believe this to be true for C/C++ code: functions local
to a translation unit, and functions with an inline definition in every
translation unit which uses them. These are represented as internal
linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
linkonce-odr in r152737.

Unfortunately, when I did that, I also introduced a subtle bug. There
was an implicit assumption that the last caller of the function within
the TU was the last caller of the function in the program. We want to
bonus the last caller of the function in the program by a huge amount
for inlining because inlining that callsite has very little cost.
Unfortunately, the last caller in the TU of a linkonce-odr function is
*not* the last caller in the program, and so we don't want to apply this
bonus. If we do, we can apply it to one callsite *per-TU*. Because of
the way deferred inlining works, when it sees this bonus applied to one
callsite in the TU for 'B', it decides that inlining 'B' is of the
*utmost* importance just so we can get that final bonus. It then
proceeds to essentially force deferred inlining regardless of the actual
cost tradeoff.

The result? PR12345: code bloat, code bloat, code bloat. Another result
is getting *damn* lucky on a few benchmarks, and the over-inlining
exposing critically important optimizations. I would very much like
a list of benchmarks that regress after this change goes in, with
bitcode before and after. This will help me greatly understand what
opportunities the current cost analysis is missing.

Initial benchmark numbers look very good. WebKit files that exhibited
the worst of PR12345 went from growing to shrinking compared to Clang
with r152737 reverted.

- Bootstrapped Clang is 3% smaller with this change.
- Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
  faster with this change.

Please let me know about any other performance impact you see. Thanks to
Nico for reporting and urging me to actually fix, Richard Smith, Duncan
Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
today.

llvm-svn: 153506
2012-03-27 10:48:28 +00:00
..
ArgumentPromotion.cpp Update inter-procedural optimizations for atomic load/store. 2011-08-15 22:16:46 +00:00
CMakeLists.txt build/CMake: Finish removal of add_llvm_library_dependencies. 2011-11-29 19:25:30 +00:00
ConstantMerge.cpp Re-fix the issue Bill fixed in r147899 in a slightly different way, which doesn't abuse the semantics of linker_private. We don't really want to merge any string constant with a weak_odr global. 2012-01-11 22:06:46 +00:00
DeadArgumentElimination.cpp Remove all remaining uses of Value::getNameStr(). 2011-11-15 16:27:03 +00:00
ExtractGV.cpp
FunctionAttrs.cpp SCCCaptured is trivially false on entry to this loop and not modified inside it. 2012-01-05 22:21:45 +00:00
GlobalDCE.cpp
GlobalOpt.cpp Teach globalopt how to evaluate an invoke with a non-void return type. 2012-03-13 18:01:37 +00:00
InlineAlways.cpp Start removing the use of an ad-hoc 'never inline' set and instead 2012-03-16 06:10:13 +00:00
Inliner.cpp Make a seemingly tiny change to the inliner and fix the generated code 2012-03-27 10:48:28 +00:00
InlineSimple.cpp Rip out support for 'llvm.noinline'. This thing has a strange history... 2012-03-16 06:10:15 +00:00
Internalize.cpp
IPConstantPropagation.cpp
IPO.cpp C API functions must be able to see their extern "C" definitions, or it will be impossible to call them from C. 2011-08-19 01:36:54 +00:00
LLVMBuild.txt Add a basic-block autovectorization pass. 2012-02-01 03:51:43 +00:00
LoopExtractor.cpp Place the check for an exit landing pad where it will be run on both code paths through the if-then-else. 2011-09-20 22:27:16 +00:00
Makefile
MergeFunctions.cpp Update inter-procedural optimizations for atomic load/store. 2011-08-15 22:16:46 +00:00
PartialInlining.cpp
PassManagerBuilder.cpp add EP_OptimizerLast extension point 2012-03-23 23:22:59 +00:00
PruneEH.cpp [unwind removal] We no longer have 'unwind' instructions being generated, so 2012-02-06 21:16:41 +00:00
StripDeadPrototypes.cpp
StripSymbols.cpp