1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
llvm-mirror/include/llvm
Adam Nemet 700b4d043d [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
..
ADT Fix DensetSet::insert_as() for MSVC2015 (NFC) 2016-09-06 03:03:15 +00:00
Analysis Fix up comment from r280442, noticed by Justin. 2016-09-02 17:20:32 +00:00
AsmParser
Bitcode Constify some path in the bitcode writer (NFC) 2016-08-19 06:06:18 +00:00
CodeGen ADT: Do not inherit from std::iterator in ilist_iterator 2016-09-03 02:27:35 +00:00
Config Use posix_fallocate instead of ftruncate. 2016-07-19 20:19:56 +00:00
DebugInfo [codeview] Use the correct max CV record length of 0xFF00 2016-09-02 18:43:27 +00:00
ExecutionEngine [ORC] Clone module flags metadata into the globals module in the 2016-09-04 17:53:30 +00:00
IR DebugInfo: use strongly typed enum for debug info flags 2016-09-06 10:46:28 +00:00
IRReader
LibDriver
LineEditor Apply clang-tidy's misc-move-constructor-init throughout LLVM. 2016-05-27 14:27:24 +00:00
Linker Linker: teach the IR mover to return llvm::Error. 2016-05-27 05:21:35 +00:00
LTO [ThinLTO] Move loading of cache entry to client 2016-08-26 23:29:14 +00:00
MC (LLVM part) Implement MASM-flavor intel syntax behavior for inline MS asm block: 2016-09-02 23:15:29 +00:00
Object [COFFObjectFile] Ignore broken symbol table 2016-08-30 20:20:24 +00:00
ObjectYAML [macho2yaml] Don't write empty linkedit data 2016-08-17 21:46:04 +00:00
Option
Passes [PM] Significantly refactor the pass pipeline parsing to be easier to 2016-08-03 03:21:41 +00:00
ProfileData [Coverage] Make sorting criteria for CounterMappingRegions local. 2016-08-31 07:01:17 +00:00
Support [Support] - Fix possible crash in match() of llvm::Regex. 2016-09-02 08:44:46 +00:00
TableGen [TableGen] Autobrief-ize Record. NFC. 2016-07-14 14:53:14 +00:00
Target [Target] Remove the AvailableRegClasses vector from TargetLoweringBase. It was a private member with no code reading from it. 2016-09-05 06:43:00 +00:00
Transforms [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info 2016-09-06 16:08:33 +00:00
CMakeLists.txt
InitializePasses.h Add a counter-function insertion pass 2016-09-01 09:42:39 +00:00
LinkAllIR.h
LinkAllPasses.h Add a counter-function insertion pass 2016-09-01 09:42:39 +00:00
module.modulemap Update modulemap for Msf -> MSF rename. 2016-07-30 12:05:17 +00:00
module.modulemap.build
Pass.h Remove unused header. 2016-05-25 22:56:58 +00:00
PassAnalysisSupport.h Apply clang-tidy's modernize-loop-convert to most of lib/Transforms. 2016-06-26 12:28:59 +00:00
PassInfo.h
PassRegistry.h
PassSupport.h [LPM] Reinstate r271781 which reinstated r271652 to replace the 2016-06-04 19:57:55 +00:00