1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
llvm-mirror/test/Transforms
Adam Nemet 700b4d043d [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
..
ADCE
AddDiscriminators Do not assign new discriminator for all intrinsics. 2016-08-05 17:56:49 +00:00
AlignmentFromAssumptions
ArgumentPromotion
AtomicExpand
BBVectorize Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers 2016-08-22 13:14:07 +00:00
BDCE
BranchFolding
CodeExtractor CodeExtractor : Add ability to preserve profile data. 2016-08-02 02:15:45 +00:00
CodeGenPrepare
ConstantHoisting This implements a more optimal algorithm for selecting a base constant in 2016-07-14 07:44:20 +00:00
ConstantMerge
ConstProp Don't remove side effecting instructions due to ConstantFoldInstruction 2016-07-22 04:54:44 +00:00
Coroutines [Coroutines] Part12: Handle alloca address-taken 2016-09-05 23:45:45 +00:00
CorrelatedValuePropagation CVP. Turn marking adds as no wrap (introduced by r278107) off by default 2016-08-18 16:08:35 +00:00
CountingFunctionInserter Add a counter-function insertion pass 2016-09-01 09:42:39 +00:00
CrossDSOCFI
DCE
DeadArgElim
DeadStoreElimination [DSE] Don't remove stores made live by a call which unwinds. 2016-08-12 01:09:53 +00:00
EarlyCSE [EarlyCSE] Optionally use MemorySSA. NFC. 2016-08-31 19:24:10 +00:00
EliminateAvailableExternally
Float2Int
ForcedFunctionAttrs
FunctionAttrs Forgot to add a test for r276008. 2016-07-20 04:13:05 +00:00
FunctionImport Don't import variadic functions 2016-08-11 22:13:57 +00:00
GCOVProfiling llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes. 2016-09-02 01:33:00 +00:00
GlobalDCE
GlobalMerge
GlobalOpt Revert "Revert "Invariant start/end intrinsics overloaded for address space"" 2016-08-13 23:31:24 +00:00
GuardWidening
GVN IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar 2016-08-30 19:58:48 +00:00
GVNHoist GVN-hoist: fix hoistingFromAllPaths for loops (PR29034) 2016-08-25 11:55:47 +00:00
IndVarSimplify Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative 2016-08-22 13:12:07 +00:00
InferFunctionAttrs Recommitting r275284: add support to inline __builtin_mempcpy 2016-07-29 18:23:18 +00:00
Inline Fix inliner funclet unwind memoization 2016-09-04 01:23:20 +00:00
InstCombine fix FileCheck variables for test added with r280677 2016-09-05 23:49:32 +00:00
InstMerge
InstSimplify [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN. 2016-09-02 14:47:43 +00:00
Internalize
IPConstantProp
IRCE [IRCE] Create llvm::Loop instances for cloned out loops 2016-08-14 01:04:46 +00:00
JumpThreading [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info 2016-09-06 16:08:33 +00:00
LCSSA Revert "Revert r275883 and r275891. They seem to cause PR28608." 2016-07-20 01:55:27 +00:00
LICM
LoadCombine
LoadStoreVectorizer [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148. 2016-08-30 23:53:59 +00:00
LoopDataPrefetch [PM] Port LoopDataPrefetch AArch64 tests to new pass manager 2016-08-22 12:59:58 +00:00
LoopDeletion [PM] Port Dead Loop Deletion Pass to the new PM 2016-07-14 18:28:29 +00:00
LoopDistribute [BPI] Add new LazyBPI analysis 2016-07-28 23:31:12 +00:00
LoopIdiom Target independent codesize heuristics for Loop Idiom Recognition 2016-08-11 18:28:33 +00:00
LoopInterchange
LoopLoadElim
LoopReroll [LoopReroll] Reroll loops with unordered atomic memory accesses 2016-07-19 00:23:54 +00:00
LoopRotate
LoopSimplify [LoopSimplify] Rebuild LCSSA for the inner loop after separating nested loops. 2016-08-09 22:44:56 +00:00
LoopSimplifyCFG
LoopStrengthReduce [LSR] Don't try and create post-inc expressions on non-rotated loops 2016-08-15 07:53:03 +00:00
LoopUnroll [LoopUnroll] Fix a PowerPC test broken by r277524. 2016-08-02 21:43:25 +00:00
LoopUnswitch
LoopVectorize [LV] Ensure reverse interleaved group GEPs remain uniform 2016-09-02 16:19:22 +00:00
LoopVersioning
LoopVersioningLICM [Loop Vectorizer] Fixed memory confilict checks. 2016-08-28 08:53:53 +00:00
LowerAtomic
LowerExpectIntrinsic [Profile] handle select instruction in 'expect' lowering 2016-09-02 22:03:40 +00:00
LowerGuardIntrinsic [PM] Port LowerGuardIntrinsic to the new PM. 2016-07-28 22:08:41 +00:00
LowerInvoke [PM] Port LowerInvoke to the new pass manager 2016-08-12 17:28:27 +00:00
LowerSwitch
LowerTypeTests [WebAssembly] Fix CFI index to account for padding nullptr function 2016-08-08 23:56:01 +00:00
Mem2Reg
MemCpyOpt [MemCpy] Add comments for r279769 2016-08-25 21:03:46 +00:00
MergeFunc
MetaRenamer
NameAnonFunctions
NaryReassociate [PM] Port NaryReassociate to the new PM 2016-07-21 22:28:52 +00:00
ObjCARC [Verifier] Resume instructions can only be in functions w/ a personality 2016-08-01 18:06:34 +00:00
PartiallyInlineLibCalls
PGOProfile [ThinLTO] Indirect call promotion fixes for promoted local functions 2016-08-29 22:46:56 +00:00
PhaseOrdering
PlaceSafepoints
PreISelIntrinsicLowering
PruneEH
Reassociate [Reassociate] Add test for PR28367. 2016-08-18 13:22:37 +00:00
Reg2Mem
RewriteStatepointsForGC [statepoints][experimental] Add support for live-in semantics of values in deopt bundles 2016-08-31 15:12:17 +00:00
SafeStack [safestack] Layout large allocas first to reduce fragmentation. 2016-08-02 23:21:30 +00:00
SampleProfile Fine tuning of sample profile propagation algorithm. 2016-08-12 16:22:12 +00:00
Scalarizer Scalarizer: Support scalarizing intrinsics 2016-07-25 20:02:54 +00:00
SCCP [SCCP] Don't delete side-effecting instructions 2016-08-24 18:10:21 +00:00
SeparateConstOffsetFromGEP [NVPTX] Enable the load-store vectorizer on nvptx. 2016-07-20 22:11:36 +00:00
SimplifyCFG [SimplifyCFG] Add test for sinking inline asm in if/else 2016-09-05 13:49:26 +00:00
Sink Add a testcase for r275581 2016-07-19 17:52:41 +00:00
SLPVectorizer [SLP] Avoid signed integer overflow 2016-08-23 20:48:50 +00:00
SpeculativeExecution [PM] Port SpeculativeExecution to the new PM 2016-08-01 21:48:33 +00:00
SROA [SROA] Fix crash with lifetime intrinsic partially covering alloca. 2016-08-08 01:30:53 +00:00
StraightLineStrengthReduce
StripDeadPrototypes
StripSymbols
StructurizeCFG StructurizeCFG: Fix inverting constantexpr conditions 2016-07-15 22:13:16 +00:00
TailCallElim
Util [MSSA] Fix PR28880 by fixing use optimizer's lower bound tracking behavior. 2016-08-08 04:44:53 +00:00
WholeProgramDevirt WholeProgramDevirt: generate more detailed and accurate remarks. 2016-08-11 19:09:02 +00:00