1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00
llvm-mirror/lib/Analysis
Michael Zolotukhin afd08c7313 [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...
Summary:
...loop after the last iteration.

This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.

This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.

The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.

Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.

Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.

We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.

Reviewers: chandlerc

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11758

llvm-svn: 269388
2016-05-13 01:42:39 +00:00
..
AliasAnalysis.cpp
AliasAnalysisEvaluator.cpp
AliasSetTracker.cpp
Analysis.cpp [PM] Port of the DepndenceAnalysis to the new PM. 2016-05-12 22:19:39 +00:00
AssumptionCache.cpp
BasicAliasAnalysis.cpp [BasicAA] Compare GEP indices based on value (Fix PR27418) 2016-05-11 15:45:43 +00:00
BitSetUtils.cpp Re-apply r269081 and r269082 with a fix for MSVC. 2016-05-10 18:07:21 +00:00
BlockFrequencyInfo.cpp [PM] port Branch Frequency Analaysis pass to new PM 2016-05-05 21:13:27 +00:00
BlockFrequencyInfoImpl.cpp fix spelling; NFC 2016-05-09 16:07:45 +00:00
BranchProbabilityInfo.cpp [PM] Port Branch Probability Analysis pass to the new pass manager. 2016-05-05 02:59:57 +00:00
CallGraph.cpp
CallGraphSCCPass.cpp Re-commit optimization bisect support (r267022) without new pass manager support. 2016-04-22 22:06:11 +00:00
CallPrinter.cpp
CaptureTracking.cpp Fold compares irrespective of whether allocation can be elided 2016-05-03 14:58:21 +00:00
CFG.cpp
CFGPrinter.cpp
CFLAliasAnalysis.cpp [CFLAA] Fix a use-of-invalid-pointer bug. 2016-05-02 18:09:19 +00:00
CGSCCPassManager.cpp
CMakeLists.txt Re-apply r269081 and r269082 with a fix for MSVC. 2016-05-10 18:07:21 +00:00
CodeMetrics.cpp
ConstantFolding.cpp [ConstantFolding, ValueTracking] Fold constants involving bitcasts of ConstantVector 2016-05-04 06:13:33 +00:00
CostModel.cpp
Delinearization.cpp
DemandedBits.cpp Port DemandedBits to the new pass manager. 2016-04-18 23:55:01 +00:00
DependenceAnalysis.cpp [PM] Port of the DepndenceAnalysis to the new PM. 2016-05-12 22:19:39 +00:00
DivergenceAnalysis.cpp DivergenceAnalysis: Fix crash with no return blocks 2016-05-09 16:57:08 +00:00
DominanceFrontier.cpp
DomPrinter.cpp
EHPersonalities.cpp
GlobalsModRef.cpp
InlineCost.cpp Revert r269131 2016-05-10 23:26:04 +00:00
InstCount.cpp
InstructionSimplify.cpp [InstSimplify] use computeKnownBits on shift amount operands 2016-05-10 20:46:54 +00:00
Interval.cpp
IntervalPartition.cpp
IteratedDominanceFrontier.cpp Correct IDF calculator for ReverseIDF 2016-04-19 06:13:28 +00:00
IVUsers.cpp
LazyCallGraph.cpp
LazyValueInfo.cpp [LVI] Add an API to LazyValueInfo so that it can export ConstantRanges 2016-05-02 19:58:00 +00:00
Lint.cpp
LLVMBuild.txt Revert r269131 2016-05-10 23:26:04 +00:00
Loads.cpp NFC. Introduce Value::isPointerDereferenceable 2016-05-11 14:43:28 +00:00
LoopAccessAnalysis.cpp [LAA] Use std::min. NFC 2016-05-12 21:41:53 +00:00
LoopInfo.cpp [LoopUnroll] Unroll loops which have exit blocks to EH pads 2016-05-03 03:57:40 +00:00
LoopPass.cpp Re-commit optimization bisect support (r267022) without new pass manager support. 2016-04-22 22:06:11 +00:00
LoopPassManager.cpp PM: Check that loop passes preserve a basic set of analyses 2016-05-03 21:35:08 +00:00
LoopUnrollAnalyzer.cpp [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... 2016-05-13 01:42:39 +00:00
MemDepPrinter.cpp
MemDerefPrinter.cpp
MemoryBuiltins.cpp
MemoryDependenceAnalysis.cpp
MemoryLocation.cpp [TLI] Unify LibFunc signature checking. NFCI. 2016-04-27 19:04:35 +00:00
ModuleDebugInfoPrinter.cpp
ModuleSummaryAnalysis.cpp ThinLTO: fix assertion and refactor check for hidden use from inline ASM in a helper function 2016-05-06 08:25:33 +00:00
ObjCARCAliasAnalysis.cpp
ObjCARCAnalysisUtils.cpp
ObjCARCInstKind.cpp
OrderedBasicBlock.cpp
PHITransAddr.cpp
PostDominators.cpp
PtrUseVisitor.cpp
README.txt
RegionInfo.cpp
RegionPass.cpp
RegionPrinter.cpp
ScalarEvolution.cpp [SCEV] Be more aggressive around proving no-wrap 2016-05-11 17:41:26 +00:00
ScalarEvolutionAliasAnalysis.cpp
ScalarEvolutionExpander.cpp [SCEVExpander] Fix a failed cast<> assertion 2016-05-11 17:41:41 +00:00
ScalarEvolutionNormalization.cpp Remove emacs mode markers from .cpp files. NFC 2016-04-24 17:55:41 +00:00
ScopedNoAliasAA.cpp
SparsePropagation.cpp
StratifiedSets.h
TargetLibraryInfo.cpp [X86] Promote several single precision FP libcalls on Windows 2016-05-08 08:15:50 +00:00
TargetTransformInfo.cpp [TTI] Add hook for vector extract with extension 2016-04-27 15:20:21 +00:00
Trace.cpp
TypeBasedAliasAnalysis.cpp
ValueTracking.cpp [ValueTracking] Use guards to prove non-nullness of a value 2016-05-10 02:35:44 +00:00
VectorUtils.cpp Revert "[VectorUtils] Query number of sign bits to allow more truncations" 2016-05-10 12:27:23 +00:00

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//