llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00

History

Michael Zolotukhin afd08c7313 [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... Summary: ...loop after the last iteration. This is really hard to do correctly. The core problem is that we need to model liveness through the induction PHIs from iteration to iteration in order to get the correct results, and we need to correctly de-duplicate the common subgraphs of instructions feeding some subset of the induction PHIs. All of this can be driven either from a side effect at some iteration or from the loop values used after the loop finishes. This patch implements this by storing the forward-propagating analysis of each instruction in a cache to recall whether it was free and whether it has become live and thus counted toward the total unroll cost. Then, at each sink for a value in the loop, we recursively walk back through every value that feeds the sink, including looping back through the iterations as needed, until we have marked the entire input graph as live. Because we cache this, we never visit instructions more than twice -- once when we analyze them and put them into the cache, and once when we count their cost towards the unrolled loop. Also, because the cache is only two bits and because we are dealing with relatively small iteration counts, we can store all of this very densely in memory to avoid this from becoming an excessively slow analysis. The code here is still pretty gross. I would appreciate suggestions about better ways to factor or split this up, I've stared too long at the algorithmic side to really have a good sense of what the design should probably look at. Also, it might seem like we should do all of this bottom-up, but I think that is a red herring. Specifically, the simplification power is much greater working top-down. We can forward propagate very effectively, even across strange and interesting recurrances around the backedge. Because we use data to propagate, this doesn't cause a state space explosion. Doing this level of constant folding, etc, would be very expensive to do bottom-up because it wouldn't be until the last moment that you could collapse everything. The current solution is essentially a top-down simplification with a bottom-up cost accounting which seems to get the best of both worlds. It makes the simplification incremental and powerful while leaving everything dead until we know it is needed. Finally, a core property of this approach is its monotonicity. At all times, the current UnrolledCost is a conservatively low estimate. This ensures that we will never early-exit from the analysis due to exceeding a threshold when if we had continued, the cost would have gone back below the threshold. These kinds of bugs can cause incredibly hard to track down random changes to behavior. We could use a techinque similar (but much simpler) within the inliner as well to avoid considering speculated code in the inline cost. Reviewers: chandlerc Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D11758 llvm-svn: 269388		2016-05-13 01:42:39 +00:00
..
AliasAnalysis.cpp
AliasAnalysisEvaluator.cpp
AliasSetTracker.cpp
Analysis.cpp	[PM] Port of the DepndenceAnalysis to the new PM.	2016-05-12 22:19:39 +00:00
AssumptionCache.cpp
BasicAliasAnalysis.cpp	[BasicAA] Compare GEP indices based on value (Fix PR27418)	2016-05-11 15:45:43 +00:00
BitSetUtils.cpp	Re-apply r269081 and r269082 with a fix for MSVC.	2016-05-10 18:07:21 +00:00
BlockFrequencyInfo.cpp	[PM] port Branch Frequency Analaysis pass to new PM	2016-05-05 21:13:27 +00:00
BlockFrequencyInfoImpl.cpp	fix spelling; NFC	2016-05-09 16:07:45 +00:00
BranchProbabilityInfo.cpp	[PM] Port Branch Probability Analysis pass to the new pass manager.	2016-05-05 02:59:57 +00:00
CallGraph.cpp
CallGraphSCCPass.cpp	Re-commit optimization bisect support (r267022) without new pass manager support.	2016-04-22 22:06:11 +00:00
CallPrinter.cpp
CaptureTracking.cpp	Fold compares irrespective of whether allocation can be elided	2016-05-03 14:58:21 +00:00
CFG.cpp
CFGPrinter.cpp
CFLAliasAnalysis.cpp	[CFLAA] Fix a use-of-invalid-pointer bug.	2016-05-02 18:09:19 +00:00
CGSCCPassManager.cpp
CMakeLists.txt	Re-apply r269081 and r269082 with a fix for MSVC.	2016-05-10 18:07:21 +00:00
CodeMetrics.cpp
ConstantFolding.cpp	[ConstantFolding, ValueTracking] Fold constants involving bitcasts of ConstantVector	2016-05-04 06:13:33 +00:00
CostModel.cpp
Delinearization.cpp
DemandedBits.cpp	Port DemandedBits to the new pass manager.	2016-04-18 23:55:01 +00:00
DependenceAnalysis.cpp	[PM] Port of the DepndenceAnalysis to the new PM.	2016-05-12 22:19:39 +00:00
DivergenceAnalysis.cpp	DivergenceAnalysis: Fix crash with no return blocks	2016-05-09 16:57:08 +00:00
DominanceFrontier.cpp
DomPrinter.cpp
EHPersonalities.cpp
GlobalsModRef.cpp
InlineCost.cpp	Revert r269131	2016-05-10 23:26:04 +00:00
InstCount.cpp
InstructionSimplify.cpp	[InstSimplify] use computeKnownBits on shift amount operands	2016-05-10 20:46:54 +00:00
Interval.cpp
IntervalPartition.cpp
IteratedDominanceFrontier.cpp	Correct IDF calculator for ReverseIDF	2016-04-19 06:13:28 +00:00
IVUsers.cpp
LazyCallGraph.cpp
LazyValueInfo.cpp	[LVI] Add an API to LazyValueInfo so that it can export ConstantRanges	2016-05-02 19:58:00 +00:00
Lint.cpp
LLVMBuild.txt	Revert r269131	2016-05-10 23:26:04 +00:00
Loads.cpp	NFC. Introduce Value::isPointerDereferenceable	2016-05-11 14:43:28 +00:00
LoopAccessAnalysis.cpp	[LAA] Use std::min. NFC	2016-05-12 21:41:53 +00:00
LoopInfo.cpp	[LoopUnroll] Unroll loops which have exit blocks to EH pads	2016-05-03 03:57:40 +00:00
LoopPass.cpp	Re-commit optimization bisect support (r267022) without new pass manager support.	2016-04-22 22:06:11 +00:00
LoopPassManager.cpp	PM: Check that loop passes preserve a basic set of analyses	2016-05-03 21:35:08 +00:00
LoopUnrollAnalyzer.cpp	[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...	2016-05-13 01:42:39 +00:00
MemDepPrinter.cpp
MemDerefPrinter.cpp
MemoryBuiltins.cpp
MemoryDependenceAnalysis.cpp
MemoryLocation.cpp	[TLI] Unify LibFunc signature checking. NFCI.	2016-04-27 19:04:35 +00:00
ModuleDebugInfoPrinter.cpp
ModuleSummaryAnalysis.cpp	ThinLTO: fix assertion and refactor check for hidden use from inline ASM in a helper function	2016-05-06 08:25:33 +00:00
ObjCARCAliasAnalysis.cpp
ObjCARCAnalysisUtils.cpp
ObjCARCInstKind.cpp
OrderedBasicBlock.cpp
PHITransAddr.cpp
PostDominators.cpp
PtrUseVisitor.cpp
README.txt
RegionInfo.cpp
RegionPass.cpp
RegionPrinter.cpp
ScalarEvolution.cpp	[SCEV] Be more aggressive around proving no-wrap	2016-05-11 17:41:26 +00:00
ScalarEvolutionAliasAnalysis.cpp
ScalarEvolutionExpander.cpp	[SCEVExpander] Fix a failed cast<> assertion	2016-05-11 17:41:41 +00:00
ScalarEvolutionNormalization.cpp	Remove emacs mode markers from .cpp files. NFC	2016-04-24 17:55:41 +00:00
ScopedNoAliasAA.cpp
SparsePropagation.cpp
StratifiedSets.h
TargetLibraryInfo.cpp	[X86] Promote several single precision FP libcalls on Windows	2016-05-08 08:15:50 +00:00
TargetTransformInfo.cpp	[TTI] Add hook for vector extract with extension	2016-04-27 15:20:21 +00:00
Trace.cpp
TypeBasedAliasAnalysis.cpp
ValueTracking.cpp	[ValueTracking] Use guards to prove non-nullness of a value	2016-05-10 02:35:44 +00:00
VectorUtils.cpp	Revert "[VectorUtils] Query number of sign bits to allow more truncations"	2016-05-10 12:27:23 +00:00

README.txt

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//