mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-22 10:42:39 +01:00
09c0e58fa9
The current implementation for computing relative block frequencies does not handle correctly control-flow graphs containing irreducible loops. This results in suboptimally generated binaries, whose perf can be up to 5% worse than optimal. To resolve the problem, we apply a post-processing step, which iteratively updates block frequencies based on the frequencies of their predesessors. This corresponds to finding the stationary point of the Markov chain by an iterative method aka "PageRank computation". The algorithm takes at most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster. It is turned on by passing option `use-iterative-bfi-inference` and applied only for functions containing profile data and irreducible loops. Tested on SPEC06/17, where it is helping to get correct profile counts for one of the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5% for binaries containing functions with hot irreducible loops. Reviewed By: hoy, wenlei, davidxl Differential Revision: https://reviews.llvm.org/D103289 |
||
---|---|---|
.. | ||
models/inliner | ||
AliasAnalysis.cpp | ||
AliasAnalysisEvaluator.cpp | ||
AliasAnalysisSummary.cpp | ||
AliasAnalysisSummary.h | ||
AliasSetTracker.cpp | ||
Analysis.cpp | ||
AssumeBundleQueries.cpp | ||
AssumptionCache.cpp | ||
BasicAliasAnalysis.cpp | ||
BlockFrequencyInfo.cpp | ||
BlockFrequencyInfoImpl.cpp | ||
BranchProbabilityInfo.cpp | ||
CallGraph.cpp | ||
CallGraphSCCPass.cpp | ||
CallPrinter.cpp | ||
CaptureTracking.cpp | ||
CFG.cpp | ||
CFGPrinter.cpp | ||
CFLAndersAliasAnalysis.cpp | ||
CFLGraph.h | ||
CFLSteensAliasAnalysis.cpp | ||
CGSCCPassManager.cpp | ||
CMakeLists.txt | ||
CmpInstAnalysis.cpp | ||
CodeMetrics.cpp | ||
ConstantFolding.cpp | ||
ConstraintSystem.cpp | ||
CostModel.cpp | ||
DDG.cpp | ||
DDGPrinter.cpp | ||
Delinearization.cpp | ||
DemandedBits.cpp | ||
DependenceAnalysis.cpp | ||
DependenceGraphBuilder.cpp | ||
DevelopmentModeInlineAdvisor.cpp | ||
DivergenceAnalysis.cpp | ||
DominanceFrontier.cpp | ||
DomPrinter.cpp | ||
DomTreeUpdater.cpp | ||
EHPersonalities.cpp | ||
FunctionPropertiesAnalysis.cpp | ||
GlobalsModRef.cpp | ||
GuardUtils.cpp | ||
HeatUtils.cpp | ||
ImportedFunctionsInliningStatistics.cpp | ||
IndirectCallPromotionAnalysis.cpp | ||
InlineAdvisor.cpp | ||
InlineCost.cpp | ||
InlineSizeEstimatorAnalysis.cpp | ||
InstCount.cpp | ||
InstructionPrecedenceTracking.cpp | ||
InstructionSimplify.cpp | ||
Interval.cpp | ||
IntervalPartition.cpp | ||
IRSimilarityIdentifier.cpp | ||
IVDescriptors.cpp | ||
IVUsers.cpp | ||
LazyBlockFrequencyInfo.cpp | ||
LazyBranchProbabilityInfo.cpp | ||
LazyCallGraph.cpp | ||
LazyValueInfo.cpp | ||
LegacyDivergenceAnalysis.cpp | ||
Lint.cpp | ||
Loads.cpp | ||
LoopAccessAnalysis.cpp | ||
LoopAnalysisManager.cpp | ||
LoopCacheAnalysis.cpp | ||
LoopInfo.cpp | ||
LoopNestAnalysis.cpp | ||
LoopPass.cpp | ||
LoopUnrollAnalyzer.cpp | ||
MemDepPrinter.cpp | ||
MemDerefPrinter.cpp | ||
MemoryBuiltins.cpp | ||
MemoryDependenceAnalysis.cpp | ||
MemoryLocation.cpp | ||
MemorySSA.cpp | ||
MemorySSAUpdater.cpp | ||
MLInlineAdvisor.cpp | ||
ModuleDebugInfoPrinter.cpp | ||
ModuleSummaryAnalysis.cpp | ||
MustExecute.cpp | ||
ObjCARCAliasAnalysis.cpp | ||
ObjCARCAnalysisUtils.cpp | ||
ObjCARCInstKind.cpp | ||
OptimizationRemarkEmitter.cpp | ||
OverflowInstAnalysis.cpp | ||
PHITransAddr.cpp | ||
PhiValues.cpp | ||
PostDominators.cpp | ||
ProfileSummaryInfo.cpp | ||
PtrUseVisitor.cpp | ||
README.txt | ||
RegionInfo.cpp | ||
RegionPass.cpp | ||
RegionPrinter.cpp | ||
ReleaseModeModelRunner.cpp | ||
ReplayInlineAdvisor.cpp | ||
ScalarEvolution.cpp | ||
ScalarEvolutionAliasAnalysis.cpp | ||
ScalarEvolutionDivision.cpp | ||
ScalarEvolutionNormalization.cpp | ||
ScopedNoAliasAA.cpp | ||
StackLifetime.cpp | ||
StackSafetyAnalysis.cpp | ||
StratifiedSets.h | ||
SyncDependenceAnalysis.cpp | ||
SyntheticCountsUtils.cpp | ||
TargetLibraryInfo.cpp | ||
TargetTransformInfo.cpp | ||
TFUtils.cpp | ||
Trace.cpp | ||
TypeBasedAliasAnalysis.cpp | ||
TypeMetadataUtils.cpp | ||
ValueLattice.cpp | ||
ValueLatticeUtils.cpp | ||
ValueTracking.cpp | ||
VectorUtils.cpp | ||
VFABIDemangling.cpp |
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//