mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-23 03:02:36 +01:00
31fa5fa3a2
Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645 |
||
---|---|---|
.. | ||
AliasAnalysis.cpp | ||
AliasAnalysisEvaluator.cpp | ||
AliasAnalysisSummary.cpp | ||
AliasAnalysisSummary.h | ||
AliasSetTracker.cpp | ||
Analysis.cpp | ||
AssumptionCache.cpp | ||
BasicAliasAnalysis.cpp | ||
BlockFrequencyInfo.cpp | ||
BlockFrequencyInfoImpl.cpp | ||
BranchProbabilityInfo.cpp | ||
CallGraph.cpp | ||
CallGraphSCCPass.cpp | ||
CallPrinter.cpp | ||
CaptureTracking.cpp | ||
CFG.cpp | ||
CFGPrinter.cpp | ||
CFLAndersAliasAnalysis.cpp | ||
CFLGraph.h | ||
CFLSteensAliasAnalysis.cpp | ||
CGSCCPassManager.cpp | ||
CMakeLists.txt | ||
CmpInstAnalysis.cpp | ||
CodeMetrics.cpp | ||
ConstantFolding.cpp | ||
CostModel.cpp | ||
Delinearization.cpp | ||
DemandedBits.cpp | ||
DependenceAnalysis.cpp | ||
DivergenceAnalysis.cpp | ||
DominanceFrontier.cpp | ||
DomPrinter.cpp | ||
EHPersonalities.cpp | ||
GlobalsModRef.cpp | ||
GuardUtils.cpp | ||
IndirectCallPromotionAnalysis.cpp | ||
InlineCost.cpp | ||
InstCount.cpp | ||
InstructionPrecedenceTracking.cpp | ||
InstructionSimplify.cpp | ||
Interval.cpp | ||
IntervalPartition.cpp | ||
IteratedDominanceFrontier.cpp | ||
IVDescriptors.cpp | ||
IVUsers.cpp | ||
LazyBlockFrequencyInfo.cpp | ||
LazyBranchProbabilityInfo.cpp | ||
LazyCallGraph.cpp | ||
LazyValueInfo.cpp | ||
LegacyDivergenceAnalysis.cpp | ||
Lint.cpp | ||
LLVMBuild.txt | ||
Loads.cpp | ||
LoopAccessAnalysis.cpp | ||
LoopAnalysisManager.cpp | ||
LoopInfo.cpp | ||
LoopPass.cpp | ||
LoopUnrollAnalyzer.cpp | ||
MemDepPrinter.cpp | ||
MemDerefPrinter.cpp | ||
MemoryBuiltins.cpp | ||
MemoryDependenceAnalysis.cpp | ||
MemoryLocation.cpp | ||
MemorySSA.cpp | ||
MemorySSAUpdater.cpp | ||
ModuleDebugInfoPrinter.cpp | ||
ModuleSummaryAnalysis.cpp | ||
MustExecute.cpp | ||
ObjCARCAliasAnalysis.cpp | ||
ObjCARCAnalysisUtils.cpp | ||
ObjCARCInstKind.cpp | ||
OptimizationRemarkEmitter.cpp | ||
OrderedBasicBlock.cpp | ||
OrderedInstructions.cpp | ||
PHITransAddr.cpp | ||
PhiValues.cpp | ||
PostDominators.cpp | ||
ProfileSummaryInfo.cpp | ||
PtrUseVisitor.cpp | ||
README.txt | ||
RegionInfo.cpp | ||
RegionPass.cpp | ||
RegionPrinter.cpp | ||
ScalarEvolution.cpp | ||
ScalarEvolutionAliasAnalysis.cpp | ||
ScalarEvolutionExpander.cpp | ||
ScalarEvolutionNormalization.cpp | ||
ScopedNoAliasAA.cpp | ||
StackSafetyAnalysis.cpp | ||
StratifiedSets.h | ||
SyncDependenceAnalysis.cpp | ||
SyntheticCountsUtils.cpp | ||
TargetLibraryInfo.cpp | ||
TargetTransformInfo.cpp | ||
Trace.cpp | ||
TypeBasedAliasAnalysis.cpp | ||
TypeMetadataUtils.cpp | ||
ValueLattice.cpp | ||
ValueLatticeUtils.cpp | ||
ValueTracking.cpp | ||
VectorUtils.cpp |
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//