[Pass Manager] remove EarlyCSE as clean-up for VectorCombine

EarlyCSE was added with D75145, but the motivating test is not regressed by removing the extra pass now. That might be because VectorCombine altered the way it processes instructions, or it might be from (re)moving VectorCombine in the pipeline. The extra round of EarlyCSE appears to cost approximately 0.26% in compile-time as discussed in D80236, so we need some evidence to justify its inclusion here, but we do not have that (yet). I suspect that between SLP and VectorCombine, we are creating patterns that InstCombine and/or codegen are not prepared for, but we will need to reduce those examples and include them as PhaseOrdering and/or test-suite benchmarks.
2025-01-31 20:51:52 +01:00 · 2020-05-24 12:20:22 -04:00 · 2020-05-24 12:20:22 -04:00 · b7b5a06ca8
commit b7b5a06ca8
parent 639aa30744
10 changed files with 3 additions and 16 deletions
--- a/lib/Passes/PassBuilder.cpp
+++ b/lib/Passes/PassBuilder.cpp
@ -1014,7 +1014,6 @@ ModulePassManager PassBuilder::buildModuleOptimizationPipeline(

  // Enhance/cleanup vector code.
  OptimizePM.addPass(VectorCombinePass());
-  OptimizePM.addPass(EarlyCSEPass());
  OptimizePM.addPass(InstCombinePass());

  // Unroll small loops to hide loop backedge latency and saturate any parallel
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@ -783,7 +783,6 @@ void PassManagerBuilder::populateModulePassManager(

  // Enhance/cleanup vector code.
  MPM.add(createVectorCombinePass());
-  MPM.add(createEarlyCSEPass());

  addExtensionsToPM(EP_Peephole, MPM);
  MPM.add(createInstructionCombiningPass());
@ -1019,7 +1018,6 @@ void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
  // Now that we've optimized loops (in particular loop induction variables),
  // we may have exposed more scalar opportunities. Run parts of the scalar
  // optimizer again at this point.
-  PM.add(createVectorCombinePass());
  PM.add(createInstructionCombiningPass()); // Initial cleanup
  PM.add(createCFGSimplificationPass()); // if-convert
  PM.add(createSCCPPass()); // Propagate exposed constants
@ -1027,10 +1025,10 @@ void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
  PM.add(createBitTrackingDCEPass());

  // More scalar chains could be vectorized due to more alias information
-  if (SLPVectorize) {
+  if (SLPVectorize)
    PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
-    PM.add(createVectorCombinePass()); // Clean up partial vectorization.
-  }
+
+  PM.add(createVectorCombinePass()); // Clean up partial vectorization.

  // After vectorization, assume intrinsics may tell us more about pointer
  // alignments.
--- a/test/CodeGen/AMDGPU/opt-pipeline.ll
+++ b/test/CodeGen/AMDGPU/opt-pipeline.ll
@ -246,7 +246,6 @@
 ; GCN-O1-NEXT:       Simplify the CFG
 ; GCN-O1-NEXT:       Dominator Tree Construction
 ; GCN-O1-NEXT:       Optimize scalar/vector ops
-; GCN-O1-NEXT:       Early CSE
 ; GCN-O1-NEXT:       Basic Alias Analysis (stateless AA impl)
 ; GCN-O1-NEXT:       Function Alias Analysis Results
 ; GCN-O1-NEXT:       Natural Loop Information
@ -597,7 +596,6 @@
 ; GCN-O2-NEXT:       Inject TLI Mappings
 ; GCN-O2-NEXT:       SLP Vectorizer
 ; GCN-O2-NEXT:       Optimize scalar/vector ops
-; GCN-O2-NEXT:       Early CSE
 ; GCN-O2-NEXT:       Optimization Remark Emitter
 ; GCN-O2-NEXT:       Combine redundant instructions
 ; GCN-O2-NEXT:       Canonicalize natural loops
@ -950,7 +948,6 @@
 ; GCN-O3-NEXT:       Inject TLI Mappings
 ; GCN-O3-NEXT:       SLP Vectorizer
 ; GCN-O3-NEXT:       Optimize scalar/vector ops
-; GCN-O3-NEXT:       Early CSE
 ; GCN-O3-NEXT:       Optimization Remark Emitter
 ; GCN-O3-NEXT:       Combine redundant instructions
 ; GCN-O3-NEXT:       Canonicalize natural loops
--- a/test/Other/new-pm-defaults.ll
+++ b/test/Other/new-pm-defaults.ll
@ -261,7 +261,6 @@
 ; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-O-NEXT: Running pass: VectorCombinePass
-; CHECK-O-NEXT: Running pass: EarlyCSEPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O-NEXT: Running pass: LoopUnrollPass
 ; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
--- a/test/Other/new-pm-thinlto-defaults.ll
+++ b/test/Other/new-pm-thinlto-defaults.ll
@ -231,7 +231,6 @@
 ; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
-; CHECK-POSTLINK-O-NEXT: Running pass: EarlyCSEPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: WarnMissedTransformationsPass
--- a/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@ -199,7 +199,6 @@
 ; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-O-NEXT: Running pass: VectorCombinePass
-; CHECK-O-NEXT: Running pass: EarlyCSEPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O-NEXT: Running pass: LoopUnrollPass
 ; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
--- a/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@ -210,7 +210,6 @@
 ; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-O-NEXT: Running pass: VectorCombinePass
-; CHECK-O-NEXT: Running pass: EarlyCSEPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
 ; CHECK-O-NEXT: Running pass: LoopUnrollPass
 ; CHECK-O-NEXT: Running pass: WarnMissedTransformationsPass
--- a/test/Other/opt-O2-pipeline.ll
+++ b/test/Other/opt-O2-pipeline.ll
@ -253,7 +253,6 @@
 ; CHECK-NEXT:       Inject TLI Mappings
 ; CHECK-NEXT:       SLP Vectorizer
 ; CHECK-NEXT:       Optimize scalar/vector ops
-; CHECK-NEXT:       Early CSE
 ; CHECK-NEXT:       Optimization Remark Emitter
 ; CHECK-NEXT:       Combine redundant instructions
 ; CHECK-NEXT:       Canonicalize natural loops
--- a/test/Other/opt-O3-pipeline.ll
+++ b/test/Other/opt-O3-pipeline.ll
@ -258,7 +258,6 @@
 ; CHECK-NEXT:       Inject TLI Mappings
 ; CHECK-NEXT:       SLP Vectorizer
 ; CHECK-NEXT:       Optimize scalar/vector ops
-; CHECK-NEXT:       Early CSE
 ; CHECK-NEXT:       Optimization Remark Emitter
 ; CHECK-NEXT:       Combine redundant instructions
 ; CHECK-NEXT:       Canonicalize natural loops
--- a/test/Other/opt-Os-pipeline.ll
+++ b/test/Other/opt-Os-pipeline.ll
@ -239,7 +239,6 @@
 ; CHECK-NEXT:       Inject TLI Mappings
 ; CHECK-NEXT:       SLP Vectorizer
 ; CHECK-NEXT:       Optimize scalar/vector ops
-; CHECK-NEXT:       Early CSE
 ; CHECK-NEXT:       Optimization Remark Emitter
 ; CHECK-NEXT:       Combine redundant instructions
 ; CHECK-NEXT:       Canonicalize natural loops