[GlobalOpt] Improve common case efficiency of static global initializer evaluation

For very, very large global initializers which can be statically evaluated, the code would create vectors of temporary Constants, modifying them in place, before committing the resulting Constant aggregate to the global's initializer value. This had effectively O(n^2) complexity in the size of the global initializer and would cause memory and non-termination issues compiling some workloads. This change performs the static initializer evaluation and creation in batches, once for each global in the evaluated IR memory. The existing code is maintained as a last resort when the initializers are more complex than simple values in a large aggregate. This should theoretically by NFC, no test as the example case is massive. The existing test cases pass with this, as well as the llvm test suite. To give an example, consider the following C++ code adapted from the clang regression tests: struct S { int n = 10; int m = 2 * n; S(int a) : n(a) {} }; template<typename T> struct U { T *r = &q; T q = 42; U *p = this; }; U<S> e; The global static constructor for 'e' will need to initialize 'r' and 'p' of the outer struct, while also initializing the inner 'q' structs 'n' and 'm' members. This batch algorithm will simply use general CommitValueTo() method to handle the complex nested S struct initialization of 'q', before processing the outermost members in a single batch. Using CommitValueTo() to handle member in the outer struct is inefficient when the struct/array is very large as we end up creating and destroy constant arrays for each initialization. For the above case, we expect the following IR to be generated: %struct.U = type { %struct.S*, %struct.S, %struct.U* } %struct.S = type { i32, i32 } @e = global %struct.U { %struct.S* gep inbounds (%struct.U, %struct.U* @e, i64 0, i32 1), %struct.S { i32 42, i32 84 }, %struct.U* @e } The %struct.S { i32 42, i32 84 } inner initializer is treated as a complex constant expression, while the other two elements of @e are "simple". Differential Revision: https://reviews.llvm.org/D42612 llvm-svn: 323933
2024-11-23 03:02:36 +01:00 · 2018-01-31 23:56:07 +00:00 · 2018-01-31 23:56:07 +00:00 · fd88771d7b
commit fd88771d7b
parent 10c80524c6
1 changed files with 126 additions and 2 deletions
--- a/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/lib/Transforms/IPO/GlobalOpt.cpp
@ -2382,6 +2382,131 @@ static void CommitValueTo(Constant *Val, Constant *Addr) {
  GV->setInitializer(EvaluateStoreInto(GV->getInitializer(), Val, CE, 2));
 }

+/// Given a map of address -> value, where addresses are expected to be some form
+/// of either a global or a constant GEP, set the initializer for the address to
+/// be the value. This performs mostly the same function as CommitValueTo()
+/// and EvaluateStoreInto() but is optimized to be more efficient for the common
+/// case where the set of addresses are GEPs sharing the same underlying global,
+/// processing the GEPs in batches rather than individually.
+///
+/// To give an example, consider the following C++ code adapted from the clang
+/// regression tests:
+/// struct S {
+///  int n = 10;
+///  int m = 2 * n;
+///  S(int a) : n(a) {}
+/// };
+///
+/// template<typename T>
+/// struct U {
+///  T *r = &q;
+///  T q = 42;
+///  U *p = this;
+/// };
+///
+/// U<S> e;
+///
+/// The global static constructor for 'e' will need to initialize 'r' and 'p' of
+/// the outer struct, while also initializing the inner 'q' structs 'n' and 'm'
+/// members. This batch algorithm will simply use general CommitValueTo() method
+/// to handle the complex nested S struct initialization of 'q', before
+/// processing the outermost members in a single batch. Using CommitValueTo() to
+/// handle member in the outer struct is inefficient when the struct/array is
+/// very large as we end up creating and destroy constant arrays for each
+/// initialization.
+/// For the above case, we expect the following IR to be generated:
+///
+/// %struct.U = type { %struct.S*, %struct.S, %struct.U* }
+/// %struct.S = type { i32, i32 }
+/// @e = global %struct.U { %struct.S* gep inbounds (%struct.U, %struct.U* @e,
+///                                                  i64 0, i32 1),
+///                         %struct.S { i32 42, i32 84 }, %struct.U* @e }
+/// The %struct.S { i32 42, i32 84 } inner initializer is treated as a complex
+/// constant expression, while the other two elements of @e are "simple".
+static void BatchCommitValueTo(const DenseMap<Constant*, Constant*> &Mem) {
+  SmallVector<std::pair<GlobalVariable*, Constant*>, 32> GVs;
+  SmallVector<std::pair<ConstantExpr*, Constant*>, 32> ComplexCEs;
+  SmallVector<std::pair<ConstantExpr*, Constant*>, 32> SimpleCEs;
+  SimpleCEs.reserve(Mem.size());
+
+  for (const auto &I : Mem) {
+    if (auto *GV = dyn_cast<GlobalVariable>(I.first)) {
+      GVs.push_back(std::make_pair(GV, I.second));
+    } else {
+      ConstantExpr *GEP = cast<ConstantExpr>(I.first);
+      // We don't handle the deeply recursive case using the batch method.
+      if (GEP->getNumOperands() > 3)
+        ComplexCEs.push_back(std::make_pair(GEP, I.second));
+      else
+        SimpleCEs.push_back(std::make_pair(GEP, I.second));
+    }
+  }
+
+  // The algorithm below doesn't handle cases like nested structs, so use the
+  // slower fully general method if we have to.
+  for (auto ComplexCE : ComplexCEs)
+    CommitValueTo(ComplexCE.second, ComplexCE.first);
+
+  for (auto GVPair : GVs) {
+    assert(GVPair.first->hasInitializer());
+    GVPair.first->setInitializer(GVPair.second);
+  }
+
+  if (SimpleCEs.empty())
+    return;
+
+  // We cache a single global's initializer elements in the case where the
+  // subsequent address/val pair uses the same one. This avoids throwing away and
+  // rebuilding the constant struct/vector/array just because one element is
+  // modified at a time.
+  SmallVector<Constant *, 32> Elts;
+  Elts.reserve(SimpleCEs.size());
+  GlobalVariable *CurrentGV = nullptr;
+
+  auto commitAndSetupCache = [&](GlobalVariable *GV, bool Update) {
+    Constant *Init = GV->getInitializer();
+    Type *Ty = Init->getType();
+    if (Update) {
+      if (CurrentGV) {
+        assert(CurrentGV && "Expected a GV to commit to!");
+        Type *CurrentInitTy = CurrentGV->getInitializer()->getType();
+        // We have a valid cache that needs to be committed.
+        if (StructType *STy = dyn_cast<StructType>(CurrentInitTy))
+          CurrentGV->setInitializer(ConstantStruct::get(STy, Elts));
+        else if (ArrayType *ArrTy = dyn_cast<ArrayType>(CurrentInitTy))
+          CurrentGV->setInitializer(ConstantArray::get(ArrTy, Elts));
+        else
+          CurrentGV->setInitializer(ConstantVector::get(Elts));
+      }
+      if (CurrentGV == GV)
+        return;
+      // Need to clear and set up cache for new initializer.
+      CurrentGV = GV;
+      Elts.clear();
+      unsigned NumElts;
+      if (auto *STy = dyn_cast<StructType>(Ty))
+        NumElts = STy->getNumElements();
+      else
+        NumElts = cast<SequentialType>(Ty)->getNumElements();
+      for (unsigned i = 0, e = NumElts; i != e; ++i)
+        Elts.push_back(Init->getAggregateElement(i));
+    }
+  };
+
+  for (auto CEPair : SimpleCEs) {
+    ConstantExpr *GEP = CEPair.first;
+    Constant *Val = CEPair.second;
+
+    GlobalVariable *GV = cast<GlobalVariable>(GEP->getOperand(0));
+    commitAndSetupCache(GV, GV != CurrentGV);
+    ConstantInt *CI = cast<ConstantInt>(GEP->getOperand(2));
+    Elts[CI->getZExtValue()] = Val;
+  }
+  // The last initializer in the list needs to be committed, others
+  // will be committed on a new initializer being processed.
+  commitAndSetupCache(CurrentGV, true);
+}
+
 /// Evaluate static constructors in the function, if we can.  Return true if we
 /// can, false otherwise.
 static bool EvaluateStaticConstructor(Function *F, const DataLayout &DL,
@ -2399,8 +2524,7 @@ static bool EvaluateStaticConstructor(Function *F, const DataLayout &DL,
    DEBUG(dbgs() << "FULLY EVALUATED GLOBAL CTOR FUNCTION '"
          << F->getName() << "' to " << Eval.getMutatedMemory().size()
          << " stores.\n");
-    for (const auto &I : Eval.getMutatedMemory())
-      CommitValueTo(I.second, I.first);
+    BatchCommitValueTo(Eval.getMutatedMemory());
    for (GlobalVariable *GV : Eval.getInvariants())
      GV->setConstant(true);
  }