diff --git a/docs/LangRef.rst b/docs/LangRef.rst index 1110f8127d9..5600d2e4c98 100644 --- a/docs/LangRef.rst +++ b/docs/LangRef.rst @@ -5066,6 +5066,72 @@ For example, in the code below, the call instruction may only target the ... !0 = !{i64 (i64, i64)* @add, i64 (i64, i64)* @sub} +'``callback``' Metadata +^^^^^^^^^^^^^^^^^^^^^^ + +``callback`` metadata may be attached to a function declaration, or definition. +(Call sites are excluded only due to the lack of a use case.) For ease of +exposition, we'll refer to the function annotated w/ metadata as a broker +function. The metadata describes how the arguments of a call to the broker are +in turn passed to the callback function specified by the metadata. Thus, the +``callback`` metadata provides a partial description of a call site inside the +broker function with regards to the arguments of a call to the broker. The only +semantic restriction on the broker function itself is that it is not allowed to +inspect or modify arguments referenced in the ``callback`` metadata as +pass-through to the callback function. + +The broker is not required to actually invoke the callback function at runtime. +However, the assumptions about not inspecting or modifying arguments that would +be passed to the specified callback function still hold, even if the callback +function is not dynamically invoked. The broker is allowed to invoke the +callback function more than once per invocation of the broker. The broker is +also allowed to invoke (directly or indirectly) the function passed as a +callback through another use. Finally, the broker is also allowed to relay the +callback callee invocation to a different thread. + +The metadata is structured as follows: At the outer level, ``callback`` +metadata is a list of ``callback`` encodings. Each encoding starts with a +constant ``i64`` which describes the argument position of the callback function +in the call to the broker. The following elements, except the last, describe +what arguments are passed to the callback function. Each element is again an +``i64`` constant identifying the argument of the broker that is passed through, +or ``i64 -1`` to indicate an unknown or inspected argument. The order in which +they are listed has to be the same in which they are passed to the callback +callee. The last element of the encoding is a boolean which specifies how +variadic arguments of the broker are handled. If it is true, all variadic +arguments of the broker are passed through to the callback function *after* the +arguments encoded explicitly before. + +In the code below, the ``pthread_create`` function is marked as a broker +through the ``!callback !1`` metadata. In the example, there is only one +callback encoding, namely ``!2``, associated with the broker. This encoding +identifies the callback function as the second argument of the broker (``i64 +2``) and the sole argument of the callback function as the third one of the +broker function (``i64 3``). + +.. code-block:: llvm + + declare !callback !1 dso_local i32 @pthread_create(i64*, %union.pthread_attr_t*, i8* (i8*)*, i8*) + + ... + !2 = !{i64 2, i64 3, i1 false} + !1 = !{!2} + +Another example is shown below. The callback callee is the second argument of +the ``__kmpc_fork_call`` function (``i64 2``). The callee is given two unknown +values (each identified by a ``i64 -1``) and afterwards all +variadic arguments that are passed to the ``__kmpc_fork_call`` call (due to the +final ``i1 true``). + +.. code-block:: llvm + + declare !callback !0 dso_local void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) + + ... + !1 = !{i64 2, i64 -1, i64 -1, i1 true} + !0 = !{!1} + + '``unpredictable``' Metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/include/llvm/IR/CallSite.h b/include/llvm/IR/CallSite.h index a3e78049f4b..ac48fe849ef 100644 --- a/include/llvm/IR/CallSite.h +++ b/include/llvm/IR/CallSite.h @@ -683,6 +683,182 @@ private: User::op_iterator getCallee() const; }; +/// AbstractCallSite +/// +/// An abstract call site is a wrapper that allows to treat direct, +/// indirect, and callback calls the same. If an abstract call site +/// represents a direct or indirect call site it behaves like a stripped +/// down version of a normal call site object. The abstract call site can +/// also represent a callback call, thus the fact that the initially +/// called function (=broker) may invoke a third one (=callback callee). +/// In this case, the abstract call site hides the middle man, hence the +/// broker function. The result is a representation of the callback call, +/// inside the broker, but in the context of the original call to the broker. +/// +/// There are up to three functions involved when we talk about callback call +/// sites. The caller (1), which invokes the broker function. The broker +/// function (2), that will invoke the callee zero or more times. And finally +/// the callee (3), which is the target of the callback call. +/// +/// The abstract call site will handle the mapping from parameters to arguments +/// depending on the semantic of the broker function. However, it is important +/// to note that the mapping is often partial. Thus, some arguments of the +/// call/invoke instruction are mapped to parameters of the callee while others +/// are not. +class AbstractCallSite { +public: + + /// The encoding of a callback with regards to the underlying instruction. + struct CallbackInfo { + + /// For direct/indirect calls the parameter encoding is empty. If it is not, + /// the abstract call site represents a callback. In that case, the first + /// element of the encoding vector represents which argument of the call + /// site CS is the callback callee. The remaining elements map parameters + /// (identified by their position) to the arguments that will be passed + /// through (also identified by position but in the call site instruction). + /// + /// NOTE that we use LLVM argument numbers (starting at 0) and not + /// clang/soruce argument numbers (starting at 1). The -1 entries represent + /// unknown values that are passed to the callee. + using ParameterEncodingTy = SmallVector; + ParameterEncodingTy ParameterEncoding; + + }; + +private: + + /// The underlying call site: + /// caller -> callee, if this is a direct or indirect call site + /// caller -> broker function, if this is a callback call site + CallSite CS; + + /// The encoding of a callback with regards to the underlying instruction. + CallbackInfo CI; + +public: + /// Sole constructor for abstract call sites (ACS). + /// + /// An abstract call site can only be constructed through a llvm::Use because + /// each operand (=use) of an instruction could potentially be a different + /// abstract call site. Furthermore, even if the value of the llvm::Use is the + /// same, and the user is as well, the abstract call sites might not be. + /// + /// If a use is not associated with an abstract call site the constructed ACS + /// will evaluate to false if converted to a boolean. + /// + /// If the use is the callee use of a call or invoke instruction, the + /// constructed abstract call site will behave as a llvm::CallSite would. + /// + /// If the use is not a callee use of a call or invoke instruction, the + /// callback metadata is used to determine the argument <-> parameter mapping + /// as well as the callee of the abstract call site. + AbstractCallSite(const Use *U); + + /// Conversion operator to conveniently check for a valid/initialized ACS. + explicit operator bool() const { return (bool)CS; } + + /// Return the underlying instruction. + Instruction *getInstruction() const { return CS.getInstruction(); } + + /// Return the call site abstraction for the underlying instruction. + CallSite getCallSite() const { return CS; } + + /// Return true if this ACS represents a direct call. + bool isDirectCall() const { + return !isCallbackCall() && !CS.isIndirectCall(); + } + + /// Return true if this ACS represents an indirect call. + bool isIndirectCall() const { + return !isCallbackCall() && CS.isIndirectCall(); + } + + /// Return true if this ACS represents a callback call. + bool isCallbackCall() const { + // For a callback call site the callee is ALWAYS stored first in the + // transitive values vector. Thus, a non-empty vector indicates a callback. + return !CI.ParameterEncoding.empty(); + } + + /// Return true if @p UI is the use that defines the callee of this ACS. + bool isCallee(Value::const_user_iterator UI) const { + return isCallee(&UI.getUse()); + } + + /// Return true if @p U is the use that defines the callee of this ACS. + bool isCallee(const Use *U) const { + if (isDirectCall()) + return CS.isCallee(U); + + assert(!CI.ParameterEncoding.empty() && + "Callback without parameter encoding!"); + + return (int)CS.getArgumentNo(U) == CI.ParameterEncoding[0]; + } + + /// Return the number of parameters of the callee. + unsigned getNumArgOperands() const { + if (isDirectCall()) + return CS.getNumArgOperands(); + // Subtract 1 for the callee encoding. + return CI.ParameterEncoding.size() - 1; + } + + /// Return the operand index of the underlying instruction associated with @p + /// Arg. + int getCallArgOperandNo(Argument &Arg) const { + return getCallArgOperandNo(Arg.getArgNo()); + } + + /// Return the operand index of the underlying instruction associated with + /// the function parameter number @p ArgNo or -1 if there is none. + int getCallArgOperandNo(unsigned ArgNo) const { + if (isDirectCall()) + return ArgNo; + // Add 1 for the callee encoding. + return CI.ParameterEncoding[ArgNo + 1]; + } + + /// Return the operand of the underlying instruction associated with @p Arg. + Value *getCallArgOperand(Argument &Arg) const { + return getCallArgOperand(Arg.getArgNo()); + } + + /// Return the operand of the underlying instruction associated with the + /// function parameter number @p ArgNo or nullptr if there is none. + Value *getCallArgOperand(unsigned ArgNo) const { + if (isDirectCall()) + return CS.getArgOperand(ArgNo); + // Add 1 for the callee encoding. + return CI.ParameterEncoding[ArgNo + 1] >= 0 + ? CS.getArgOperand(CI.ParameterEncoding[ArgNo + 1]) + : nullptr; + } + + /// Return the operand index of the underlying instruction associated with the + /// callee of this ACS. Only valid for callback calls! + int getCallArgOperandNoForCallee() const { + assert(isCallbackCall()); + assert(CI.ParameterEncoding.size() && CI.ParameterEncoding[0] > 0); + return CI.ParameterEncoding[0]; + } + + /// Return the pointer to function that is being called. + Value *getCalledValue() const { + if (isDirectCall()) + return CS.getCalledValue(); + return CS.getArgOperand(getCallArgOperandNoForCallee()); + } + + /// Return the function being called if this is a direct call, otherwise + /// return null (if it's an indirect call). + Function *getCalledFunction() const { + Value *V = getCalledValue(); + return V ? dyn_cast(V->stripPointerCasts()) : nullptr; + } +}; + template <> struct DenseMapInfo { using BaseInfo = DenseMapInfo; diff --git a/include/llvm/IR/LLVMContext.h b/include/llvm/IR/LLVMContext.h index bd7097b39a3..92f772b06d8 100644 --- a/include/llvm/IR/LLVMContext.h +++ b/include/llvm/IR/LLVMContext.h @@ -103,6 +103,7 @@ public: MD_callees = 23, // "callees" MD_irr_loop = 24, // "irr_loop" MD_access_group = 25, // "llvm.access.group" + MD_callback = 26, // "callback" }; /// Known operand bundle tag IDs, which always have the same value. All diff --git a/include/llvm/IR/MDBuilder.h b/include/llvm/IR/MDBuilder.h index 174616c7ab1..2b4a8ccd343 100644 --- a/include/llvm/IR/MDBuilder.h +++ b/include/llvm/IR/MDBuilder.h @@ -94,6 +94,17 @@ public: /// calls. MDNode *createCallees(ArrayRef Callees); + //===------------------------------------------------------------------===// + // Callback metadata. + //===------------------------------------------------------------------===// + + /// Return metadata describing a callback (see llvm::AbstractCallSite). + MDNode *createCallbackEncoding(unsigned CalleeArgNo, ArrayRef Arguments, + bool VarArgsArePassed); + + /// Merge the new callback encoding \p NewCB into \p ExistingCallbacks. + MDNode *mergeCallbackEncodings(MDNode *ExistingCallbacks, MDNode *NewCB); + //===------------------------------------------------------------------===// // AA metadata. //===------------------------------------------------------------------===// diff --git a/lib/IR/AbstractCallSite.cpp b/lib/IR/AbstractCallSite.cpp new file mode 100644 index 00000000000..6ad2df85169 --- /dev/null +++ b/lib/IR/AbstractCallSite.cpp @@ -0,0 +1,135 @@ +//===-- AbstractCallSite.cpp - Implementation of abstract call sites ------===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This file implements abstract call sites which unify the interface for +// direct, indirect, and callback call sites. +// +// For more information see: +// https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk20 +// +//===----------------------------------------------------------------------===// + +#include "llvm/ADT/Statistic.h" +#include "llvm/ADT/StringSwitch.h" +#include "llvm/IR/CallSite.h" +#include "llvm/Support/Debug.h" + +using namespace llvm; + +#define DEBUG_TYPE "abstract-call-sites" + +STATISTIC(NumCallbackCallSites, "Number of callback call sites created"); +STATISTIC(NumDirectAbstractCallSites, + "Number of direct abstract call sites created"); +STATISTIC(NumInvalidAbstractCallSitesUnknownUse, + "Number of invalid abstract call sites created (unknown use)"); +STATISTIC(NumInvalidAbstractCallSitesUnknownCallee, + "Number of invalid abstract call sites created (unknown callee)"); +STATISTIC(NumInvalidAbstractCallSitesNoCallback, + "Number of invalid abstract call sites created (no callback)"); + +/// Create an abstract call site from a use. +AbstractCallSite::AbstractCallSite(const Use *U) : CS(U->getUser()) { + + // First handle unknown users. + if (!CS) { + + // If the use is actually in a constant cast expression which itself + // has only one use, we look through the constant cast expression. + // This happens by updating the use @p U to the use of the constant + // cast expression and afterwards re-initializing CS accordingly. + if (ConstantExpr *CE = dyn_cast(U->getUser())) + if (CE->getNumUses() == 1 && CE->isCast()) { + U = &*CE->use_begin(); + CS = CallSite(U->getUser()); + } + + if (!CS) { + NumInvalidAbstractCallSitesUnknownUse++; + return; + } + } + + // Then handle direct or indirect calls. Thus, if U is the callee of the + // call site CS it is not a callback and we are done. + if (CS.isCallee(U)) { + NumDirectAbstractCallSites++; + return; + } + + // If we cannot identify the broker function we cannot create a callback and + // invalidate the abstract call site. + Function *Callee = CS.getCalledFunction(); + if (!Callee) { + NumInvalidAbstractCallSitesUnknownCallee++; + CS = CallSite(); + return; + } + + MDNode *CallbackMD = Callee->getMetadata(LLVMContext::MD_callback); + if (!CallbackMD) { + NumInvalidAbstractCallSitesNoCallback++; + CS = CallSite(); + return; + } + + unsigned UseIdx = CS.getArgumentNo(U); + MDNode *CallbackEncMD = nullptr; + for (const MDOperand &Op : CallbackMD->operands()) { + MDNode *OpMD = cast(Op.get()); + auto *CBCalleeIdxAsCM = cast(OpMD->getOperand(0)); + uint64_t CBCalleeIdx = + cast(CBCalleeIdxAsCM->getValue())->getZExtValue(); + if (CBCalleeIdx != UseIdx) + continue; + CallbackEncMD = OpMD; + break; + } + + if (!CallbackEncMD) { + NumInvalidAbstractCallSitesNoCallback++; + CS = CallSite(); + return; + } + + NumCallbackCallSites++; + + assert(CallbackEncMD->getNumOperands() >= 2 && "Incomplete !callback metadata"); + + unsigned NumCallOperands = CS.getNumArgOperands(); + // Skip the var-arg flag at the end when reading the metadata. + for (unsigned u = 0, e = CallbackEncMD->getNumOperands() - 1; u < e; u++) { + Metadata *OpAsM = CallbackEncMD->getOperand(u).get(); + auto *OpAsCM = cast(OpAsM); + assert(OpAsCM->getType()->isIntegerTy(64) && + "Malformed !callback metadata"); + + int64_t Idx = cast(OpAsCM->getValue())->getSExtValue(); + assert(-1 <= Idx && Idx <= NumCallOperands && + "Out-of-bounds !callback metadata index"); + + CI.ParameterEncoding.push_back(Idx); + } + + if (!Callee->isVarArg()) + return; + + Metadata *VarArgFlagAsM = + CallbackEncMD->getOperand(CallbackEncMD->getNumOperands() - 1).get(); + auto *VarArgFlagAsCM = cast(VarArgFlagAsM); + assert(VarArgFlagAsCM->getType()->isIntegerTy(1) && + "Malformed !callback metadata var-arg flag"); + + if (VarArgFlagAsCM->getValue()->isNullValue()) + return; + + // Add all variadic arguments at the end. + for (unsigned u = Callee->arg_size(); u < NumCallOperands; u++) + CI.ParameterEncoding.push_back(u); +} diff --git a/lib/IR/CMakeLists.txt b/lib/IR/CMakeLists.txt index 2586f987289..e52da6182c4 100644 --- a/lib/IR/CMakeLists.txt +++ b/lib/IR/CMakeLists.txt @@ -3,6 +3,7 @@ tablegen(LLVM AttributesCompatFunc.inc -gen-attrs) add_public_tablegen_target(AttributeCompatFuncTableGen) add_llvm_library(LLVMCore + AbstractCallSite.cpp AsmWriter.cpp Attributes.cpp AutoUpgrade.cpp diff --git a/lib/IR/LLVMContext.cpp b/lib/IR/LLVMContext.cpp index 944d8265151..5c4f0de33b1 100644 --- a/lib/IR/LLVMContext.cpp +++ b/lib/IR/LLVMContext.cpp @@ -62,6 +62,7 @@ LLVMContext::LLVMContext() : pImpl(new LLVMContextImpl(*this)) { {MD_callees, "callees"}, {MD_irr_loop, "irr_loop"}, {MD_access_group, "llvm.access.group"}, + {MD_callback, "callback"}, }; for (auto &MDKind : MDKinds) { diff --git a/lib/IR/MDBuilder.cpp b/lib/IR/MDBuilder.cpp index 3fa541f1b53..3b5f2fec971 100644 --- a/lib/IR/MDBuilder.cpp +++ b/lib/IR/MDBuilder.cpp @@ -107,6 +107,50 @@ MDNode *MDBuilder::createCallees(ArrayRef Callees) { return MDNode::get(Context, Ops); } +MDNode *MDBuilder::createCallbackEncoding(unsigned CalleeArgNo, + ArrayRef Arguments, + bool VarArgArePassed) { + SmallVector Ops; + + Type *Int64 = Type::getInt64Ty(Context); + Ops.push_back(createConstant(ConstantInt::get(Int64, CalleeArgNo))); + + for (int ArgNo : Arguments) + Ops.push_back(createConstant(ConstantInt::get(Int64, ArgNo, true))); + + Type *Int1 = Type::getInt1Ty(Context); + Ops.push_back(createConstant(ConstantInt::get(Int1, VarArgArePassed))); + + return MDNode::get(Context, Ops); +} + +MDNode *MDBuilder::mergeCallbackEncodings(MDNode *ExistingCallbacks, + MDNode *NewCB) { + if (!ExistingCallbacks) + return MDNode::get(Context, {NewCB}); + + auto *NewCBCalleeIdxAsCM = cast(NewCB->getOperand(0)); + uint64_t NewCBCalleeIdx = + cast(NewCBCalleeIdxAsCM->getValue())->getZExtValue(); + + SmallVector Ops; + unsigned NumExistingOps = ExistingCallbacks->getNumOperands(); + Ops.resize(NumExistingOps + 1); + + for (unsigned u = 0; u < NumExistingOps; u++) { + Ops[u] = ExistingCallbacks->getOperand(u); + + auto *OldCBCalleeIdxAsCM = cast(Ops[u]); + uint64_t OldCBCalleeIdx = + cast(OldCBCalleeIdxAsCM->getValue())->getZExtValue(); + assert(NewCBCalleeIdx != OldCBCalleeIdx && + "Cannot map a callback callee index twice!"); + } + + Ops[NumExistingOps] = NewCB; + return MDNode::get(Context, Ops); +} + MDNode *MDBuilder::createAnonymousAARoot(StringRef Name, MDNode *Extra) { // To ensure uniqueness the root node is self-referential. auto Dummy = MDNode::getTemporary(Context, None); diff --git a/test/ThinLTO/X86/lazyload_metadata.ll b/test/ThinLTO/X86/lazyload_metadata.ll index 5fbf7a98bb9..21ee726e148 100644 --- a/test/ThinLTO/X86/lazyload_metadata.ll +++ b/test/ThinLTO/X86/lazyload_metadata.ll @@ -10,13 +10,13 @@ ; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \ ; RUN: -o /dev/null -stats \ ; RUN: 2>&1 | FileCheck %s -check-prefix=LAZY -; LAZY: 57 bitcode-reader - Number of Metadata records loaded +; LAZY: 59 bitcode-reader - Number of Metadata records loaded ; LAZY: 2 bitcode-reader - Number of MDStrings loaded ; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \ ; RUN: -o /dev/null -disable-ondemand-mds-loading -stats \ ; RUN: 2>&1 | FileCheck %s -check-prefix=NOTLAZY -; NOTLAZY: 66 bitcode-reader - Number of Metadata records loaded +; NOTLAZY: 68 bitcode-reader - Number of Metadata records loaded ; NOTLAZY: 7 bitcode-reader - Number of MDStrings loaded