diff --git a/docs/LangRef.rst b/docs/LangRef.rst index 86b5a15f254..d2bc6a05831 100644 --- a/docs/LangRef.rst +++ b/docs/LangRef.rst @@ -366,6 +366,18 @@ added in the future: accessed runtime components pinned to specific hardware registers. At the moment only X86 supports this convention (both 32 and 64 bit). +"``webkit_jscc``" - WebKit's JavaScript calling convention + This calling convention has been implemented for `WebKit FTL JIT + `_. It passes arguments on the + stack right to left (as cdecl does), and returns a value in the + platform's customary return register. +"``anyregcc``" - Dynamic calling convention for code patching + This is a special convention that supports patching an arbitrary code + sequence in place of a call site. This convention forces the call + arguments into registers but allows them to be dynamcially + allocated. This can currently only be used with calls to + llvm.experimental.patchpoint because only this intrinsic records + the location of its arguments in a side table. See :doc:`StackMaps`. "``cc ``" - Numbered convention Any calling convention may be specified by number, allowing target-specific calling conventions to be used. Target specific @@ -8912,3 +8924,10 @@ Semantics: This intrinsic does nothing, and it's removed by optimizers and ignored by codegen. + +Stack Map Intrinsics +-------------------- + +LLVM provides experimental intrinsics to support runtime patching +mechanisms commonly desired in dynamic language JITs. These intrinsics +are described in :doc:`StackMaps`. diff --git a/docs/StackMaps.rst b/docs/StackMaps.rst new file mode 100644 index 00000000000..0dac62b595d --- /dev/null +++ b/docs/StackMaps.rst @@ -0,0 +1,480 @@ +=================================== +Stack maps and patch points in LLVM +=================================== + +.. contents:: + :local: + :depth: 2 + +Definitions +=========== + +In this document we refer to the "runtime" collectively as all +components that serve as the LLVM client, including the LLVM IR +generator, object code consumer, and code patcher. + +A stack map records the location of ``live values`` at a particular +instruction address. These ``live values`` do not refer to all the +LLVM values live across the stack map. Instead, they are only the +values that the runtime requires to be live at this point. For +example, they may be the values the runtime will need to resume +program execution at that point independent of the compiled function +containing the stack map. + +LLVM emits stack map data into the object code within a designated +:ref:`stackmap-section`. This stack map data contains a record for +each stack map. The record stores the stack map's instruction address +and contains a entry for each mapped value. Each entry encodes a +value's location as a register, stack offset, or constant. + +A patch point is an instruction address at which space is reserved for +patching a new instruction sequence at run time. Patch points look +much like calls to LLVM. They take arguments that follow a calling +convention and may return a value. They also imply stack map +generation, which allows the runtime to locate the patchpoint and +find the location of ``live values`` at that point. + +Motivation +========== + +This functionality is currently experimental but is potentially useful +in a variety of settings, the most obvious being a runtime (JIT) +compiler. Example applications of the patchpoint intrinsics are +implementing an inline call cache for polymorphic method dispatch or +optimizing the retrieval of properties in dynamically typed languages +such as JavaScript. + +The intrinsics documented here are currently used by the JavaScript +compiler within the open source WebKit project, see the `FTL JIT +`_, but they are designed to be +used whenever stack maps or code patching are needed. Because the +intrinsics have experimental status, compatibility across LLVM +releases is not guaranteed. + +The stack map functionality described in this document is separate +from the functionality described in +:ref:`stack-map`. `GCFunctionMetadata` provides the location of +pointers into a collected heap captured by the `GCRoot` intrinsic, +which can also be considered a "stack map". Unlike the stack maps +defined above, the `GCFunctionMetadata` stack map interface does not +provide a way to associate live register values of arbitrary type with +an instruction address, nor does it specify a format for the resulting +stack map. The stack maps described here could potentially provide +richer information to a garbage collecting runtime, but that usage +will not be discussed in this document. + +Intrinsics +========== + +The following two kinds of intrinsics can be used to implement stack +maps and patch points: ``llvm.experimental.stackmap`` and +``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a +stack map record, and they both allow some form of code patching. They +can be used independently (i.e. ``llvm.experimental.patchpoint`` +implicitly generates a stack map without the need for an additional +call to ``llvm.experimental.stackmap``). The choice of which to use +depends on whether it is necessary to reserve space for code patching +and whether any of the intrinsic arguments should be lowered according +to calling conventions. ``llvm.experimental.stackmap`` does not +reserve any space, nor does it expect any call arguments. If the +runtime patches code at the stack map's address, it will destructively +overwrite the program text. This is unlike +``llvm.experimental.patchpoint``, which reserves space for in-place +patching without overwriting surrounding code. The +``llvm.experimental.patchpoint`` intrinsic also lowers a specified +number of arguments according to its calling convention. This allows +patched code to make in-place function calls without marshaling. + +Each instance of one of these intrinsics generates a stack map record +in the :ref:`stackmap-section`. The record includes an ID, allowing +the runtime to uniquely identify the stack map, and the offset within +the code from the beginning of the enclosing function. + +'``llvm.experimental.stackmap``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void + @llvm.experimental.stackmap(i64 , i32 , ...) + +Overview: +""""""""" + +The '``llvm.experimental.stackmap``' intrinsic records the location of +specified values in the stack map without generating any code. + +Operands: +""""""""" + +The first operand is an ID to be encoded within the stack map. The +second operand is the number of shadow bytes following the +intrinsic. The variable number of operands that follow are the ``live +values`` for which locations will be recorded in the stack map. + +To use this intrinsic as a bare-bones stack map, with no code patching +support, the number of shadow bytes can be set to zero. + +Semantics: +"""""""""" + +The stack map intrinsic generates no code in place, unless nops are +needed to cover its shadow (see below). However, its offset from +function entry is stored in the stack map. This is the relative +instruction address immediately following the instructions that +precede the stack map. + +The stack map ID allows a runtime to locate the desired stack map +record. LLVM passes this ID through directly to the stack map +record without checking uniqueness. + +LLVM guarantees a shadow of instructions following the stack map's +instruction offset during which neither the end of the basic block nor +another call to ``llvm.experimental.stackmap`` or +``llvm.experimental.patchpoint`` may occur. This allows the runtime to +patch the code at this point in response to an event triggered from +outside the code. The code for instructions following the stack map +may be emitted in the stack map's shadow, and these instructions may +be overwritten by destructive patching. Without shadow bytes, this +destructive patching could overwrite program text or data outside the +current function. We disallow overlapping stack map shadows so that +the runtime does not need to consider this corner case. + +For example, a stack map with 8 byte shadow: + +.. code-block:: llvm + + call void @runtime() + call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8, + i64* %ptr) + %val = load i64* %ptr + %add = add i64 %val, 3 + ret i64 %add + +May require one byte of nop-padding: + +.. code-block:: none + + 0x00 callq _runtime + 0x05 nop <--- stack map address + 0x06 movq (%rdi), %rax + 0x07 addq $3, %rax + 0x0a popq %rdx + 0x0b ret <---- end of 8-byte shadow + +Now, if the runtime needs to invalidate the compiled code, it may +patch 8 bytes of code at the stack map's address at follows: + +.. code-block:: none + + 0x00 callq _runtime + 0x05 movl $0xffff, %rax <--- patched code at stack map address + 0x0a callq *%rax <---- end of 8-byte shadow + +This way, after the normal call to the runtime returns, the code will +execute a patched call to a special entry point that can rebuild a +stack frame from the values located by the stack map. + +'``llvm.experimental.patchpoint.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void + @llvm.experimental.patchpoint.void(i64 , i32 , + i8* , i32 , ...) + declare i64 + @llvm.experimental.patchpoint.i64(i64 , i32 , + i8* , i32 , ...) + +Overview: +""""""""" + +The '``llvm.experimental.patchpoint.*``' intrinsics creates a function +call to the specified ```` and records the location of specified +values in the stack map. + +Operands: +""""""""" + +The first operand is an ID, the second operand is the number of bytes +reserved for the patchable region, the third operand is the target +address of a function (optionally null), and the fourth operand +specifies how many of the following variable operands are considered +function call arguments. The remaining variable number of operands are +the ``live values`` for which locations will be recorded in the stack +map. + +Semantics: +"""""""""" + +The patch point intrinsic generates a stack map. It also emits a +function call to the address specified by ```` if the address +is not a constant null. The function call and its arguments are +lowered according to the calling convention specified at the +intrinsic's callsite. Variants of the intrinsic with non-void return +type also return a value according to calling convention. + +Requesting zero patch point arguments is valid. In this case, all +variable operands are handled just like +``llvm.experimental.stackmap.*``. The difference is that space will +still be reserved for patching, a call will be emitted, and a return +value is allowed. + +The location of the arguments are not normally recorded in the stack +map because they are already fixed by the calling convention. The +remaining ``live values`` will have their location recorded, which +could be a register, stack location, or constant. A special calling +convention has been introduced for use with stack maps, anyregcc, +which forces the arguments to be loaded into registers but allows +those register to be dynamically allocated. These argument registers +will have their register locations recorded in the stack map in +addition to the remaining ``live values``. + +The patch point also emits nops to cover at least ```` of +instruction encoding space. Hence, the client must ensure that +```` is enough to encode a call to the target address on the +supported targets. If the call target is constant null, then there is +no minimum requirement. A zero-byte null target patchpoint is +valid. + +The runtime may patch the code emitted for the patch point, including +the call sequence and nops. However, the runtime may not assume +anything about the code LLVM emits within the reserved space. Partial +patching is not allowed. The runtime must patch all reserved bytes, +padding with nops if necessary. + +This example shows a patch point reserving 15 bytes, with one argument +in $rdi, and a return value in $rax per native calling convention: + +.. code-block:: llvm + + %target = inttoptr i64 -281474976710654 to i8* + %val = call i64 (i64, i32, ...)* + @llvm.experimental.patchpoint.i64(i64 78, i32 15, + i8* %target, i32 1, i64* %ptr) + %add = add i64 %val, 3 + ret i64 %add + +May generate: + +.. code-block:: none + + 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address + 0x0a callq *%r11 + 0x0d nop + 0x0e nop <--- end of reserved 15-bytes + 0x0f addq $0x3, %rax + 0x10 movl %rax, 8(%rsp) + +Note that no stack map locations will be recorded. If the patched code +sequence does not need arguments fixed to specific calling convention +registers, then the ``anyregcc`` convention may be used: + +.. code-block:: none + + %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15, + i8* %target, i32 1, + i64* %ptr) + +The stack map now indicates the location of the %ptr argument and +return value: + +.. code-block:: none + + Stack Map: ID=78, Loc0=%r9 Loc1=%r8 + +The patch code sequence may now use the argument that happened to be +allocated in %r8 and return a value allocated in %r9: + +.. code-block:: none + + 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address + 0x03 nop + ... + 0x0e nop <--- end of reserved 15-bytes + 0x0f addq $0x3, %r9 + 0x10 movl %r9, 8(%rsp) + +.. _stackmap-format: + +Stack Map Format +================ + +The existence of a stack map or patch point intrinsic within an LLVM +Module forces code emission to create a :ref:`stackmap-section`. The +format of this section follows: + +.. code-block:: none + + uint32 : Reserved (header) + uint32 : NumConstants + Constants[NumConstants] { + uint64 : LargeConstant + } + uint32 : NumRecords + StkMapRecord[NumRecords] { + uint64 : PatchPoint ID + uint32 : Instruction Offset + uint16 : Reserved (record flags) + uint16 : NumLocations + Location[NumLocations] { + uint8 : Register | Direct | Indirect | Constant | ConstantIndex + uint8 : Reserved (location flags) + uint16 : Dwarf RegNum + int32 : Offset or SmallConstant + } + uint16 : NumLiveOuts + LiveOuts[NumLiveOuts] + uint16 : Dwarf RegNum + uint8 : Reserved + uint8 : Size in Bytes + } + } + +The first byte of each location encodes a type that indicates how to +interpret the ``RegNum`` and ``Offset`` fields as follows: + +======== ========== =================== =========================== +Encoding Type Value Description +-------- ---------- ------------------- --------------------------- +0x1 Register Reg Value in a register +0x2 Direct Reg + Offset Frame index value +0x3 Indirect [Reg + Offset] Spilled value +0x4 Constant Offset Small constant +0x5 ConstIndex Constants[Offset] Large constant +======== ========== =================== =========================== + +In the common case, a value is available in a register, and the +``Offset`` field will be zero. Values spilled to the stack are encoded +as ``Indirect`` locations. The runtime must load those values from a +stack address, typically in the form ``[BP + Offset]``. If an +``alloca`` value is passed directly to a stack map intrinsic, then +LLVM may fold the frame index into the stack map as an optimization to +avoid allocating a register or stack slot. These frame indices will be +encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may +also optimize constants by emitting them directly in the stack map, +either in the ``Offset`` of a ``Constant`` location or in the constant +pool, referred to by ``ConstantIndex`` locations. + +At each callsite, a "liveout" register list is also recorded. These +are the registers that are live across the stackmap and therefore must +be saved by the runtime. This is an important optimization when the +patchpoint intrinsic is used with a calling convention that by default +preserves most registers as callee-save. + +Each entry in the liveout register list contains a DWARF register +number and size in bytes. The stackmap format deliberately omits +specific subregister information. Instead the runtime must interpret +this information conservatively. For example, if the stackmap reports +one byte at ``%rax``, then the value may be in either ``%al`` or +``%ah``. It doesn't matter in practice, because the runtime will +simply save ``%rax``. However, if the stackmap reports 16 bytes at +``%ymm0``, then the runtime can safely optimize by saving only +``%xmm0``. + +The stack map format is a contract between an LLVM SVN revision and +the runtime. It is currently experimental and may change in the short +term, but minimizing the need to update the runtime is +important. Consequently, the stack map design is motivated by +simplicity and extensibility. Compactness of the representation is +secondary because the runtime is expected to parse the data +immediately after compiling a module and encode the information in its +own format. Since the runtime controls the allocation of sections, it +can reuse the same stack map space for multiple modules. + +.. _stackmap-section: + +Stack Map Section +^^^^^^^^^^^^^^^^^ + +A JIT compiler can easily access this section by providing its own +memory manager via the LLVM C API +``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory +manager, the JIT provides a callback: +``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates +this section, it invokes the callback and passes the section name. The +JIT can record the in-memory address of the section at this time and +later parse it to recover the stack map data. + +On Darwin, the stack map section name is "__llvm_stackmaps". The +segment name is "__LLVM_STACKMAPS". + +Stack Map Usage +=============== + +The stack map support described in this document can be used to +precisely determine the location of values at a specific position in +the code. LLVM does not maintain any mapping between those values and +any higher-level entity. The runtime must be able to interpret the +stack map record given only the ID, offset, and the order of the +locations, which LLVM preserves. + +Note that this is quite different from the goal of debug information, +which is a best-effort attempt to track the location of named +variables at every instruction. + +An important motivation for this design is to allow a runtime to +commandeer a stack frame when execution reaches an instruction address +associated with a stack map. The runtime must be able to rebuild a +stack frame and resume program execution using the information +provided by the stack map. For example, execution may resume in an +interpreter or a recompiled version of the same function. + +This usage restricts LLVM optimization. Clearly, LLVM must not move +stores across a stack map. However, loads must also be handled +conservatively. If the load may trigger an exception, hoisting it +above a stack map could be invalid. For example, the runtime may +determine that a load is safe to execute without a type check given +the current state of the type system. If the type system changes while +some activation of the load's function exists on the stack, the load +becomes unsafe. The runtime can prevent subsequent execution of that +load by immediately patching any stack map location that lies between +the current call site and the load (typically, the runtime would +simply patch all stack map locations to invalidate the function). If +the compiler had hoisted the load above the stack map, then the +program could crash before the runtime could take back control. + +To enforce these semantics, stackmap and patchpoint intrinsics are +considered to potentially read and write all memory. This may limit +optimization more than some clients desire. To address this problem +meta-data could be added to the intrinsic call to express aliasing, +thereby allowing optimizations to hoist certain loads above stack +maps. + +Direct Stack Map Entries +^^^^^^^^^^^^^^^^^^^^^^^^ + +As shown in :ref:`stackmap-section`, a Direct stack map location +records the address of frame index. This address is itself the value +that the runtime requested. This differs from Indirect locations, +which refer to a stack locations from which the requested values must +be loaded. Direct locations can communicate the address if an alloca, +while Indirect locations handle register spills. + +For example: + +.. code-block:: none + + entry: + %a = alloca i64... + llvm.experimental.stackmap(i64 , i32 , i64* %a) + +The runtime can determine this alloca's relative location on the +stack immediately after compilation, or at any time thereafter. This +differs from Register and Indirect locations, because the runtime can +only read the values in those locations when execution reaches the +instruction address of the stack map. + +This functionality requires LLVM to treat entry-block allocas +specially when they are directly consumed by an intrinsics. (This is +the same requirement imposed by the llvm.gcroot intrinsic.) LLVM +transformations must not substitute the alloca with any intervening +value. This can be verified by the runtime simply by checking that the +stack map's location is a Direct location type. diff --git a/docs/index.rst b/docs/index.rst index 62766f10342..d040632dc69 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -234,6 +234,7 @@ For API clients and LLVM developers. TableGen/LangRef HowToUseAttributes NVPTXUsage + StackMaps :doc:`WritingAnLLVMPass` Information on how to write LLVM transformations and analyses. @@ -308,6 +309,9 @@ For API clients and LLVM developers. :doc:`NVPTXUsage` This document describes using the NVPTX back-end to compile GPU kernels. +:doc:`StackMaps` + LLVM support for mapping instruction addresses to the location of + values and allowing code to be patched. Development Process Documentation =================================