[COFF, ARM64] Implement Intrinsic.sponentry for AArch64

Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909
2024-11-25 20:23:11 +01:00 · 2018-11-01 23:22:25 +00:00 · 2018-11-01 23:22:25 +00:00 · f318e7e734
commit f318e7e734
parent 1557b354d7
10 changed files with 190 additions and 30 deletions
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@ -2926,7 +2926,7 @@ Simple Constants
    hexadecimal notation (see below). The assembler requires the exact
    decimal value of a floating-point constant. For example, the
    assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating
-    decimal in binary. Floating-point constants must have a 
+    decimal in binary. Floating-point constants must have a
    :ref:`floating-point <t_floating>` type.
 **Null pointer constants**
    The identifier '``null``' is recognized as a null pointer constant
@ -3331,7 +3331,7 @@ The following is the syntax for constant expressions:
    value won't fit in the integer type, the result is a
    :ref:`poison value <poisonvalues>`.
 ``uitofp (CST to TYPE)``
-    Convert an unsigned integer constant to the corresponding 
+    Convert an unsigned integer constant to the corresponding
    floating-point constant. TYPE must be a scalar or vector floating-point
    type.  CST must be of scalar or vector integer type. Both CST and TYPE must
    be scalars, or vectors of the same number of elements.
@ -5434,7 +5434,7 @@ Irreducible loop header weights are typically based on profile data.
 '``invariant.group``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-The experimental ``invariant.group`` metadata may be attached to 
+The experimental ``invariant.group`` metadata may be attached to
 ``load``/``store`` instructions referencing a single metadata with no entries.
 The existence of the ``invariant.group`` metadata on the instruction tells
 the optimizer that every ``load`` and ``store`` to the same pointer operand
@ -6875,7 +6875,7 @@ Arguments:
 """"""""""

 The two arguments to the '``fadd``' instruction must be
-:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of 
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
 floating-point values. Both arguments must have identical types.

 Semantics:
@ -6883,7 +6883,7 @@ Semantics:

 The value produced is the floating-point sum of the two operands.
 This instruction is assumed to execute in the default :ref:`floating-point
-environment <floatenv>`. 
+environment <floatenv>`.
 This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
@ -6972,7 +6972,7 @@ Arguments:
 """"""""""

 The two arguments to the '``fsub``' instruction must be
-:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of 
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
 floating-point values. Both arguments must have identical types.

 Semantics:
@ -6980,7 +6980,7 @@ Semantics:

 The value produced is the floating-point difference of the two operands.
 This instruction is assumed to execute in the default :ref:`floating-point
-environment <floatenv>`. 
+environment <floatenv>`.
 This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
@ -7067,7 +7067,7 @@ Arguments:
 """"""""""

 The two arguments to the '``fmul``' instruction must be
-:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of 
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
 floating-point values. Both arguments must have identical types.

 Semantics:
@ -7075,7 +7075,7 @@ Semantics:

 The value produced is the floating-point product of the two operands.
 This instruction is assumed to execute in the default :ref:`floating-point
-environment <floatenv>`. 
+environment <floatenv>`.
 This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
@ -7201,7 +7201,7 @@ Arguments:
 """"""""""

 The two arguments to the '``fdiv``' instruction must be
-:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of 
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
 floating-point values. Both arguments must have identical types.

 Semantics:
@ -7209,7 +7209,7 @@ Semantics:

 The value produced is the floating-point quotient of the two operands.
 This instruction is assumed to execute in the default :ref:`floating-point
-environment <floatenv>`. 
+environment <floatenv>`.
 This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
@ -7344,7 +7344,7 @@ Arguments:
 """"""""""

 The two arguments to the '``frem``' instruction must be
-:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of 
+:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>` of
 floating-point values. Both arguments must have identical types.

 Semantics:
@ -7352,10 +7352,10 @@ Semantics:

 The value produced is the floating-point remainder of the two operands.
 This is the same output as a libm '``fmod``' function, but without any
-possibility of setting ``errno``. The remainder has the same sign as the 
+possibility of setting ``errno``. The remainder has the same sign as the
 dividend.
 This instruction is assumed to execute in the default :ref:`floating-point
-environment <floatenv>`. 
+environment <floatenv>`.
 This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
@ -8809,7 +8809,7 @@ Semantics:

 The '``fptrunc``' instruction casts a ``value`` from a larger
 :ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
-<t_floating>` type.  
+<t_floating>` type.
 This instruction is assumed to execute in the default :ref:`floating-point
 environment <floatenv>`.

@ -10330,6 +10330,27 @@ of the obvious source-language caller.

 This intrinsic is only implemented for x86 and aarch64.

+'``llvm.sponentry``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare i8* @llvm.sponentry()
+
+Overview:
+"""""""""
+
+The '``llvm.sponentry``' intrinsic returns the stack pointer value at
+the entry of the current function calling this intrinsic.
+
+Semantics:
+""""""""""
+
+Note this intrinsic is only verified on AArch64.
+
 '``llvm.frameaddress``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@ -12115,11 +12136,11 @@ Overview:

 The '``llvm.fshl``' family of intrinsic functions performs a funnel shift left:
 the first two values are concatenated as { %a : %b } (%a is the most significant
-bits of the wide value), the combined value is shifted left, and the most 
-significant bits are extracted to produce a result that is the same size as the 
-original arguments. If the first 2 arguments are identical, this is equivalent 
-to a rotate left operation. For vector types, the operation occurs for each 
-element of the vector. The shift argument is treated as an unsigned amount 
+bits of the wide value), the combined value is shifted left, and the most
+significant bits are extracted to produce a result that is the same size as the
+original arguments. If the first 2 arguments are identical, this is equivalent
+to a rotate left operation. For vector types, the operation occurs for each
+element of the vector. The shift argument is treated as an unsigned amount
 modulo the element size of the arguments.

 Arguments:
@ -12161,11 +12182,11 @@ Overview:

 The '``llvm.fshr``' family of intrinsic functions performs a funnel shift right:
 the first two values are concatenated as { %a : %b } (%a is the most significant
-bits of the wide value), the combined value is shifted right, and the least 
-significant bits are extracted to produce a result that is the same size as the 
-original arguments. If the first 2 arguments are identical, this is equivalent 
-to a rotate right operation. For vector types, the operation occurs for each 
-element of the vector. The shift argument is treated as an unsigned amount 
+bits of the wide value), the combined value is shifted right, and the least
+significant bits are extracted to produce a result that is the same size as the
+original arguments. If the first 2 arguments are identical, this is equivalent
+to a rotate right operation. For vector types, the operation occurs for each
+element of the vector. The shift argument is treated as an unsigned amount
 modulo the element size of the arguments.

 Arguments:
@ -13446,7 +13467,7 @@ The '``llvm.masked.expandload``' intrinsic is designed for reading multiple scal
    %Tmp = call <8 x double> @llvm.masked.expandload.v8f64(double* %Bptr, <8 x i1> %Mask, <8 x double> undef)
    ; Store the result in A
    call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> %Tmp, <8 x double>* %Aptr, i32 8, <8 x i1> %Mask)
-    
+
    ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
    %MaskI = bitcast <8 x i1> %Mask to i8
    %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
@ -13503,7 +13524,7 @@ The '``llvm.masked.compressstore``' intrinsic is designed for compressing data i
    %Tmp = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* %Aptr, i32 8, <8 x i1> %Mask, <8 x double> undef)
    ; Store all selected elements consecutively in array B
    call <void> @llvm.masked.compressstore.v8f64(<8 x double> %Tmp, double* %Bptr, <8 x i1> %Mask)
-    
+
    ; %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.
    %MaskI = bitcast <8 x i1> %Mask to i8
    %MaskIPopcnt = call i8 @llvm.ctpop.i8(i8 %MaskI)
@ -14136,7 +14157,7 @@ Overview:

 The '``llvm.experimental.constrained.powi``' intrinsic returns the first operand
 raised to the (positive or negative) power specified by the second operand. The
-order of evaluation of multiplications is not defined. When a vector of 
+order of evaluation of multiplications is not defined. When a vector of
 floating-point type is used, the second argument remains a scalar integer value.


@ -14462,7 +14483,7 @@ Overview:
 """""""""

 The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first
-operand rounded to the nearest integer. It will not raise an inexact 
+operand rounded to the nearest integer. It will not raise an inexact
 floating-point exception if the operand is not an integer.


--- a/include/llvm/CodeGen/ISDOpcodes.h
+++ b/include/llvm/CodeGen/ISDOpcodes.h
@ -70,7 +70,7 @@ namespace ISD {
    /// of the frame or return address to return.  An index of zero corresponds
    /// to the current function's frame or return address, an index of one to
    /// the parent's frame or return address, and so on.
-    FRAMEADDR, RETURNADDR, ADDROFRETURNADDR,
+    FRAMEADDR, RETURNADDR, ADDROFRETURNADDR, SPONENTRY,

    /// LOCAL_RECOVER - Represents the llvm.localrecover intrinsic.
    /// Materializes the offset from the local object pointer of another
--- a/include/llvm/IR/Intrinsics.td
+++ b/include/llvm/IR/Intrinsics.td
@ -320,6 +320,7 @@ def int_gcwrite : Intrinsic<[],
 def int_returnaddress : Intrinsic<[llvm_ptr_ty], [llvm_i32_ty], [IntrNoMem]>;
 def int_addressofreturnaddress : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
 def int_frameaddress  : Intrinsic<[llvm_ptr_ty], [llvm_i32_ty], [IntrNoMem]>;
+def int_sponentry  : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
 def int_read_register  : Intrinsic<[llvm_anyint_ty], [llvm_metadata_ty],
                                   [IntrReadMem], "llvm.read_register">;
 def int_write_register : Intrinsic<[], [llvm_metadata_ty, llvm_anyint_ty],
--- a/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@ -1059,6 +1059,7 @@ void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
  case ISD::FRAMEADDR:
  case ISD::RETURNADDR:
  case ISD::ADDROFRETURNADDR:
+  case ISD::SPONENTRY:
    // These operations lie about being legal: when they claim to be legal,
    // they should actually be custom-lowered.
    Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
--- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@ -5050,6 +5050,10 @@ SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
    setValue(&I, DAG.getNode(ISD::ADDROFRETURNADDR, sdl,
                             TLI.getPointerTy(DAG.getDataLayout())));
    return nullptr;
+  case Intrinsic::sponentry:
+    setValue(&I, DAG.getNode(ISD::SPONENTRY, sdl,
+                             TLI.getPointerTy(DAG.getDataLayout())));
+    return nullptr;
  case Intrinsic::frameaddress:
    setValue(&I, DAG.getNode(ISD::FRAMEADDR, sdl,
                             TLI.getPointerTy(DAG.getDataLayout()),
--- a/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@ -124,6 +124,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
  case ISD::RETURNADDR:                 return "RETURNADDR";
  case ISD::ADDROFRETURNADDR:           return "ADDROFRETURNADDR";
  case ISD::FRAMEADDR:                  return "FRAMEADDR";
+  case ISD::SPONENTRY:                  return "SPONENTRY";
  case ISD::LOCAL_RECOVER:              return "LOCAL_RECOVER";
  case ISD::READ_REGISTER:              return "READ_REGISTER";
  case ISD::WRITE_REGISTER:             return "WRITE_REGISTER";
--- a/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/lib/Target/AArch64/AArch64FastISel.cpp
@ -3450,6 +3450,21 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
    updateValueMap(II, SrcReg);
    return true;
  }
+  case Intrinsic::sponentry: {
+    MachineFrameInfo &MFI = FuncInfo.MF->getFrameInfo();
+
+    // SP = FP + Fixed Object + 16
+    int FI = MFI.CreateFixedObject(4, 0, false);
+    unsigned ResultReg = createResultReg(&AArch64::GPR64spRegClass);
+    BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
+            TII.get(AArch64::ADDXri), ResultReg)
+            .addFrameIndex(FI)
+            .addImm(0)
+            .addImm(0);
+
+    updateValueMap(II, ResultReg);
+    return true;
+  }
  case Intrinsic::memcpy:
  case Intrinsic::memmove: {
    const auto *MTI = cast<MemTransferInst>(II);
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@ -2863,6 +2863,8 @@ SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
    return LowerFP_EXTEND(Op, DAG);
  case ISD::FRAMEADDR:
    return LowerFRAMEADDR(Op, DAG);
+  case ISD::SPONENTRY:
+    return LowerSPONENTRY(Op, DAG);
  case ISD::RETURNADDR:
    return LowerRETURNADDR(Op, DAG);
  case ISD::ADDROFRETURNADDR:
@ -5173,6 +5175,16 @@ SDValue AArch64TargetLowering::LowerFRAMEADDR(SDValue Op,
  return FrameAddr;
 }

+SDValue AArch64TargetLowering::LowerSPONENTRY(SDValue Op,
+                                              SelectionDAG &DAG) const {
+  MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
+
+  EVT VT = getPointerTy(DAG.getDataLayout());
+  SDLoc DL(Op);
+  int FI = MFI.CreateFixedObject(4, 0, false);
+  return DAG.getFrameIndex(FI, VT);
+}
+
 // FIXME? Maybe this could be a TableGen attribute on some registers and
 // this table could be generated automatically from RegInfo.
 unsigned AArch64TargetLowering::getRegisterByName(const char* RegName, EVT VT,
--- a/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/lib/Target/AArch64/AArch64ISelLowering.h
@ -617,6 +617,7 @@ private:
  SDValue LowerVACOPY(SDValue Op, SelectionDAG &DAG) const;
  SDValue LowerVAARG(SDValue Op, SelectionDAG &DAG) const;
  SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
+  SDValue LowerSPONENTRY(SDValue Op, SelectionDAG &DAG) const;
  SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
  SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
  SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
--- a/test/CodeGen/AArch64/sponentry.ll
+++ b/test/CodeGen/AArch64/sponentry.ll
@ -0,0 +1,104 @@
+; RUN: llc -mtriple=aarch64-windows-msvc -disable-fp-elim %s -o - | FileCheck %s
+; RUN: llc -mtriple=aarch64-windows-msvc -fast-isel -disable-fp-elim %s -o - | FileCheck %s
+; RUN: llc -mtriple=aarch64-windows-msvc %s -o - | FileCheck %s --check-prefix=NOFP
+; RUN: llc -mtriple=aarch64-windows-msvc -fast-isel %s -o - | FileCheck %s --check-prefix=NOFP
+
+@env2 = common dso_local global [24 x i64]* null, align 8
+
+define dso_local void @bar() {
+  %1 = call i8* @llvm.sponentry()
+  %2 = load [24 x i64]*, [24 x i64]** @env2, align 8
+  %3 = getelementptr inbounds [24 x i64], [24 x i64]* %2, i32 0, i32 0
+  %4 = bitcast i64* %3 to i8*
+  %5 = call i32 @_setjmpex(i8* %4, i8* %1) #2
+  ret void
+}
+
+; CHECK: bar:
+; CHECK: mov     x29, sp
+; CHECK: add     x1, x29, #16
+; CEHCK: bl      _setjmpex
+
+; NOFP: str     x30, [sp, #-16]!
+; NOFP: add     x1, sp, #16
+
+define dso_local void @foo([24 x i64]*) {
+  %2 = alloca [24 x i64]*, align 8
+  %3 = alloca i32, align 4
+  %4 = alloca [100 x i32], align 4
+  store [24 x i64]* %0, [24 x i64]** %2, align 8
+  %5 = call i8* @llvm.sponentry()
+  %6 = load [24 x i64]*, [24 x i64]** %2, align 8
+  %7 = getelementptr inbounds [24 x i64], [24 x i64]* %6, i32 0, i32 0
+  %8 = bitcast i64* %7 to i8*
+  %9 = call i32 @_setjmpex(i8* %8, i8* %5)
+  store i32 %9, i32* %3, align 4
+  ret void
+}
+
+; CHECK: foo:
+; CHECK: sub     sp, sp, #448
+; CHECK: add     x29, sp, #432
+; CHECK: add     x1, x29, #16
+; CEHCK: bl      _setjmpex
+
+; NOFP: sub     sp, sp, #432
+; NOFP: add     x1, sp, #432
+
+define dso_local void @var_args(i8*, ...) {
+  %2 = alloca i8*, align 8
+  %3 = alloca i8*, align 8
+  store i8* %0, i8** %2, align 8
+  %4 = bitcast i8** %3 to i8*
+  call void @llvm.va_start(i8* %4)
+  %5 = load i8*, i8** %3, align 8
+  %6 = getelementptr inbounds i8, i8* %5, i64 8
+  store i8* %6, i8** %3, align 8
+  %7 = bitcast i8* %5 to i32*
+  %8 = load i32, i32* %7, align 8
+  %9 = bitcast i8** %3 to i8*
+  call void @llvm.va_end(i8* %9)
+  %10 = call i8* @llvm.sponentry()
+  %11 = load [24 x i64]*, [24 x i64]** @env2, align 8
+  %12 = getelementptr inbounds [24 x i64], [24 x i64]* %11, i32 0, i32 0
+  %13 = bitcast i64* %12 to i8*
+  %14 = call i32 @_setjmpex(i8* %13, i8* %10) #3
+  ret void
+}
+
+; CHECK: var_args:
+; CHECK: sub     sp, sp, #96
+; CHECK: add     x29, sp, #16
+; CHECK: add     x1, x29, #80
+; CEHCK: bl      _setjmpex
+
+; NOFP: sub     sp, sp, #96
+; NOFP: add     x1, sp, #96
+
+define dso_local void @manyargs(i64 %x1, i64 %x2, i64 %x3, i64 %x4, i64 %x5, i64 %x6, i64 %x7, i64 %x8, i64 %x9, i64 %x10) {
+  %1 = call i8* @llvm.sponentry()
+  %2 = load [24 x i64]*, [24 x i64]** @env2, align 8
+  %3 = getelementptr inbounds [24 x i64], [24 x i64]* %2, i32 0, i32 0
+  %4 = bitcast i64* %3 to i8*
+  %5 = call i32 @_setjmpex(i8* %4, i8* %1) #2
+  ret void
+}
+
+; CHECK: manyargs:
+; CHECK: stp     x29, x30, [sp, #-16]!
+; CHECK: add     x1, x29, #16
+
+; NOFP: str     x30, [sp, #-16]!
+; NOFP: add     x1, sp, #16
+
+; Function Attrs: nounwind readnone
+declare i8* @llvm.sponentry()
+
+; Function Attrs: returns_twice
+declare dso_local i32 @_setjmpex(i8*, i8*)
+
+; Function Attrs: nounwind
+declare void @llvm.va_start(i8*) #1
+
+; Function Attrs: nounwind
+declare void @llvm.va_end(i8*) #1