1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

91 Commits

Author SHA1 Message Date
NAKAMURA Takumi
5ce9110622 Untabify.
llvm-svn: 251769
2015-11-02 01:38:12 +00:00
Jonas Paulsson
6bdd2eb055 [SystemZ] Use load-and-test for fp compare with 0 if vector support is present.
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.

Reviewed by Ulrich Weigand.

llvm-svn: 249664
2015-10-08 07:40:16 +00:00
NAKAMURA Takumi
dd4e00ed1e Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
Ulrich Weigand
6643dc8666 [SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
  { <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.

This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.

Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.

However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.

This patch fixes the crash by implementing CanLowerReturn.  As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.

llvm-svn: 244889
2015-08-13 13:37:06 +00:00
Mehdi Amini
d5d8989892 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Mehdi Amini
761f22ac31 Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini
547576cef1 Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini
abf873c623 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Benjamin Kramer
89b0e53c15 [TargetLowering] StringRefize asm constraint getters.
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.

llvm-svn: 241411
2015-07-05 19:29:18 +00:00
Matt Arsenault
b0334192af Add address space argument to isLegalAddressingMode
This is important because of different addressing modes
depending on the address space for GPU targets.

This only adds the argument, and does not update
any of the uses to provide the correct address space.

llvm-svn: 238723
2015-06-01 05:31:59 +00:00
Matthias Braun
3b3ecc12b2 Change getTargetNodeName() to produce compiler warnings for missing cases, fix them
llvm-svn: 236775
2015-05-07 21:33:59 +00:00
Ulrich Weigand
cd1cd8d125 [SystemZ] Add vector intrinsics
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.

For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.

For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result.  Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.

Based on a patch by Richard Sandiford.

llvm-svn: 236527
2015-05-05 19:31:09 +00:00
Ulrich Weigand
096deccae0 [SystemZ] Handle sub-128 vectors
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register.  We therefore
want to legalize those types by widening the vector rather than promoting
the elements.

The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results.  One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.

Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended.  Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.

Based on a patch by Richard Sandiford.

llvm-svn: 236525
2015-05-05 19:29:21 +00:00
Ulrich Weigand
119ff7dab3 [SystemZ] Add CodeGen support for v4f32
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used.  Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.

This particularly helps with llvmpipe.

Based on a patch by Richard Sandiford.

llvm-svn: 236523
2015-05-05 19:27:45 +00:00
Ulrich Weigand
2413036cdd [SystemZ] Add CodeGen support for v2f64
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.

Based on a patch by Richard Sandiford.

llvm-svn: 236522
2015-05-05 19:26:48 +00:00
Ulrich Weigand
938ff50eda [SystemZ] Add CodeGen support for integer vector types
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility.  This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).

When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
  (except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.

The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.

However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.

These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level.  This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.

Based on a patch by Richard Sandiford.

llvm-svn: 236521
2015-05-05 19:25:42 +00:00
Ulrich Weigand
2031d6050e [SystemZ] Support transactional execution on zEC12
The zEC12 provides the transactional-execution facility.  This is exposed
to users via a set of builtin routines on other compilers.  This patch
adds LLVM support to enable those builtins.  In partciular, the patch:

- adds the transactional-execution and processor-assist facilities
- adds MC support for all instructions provided by those facilities
- adds LLVM intrinsics for those instructions and hooks them up for CodeGen
- adds CodeGen support to optimize CC return value checking

Since this is first use of target-specific intrinsics on the platform,
the patch creates the include/llvm/IR/IntrinsicsSystemZ.td file and
hooks it up in Intrinsics.td.  I've also changed Triple::getArchTypePrefix
to return "s390" instead of "systemz", since the naming convention for
GCC intrinsics uses "s390" on the platform, and it neemed more straight-
forward to use the same convention for LLVM IR intrinsics.

An associated clang patch makes the intrinsics (and command line switches)
available at the source-language level.

For reference, the transactional-execution instructions are documented
in the z/Architecture Principles of Operation for the zEC12:
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/DZ9ZR009.pdf
The associated builtins are documented in the GCC manual:
http://gcc.gnu.org/onlinedocs/gcc/S_002f390-System-z-Built-in-Functions.html


Index: llvm-head/lib/Target/SystemZ/SystemZOperators.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZOperators.td
+++ llvm-head/lib/Target/SystemZ/SystemZOperators.td
@@ -79,6 +79,9 @@ def SDT_ZI32Intrinsic       : SDTypeProf
 def SDT_ZPrefetch           : SDTypeProfile<0, 2,
                                             [SDTCisVT<0, i32>,
                                              SDTCisPtrTy<1>]>;
+def SDT_ZTBegin             : SDTypeProfile<0, 2,
+                                            [SDTCisPtrTy<0>,
+                                             SDTCisVT<1, i32>]>;
 
 //===----------------------------------------------------------------------===//
 // Node definitions
@@ -180,6 +183,15 @@ def z_prefetch          : SDNode<"System
                                  [SDNPHasChain, SDNPMayLoad, SDNPMayStore,
                                   SDNPMemOperand]>;
 
+def z_tbegin            : SDNode<"SystemZISD::TBEGIN", SDT_ZTBegin,
+                                 [SDNPHasChain, SDNPOutGlue, SDNPMayStore,
+                                  SDNPSideEffect]>;
+def z_tbegin_nofloat    : SDNode<"SystemZISD::TBEGIN_NOFLOAT", SDT_ZTBegin,
+                                 [SDNPHasChain, SDNPOutGlue, SDNPMayStore,
+                                  SDNPSideEffect]>;
+def z_tend              : SDNode<"SystemZISD::TEND", SDTNone,
+                                 [SDNPHasChain, SDNPOutGlue, SDNPSideEffect]>;
+
 //===----------------------------------------------------------------------===//
 // Pattern fragments
 //===----------------------------------------------------------------------===//
Index: llvm-head/lib/Target/SystemZ/SystemZInstrFormats.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZInstrFormats.td
+++ llvm-head/lib/Target/SystemZ/SystemZInstrFormats.td
@@ -473,6 +473,17 @@ class InstSS<bits<8> op, dag outs, dag i
   let Inst{15-0}  = BD2;
 }
 
+class InstS<bits<16> op, dag outs, dag ins, string asmstr, list<dag> pattern>
+  : InstSystemZ<4, outs, ins, asmstr, pattern> {
+  field bits<32> Inst;
+  field bits<32> SoftFail = 0;
+
+  bits<16> BD2;
+
+  let Inst{31-16} = op;
+  let Inst{15-0}  = BD2;
+}
+
 //===----------------------------------------------------------------------===//
 // Instruction definitions with semantics
 //===----------------------------------------------------------------------===//
Index: llvm-head/lib/Target/SystemZ/SystemZInstrInfo.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZInstrInfo.td
+++ llvm-head/lib/Target/SystemZ/SystemZInstrInfo.td
@@ -1362,6 +1362,60 @@ let Defs = [CC] in {
 }
 
 //===----------------------------------------------------------------------===//
+// Transactional execution
+//===----------------------------------------------------------------------===//
+
+let Predicates = [FeatureTransactionalExecution] in {
+  // Transaction Begin
+  let hasSideEffects = 1, mayStore = 1,
+      usesCustomInserter = 1, Defs = [CC] in {
+    def TBEGIN : InstSIL<0xE560,
+                         (outs), (ins bdaddr12only:$BD1, imm32zx16:$I2),
+                         "tbegin\t$BD1, $I2",
+                         [(z_tbegin bdaddr12only:$BD1, imm32zx16:$I2)]>;
+    def TBEGIN_nofloat : Pseudo<(outs), (ins bdaddr12only:$BD1, imm32zx16:$I2),
+                                [(z_tbegin_nofloat bdaddr12only:$BD1,
+                                                   imm32zx16:$I2)]>;
+    def TBEGINC : InstSIL<0xE561,
+                          (outs), (ins bdaddr12only:$BD1, imm32zx16:$I2),
+                          "tbeginc\t$BD1, $I2",
+                          [(int_s390_tbeginc bdaddr12only:$BD1,
+                                             imm32zx16:$I2)]>;
+  }
+
+  // Transaction End
+  let hasSideEffects = 1, Defs = [CC], BD2 = 0 in
+    def TEND : InstS<0xB2F8, (outs), (ins), "tend", [(z_tend)]>;
+
+  // Transaction Abort
+  let hasSideEffects = 1, isTerminator = 1, isBarrier = 1 in
+    def TABORT : InstS<0xB2FC, (outs), (ins bdaddr12only:$BD2),
+                       "tabort\t$BD2",
+                       [(int_s390_tabort bdaddr12only:$BD2)]>;
+
+  // Nontransactional Store
+  let hasSideEffects = 1 in
+    def NTSTG : StoreRXY<"ntstg", 0xE325, int_s390_ntstg, GR64, 8>;
+
+  // Extract Transaction Nesting Depth
+  let hasSideEffects = 1 in
+    def ETND : InherentRRE<"etnd", 0xB2EC, GR32, (int_s390_etnd)>;
+}
+
+//===----------------------------------------------------------------------===//
+// Processor assist
+//===----------------------------------------------------------------------===//
+
+let Predicates = [FeatureProcessorAssist] in {
+  let hasSideEffects = 1, R4 = 0 in
+    def PPA : InstRRF<0xB2E8, (outs), (ins GR64:$R1, GR64:$R2, imm32zx4:$R3),
+                      "ppa\t$R1, $R2, $R3", []>;
+  def : Pat<(int_s390_ppa_txassist GR32:$src),
+            (PPA (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$src, subreg_l32),
+                 0, 1)>;
+}
+
+//===----------------------------------------------------------------------===//
 // Miscellaneous Instructions.
 //===----------------------------------------------------------------------===//
 
Index: llvm-head/lib/Target/SystemZ/SystemZProcessors.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZProcessors.td
+++ llvm-head/lib/Target/SystemZ/SystemZProcessors.td
@@ -60,6 +60,16 @@ def FeatureMiscellaneousExtensions : Sys
   "Assume that the miscellaneous-extensions facility is installed"
 >;
 
+def FeatureTransactionalExecution : SystemZFeature<
+  "transactional-execution", "TransactionalExecution",
+  "Assume that the transactional-execution facility is installed"
+>;
+
+def FeatureProcessorAssist : SystemZFeature<
+  "processor-assist", "ProcessorAssist",
+  "Assume that the processor-assist facility is installed"
+>;
+
 def : Processor<"generic", NoItineraries, []>;
 def : Processor<"z10", NoItineraries, []>;
 def : Processor<"z196", NoItineraries,
@@ -70,4 +80,5 @@ def : Processor<"zEC12", NoItineraries,
                 [FeatureDistinctOps, FeatureLoadStoreOnCond, FeatureHighWord,
                  FeatureFPExtension, FeaturePopulationCount,
                  FeatureFastSerialization, FeatureInterlockedAccess1,
-                 FeatureMiscellaneousExtensions]>;
+                 FeatureMiscellaneousExtensions,
+                 FeatureTransactionalExecution, FeatureProcessorAssist]>;
Index: llvm-head/lib/Target/SystemZ/SystemZSubtarget.cpp
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZSubtarget.cpp
+++ llvm-head/lib/Target/SystemZ/SystemZSubtarget.cpp
@@ -40,6 +40,7 @@ SystemZSubtarget::SystemZSubtarget(const
       HasLoadStoreOnCond(false), HasHighWord(false), HasFPExtension(false),
       HasPopulationCount(false), HasFastSerialization(false),
       HasInterlockedAccess1(false), HasMiscellaneousExtensions(false),
+      HasTransactionalExecution(false), HasProcessorAssist(false),
       TargetTriple(TT), InstrInfo(initializeSubtargetDependencies(CPU, FS)),
       TLInfo(TM, *this), TSInfo(*TM.getDataLayout()), FrameLowering() {}
 
Index: llvm-head/lib/Target/SystemZ/SystemZSubtarget.h
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZSubtarget.h
+++ llvm-head/lib/Target/SystemZ/SystemZSubtarget.h
@@ -42,6 +42,8 @@ protected:
   bool HasFastSerialization;
   bool HasInterlockedAccess1;
   bool HasMiscellaneousExtensions;
+  bool HasTransactionalExecution;
+  bool HasProcessorAssist;
 
 private:
   Triple TargetTriple;
@@ -102,6 +104,12 @@ public:
     return HasMiscellaneousExtensions;
   }
 
+  // Return true if the target has the transactional-execution facility.
+  bool hasTransactionalExecution() const { return HasTransactionalExecution; }
+
+  // Return true if the target has the processor-assist facility.
+  bool hasProcessorAssist() const { return HasProcessorAssist; }
+
   // Return true if GV can be accessed using LARL for reloc model RM
   // and code model CM.
   bool isPC32DBLSymbol(const GlobalValue *GV, Reloc::Model RM,
Index: llvm-head/lib/Support/Triple.cpp
===================================================================
--- llvm-head.orig/lib/Support/Triple.cpp
+++ llvm-head/lib/Support/Triple.cpp
@@ -92,7 +92,7 @@ const char *Triple::getArchTypePrefix(Ar
   case sparcv9:
   case sparc:       return "sparc";
 
-  case systemz:     return "systemz";
+  case systemz:     return "s390";
 
   case x86:
   case x86_64:      return "x86";
Index: llvm-head/include/llvm/IR/Intrinsics.td
===================================================================
--- llvm-head.orig/include/llvm/IR/Intrinsics.td
+++ llvm-head/include/llvm/IR/Intrinsics.td
@@ -634,3 +634,4 @@ include "llvm/IR/IntrinsicsNVVM.td"
 include "llvm/IR/IntrinsicsMips.td"
 include "llvm/IR/IntrinsicsR600.td"
 include "llvm/IR/IntrinsicsBPF.td"
+include "llvm/IR/IntrinsicsSystemZ.td"
Index: llvm-head/include/llvm/IR/IntrinsicsSystemZ.td
===================================================================
--- /dev/null
+++ llvm-head/include/llvm/IR/IntrinsicsSystemZ.td
@@ -0,0 +1,46 @@
+//===- IntrinsicsSystemZ.td - Defines SystemZ intrinsics ---*- tablegen -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines all of the SystemZ-specific intrinsics.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+//
+// Transactional-execution intrinsics
+//
+//===----------------------------------------------------------------------===//
+
+let TargetPrefix = "s390" in {
+  def int_s390_tbegin : Intrinsic<[llvm_i32_ty], [llvm_ptr_ty, llvm_i32_ty],
+                                  [IntrNoDuplicate]>;
+
+  def int_s390_tbegin_nofloat : Intrinsic<[llvm_i32_ty],
+                                          [llvm_ptr_ty, llvm_i32_ty],
+                                          [IntrNoDuplicate]>;
+
+  def int_s390_tbeginc : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],
+                                   [IntrNoDuplicate]>;
+
+  def int_s390_tabort : Intrinsic<[], [llvm_i64_ty],
+                                  [IntrNoReturn, Throws]>;
+
+  def int_s390_tend : GCCBuiltin<"__builtin_tend">,
+                      Intrinsic<[llvm_i32_ty], []>;
+
+  def int_s390_etnd : GCCBuiltin<"__builtin_tx_nesting_depth">,
+                      Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>;
+
+  def int_s390_ntstg : Intrinsic<[], [llvm_i64_ty, llvm_ptr64_ty],
+                                 [IntrReadWriteArgMem]>;
+
+  def int_s390_ppa_txassist : GCCBuiltin<"__builtin_tx_assist">,
+                              Intrinsic<[], [llvm_i32_ty]>;
+}
+
Index: llvm-head/lib/Target/SystemZ/SystemZ.h
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZ.h
+++ llvm-head/lib/Target/SystemZ/SystemZ.h
@@ -68,6 +68,18 @@ const unsigned CCMASK_TM_MSB_0       = C
 const unsigned CCMASK_TM_MSB_1       = CCMASK_2 | CCMASK_3;
 const unsigned CCMASK_TM             = CCMASK_ANY;
 
+// Condition-code mask assignments for TRANSACTION_BEGIN.
+const unsigned CCMASK_TBEGIN_STARTED       = CCMASK_0;
+const unsigned CCMASK_TBEGIN_INDETERMINATE = CCMASK_1;
+const unsigned CCMASK_TBEGIN_TRANSIENT     = CCMASK_2;
+const unsigned CCMASK_TBEGIN_PERSISTENT    = CCMASK_3;
+const unsigned CCMASK_TBEGIN               = CCMASK_ANY;
+
+// Condition-code mask assignments for TRANSACTION_END.
+const unsigned CCMASK_TEND_TX   = CCMASK_0;
+const unsigned CCMASK_TEND_NOTX = CCMASK_2;
+const unsigned CCMASK_TEND      = CCMASK_TEND_TX | CCMASK_TEND_NOTX;
+
 // The position of the low CC bit in an IPM result.
 const unsigned IPM_CC = 28;
 
Index: llvm-head/lib/Target/SystemZ/SystemZISelLowering.h
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZISelLowering.h
+++ llvm-head/lib/Target/SystemZ/SystemZISelLowering.h
@@ -146,6 +146,15 @@ enum {
   // Perform a serialization operation.  (BCR 15,0 or BCR 14,0.)
   SERIALIZE,
 
+  // Transaction begin.  The first operand is the chain, the second
+  // the TDB pointer, and the third the immediate control field.
+  // Returns chain and glue.
+  TBEGIN,
+  TBEGIN_NOFLOAT,
+
+  // Transaction end.  Just the chain operand.  Returns chain and glue.
+  TEND,
+
   // Wrappers around the inner loop of an 8- or 16-bit ATOMIC_SWAP or
   // ATOMIC_LOAD_<op>.
   //
@@ -318,6 +327,7 @@ private:
   SDValue lowerSTACKSAVE(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerSTACKRESTORE(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerPREFETCH(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;
 
   // If the last instruction before MBBI in MBB was some form of COMPARE,
   // try to replace it with a COMPARE AND BRANCH just before MBBI.
@@ -355,6 +365,10 @@ private:
   MachineBasicBlock *emitStringWrapper(MachineInstr *MI,
                                        MachineBasicBlock *BB,
                                        unsigned Opcode) const;
+  MachineBasicBlock *emitTransactionBegin(MachineInstr *MI,
+                                          MachineBasicBlock *MBB,
+                                          unsigned Opcode,
+                                          bool NoFloat) const;
 };
 } // end namespace llvm
 
Index: llvm-head/lib/Target/SystemZ/SystemZISelLowering.cpp
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ llvm-head/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -20,6 +20,7 @@
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
+#include "llvm/IR/Intrinsics.h"
 #include <cctype>
 
 using namespace llvm;
@@ -304,6 +305,9 @@ SystemZTargetLowering::SystemZTargetLowe
   // Codes for which we want to perform some z-specific combinations.
   setTargetDAGCombine(ISD::SIGN_EXTEND);
 
+  // Handle intrinsics.
+  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
+
   // We want to use MVC in preference to even a single load/store pair.
   MaxStoresPerMemcpy = 0;
   MaxStoresPerMemcpyOptSize = 0;
@@ -1031,6 +1035,53 @@ prepareVolatileOrAtomicLoad(SDValue Chai
   return DAG.getNode(SystemZISD::SERIALIZE, DL, MVT::Other, Chain);
 }
 
+// Return true if Op is an intrinsic node with chain that returns the CC value
+// as its only (other) argument.  Provide the associated SystemZISD opcode and
+// the mask of valid CC values if so.
+static bool isIntrinsicWithCCAndChain(SDValue Op, unsigned &Opcode,
+                                      unsigned &CCValid) {
+  unsigned Id = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
+  switch (Id) {
+  case Intrinsic::s390_tbegin:
+    Opcode = SystemZISD::TBEGIN;
+    CCValid = SystemZ::CCMASK_TBEGIN;
+    return true;
+
+  case Intrinsic::s390_tbegin_nofloat:
+    Opcode = SystemZISD::TBEGIN_NOFLOAT;
+    CCValid = SystemZ::CCMASK_TBEGIN;
+    return true;
+
+  case Intrinsic::s390_tend:
+    Opcode = SystemZISD::TEND;
+    CCValid = SystemZ::CCMASK_TEND;
+    return true;
+
+  default:
+    return false;
+  }
+}
+
+// Emit an intrinsic with chain with a glued value instead of its CC result.
+static SDValue emitIntrinsicWithChainAndGlue(SelectionDAG &DAG, SDValue Op,
+                                             unsigned Opcode) {
+  // Copy all operands except the intrinsic ID.
+  unsigned NumOps = Op.getNumOperands();
+  SmallVector<SDValue, 6> Ops;
+  Ops.reserve(NumOps - 1);
+  Ops.push_back(Op.getOperand(0));
+  for (unsigned I = 2; I < NumOps; ++I)
+    Ops.push_back(Op.getOperand(I));
+
+  assert(Op->getNumValues() == 2 && "Expected only CC result and chain");
+  SDVTList RawVTs = DAG.getVTList(MVT::Other, MVT::Glue);
+  SDValue Intr = DAG.getNode(Opcode, SDLoc(Op), RawVTs, Ops);
+  SDValue OldChain = SDValue(Op.getNode(), 1);
+  SDValue NewChain = SDValue(Intr.getNode(), 0);
+  DAG.ReplaceAllUsesOfValueWith(OldChain, NewChain);
+  return Intr;
+}
+
 // CC is a comparison that will be implemented using an integer or
 // floating-point comparison.  Return the condition code mask for
 // a branch on true.  In the integer case, CCMASK_CMP_UO is set for
@@ -1588,9 +1639,53 @@ static void adjustForTestUnderMask(Selec
   C.CCMask = NewCCMask;
 }
 
+// Return a Comparison that tests the condition-code result of intrinsic
+// node Call against constant integer CC using comparison code Cond.
+// Opcode is the opcode of the SystemZISD operation for the intrinsic
+// and CCValid is the set of possible condition-code results.
+static Comparison getIntrinsicCmp(SelectionDAG &DAG, unsigned Opcode,
+                                  SDValue Call, unsigned CCValid, uint64_t CC,
+                                  ISD::CondCode Cond) {
+  Comparison C(Call, SDValue());
+  C.Opcode = Opcode;
+  C.CCValid = CCValid;
+  if (Cond == ISD::SETEQ)
+    // bit 3 for CC==0, bit 0 for CC==3, always false for CC>3.
+    C.CCMask = CC < 4 ? 1 << (3 - CC) : 0;
+  else if (Cond == ISD::SETNE)
+    // ...and the inverse of that.
+    C.CCMask = CC < 4 ? ~(1 << (3 - CC)) : -1;
+  else if (Cond == ISD::SETLT || Cond == ISD::SETULT)
+    // bits above bit 3 for CC==0 (always false), bits above bit 0 for CC==3,
+    // always true for CC>3.
+    C.CCMask = CC < 4 ? -1 << (4 - CC) : -1;
+  else if (Cond == ISD::SETGE || Cond == ISD::SETUGE)
+    // ...and the inverse of that.
+    C.CCMask = CC < 4 ? ~(-1 << (4 - CC)) : 0;
+  else if (Cond == ISD::SETLE || Cond == ISD::SETULE)
+    // bit 3 and above for CC==0, bit 0 and above for CC==3 (always true),
+    // always true for CC>3.
+    C.CCMask = CC < 4 ? -1 << (3 - CC) : -1;
+  else if (Cond == ISD::SETGT || Cond == ISD::SETUGT)
+    // ...and the inverse of that.
+    C.CCMask = CC < 4 ? ~(-1 << (3 - CC)) : 0;
+  else
+    llvm_unreachable("Unexpected integer comparison type");
+  C.CCMask &= CCValid;
+  return C;
+}
+
 // Decide how to implement a comparison of type Cond between CmpOp0 with CmpOp1.
 static Comparison getCmp(SelectionDAG &DAG, SDValue CmpOp0, SDValue CmpOp1,
                          ISD::CondCode Cond) {
+  if (CmpOp1.getOpcode() == ISD::Constant) {
+    uint64_t Constant = cast<ConstantSDNode>(CmpOp1)->getZExtValue();
+    unsigned Opcode, CCValid;
+    if (CmpOp0.getOpcode() == ISD::INTRINSIC_W_CHAIN &&
+        CmpOp0.getResNo() == 0 && CmpOp0->hasNUsesOfValue(1, 0) &&
+        isIntrinsicWithCCAndChain(CmpOp0, Opcode, CCValid))
+      return getIntrinsicCmp(DAG, Opcode, CmpOp0, CCValid, Constant, Cond);
+  }
   Comparison C(CmpOp0, CmpOp1);
   C.CCMask = CCMaskForCondCode(Cond);
   if (C.Op0.getValueType().isFloatingPoint()) {
@@ -1632,6 +1727,17 @@ static Comparison getCmp(SelectionDAG &D
 
 // Emit the comparison instruction described by C.
 static SDValue emitCmp(SelectionDAG &DAG, SDLoc DL, Comparison &C) {
+  if (!C.Op1.getNode()) {
+    SDValue Op;
+    switch (C.Op0.getOpcode()) {
+    case ISD::INTRINSIC_W_CHAIN:
+      Op = emitIntrinsicWithChainAndGlue(DAG, C.Op0, C.Opcode);
+      break;
+    default:
+      llvm_unreachable("Invalid comparison operands");
+    }
+    return SDValue(Op.getNode(), Op->getNumValues() - 1);
+  }
   if (C.Opcode == SystemZISD::ICMP)
     return DAG.getNode(SystemZISD::ICMP, DL, MVT::Glue, C.Op0, C.Op1,
                        DAG.getConstant(C.ICmpType, MVT::i32));
@@ -1713,7 +1819,6 @@ SDValue SystemZTargetLowering::lowerSETC
 }
 
 SDValue SystemZTargetLowering::lowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
-  SDValue Chain    = Op.getOperand(0);
   ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
   SDValue CmpOp0   = Op.getOperand(2);
   SDValue CmpOp1   = Op.getOperand(3);
@@ -1723,7 +1828,7 @@ SDValue SystemZTargetLowering::lowerBR_C
   Comparison C(getCmp(DAG, CmpOp0, CmpOp1, CC));
   SDValue Glue = emitCmp(DAG, DL, C);
   return DAG.getNode(SystemZISD::BR_CCMASK, DL, Op.getValueType(),
-                     Chain, DAG.getConstant(C.CCValid, MVT::i32),
+                     Op.getOperand(0), DAG.getConstant(C.CCValid, MVT::i32),
                      DAG.getConstant(C.CCMask, MVT::i32), Dest, Glue);
 }
 
@@ -2561,6 +2666,30 @@ SDValue SystemZTargetLowering::lowerPREF
                                  Node->getMemoryVT(), Node->getMemOperand());
 }
 
+// Return an i32 that contains the value of CC immediately after After,
+// whose final operand must be MVT::Glue.
+static SDValue getCCResult(SelectionDAG &DAG, SDNode *After) {
+  SDValue Glue = SDValue(After, After->getNumValues() - 1);
+  SDValue IPM = DAG.getNode(SystemZISD::IPM, SDLoc(After), MVT::i32, Glue);
+  return DAG.getNode(ISD::SRL, SDLoc(After), MVT::i32, IPM,
+                     DAG.getConstant(SystemZ::IPM_CC, MVT::i32));
+}
+
+SDValue
+SystemZTargetLowering::lowerINTRINSIC_W_CHAIN(SDValue Op,
+                                              SelectionDAG &DAG) const {
+  unsigned Opcode, CCValid;
+  if (isIntrinsicWithCCAndChain(Op, Opcode, CCValid)) {
+    assert(Op->getNumValues() == 2 && "Expected only CC result and chain");
+    SDValue Glued = emitIntrinsicWithChainAndGlue(DAG, Op, Opcode);
+    SDValue CC = getCCResult(DAG, Glued.getNode());
+    DAG.ReplaceAllUsesOfValueWith(SDValue(Op.getNode(), 0), CC);
+    return SDValue();
+  }
+
+  return SDValue();
+}
+
 SDValue SystemZTargetLowering::LowerOperation(SDValue Op,
                                               SelectionDAG &DAG) const {
   switch (Op.getOpcode()) {
@@ -2634,6 +2763,8 @@ SDValue SystemZTargetLowering::LowerOper
     return lowerSTACKRESTORE(Op, DAG);
   case ISD::PREFETCH:
     return lowerPREFETCH(Op, DAG);
+  case ISD::INTRINSIC_W_CHAIN:
+    return lowerINTRINSIC_W_CHAIN(Op, DAG);
   default:
     llvm_unreachable("Unexpected node to lower");
   }
@@ -2674,6 +2805,9 @@ const char *SystemZTargetLowering::getTa
     OPCODE(SEARCH_STRING);
     OPCODE(IPM);
     OPCODE(SERIALIZE);
+    OPCODE(TBEGIN);
+    OPCODE(TBEGIN_NOFLOAT);
+    OPCODE(TEND);
     OPCODE(ATOMIC_SWAPW);
     OPCODE(ATOMIC_LOADW_ADD);
     OPCODE(ATOMIC_LOADW_SUB);
@@ -3501,6 +3635,50 @@ SystemZTargetLowering::emitStringWrapper
   return DoneMBB;
 }
 
+// Update TBEGIN instruction with final opcode and register clobbers.
+MachineBasicBlock *
+SystemZTargetLowering::emitTransactionBegin(MachineInstr *MI,
+                                            MachineBasicBlock *MBB,
+                                            unsigned Opcode,
+                                            bool NoFloat) const {
+  MachineFunction &MF = *MBB->getParent();
+  const TargetFrameLowering *TFI = Subtarget.getFrameLowering();
+  const SystemZInstrInfo *TII = Subtarget.getInstrInfo();
+
+  // Update opcode.
+  MI->setDesc(TII->get(Opcode));
+
+  // We cannot handle a TBEGIN that clobbers the stack or frame pointer.
+  // Make sure to add the corresponding GRSM bits if they are missing.
+  uint64_t Control = MI->getOperand(2).getImm();
+  static const unsigned GPRControlBit[16] = {
+    0x8000, 0x8000, 0x4000, 0x4000, 0x2000, 0x2000, 0x1000, 0x1000,
+    0x0800, 0x0800, 0x0400, 0x0400, 0x0200, 0x0200, 0x0100, 0x0100
+  };
+  Control |= GPRControlBit[15];
+  if (TFI->hasFP(MF))
+    Control |= GPRControlBit[11];
+  MI->getOperand(2).setImm(Control);
+
+  // Add GPR clobbers.
+  for (int I = 0; I < 16; I++) {
+    if ((Control & GPRControlBit[I]) == 0) {
+      unsigned Reg = SystemZMC::GR64Regs[I];
+      MI->addOperand(MachineOperand::CreateReg(Reg, true, true));
+    }
+  }
+
+  // Add FPR clobbers.
+  if (!NoFloat && (Control & 4) != 0) {
+    for (int I = 0; I < 16; I++) {
+      unsigned Reg = SystemZMC::FP64Regs[I];
+      MI->addOperand(MachineOperand::CreateReg(Reg, true, true));
+    }
+  }
+
+  return MBB;
+}
+
 MachineBasicBlock *SystemZTargetLowering::
 EmitInstrWithCustomInserter(MachineInstr *MI, MachineBasicBlock *MBB) const {
   switch (MI->getOpcode()) {
@@ -3742,6 +3920,12 @@ EmitInstrWithCustomInserter(MachineInstr
     return emitStringWrapper(MI, MBB, SystemZ::MVST);
   case SystemZ::SRSTLoop:
     return emitStringWrapper(MI, MBB, SystemZ::SRST);
+  case SystemZ::TBEGIN:
+    return emitTransactionBegin(MI, MBB, SystemZ::TBEGIN, false);
+  case SystemZ::TBEGIN_nofloat:
+    return emitTransactionBegin(MI, MBB, SystemZ::TBEGIN, true);
+  case SystemZ::TBEGINC:
+    return emitTransactionBegin(MI, MBB, SystemZ::TBEGINC, true);
   default:
     llvm_unreachable("Unexpected instr type to insert");
   }
Index: llvm-head/test/CodeGen/SystemZ/htm-intrinsics.ll
===================================================================
--- /dev/null
+++ llvm-head/test/CodeGen/SystemZ/htm-intrinsics.ll
@@ -0,0 +1,352 @@
+; Test transactional-execution intrinsics.
+;
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=zEC12 | FileCheck %s
+
+declare i32 @llvm.s390.tbegin(i8 *, i32)
+declare i32 @llvm.s390.tbegin.nofloat(i8 *, i32)
+declare void @llvm.s390.tbeginc(i8 *, i32)
+declare i32 @llvm.s390.tend()
+declare void @llvm.s390.tabort(i64)
+declare void @llvm.s390.ntstg(i64, i64 *)
+declare i32 @llvm.s390.etnd()
+declare void @llvm.s390.ppa.txassist(i32)
+
+; TBEGIN.
+define void @test_tbegin() {
+; CHECK-LABEL: test_tbegin:
+; CHECK-NOT: stmg
+; CHECK: std %f8,
+; CHECK: std %f9,
+; CHECK: std %f10,
+; CHECK: std %f11,
+; CHECK: std %f12,
+; CHECK: std %f13,
+; CHECK: std %f14,
+; CHECK: std %f15,
+; CHECK: tbegin 0, 65292
+; CHECK: ld %f8,
+; CHECK: ld %f9,
+; CHECK: ld %f10,
+; CHECK: ld %f11,
+; CHECK: ld %f12,
+; CHECK: ld %f13,
+; CHECK: ld %f14,
+; CHECK: ld %f15,
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin(i8 *null, i32 65292)
+  ret void
+}
+
+; TBEGIN (nofloat).
+define void @test_tbegin_nofloat1() {
+; CHECK-LABEL: test_tbegin_nofloat1:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  ret void
+}
+
+; TBEGIN (nofloat) with integer CC return value.
+define i32 @test_tbegin_nofloat2() {
+; CHECK-LABEL: test_tbegin_nofloat2:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  ret i32 %res
+}
+
+; TBEGIN (nofloat) with implicit CC check.
+define void @test_tbegin_nofloat3(i32 *%ptr) {
+; CHECK-LABEL: test_tbegin_nofloat3:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: jnh  {{\.L*}}
+; CHECK: mvhi 0(%r2), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void
+}
+
+; TBEGIN (nofloat) with dual CC use.
+define i32 @test_tbegin_nofloat4(i32 %pad, i32 *%ptr) {
+; CHECK-LABEL: test_tbegin_nofloat4:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: cijlh %r2, 2,  {{\.L*}}
+; CHECK: mvhi 0(%r3), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret i32 %res
+}
+
+; TBEGIN (nofloat) with register.
+define void @test_tbegin_nofloat5(i8 *%ptr) {
+; CHECK-LABEL: test_tbegin_nofloat5:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0(%r2), 65292
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *%ptr, i32 65292)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0x0f00.
+define void @test_tbegin_nofloat6() {
+; CHECK-LABEL: test_tbegin_nofloat6:
+; CHECK: stmg %r6, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 3840
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 3840)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xf100.
+define void @test_tbegin_nofloat7() {
+; CHECK-LABEL: test_tbegin_nofloat7:
+; CHECK: stmg %r8, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 61696
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 61696)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xfe00 -- stack pointer added automatically.
+define void @test_tbegin_nofloat8() {
+; CHECK-LABEL: test_tbegin_nofloat8:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65280
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65024)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xfb00 -- no frame pointer needed.
+define void @test_tbegin_nofloat9() {
+; CHECK-LABEL: test_tbegin_nofloat9:
+; CHECK: stmg %r10, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 64256
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 64256)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xfb00 -- frame pointer added automatically.
+define void @test_tbegin_nofloat10(i64 %n) {
+; CHECK-LABEL: test_tbegin_nofloat10:
+; CHECK: stmg %r11, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65280
+; CHECK: br %r14
+  %buf = alloca i8, i64 %n
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 64256)
+  ret void
+}
+
+; TBEGINC.
+define void @test_tbeginc() {
+; CHECK-LABEL: test_tbeginc:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbeginc 0, 65288
+; CHECK: br %r14
+  call void @llvm.s390.tbeginc(i8 *null, i32 65288)
+  ret void
+}
+
+; TEND with integer CC return value.
+define i32 @test_tend1() {
+; CHECK-LABEL: test_tend1:
+; CHECK: tend
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tend()
+  ret i32 %res
+}
+
+; TEND with implicit CC check.
+define void @test_tend3(i32 *%ptr) {
+; CHECK-LABEL: test_tend3:
+; CHECK: tend
+; CHECK: je  {{\.L*}}
+; CHECK: mvhi 0(%r2), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tend()
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void
+}
+
+; TEND with dual CC use.
+define i32 @test_tend2(i32 %pad, i32 *%ptr) {
+; CHECK-LABEL: test_tend2:
+; CHECK: tend
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: cijlh %r2, 2,  {{\.L*}}
+; CHECK: mvhi 0(%r3), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tend()
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret i32 %res
+}
+
+; TABORT with register only.
+define void @test_tabort1(i64 %val) {
+; CHECK-LABEL: test_tabort1:
+; CHECK: tabort 0(%r2)
+; CHECK: br %r14
+  call void @llvm.s390.tabort(i64 %val)
+  ret void
+}
+
+; TABORT with immediate only.
+define void @test_tabort2(i64 %val) {
+; CHECK-LABEL: test_tabort2:
+; CHECK: tabort 1234
+; CHECK: br %r14
+  call void @llvm.s390.tabort(i64 1234)
+  ret void
+}
+
+; TABORT with register + immediate.
+define void @test_tabort3(i64 %val) {
+; CHECK-LABEL: test_tabort3:
+; CHECK: tabort 1234(%r2)
+; CHECK: br %r14
+  %sum = add i64 %val, 1234
+  call void @llvm.s390.tabort(i64 %sum)
+  ret void
+}
+
+; TABORT with out-of-range immediate.
+define void @test_tabort4(i64 %val) {
+; CHECK-LABEL: test_tabort4:
+; CHECK: tabort 0({{%r[1-5]}})
+; CHECK: br %r14
+  call void @llvm.s390.tabort(i64 4096)
+  ret void
+}
+
+; NTSTG with base pointer only.
+define void @test_ntstg1(i64 *%ptr, i64 %val) {
+; CHECK-LABEL: test_ntstg1:
+; CHECK: ntstg %r3, 0(%r2)
+; CHECK: br %r14
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with base and index.
+; Check that VSTL doesn't allow an index.
+define void @test_ntstg2(i64 *%base, i64 %index, i64 %val) {
+; CHECK-LABEL: test_ntstg2:
+; CHECK: sllg [[REG:%r[1-5]]], %r3, 3
+; CHECK: ntstg %r4, 0([[REG]],%r2)
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 %index
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with the highest in-range displacement.
+define void @test_ntstg3(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg3:
+; CHECK: ntstg %r3, 524280(%r2)
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 65535
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with an out-of-range positive displacement.
+define void @test_ntstg4(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg4:
+; CHECK: ntstg %r3, 0({{%r[1-5]}})
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 65536
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with the lowest in-range displacement.
+define void @test_ntstg5(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg5:
+; CHECK: ntstg %r3, -524288(%r2)
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 -65536
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with an out-of-range negative displacement.
+define void @test_ntstg6(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg6:
+; CHECK: ntstg %r3, 0({{%r[1-5]}})
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 -65537
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; ETND.
+define i32 @test_etnd() {
+; CHECK-LABEL: test_etnd:
+; CHECK: etnd %r2
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.etnd()
+  ret i32 %res
+}
+
+; PPA (Transaction-Abort Assist)
+define void @test_ppa_txassist(i32 %val) {
+; CHECK-LABEL: test_ppa_txassist:
+; CHECK: ppa %r2, 0, 1
+; CHECK: br %r14
+  call void @llvm.s390.ppa.txassist(i32 %val)
+  ret void
+}
+
Index: llvm-head/test/MC/SystemZ/insn-bad-zEC12.s
===================================================================
--- llvm-head.orig/test/MC/SystemZ/insn-bad-zEC12.s
+++ llvm-head/test/MC/SystemZ/insn-bad-zEC12.s
@@ -3,6 +3,22 @@
 # RUN: FileCheck < %t %s
 
 #CHECK: error: invalid operand
+#CHECK: ntstg	%r0, -524289
+#CHECK: error: invalid operand
+#CHECK: ntstg	%r0, 524288
+
+	ntstg	%r0, -524289
+	ntstg	%r0, 524288
+
+#CHECK: error: invalid operand
+#CHECK: ppa	%r0, %r0, -1
+#CHECK: error: invalid operand
+#CHECK: ppa	%r0, %r0, 16
+
+	ppa	%r0, %r0, -1
+	ppa	%r0, %r0, 16
+
+#CHECK: error: invalid operand
 #CHECK: risbgn	%r0,%r0,0,0,-1
 #CHECK: error: invalid operand
 #CHECK: risbgn	%r0,%r0,0,0,64
@@ -22,3 +38,47 @@
 	risbgn	%r0,%r0,-1,0,0
 	risbgn	%r0,%r0,256,0,0
 
+#CHECK: error: invalid operand
+#CHECK: tabort	-1
+#CHECK: error: invalid operand
+#CHECK: tabort	4096
+#CHECK: error: invalid use of indexed addressing
+#CHECK: tabort	0(%r1,%r2)
+
+	tabort	-1
+	tabort	4096
+	tabort	0(%r1,%r2)
+
+#CHECK: error: invalid operand
+#CHECK: tbegin	-1, 0
+#CHECK: error: invalid operand
+#CHECK: tbegin	4096, 0
+#CHECK: error: invalid use of indexed addressing
+#CHECK: tbegin	0(%r1,%r2), 0
+#CHECK: error: invalid operand
+#CHECK: tbegin	0, -1
+#CHECK: error: invalid operand
+#CHECK: tbegin	0, 65536
+
+	tbegin	-1, 0
+	tbegin	4096, 0
+	tbegin	0(%r1,%r2), 0
+	tbegin	0, -1
+	tbegin	0, 65536
+
+#CHECK: error: invalid operand
+#CHECK: tbeginc	-1, 0
+#CHECK: error: invalid operand
+#CHECK: tbeginc	4096, 0
+#CHECK: error: invalid use of indexed addressing
+#CHECK: tbeginc	0(%r1,%r2), 0
+#CHECK: error: invalid operand
+#CHECK: tbeginc	0, -1
+#CHECK: error: invalid operand
+#CHECK: tbeginc	0, 65536
+
+	tbeginc	-1, 0
+	tbeginc	4096, 0
+	tbeginc	0(%r1,%r2), 0
+	tbeginc	0, -1
+	tbeginc	0, 65536
Index: llvm-head/test/MC/SystemZ/insn-good-zEC12.s
===================================================================
--- llvm-head.orig/test/MC/SystemZ/insn-good-zEC12.s
+++ llvm-head/test/MC/SystemZ/insn-good-zEC12.s
@@ -1,6 +1,48 @@
 # For zEC12 and above.
 # RUN: llvm-mc -triple s390x-linux-gnu -mcpu=zEC12 -show-encoding %s | FileCheck %s
 
+#CHECK: etnd	%r0                     # encoding: [0xb2,0xec,0x00,0x00]
+#CHECK: etnd	%r15                    # encoding: [0xb2,0xec,0x00,0xf0]
+#CHECK: etnd	%r7                     # encoding: [0xb2,0xec,0x00,0x70]
+
+	etnd	%r0
+	etnd	%r15
+	etnd	%r7
+
+#CHECK: ntstg	%r0, -524288            # encoding: [0xe3,0x00,0x00,0x00,0x80,0x25]
+#CHECK: ntstg	%r0, -1                 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x25]
+#CHECK: ntstg	%r0, 0                  # encoding: [0xe3,0x00,0x00,0x00,0x00,0x25]
+#CHECK: ntstg	%r0, 1                  # encoding: [0xe3,0x00,0x00,0x01,0x00,0x25]
+#CHECK: ntstg	%r0, 524287             # encoding: [0xe3,0x00,0x0f,0xff,0x7f,0x25]
+#CHECK: ntstg	%r0, 0(%r1)             # encoding: [0xe3,0x00,0x10,0x00,0x00,0x25]
+#CHECK: ntstg	%r0, 0(%r15)            # encoding: [0xe3,0x00,0xf0,0x00,0x00,0x25]
+#CHECK: ntstg	%r0, 524287(%r1,%r15)   # encoding: [0xe3,0x01,0xff,0xff,0x7f,0x25]
+#CHECK: ntstg	%r0, 524287(%r15,%r1)   # encoding: [0xe3,0x0f,0x1f,0xff,0x7f,0x25]
+#CHECK: ntstg	%r15, 0                 # encoding: [0xe3,0xf0,0x00,0x00,0x00,0x25]
+
+	ntstg	%r0, -524288
+	ntstg	%r0, -1
+	ntstg	%r0, 0
+	ntstg	%r0, 1
+	ntstg	%r0, 524287
+	ntstg	%r0, 0(%r1)
+	ntstg	%r0, 0(%r15)
+	ntstg	%r0, 524287(%r1,%r15)
+	ntstg	%r0, 524287(%r15,%r1)
+	ntstg	%r15, 0
+
+#CHECK: ppa	%r0, %r0, 0             # encoding: [0xb2,0xe8,0x00,0x00]
+#CHECK: ppa	%r0, %r0, 15            # encoding: [0xb2,0xe8,0xf0,0x00]
+#CHECK: ppa	%r0, %r15, 0            # encoding: [0xb2,0xe8,0x00,0x0f]
+#CHECK: ppa	%r4, %r6, 7             # encoding: [0xb2,0xe8,0x70,0x46]
+#CHECK: ppa	%r15, %r0, 0            # encoding: [0xb2,0xe8,0x00,0xf0]
+
+	ppa	%r0, %r0, 0
+	ppa	%r0, %r0, 15
+	ppa	%r0, %r15, 0
+	ppa	%r4, %r6, 7
+	ppa	%r15, %r0, 0
+
 #CHECK: risbgn	%r0, %r0, 0, 0, 0       # encoding: [0xec,0x00,0x00,0x00,0x00,0x59]
 #CHECK: risbgn	%r0, %r0, 0, 0, 63      # encoding: [0xec,0x00,0x00,0x00,0x3f,0x59]
 #CHECK: risbgn	%r0, %r0, 0, 255, 0     # encoding: [0xec,0x00,0x00,0xff,0x00,0x59]
@@ -17,3 +59,68 @@
 	risbgn	%r15,%r0,0,0,0
 	risbgn	%r4,%r5,6,7,8
 
+#CHECK: tabort	0                       # encoding: [0xb2,0xfc,0x00,0x00]
+#CHECK: tabort	0(%r1)                  # encoding: [0xb2,0xfc,0x10,0x00]
+#CHECK: tabort	0(%r15)                 # encoding: [0xb2,0xfc,0xf0,0x00]
+#CHECK: tabort	4095                    # encoding: [0xb2,0xfc,0x0f,0xff]
+#CHECK: tabort	4095(%r1)               # encoding: [0xb2,0xfc,0x1f,0xff]
+#CHECK: tabort	4095(%r15)              # encoding: [0xb2,0xfc,0xff,0xff]
+
+	tabort	0
+	tabort	0(%r1)
+	tabort	0(%r15)
+	tabort	4095
+	tabort	4095(%r1)
+	tabort	4095(%r15)
+
+#CHECK: tbegin	0, 0                    # encoding: [0xe5,0x60,0x00,0x00,0x00,0x00]
+#CHECK: tbegin	4095, 0                 # encoding: [0xe5,0x60,0x0f,0xff,0x00,0x00]
+#CHECK: tbegin	0, 0                    # encoding: [0xe5,0x60,0x00,0x00,0x00,0x00]
+#CHECK: tbegin	0, 1                    # encoding: [0xe5,0x60,0x00,0x00,0x00,0x01]
+#CHECK: tbegin	0, 32767                # encoding: [0xe5,0x60,0x00,0x00,0x7f,0xff]
+#CHECK: tbegin	0, 32768                # encoding: [0xe5,0x60,0x00,0x00,0x80,0x00]
+#CHECK: tbegin	0, 65535                # encoding: [0xe5,0x60,0x00,0x00,0xff,0xff]
+#CHECK: tbegin	0(%r1), 42              # encoding: [0xe5,0x60,0x10,0x00,0x00,0x2a]
+#CHECK: tbegin	0(%r15), 42             # encoding: [0xe5,0x60,0xf0,0x00,0x00,0x2a]
+#CHECK: tbegin	4095(%r1), 42           # encoding: [0xe5,0x60,0x1f,0xff,0x00,0x2a]
+#CHECK: tbegin	4095(%r15), 42          # encoding: [0xe5,0x60,0xff,0xff,0x00,0x2a]
+
+	tbegin	0, 0
+	tbegin	4095, 0
+	tbegin	0, 0
+	tbegin	0, 1
+	tbegin	0, 32767
+	tbegin	0, 32768
+	tbegin	0, 65535
+	tbegin	0(%r1), 42
+	tbegin	0(%r15), 42
+	tbegin	4095(%r1), 42
+	tbegin	4095(%r15), 42
+
+#CHECK: tbeginc	0, 0                    # encoding: [0xe5,0x61,0x00,0x00,0x00,0x00]
+#CHECK: tbeginc	4095, 0                 # encoding: [0xe5,0x61,0x0f,0xff,0x00,0x00]
+#CHECK: tbeginc	0, 0                    # encoding: [0xe5,0x61,0x00,0x00,0x00,0x00]
+#CHECK: tbeginc	0, 1                    # encoding: [0xe5,0x61,0x00,0x00,0x00,0x01]
+#CHECK: tbeginc	0, 32767                # encoding: [0xe5,0x61,0x00,0x00,0x7f,0xff]
+#CHECK: tbeginc	0, 32768                # encoding: [0xe5,0x61,0x00,0x00,0x80,0x00]
+#CHECK: tbeginc	0, 65535                # encoding: [0xe5,0x61,0x00,0x00,0xff,0xff]
+#CHECK: tbeginc	0(%r1), 42              # encoding: [0xe5,0x61,0x10,0x00,0x00,0x2a]
+#CHECK: tbeginc	0(%r15), 42             # encoding: [0xe5,0x61,0xf0,0x00,0x00,0x2a]
+#CHECK: tbeginc	4095(%r1), 42           # encoding: [0xe5,0x61,0x1f,0xff,0x00,0x2a]
+#CHECK: tbeginc	4095(%r15), 42          # encoding: [0xe5,0x61,0xff,0xff,0x00,0x2a]
+
+	tbeginc	0, 0
+	tbeginc	4095, 0
+	tbeginc	0, 0
+	tbeginc	0, 1
+	tbeginc	0, 32767
+	tbeginc	0, 32768
+	tbeginc	0, 65535
+	tbeginc	0(%r1), 42
+	tbeginc	0(%r15), 42
+	tbeginc	4095(%r1), 42
+	tbeginc	4095(%r15), 42
+
+#CHECK: tend                            # encoding: [0xb2,0xf8,0x00,0x00]
+
+	tend
Index: llvm-head/test/MC/SystemZ/insn-bad-z196.s
===================================================================
--- llvm-head.orig/test/MC/SystemZ/insn-bad-z196.s
+++ llvm-head/test/MC/SystemZ/insn-bad-z196.s
@@ -244,6 +244,11 @@
 	cxlgbr	%f0, 16, %r0, 0
 	cxlgbr	%f2, 0, %r0, 0
 
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: etnd	%r7
+
+	etnd	%r7
+
 #CHECK: error: invalid operand
 #CHECK: fidbra	%f0, 0, %f0, -1
 #CHECK: error: invalid operand
@@ -546,6 +551,16 @@
 	locr	%r0,%r0,-1
 	locr	%r0,%r0,16
 
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: ntstg	%r0, 524287(%r1,%r15)
+
+	ntstg	%r0, 524287(%r1,%r15)
+
+#CHECK: error: {{(instruction requires: processor-assist)?}}
+#CHECK: ppa	%r4, %r6, 7
+
+	ppa	%r4, %r6, 7
+
 #CHECK: error: {{(instruction requires: miscellaneous-extensions)?}}
 #CHECK: risbgn	%r1, %r2, 0, 0, 0
 
@@ -690,3 +705,24 @@
 	stocg	%r0,-524289,1
 	stocg	%r0,524288,1
 	stocg	%r0,0(%r1,%r2),1
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tabort	4095(%r1)
+
+	tabort	4095(%r1)
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tbegin	4095(%r1), 42
+
+	tbegin	4095(%r1), 42
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tbeginc	4095(%r1), 42
+
+	tbeginc	4095(%r1), 42
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tend
+
+	tend
+
Index: llvm-head/test/MC/Disassembler/SystemZ/insns.txt
===================================================================
--- llvm-head.orig/test/MC/Disassembler/SystemZ/insns.txt
+++ llvm-head/test/MC/Disassembler/SystemZ/insns.txt
@@ -2503,6 +2503,15 @@
 # CHECK: ear %r15, %a15
 0xb2 0x4f 0x00 0xff
 
+# CHECK: etnd %r0
+0xb2 0xec 0x00 0x00
+
+# CHECK: etnd %r15
+0xb2 0xec 0x00 0xf0
+
+# CHECK: etnd %r7
+0xb2 0xec 0x00 0x70
+
 # CHECK: fidbr %f0, 0, %f0
 0xb3 0x5f 0x00 0x00
 
@@ -6034,6 +6043,36 @@
 # CHECK: ny %r15, 0
 0xe3 0xf0 0x00 0x00 0x00 0x54
 
+# CHECK: ntstg %r0, -524288
+0xe3 0x00 0x00 0x00 0x80 0x25
+
+# CHECK: ntstg %r0, -1
+0xe3 0x00 0x0f 0xff 0xff 0x25
+
+# CHECK: ntstg %r0, 0
+0xe3 0x00 0x00 0x00 0x00 0x25
+
+# CHECK: ntstg %r0, 1
+0xe3 0x00 0x00 0x01 0x00 0x25
+
+# CHECK: ntstg %r0, 524287
+0xe3 0x00 0x0f 0xff 0x7f 0x25
+
+# CHECK: ntstg %r0, 0(%r1)
+0xe3 0x00 0x10 0x00 0x00 0x25
+
+# CHECK: ntstg %r0, 0(%r15)
+0xe3 0x00 0xf0 0x00 0x00 0x25
+
+# CHECK: ntstg %r0, 524287(%r1,%r15)
+0xe3 0x01 0xff 0xff 0x7f 0x25
+
+# CHECK: ntstg %r0, 524287(%r15,%r1)
+0xe3 0x0f 0x1f 0xff 0x7f 0x25
+
+# CHECK: ntstg %r15, 0
+0xe3 0xf0 0x00 0x00 0x00 0x25
+
 # CHECK: oc 0(1), 0
 0xd6 0x00 0x00 0x00 0x00 0x00
 
@@ -6346,6 +6385,21 @@
 # CHECK: popcnt %r7, %r8
 0xb9 0xe1 0x00 0x78
 
+# CHECK: ppa %r0, %r0, 0
+0xb2 0xe8 0x00 0x00
+
+# CHECK: ppa %r0, %r0, 15
+0xb2 0xe8 0xf0 0x00
+
+# CHECK: ppa %r0, %r15, 0
+0xb2 0xe8 0x00 0x0f
+
+# CHECK: ppa %r4, %r6, 7
+0xb2 0xe8 0x70 0x46
+
+# CHECK: ppa %r15, %r0, 0
+0xb2 0xe8 0x00 0xf0
+
 # CHECK: risbg %r0, %r0, 0, 0, 0
 0xec 0x00 0x00 0x00 0x00 0x55
 
@@ -8062,6 +8116,93 @@
 # CHECK: sy %r15, 0
 0xe3 0xf0 0x00 0x00 0x00 0x5b
 
+# CHECK: tabort 0
+0xb2 0xfc 0x00 0x00
+
+# CHECK: tabort 0(%r1)
+0xb2 0xfc 0x10 0x00
+
+# CHECK: tabort 0(%r15)
+0xb2 0xfc 0xf0 0x00
+
+# CHECK: tabort 4095
+0xb2 0xfc 0x0f 0xff
+
+# CHECK: tabort 4095(%r1)
+0xb2 0xfc 0x1f 0xff
+
+# CHECK: tabort 4095(%r15)
+0xb2 0xfc 0xff 0xff
+
+# CHECK: tbegin 0, 0
+0xe5 0x60 0x00 0x00 0x00 0x00
+
+# CHECK: tbegin 4095, 0
+0xe5 0x60 0x0f 0xff 0x00 0x00
+
+# CHECK: tbegin 0, 0
+0xe5 0x60 0x00 0x00 0x00 0x00
+
+# CHECK: tbegin 0, 1
+0xe5 0x60 0x00 0x00 0x00 0x01
+
+# CHECK: tbegin 0, 32767
+0xe5 0x60 0x00 0x00 0x7f 0xff
+
+# CHECK: tbegin 0, 32768
+0xe5 0x60 0x00 0x00 0x80 0x00
+
+# CHECK: tbegin 0, 65535
+0xe5 0x60 0x00 0x00 0xff 0xff
+
+# CHECK: tbegin 0(%r1), 42
+0xe5 0x60 0x10 0x00 0x00 0x2a
+
+# CHECK: tbegin 0(%r15), 42
+0xe5 0x60 0xf0 0x00 0x00 0x2a
+
+# CHECK: tbegin 4095(%r1), 42
+0xe5 0x60 0x1f 0xff 0x00 0x2a
+
+# CHECK: tbegin 4095(%r15), 42
+0xe5 0x60 0xff 0xff 0x00 0x2a
+
+# CHECK: tbeginc 0, 0
+0xe5 0x61 0x00 0x00 0x00 0x00
+
+# CHECK: tbeginc 4095, 0
+0xe5 0x61 0x0f 0xff 0x00 0x00
+
+# CHECK: tbeginc 0, 0
+0xe5 0x61 0x00 0x00 0x00 0x00
+
+# CHECK: tbeginc 0, 1
+0xe5 0x61 0x00 0x00 0x00 0x01
+
+# CHECK: tbeginc 0, 32767
+0xe5 0x61 0x00 0x00 0x7f 0xff
+
+# CHECK: tbeginc 0, 32768
+0xe5 0x61 0x00 0x00 0x80 0x00
+
+# CHECK: tbeginc 0, 65535
+0xe5 0x61 0x00 0x00 0xff 0xff
+
+# CHECK: tbeginc 0(%r1), 42
+0xe5 0x61 0x10 0x00 0x00 0x2a
+
+# CHECK: tbeginc 0(%r15), 42
+0xe5 0x61 0xf0 0x00 0x00 0x2a
+
+# CHECK: tbeginc 4095(%r1), 42
+0xe5 0x61 0x1f 0xff 0x00 0x2a
+
+# CHECK: tbeginc 4095(%r15), 42
+0xe5 0x61 0xff 0xff 0x00 0x2a
+
+# CHECK: tend
+0xb2 0xf8 0x00 0x00
+
 # CHECK: tm 0, 0
 0x91 0x00 0x00 0x00
 

llvm-svn: 233803
2015-04-01 12:51:43 +00:00
Ulrich Weigand
6191b44b78 [SystemZ] Use POPCNT instruction on z196
We already exploit a number of instructions specific to z196,
but not yet POPCNT.  Add support for the population-count
facility, MC support for the POPCNT instruction, CodeGen
support for using POPCNT, and implement the getPopcntSupport
TargetTransformInfo hook.

llvm-svn: 233689
2015-03-31 12:56:33 +00:00
Ulrich Weigand
12ba30ceef [SystemZ] Provide basic TargetTransformInfo implementation
This hooks up the TargetTransformInfo machinery for SystemZ,
and provides an implementation of getIntImmCost.

In addition, the patch adds the isLegalICmpImmediate and
isLegalAddImmediate TargetLowering overrides, and updates
a couple of test cases where we now generate slightly
better code.

llvm-svn: 233688
2015-03-31 12:52:27 +00:00
Daniel Sanders
f121870469 [systemz] Distinguish the 'Q', 'R', 'S', and 'T' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.

No functional change intended.

Reviewers: uweigand

Reviewed By: uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8251

llvm-svn: 232495
2015-03-17 16:16:14 +00:00
Daniel Sanders
6dc30f40bf Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC.
Summary:
This is instead of doing this in target independent code and is the last
non-functional change before targets begin to distinguish between
different memory constraints when selecting code for the ISD::INLINEASM
node.

Next, each target will individually move away from the idea that all
memory constraints behave like 'm'.

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8173

llvm-svn: 232373
2015-03-16 13:13:41 +00:00
Eric Christopher
454cbc40f6 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Ulrich Weigand
d35c322d54 [SystemZ] Support all TLS access models - CodeGen part
The current SystemZ back-end only supports the local-exec TLS access model.
This patch adds all required CodeGen support for the other TLS models, which
means in particular:

- Expand initial-exec TLS accesses by loading TLS offsets from the GOT
  using @indntpoff relocations.

- Expand general-dynamic and local-dynamic accesses by generating the
  appropriate calls to __tls_get_offset.  Note that this routine has
  a non-standard ABI and requires loading the GOT pointer into %r12,
  so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node.

- Add a new platform-specific optimization pass to remove redundant
  __tls_get_offset calls in the local-dynamic model (modeled after
  the corresponding X86 pass).

- Add test cases verifying all access models and optimizations.

llvm-svn: 229654
2015-02-18 09:13:27 +00:00
Eric Christopher
c020244686 Reuse a bunch of cached subtargets and remove getSubtarget calls
without a Function argument.

llvm-svn: 227647
2015-01-31 00:06:45 +00:00
Benjamin Kramer
da144ed5a2 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Matt Arsenault
76c7b7a591 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Eric Christopher
d403bea5ee Move the subtarget dependent features from SystemZTargetMachine
down to the subtarget. Add an initialization routine to assist.

llvm-svn: 212124
2014-07-01 20:19:02 +00:00
Eric Christopher
666067451e Remove the caching of the target machine from SystemZTargetLowering.
Update all callers and uses accordingly.

llvm-svn: 211880
2014-06-27 07:38:01 +00:00
Richard Sandiford
54dfe262df [SystemZ] Move sign_extend optimization to PerformDAGCombine
The target was marking SIGN_EXTEND as Custom because it wanted to optimize
certain sign-extended shifts.  In all other respects the extension is Legal,
so it'd be better to do the optimization in PerformDAGCombine instead.

No functional change intended.

llvm-svn: 203234
2014-03-07 11:34:35 +00:00
Richard Sandiford
7c028783bc [SystemZ] Remove "virtual" from override methods
Also fix a couple of cases where "override" was missing.  No behavioural
change intended.

llvm-svn: 203110
2014-03-06 12:03:36 +00:00
Richard Sandiford
95792933ce [SystemZ] Update namespace formatting to match current guidelines
No functional change intended.

llvm-svn: 203103
2014-03-06 10:38:30 +00:00
Craig Topper
b0056a4ca7 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
Matt Arsenault
7b69102edb Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Richard Sandiford
add53b9fe0 [SystemZ] Optimize (sext (ashr (shl ...), ...))
...into (ashr (shl (anyext X), ...), ...), which requires one fewer
instruction.  The (anyext X) can sometimes be simplified too.

I didn't do this in DAGCombiner because widening shifts isn't a win
on all targets.

llvm-svn: 199114
2014-01-13 15:17:53 +00:00
Richard Sandiford
99ae48f5bb [SystemZ] Use interlocked-access 1 instructions for CodeGen
...namely LOAD AND ADD, LOAD AND AND, LOAD AND OR and LOAD AND EXCLUSIVE OR.
LOAD AND ADD LOGICAL isn't really separately useful for LLVM.

I'll look at adding reusing the CC results in new year.

llvm-svn: 197985
2013-12-24 15:18:04 +00:00
Richard Sandiford
50a6f85c7a [SystemZ] Extend integer absolute selection
This patch makes more use of LPGFR and LNGFR.  It builds on top of
the LTGFR selection from r197234.  Most of the tests are motivated
by what InstCombine would produce.

llvm-svn: 197236
2013-12-13 15:35:00 +00:00
Richard Sandiford
1b5b817c73 Add TargetLowering::prepareVolatileOrAtomicLoad
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.

Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.

The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences.  It is a no-op for targets other than SystemZ.

llvm-svn: 196906
2013-12-10 10:49:34 +00:00
Richard Sandiford
bc88711db8 Add TargetLowering::prepareVolatileOrAtomicLoad
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.

Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.

The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences.  It is a no-op for targets other than SystemZ.

llvm-svn: 196905
2013-12-10 10:36:34 +00:00
Richard Sandiford
593286d79d [SystemZ] Handle vectors in getSetCCResultType
I don't have a standalone testcase for this, but it should allow r193676
to be reapplied.

llvm-svn: 194148
2013-11-06 12:16:02 +00:00
Richard Sandiford
15044afbed [SystemZ] Improve handling of SETCC
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI".  In most cases it's better to use an IPM-based
sequence instead.

llvm-svn: 192784
2013-10-16 11:10:55 +00:00
Richard Sandiford
cae9d29151 [SystemZ] Improve handling of PC-relative addresses
The backend previously folded offsets into PC-relative addresses
whereever possible.  That's the right thing to do when the address
can be used directly in a PC-relative memory reference (using things
like LRL).  But if we have a register-based memory reference and need
to load the PC-relative address separately, it's better to use an anchor
point that could be shared with other accesses to the same area of the
variable.

Fixes a FIXME.

llvm-svn: 191524
2013-09-27 15:14:04 +00:00
Richard Sandiford
bfcf129b8e [SystemZ] Add TM and TMY
The main complication here is that TM and TMY (the memory forms) set
CC differently from the register forms.  When the tested bits contain
some 0s and some 1s, the register forms set CC to 1 or 2 based on the
value the uppermost bit.  The memory forms instead set CC to 1
regardless of the uppermost bit.

Until now, I've tried to make it so that a branch never tests for an
impossible CC value.  E.g. NR only sets CC to 0 or 1, so branches on the
result will only test for 0 or 1.  Originally I'd tried to do the same
thing for TM and TMY by using custom matching code in ISelDAGToDAG.
That ended up being very ugly though, and would have meant duplicating
some of the chain checks that the common isel code does.

I've therefore gone for the simpler alternative of adding an extra
operand to the TM DAG opcode to say whether a memory form would be OK.
This means that the inverse of a "TM;JE" is "TM;JNE" rather than the
more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE".
I suppose that's arguably less confusing though...

llvm-svn: 190400
2013-09-10 10:20:32 +00:00
Richard Sandiford
8d6edc5218 [SystemZ] Tweak integer comparison code
The architecture has many comparison instructions, including some that
extend one of the operands.  The signed comparison instructions use sign
extensions and the unsigned comparison instructions use zero extensions.
In cases where we had a free choice between signed or unsigned comparisons,
we were trying to decide at lowering time which would best fit the available
instructions, taking things like extension type into account.  The code
to do that was getting increasingly hairy and was also making some bad
decisions.  E.g. when comparing the result of two LLCs, it is better to use
CR rather than CLR, since CR can be fused with a branch while CLR can't.

This patch removes the lowering code and instead adds an operand to
integer comparisons to say whether signed comparison is required,
whether unsigned comparison is required, or whether either is OK.
We can then leave the choice of instruction up to the normal isel code.

llvm-svn: 190138
2013-09-06 11:51:39 +00:00
Richard Sandiford
399318ba38 [SystemZ] Add NC, OC and XC
For now these are just used to handle scalar ANDs, ORs and XORs in which
all operands are memory.

llvm-svn: 190041
2013-09-05 10:36:45 +00:00
Richard Sandiford
9fc2e5cdff [SystemZ] Add support for TMHH, TMHL, TMLH and TMLL
For now just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.

llvm-svn: 189469
2013-08-28 10:31:43 +00:00
Richard Sandiford
b585b1e48e [SystemZ] Extend memcpy and memset support to all constant lengths
Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs.
Lengths above that threshold use a loop to handle X*256 bytes followed
by a single MVC to handle the excess (if any).  This loop will also be
needed in future when support for variable lengths is added.

Because the same tablegen classes are used to define MVC and CLC,
the patch also has the side-effect of defining a pseudo loop instruction
for CLC.  That instruction isn't used yet (and wouldn't be handled correctly
if it were).  I'm planning to use it soon though.

llvm-svn: 189331
2013-08-27 09:54:29 +00:00
Richard Sandiford
9867b44c59 [SystemZ] Add basic prefetch support
Just the instructions and intrinsics for now.

llvm-svn: 189100
2013-08-23 11:36:42 +00:00
Richard Sandiford
1dc05c13d2 [SystemZ] Define remainig *MUL_LOHI patterns
The initial port used MLG(R) for i64 UMUL_LOHI but left the other three
combinations as not-legal-or-custom.  Although 32x32->{32,32}
multiplications exist, they're not as quick as doing a normal 64-bit
multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI
would be useful.  There's also no direct instruction for i64 SMUL_LOHI,
so it needs to be implemented in terms of UMUL_LOHI.

However, not defining these patterns means that we don't convert
division by a constant into multiplication, so this patch fills
in the other cases.  The new i64 SMUL_LOHI sequence is simpler
than the one that we used previously for 64x64->128 multiplication,
so int-mul-08.ll now tests the full sequence.

llvm-svn: 188898
2013-08-21 09:34:56 +00:00
Richard Sandiford
841d24aa5a [SystemZ] Add support for sibling calls
This first cut is pretty conservative.  The final argument register (R6)
is call-saved, so we would need to make sure that the R6 argument to a
sibling call is the same as the R6 argument to the calling function,
which seems worth keeping as a separate patch.

Saying that integer truncations are free means that we no longer
use the extending instructions LGF and LLGF for spills in int-conv-09.ll
and int-conv-10.ll.  Instead we treat the registers as 64 bits wide and
truncate them to 32-bits where necessary.  I think it's unlikely we'd
use LGF and LLGF for spills in other situations for the same reason,
so I'm removing the tests rather than replacing them.  The associated
code is generic and applies to many more instructions than just
LGF and LLGF, so there is no corresponding code removal.

llvm-svn: 188669
2013-08-19 12:42:31 +00:00
Richard Sandiford
06a13f49c8 [SystemZ] Use SRST to implement strlen and strnlen
It would also make sense to use it for memchr; I'm working on that now.

llvm-svn: 188547
2013-08-16 11:41:43 +00:00