[Intrinsic] Add fixed point division intrinsics.

Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: llvm.sdiv.fix.* llvm.udiv.fix.* These intrinsics perform scaled division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Patch by: ebevhan Reviewers: bjope, leonardchan, efriedma, craig.topper Reviewed By: craig.topper Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70007
2024-11-26 04:32:44 +01:00 · 2020-01-08 15:05:03 +01:00 · 2020-01-08 15:05:03 +01:00 · 21be0de34d
commit 21be0de34d
parent e9f8f15265
17 changed files with 1524 additions and 38 deletions
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@ -13675,16 +13675,17 @@ Fixed Point Arithmetic Intrinsics

 A fixed point number represents a real data type for a number that has a fixed
 number of digits after a radix point (equivalent to the decimal point '.').
-The number of digits after the radix point is referred as the ``scale``. These
+The number of digits after the radix point is referred as the `scale`. These
 are useful for representing fractional values to a specific precision. The
 following intrinsics perform fixed point arithmetic operations on 2 operands
 of the same scale, specified as the third argument.

-The `llvm.*mul.fix` family of intrinsic functions represents a multiplication
+The ``llvm.*mul.fix`` family of intrinsic functions represents a multiplication
 of fixed point numbers through scaled integers. Therefore, fixed point
-multplication can be represented as
+multiplication can be represented as
+
+.. code-block:: llvm

-::
        %result = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %scale)

        ; Expands to
@ -13695,6 +13696,22 @@ multplication can be represented as
        %r = ashr i8 %mul, i8 %scale2  ; this is for a target rounding down towards negative infinity
        %result = trunc i8 %r to i4

+The ``llvm.*div.fix`` family of intrinsic functions represents a division of
+fixed point numbers through scaled integers. Fixed point division can be
+represented as:
+
+.. code-block:: llvm
+
+        %result call i4 @llvm.sdiv.fix.i4(i4 %a, i4 %b, i32 %scale)
+
+        ; Expands to
+        %a2 = sext i4 %a to i8
+        %b2 = sext i4 %b to i8
+        %scale2 = trunc i32 %scale to i8
+        %a3 = shl i8 %a2, %scale2
+        %r = sdiv i8 %a3, %b2 ; this is for a target rounding towards zero
+        %result = trunc i8 %r to i4
+
 For each of these functions, if the result cannot be represented exactly with
 the provided scale, the result is rounded. Rounding is unspecified since
 preferred rounding may vary for different targets. Rounding is specified
@ -13963,6 +13980,126 @@ Examples
      %res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 1)  ; %res = 4 (1 x 2 = 2)


+'``llvm.sdiv.fix.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.sdiv.fix``
+on any integer bit width or vectors of integers.
+
+::
+
+      declare i16 @llvm.sdiv.fix.i16(i16 %a, i16 %b, i32 %scale)
+      declare i32 @llvm.sdiv.fix.i32(i32 %a, i32 %b, i32 %scale)
+      declare i64 @llvm.sdiv.fix.i64(i64 %a, i64 %b, i32 %scale)
+      declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
+
+Overview
+"""""""""
+
+The '``llvm.sdiv.fix``' family of intrinsic functions perform signed
+fixed point division on 2 arguments of the same scale.
+
+Arguments
+""""""""""
+
+The arguments (%a and %b) and the result may be of integer types of any bit
+width, but they must have the same bit width. The arguments may also work with
+int vectors of the same length and int size. ``%a`` and ``%b`` are the two
+values that will undergo signed fixed point division. The argument
+``%scale`` represents the scale of both operands, and must be a constant
+integer.
+
+Semantics:
+""""""""""
+
+This operation performs fixed point division on the 2 arguments of a
+specified scale. The result will also be returned in the same scale specified
+in the third argument.
+
+If the result value cannot be precisely represented in the given scale, the
+value is rounded up or down to the closest representable value. The rounding
+direction is unspecified.
+
+It is undefined behavior if the result value does not fit within the range of
+the fixed point type, or if the second argument is zero.
+
+
+Examples
+"""""""""
+
+.. code-block:: llvm
+
+      %res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 2, i32 0)  ; %res = 3 (6 / 2 = 3)
+      %res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 4, i32 1)  ; %res = 3 (3 / 2 = 1.5)
+      %res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
+
+      ; The result in the following could be rounded up to 1 or down to 0.5
+      %res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 4, i32 1)  ; %res = 2 (or 1) (1.5 / 2 = 0.75)
+
+
+'``llvm.udiv.fix.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.udiv.fix``
+on any integer bit width or vectors of integers.
+
+::
+
+      declare i16 @llvm.udiv.fix.i16(i16 %a, i16 %b, i32 %scale)
+      declare i32 @llvm.udiv.fix.i32(i32 %a, i32 %b, i32 %scale)
+      declare i64 @llvm.udiv.fix.i64(i64 %a, i64 %b, i32 %scale)
+      declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
+
+Overview
+"""""""""
+
+The '``llvm.udiv.fix``' family of intrinsic functions perform unsigned
+fixed point division on 2 arguments of the same scale.
+
+Arguments
+""""""""""
+
+The arguments (%a and %b) and the result may be of integer types of any bit
+width, but they must have the same bit width. The arguments may also work with
+int vectors of the same length and int size. ``%a`` and ``%b`` are the two
+values that will undergo unsigned fixed point division. The argument
+``%scale`` represents the scale of both operands, and must be a constant
+integer.
+
+Semantics:
+""""""""""
+
+This operation performs fixed point division on the 2 arguments of a
+specified scale. The result will also be returned in the same scale specified
+in the third argument.
+
+If the result value cannot be precisely represented in the given scale, the
+value is rounded up or down to the closest representable value. The rounding
+direction is unspecified.
+
+It is undefined behavior if the result value does not fit within the range of
+the fixed point type, or if the second argument is zero.
+
+
+Examples
+"""""""""
+
+.. code-block:: llvm
+
+      %res = call i4 @llvm.udiv.fix.i4(i4 6, i4 2, i32 0)  ; %res = 3 (6 / 2 = 3)
+      %res = call i4 @llvm.udiv.fix.i4(i4 6, i4 4, i32 1)  ; %res = 3 (3 / 2 = 1.5)
+      %res = call i4 @llvm.udiv.fix.i4(i4 1, i4 -8, i32 4) ; %res = 2 (0.0625 / 0.5 = 0.125)
+
+      ; The result in the following could be rounded up to 1 or down to 0.5
+      %res = call i4 @llvm.udiv.fix.i4(i4 3, i4 4, i32 1)  ; %res = 2 (or 1) (1.5 / 2 = 0.75)
+
+
 Specialised Arithmetic Intrinsics
 ---------------------------------

--- a/include/llvm/CodeGen/ISDOpcodes.h
+++ b/include/llvm/CodeGen/ISDOpcodes.h
@ -285,6 +285,12 @@ namespace ISD {
    /// bits of the first 2 operands.
    SMULFIXSAT, UMULFIXSAT,

+    /// RESULT = [US]DIVFIX(LHS, RHS, SCALE) - Perform fixed point division on
+    /// 2 integers with the same width and scale. SCALE represents the scale
+    /// of both operands as fixed point numbers. This SCALE parameter must be a
+    /// constant integer.
+    SDIVFIX, UDIVFIX,
+
    /// Simple binary floating point operators.
    FADD, FSUB, FMUL, FDIV, FREM,

--- a/include/llvm/CodeGen/TargetLowering.h
+++ b/include/llvm/CodeGen/TargetLowering.h
@ -935,6 +935,8 @@ public:
    case ISD::SMULFIXSAT:
    case ISD::UMULFIX:
    case ISD::UMULFIXSAT:
+    case ISD::SDIVFIX:
+    case ISD::UDIVFIX:
      Supported = isSupportedFixedPointOperation(Op, VT, Scale);
      break;
    }
@ -4184,6 +4186,14 @@ public:
  /// method accepts integers as its arguments.
  SDValue expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const;

+  /// Method for building the DAG expansion of ISD::[US]DIVFIX. This
+  /// method accepts integers as its arguments.
+  /// Note: This method may fail if the division could not be performed
+  /// within the type. Clients must retry with a wider type if this happens.
+  SDValue expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
+                              SDValue LHS, SDValue RHS,
+                              unsigned Scale, SelectionDAG &DAG) const;
+
  /// Method for building the DAG expansion of ISD::U(ADD|SUB)O. Expansion
  /// always suceeds and populates the Result and Overflow arguments.
  void expandUADDSUBO(SDNode *Node, SDValue &Result, SDValue &Overflow,
--- a/include/llvm/IR/Intrinsics.td
+++ b/include/llvm/IR/Intrinsics.td
@ -930,6 +930,14 @@ def int_umul_fix : Intrinsic<[llvm_anyint_ty],
                             [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
                             [IntrNoMem, IntrSpeculatable, IntrWillReturn, Commutative, ImmArg<2>]>;

+def int_sdiv_fix : Intrinsic<[llvm_anyint_ty],
+                             [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
+                             [IntrNoMem, ImmArg<2>]>;
+
+def int_udiv_fix : Intrinsic<[llvm_anyint_ty],
+                             [LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
+                             [IntrNoMem, ImmArg<2>]>;
+
 //===------------------- Fixed Point Saturation Arithmetic Intrinsics ----------------===//
 //
 def int_smul_fix_sat : Intrinsic<[llvm_anyint_ty],
--- a/include/llvm/Target/TargetSelectionDAG.td
+++ b/include/llvm/Target/TargetSelectionDAG.td
@ -124,7 +124,7 @@ def SDTIntSatNoShOp : SDTypeProfile<1, 2, [   // ssat with no shift
 def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem
  SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>
 ]>;
-def SDTIntScaledBinOp : SDTypeProfile<1, 3, [  // smulfix, umulfix
+def SDTIntScaledBinOp : SDTypeProfile<1, 3, [  // smulfix, sdivfix, etc
  SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>
 ]>;

@ -400,6 +400,8 @@ def smulfix    : SDNode<"ISD::SMULFIX"   , SDTIntScaledBinOp, [SDNPCommutative]>
 def smulfixsat : SDNode<"ISD::SMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
 def umulfix    : SDNode<"ISD::UMULFIX"   , SDTIntScaledBinOp, [SDNPCommutative]>;
 def umulfixsat : SDNode<"ISD::UMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
+def sdivfix    : SDNode<"ISD::SDIVFIX"   , SDTIntScaledBinOp>;
+def udivfix    : SDNode<"ISD::UDIVFIX"   , SDTIntScaledBinOp>;

 def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
 def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;
--- a/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@ -1129,7 +1129,9 @@ void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
  case ISD::SMULFIX:
  case ISD::SMULFIXSAT:
  case ISD::UMULFIX:
-  case ISD::UMULFIXSAT: {
+  case ISD::UMULFIXSAT:
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX: {
    unsigned Scale = Node->getConstantOperandVal(2);
    Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
                                              Node->getValueType(0), Scale);
@ -3417,6 +3419,24 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
  case ISD::UMULFIXSAT:
    Results.push_back(TLI.expandFixedPointMul(Node, DAG));
    break;
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX:
+    if (SDValue V = TLI.expandFixedPointDiv(Node->getOpcode(), SDLoc(Node),
+                                            Node->getOperand(0),
+                                            Node->getOperand(1),
+                                            Node->getConstantOperandVal(2),
+                                            DAG)) {
+      Results.push_back(V);
+      break;
+    }
+    // FIXME: We might want to retry here with a wider type if we fail, if that
+    // type is legal.
+    // FIXME: Technically, so long as we only have sdivfixes where BW+Scale is
+    // <= 128 (which is the case for all of the default Embedded-C types),
+    // we will only get here with types and scales that we could always expand
+    // if we were allowed to generate libcalls to division functions of illegal
+    // type. But we cannot do that.
+    llvm_unreachable("Cannot expand DIVFIX!");
  case ISD::ADDCARRY:
  case ISD::SUBCARRY: {
    SDValue LHS = Node->getOperand(0);
--- a/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@ -160,6 +160,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
  case ISD::UMULFIX:
  case ISD::UMULFIXSAT:  Res = PromoteIntRes_MULFIX(N); break;

+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX:     Res = PromoteIntRes_DIVFIX(N); break;
+
  case ISD::ABS:         Res = PromoteIntRes_ABS(N); break;

  case ISD::ATOMIC_LOAD:
@ -778,6 +781,71 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MULFIX(SDNode *N) {
                     N->getOperand(2));
 }

+static SDValue earlyExpandDIVFIX(SDNode *N, SDValue LHS, SDValue RHS,
+                                    unsigned Scale, const TargetLowering &TLI,
+                                    SelectionDAG &DAG) {
+  EVT VT = LHS.getValueType();
+  bool Signed = N->getOpcode() == ISD::SDIVFIX;
+
+  SDLoc dl(N);
+  // See if we can perform the division in this type without widening.
+  if (SDValue V = TLI.expandFixedPointDiv(N->getOpcode(), dl, LHS, RHS, Scale,
+                                          DAG))
+    return V;
+
+  // If that didn't work, double the type width and try again. That must work,
+  // or something is wrong.
+  EVT WideVT = EVT::getIntegerVT(*DAG.getContext(),
+                                 VT.getScalarSizeInBits() * 2);
+  if (Signed) {
+    LHS = DAG.getSExtOrTrunc(LHS, dl, WideVT);
+    RHS = DAG.getSExtOrTrunc(RHS, dl, WideVT);
+  } else {
+    LHS = DAG.getZExtOrTrunc(LHS, dl, WideVT);
+    RHS = DAG.getZExtOrTrunc(RHS, dl, WideVT);
+  }
+
+  // TODO: Saturation.
+
+  SDValue Res = TLI.expandFixedPointDiv(N->getOpcode(), dl, LHS, RHS, Scale,
+                                        DAG);
+  assert(Res && "Expanding DIVFIX with wide type failed?");
+  return DAG.getZExtOrTrunc(Res, dl, VT);
+}
+
+SDValue DAGTypeLegalizer::PromoteIntRes_DIVFIX(SDNode *N) {
+  SDLoc dl(N);
+  SDValue Op1Promoted, Op2Promoted;
+  bool Signed = N->getOpcode() == ISD::SDIVFIX;
+  if (Signed) {
+    Op1Promoted = SExtPromotedInteger(N->getOperand(0));
+    Op2Promoted = SExtPromotedInteger(N->getOperand(1));
+  } else {
+    Op1Promoted = ZExtPromotedInteger(N->getOperand(0));
+    Op2Promoted = ZExtPromotedInteger(N->getOperand(1));
+  }
+  EVT PromotedType = Op1Promoted.getValueType();
+  unsigned Scale = N->getConstantOperandVal(2);
+
+  SDValue Res;
+  // If the type is already legal and the operation is legal in that type, we
+  // should not early expand.
+  if (TLI.isTypeLegal(PromotedType)) {
+    TargetLowering::LegalizeAction Action =
+        TLI.getFixedPointOperationAction(N->getOpcode(), PromotedType, Scale);
+    if (Action == TargetLowering::Legal || Action == TargetLowering::Custom)
+      Res = DAG.getNode(N->getOpcode(), dl, PromotedType, Op1Promoted,
+                        Op2Promoted, N->getOperand(2));
+  }
+
+  if (!Res)
+    Res = earlyExpandDIVFIX(N, Op1Promoted, Op2Promoted, Scale, TLI, DAG);
+
+  // TODO: Saturation.
+
+  return Res;
+}
+
 SDValue DAGTypeLegalizer::PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo) {
  if (ResNo == 1)
    return PromoteIntRes_Overflow(N);
@ -1237,7 +1305,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
  case ISD::SMULFIX:
  case ISD::SMULFIXSAT:
  case ISD::UMULFIX:
-  case ISD::UMULFIXSAT: Res = PromoteIntOp_MULFIX(N); break;
+  case ISD::UMULFIXSAT:
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX: Res = PromoteIntOp_FIX(N); break;

  case ISD::FPOWI: Res = PromoteIntOp_FPOWI(N); break;

@ -1623,7 +1693,7 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo) {
  return SDValue(DAG.UpdateNodeOperands(N, LHS, RHS, Carry), 0);
 }

-SDValue DAGTypeLegalizer::PromoteIntOp_MULFIX(SDNode *N) {
+SDValue DAGTypeLegalizer::PromoteIntOp_FIX(SDNode *N) {
  SDValue Op2 = ZExtPromotedInteger(N->getOperand(2));
  return SDValue(
      DAG.UpdateNodeOperands(N, N->getOperand(0), N->getOperand(1), Op2), 0);
@ -1837,6 +1907,9 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
  case ISD::UMULFIX:
  case ISD::UMULFIXSAT: ExpandIntRes_MULFIX(N, Lo, Hi); break;

+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX: ExpandIntRes_DIVFIX(N, Lo, Hi); break;
+
  case ISD::VECREDUCE_ADD:
  case ISD::VECREDUCE_MUL:
  case ISD::VECREDUCE_AND:
@ -3151,6 +3224,13 @@ void DAGTypeLegalizer::ExpandIntRes_MULFIX(SDNode *N, SDValue &Lo,
  Lo = DAG.getSelect(dl, NVT, SatMin, NVTZero, Lo);
 }

+void DAGTypeLegalizer::ExpandIntRes_DIVFIX(SDNode *N, SDValue &Lo,
+                                           SDValue &Hi) {
+  SDValue Res = earlyExpandDIVFIX(N, N->getOperand(0), N->getOperand(1),
+                                  N->getConstantOperandVal(2), TLI, DAG);
+  SplitInteger(Res, Lo, Hi);
+}
+
 void DAGTypeLegalizer::ExpandIntRes_SADDSUBO(SDNode *Node,
                                             SDValue &Lo, SDValue &Hi) {
  SDValue LHS = Node->getOperand(0);
--- a/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@ -329,6 +329,7 @@ private:
  SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
  SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);
  SDValue PromoteIntRes_MULFIX(SDNode *N);
+  SDValue PromoteIntRes_DIVFIX(SDNode *N);
  SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);
  SDValue PromoteIntRes_VECREDUCE(SDNode *N);
  SDValue PromoteIntRes_ABS(SDNode *N);
@ -367,7 +368,7 @@ private:
  SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);
  SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);
  SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);
-  SDValue PromoteIntOp_MULFIX(SDNode *N);
+  SDValue PromoteIntOp_FIX(SDNode *N);
  SDValue PromoteIntOp_FPOWI(SDNode *N);
  SDValue PromoteIntOp_VECREDUCE(SDNode *N);

@ -428,6 +429,7 @@ private:
  void ExpandIntRes_XMULO             (SDNode *N, SDValue &Lo, SDValue &Hi);
  void ExpandIntRes_ADDSUBSAT         (SDNode *N, SDValue &Lo, SDValue &Hi);
  void ExpandIntRes_MULFIX            (SDNode *N, SDValue &Lo, SDValue &Hi);
+  void ExpandIntRes_DIVFIX            (SDNode *N, SDValue &Lo, SDValue &Hi);

  void ExpandIntRes_ATOMIC_LOAD       (SDNode *N, SDValue &Lo, SDValue &Hi);
  void ExpandIntRes_VECREDUCE         (SDNode *N, SDValue &Lo, SDValue &Hi);
@ -689,7 +691,7 @@ private:
  SDValue ScalarizeVecRes_UNDEF(SDNode *N);
  SDValue ScalarizeVecRes_VECTOR_SHUFFLE(SDNode *N);

-  SDValue ScalarizeVecRes_MULFIX(SDNode *N);
+  SDValue ScalarizeVecRes_FIX(SDNode *N);

  // Vector Operand Scalarization: <1 x ty> -> ty.
  bool ScalarizeVectorOperand(SDNode *N, unsigned OpNo);
@ -731,7 +733,7 @@ private:
  void SplitVecRes_OverflowOp(SDNode *N, unsigned ResNo,
                              SDValue &Lo, SDValue &Hi);

-  void SplitVecRes_MULFIX(SDNode *N, SDValue &Lo, SDValue &Hi);
+  void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);

  void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);
  void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
--- a/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@ -146,6 +146,7 @@ class VectorLegalizer {
  SDValue ExpandMULO(SDValue Op);
  SDValue ExpandAddSubSat(SDValue Op);
  SDValue ExpandFixedPointMul(SDValue Op);
+  SDValue ExpandFixedPointDiv(SDValue Op);
  SDValue ExpandStrictFPOp(SDValue Op);

  SDValue UnrollStrictFPOp(SDValue Op);
@ -442,7 +443,9 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
  case ISD::SMULFIX:
  case ISD::SMULFIXSAT:
  case ISD::UMULFIX:
-  case ISD::UMULFIXSAT: {
+  case ISD::UMULFIXSAT:
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX: {
    unsigned Scale = Node->getConstantOperandVal(2);
    Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
                                              Node->getValueType(0), Scale);
@ -849,6 +852,9 @@ SDValue VectorLegalizer::Expand(SDValue Op) {
    // targets? This should probably be investigated. And if we still prefer to
    // unroll an explanation could be helpful.
    return DAG.UnrollVectorOp(Op.getNode());
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX:
+    return ExpandFixedPointDiv(Op);
 #define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)                   \
  case ISD::STRICT_##DAGN:
 #include "llvm/IR/ConstrainedOps.def"
@ -1392,6 +1398,14 @@ SDValue VectorLegalizer::ExpandFixedPointMul(SDValue Op) {
  return DAG.UnrollVectorOp(Op.getNode());
 }

+SDValue VectorLegalizer::ExpandFixedPointDiv(SDValue Op) {
+  SDNode *N = Op.getNode();
+  if (SDValue Expanded = TLI.expandFixedPointDiv(N->getOpcode(), SDLoc(N),
+          N->getOperand(0), N->getOperand(1), N->getConstantOperandVal(2), DAG))
+    return Expanded;
+  return DAG.UnrollVectorOp(N);
+}
+
 SDValue VectorLegalizer::ExpandStrictFPOp(SDValue Op) {
  if (Op.getOpcode() == ISD::STRICT_UINT_TO_FP)
    return ExpandUINT_TO_FLOAT(Op);
--- a/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@ -165,7 +165,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
  case ISD::SMULFIXSAT:
  case ISD::UMULFIX:
  case ISD::UMULFIXSAT:
-    R = ScalarizeVecRes_MULFIX(N);
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX:
+    R = ScalarizeVecRes_FIX(N);
    break;
  }

@ -189,7 +191,7 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_TernaryOp(SDNode *N) {
                     Op0.getValueType(), Op0, Op1, Op2);
 }

-SDValue DAGTypeLegalizer::ScalarizeVecRes_MULFIX(SDNode *N) {
+SDValue DAGTypeLegalizer::ScalarizeVecRes_FIX(SDNode *N) {
  SDValue Op0 = GetScalarizedVector(N->getOperand(0));
  SDValue Op1 = GetScalarizedVector(N->getOperand(1));
  SDValue Op2 = N->getOperand(2);
@ -958,7 +960,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
  case ISD::SMULFIXSAT:
  case ISD::UMULFIX:
  case ISD::UMULFIXSAT:
-    SplitVecRes_MULFIX(N, Lo, Hi);
+  case ISD::SDIVFIX:
+  case ISD::UDIVFIX:
+    SplitVecRes_FIX(N, Lo, Hi);
    break;
  }

@ -997,7 +1001,7 @@ void DAGTypeLegalizer::SplitVecRes_TernaryOp(SDNode *N, SDValue &Lo,
                   Op0Hi, Op1Hi, Op2Hi);
 }

-void DAGTypeLegalizer::SplitVecRes_MULFIX(SDNode *N, SDValue &Lo, SDValue &Hi) {
+void DAGTypeLegalizer::SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi) {
  SDValue LHSLo, LHSHi;
  GetSplitVector(N->getOperand(0), LHSLo, LHSHi);
  SDValue RHSLo, RHSHi;
--- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@ -5441,6 +5441,60 @@ static SDValue ExpandPowI(const SDLoc &DL, SDValue LHS, SDValue RHS,
  return DAG.getNode(ISD::FPOWI, DL, LHS.getValueType(), LHS, RHS);
 }

+static SDValue expandDivFix(unsigned Opcode, const SDLoc &DL,
+                            SDValue LHS, SDValue RHS, SDValue Scale,
+                            SelectionDAG &DAG, const TargetLowering &TLI) {
+  EVT VT = LHS.getValueType();
+  bool Signed = Opcode == ISD::SDIVFIX;
+  LLVMContext &Ctx = *DAG.getContext();
+
+  // If the type is legal but the operation isn't, this node might survive all
+  // the way to operation legalization. If we end up there and we do not have
+  // the ability to widen the type (if VT*2 is not legal), we cannot expand the
+  // node.
+
+  // Coax the legalizer into expanding the node during type legalization instead
+  // by bumping the size by one bit. This will force it to Promote, enabling the
+  // early expansion and avoiding the need to expand later.
+
+  // We don't have to do this if Scale is 0; that can always be expanded.
+
+  // FIXME: We wouldn't have to do this (or any of the early
+  // expansion/promotion) if it was possible to expand a libcall of an
+  // illegal type during operation legalization. But it's not, so things
+  // get a bit hacky.
+  unsigned ScaleInt = cast<ConstantSDNode>(Scale)->getZExtValue();
+  if (ScaleInt > 0 &&
+      (TLI.isTypeLegal(VT) ||
+       (VT.isVector() && TLI.isTypeLegal(VT.getVectorElementType())))) {
+    TargetLowering::LegalizeAction Action = TLI.getFixedPointOperationAction(
+        Opcode, VT, ScaleInt);
+    if (Action != TargetLowering::Legal && Action != TargetLowering::Custom) {
+      EVT PromVT;
+      if (VT.isScalarInteger())
+        PromVT = EVT::getIntegerVT(Ctx, VT.getSizeInBits() + 1);
+      else if (VT.isVector()) {
+        PromVT = VT.getVectorElementType();
+        PromVT = EVT::getIntegerVT(Ctx, PromVT.getSizeInBits() + 1);
+        PromVT = EVT::getVectorVT(Ctx, PromVT, VT.getVectorElementCount());
+      } else
+        llvm_unreachable("Wrong VT for DIVFIX?");
+      if (Signed) {
+        LHS = DAG.getSExtOrTrunc(LHS, DL, PromVT);
+        RHS = DAG.getSExtOrTrunc(RHS, DL, PromVT);
+      } else {
+        LHS = DAG.getZExtOrTrunc(LHS, DL, PromVT);
+        RHS = DAG.getZExtOrTrunc(RHS, DL, PromVT);
+      }
+      // TODO: Saturation.
+      SDValue Res = DAG.getNode(Opcode, DL, PromVT, LHS, RHS, Scale);
+      return DAG.getZExtOrTrunc(Res, DL, VT);
+    }
+  }
+
+  return DAG.getNode(Opcode, DL, VT, LHS, RHS, Scale);
+}
+
 // getUnderlyingArgRegs - Find underlying registers used for a truncated,
 // bitcasted, or split argument. Returns a list of <Register, size in bits>
 static void
@ -5705,6 +5759,14 @@ static unsigned FixedPointIntrinsicToOpcode(unsigned Intrinsic) {
    return ISD::SMULFIX;
  case Intrinsic::umul_fix:
    return ISD::UMULFIX;
+  case Intrinsic::smul_fix_sat:
+    return ISD::SMULFIXSAT;
+  case Intrinsic::umul_fix_sat:
+    return ISD::UMULFIXSAT;
+  case Intrinsic::sdiv_fix:
+    return ISD::SDIVFIX;
+  case Intrinsic::udiv_fix:
+    return ISD::UDIVFIX;
  default:
    llvm_unreachable("Unhandled fixed point intrinsic");
  }
@ -6360,7 +6422,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
    return;
  }
  case Intrinsic::smul_fix:
-  case Intrinsic::umul_fix: {
+  case Intrinsic::umul_fix:
+  case Intrinsic::smul_fix_sat:
+  case Intrinsic::umul_fix_sat: {
    SDValue Op1 = getValue(I.getArgOperand(0));
    SDValue Op2 = getValue(I.getArgOperand(1));
    SDValue Op3 = getValue(I.getArgOperand(2));
@ -6368,20 +6432,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
                             Op1.getValueType(), Op1, Op2, Op3));
    return;
  }
-  case Intrinsic::smul_fix_sat: {
+  case Intrinsic::sdiv_fix:
+  case Intrinsic::udiv_fix: {
    SDValue Op1 = getValue(I.getArgOperand(0));
    SDValue Op2 = getValue(I.getArgOperand(1));
    SDValue Op3 = getValue(I.getArgOperand(2));
-    setValue(&I, DAG.getNode(ISD::SMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
-                             Op3));
-    return;
-  }
-  case Intrinsic::umul_fix_sat: {
-    SDValue Op1 = getValue(I.getArgOperand(0));
-    SDValue Op2 = getValue(I.getArgOperand(1));
-    SDValue Op3 = getValue(I.getArgOperand(2));
-    setValue(&I, DAG.getNode(ISD::UMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
-                             Op3));
+    setValue(&I, expandDivFix(FixedPointIntrinsicToOpcode(Intrinsic), sdl,
+                              Op1, Op2, Op3, DAG, TLI));
    return;
  }
  case Intrinsic::stacksave: {
--- a/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@ -312,6 +312,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
  case ISD::UMULFIX:                    return "umulfix";
  case ISD::UMULFIXSAT:                 return "umulfixsat";

+  case ISD::SDIVFIX:                    return "sdivfix";
+  case ISD::UDIVFIX:                    return "udivfix";
+
  // Conversion operators.
  case ISD::SIGN_EXTEND:                return "sign_extend";
  case ISD::ZERO_EXTEND:                return "zero_extend";
--- a/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@ -7293,6 +7293,86 @@ TargetLowering::expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const {
  return Result;
 }

+SDValue
+TargetLowering::expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
+                                    SDValue LHS, SDValue RHS,
+                                    unsigned Scale, SelectionDAG &DAG) const {
+  assert((Opcode == ISD::SDIVFIX ||
+          Opcode == ISD::UDIVFIX) &&
+         "Expected a fixed point division opcode");
+
+  EVT VT = LHS.getValueType();
+  bool Signed = Opcode == ISD::SDIVFIX;
+  EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
+
+  // If there is enough room in the type to upscale the LHS or downscale the
+  // RHS before the division, we can perform it in this type without having to
+  // resize. For signed operations, the LHS headroom is the number of
+  // redundant sign bits, and for unsigned ones it is the number of zeroes.
+  // The headroom for the RHS is the number of trailing zeroes.
+  unsigned LHSLead = Signed ? DAG.ComputeNumSignBits(LHS) - 1
+                            : DAG.computeKnownBits(LHS).countMinLeadingZeros();
+  unsigned RHSTrail = DAG.computeKnownBits(RHS).countMinTrailingZeros();
+
+  if (LHSLead + RHSTrail < Scale)
+    return SDValue();
+
+  unsigned LHSShift = std::min(LHSLead, Scale);
+  unsigned RHSShift = Scale - LHSShift;
+
+  // At this point, we know that if we shift the LHS up by LHSShift and the
+  // RHS down by RHSShift, we can emit a regular division with a final scaling
+  // factor of Scale.
+
+  EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());
+  if (LHSShift)
+    LHS = DAG.getNode(ISD::SHL, dl, VT, LHS,
+                      DAG.getConstant(LHSShift, dl, ShiftTy));
+  if (RHSShift)
+    RHS = DAG.getNode(Signed ? ISD::SRA : ISD::SRL, dl, VT, RHS,
+                      DAG.getConstant(RHSShift, dl, ShiftTy));
+
+  SDValue Quot;
+  if (Signed) {
+    // For signed operations, if the resulting quotient is negative and the
+    // remainder is nonzero, subtract 1 from the quotient to round towards
+    // negative infinity.
+    SDValue Rem;
+    // FIXME: Ideally we would always produce an SDIVREM here, but if the
+    // type isn't legal, SDIVREM cannot be expanded. There is no reason why
+    // we couldn't just form a libcall, but the type legalizer doesn't do it.
+    if (isTypeLegal(VT) &&
+        isOperationLegalOrCustom(ISD::SDIVREM, VT)) {
+      Quot = DAG.getNode(ISD::SDIVREM, dl,
+                         DAG.getVTList(VT, VT),
+                         LHS, RHS);
+      Rem = Quot.getValue(1);
+      Quot = Quot.getValue(0);
+    } else {
+      Quot = DAG.getNode(ISD::SDIV, dl, VT,
+                         LHS, RHS);
+      Rem = DAG.getNode(ISD::SREM, dl, VT,
+                        LHS, RHS);
+    }
+    SDValue Zero = DAG.getConstant(0, dl, VT);
+    SDValue RemNonZero = DAG.getSetCC(dl, BoolVT, Rem, Zero, ISD::SETNE);
+    SDValue LHSNeg = DAG.getSetCC(dl, BoolVT, LHS, Zero, ISD::SETLT);
+    SDValue RHSNeg = DAG.getSetCC(dl, BoolVT, RHS, Zero, ISD::SETLT);
+    SDValue QuotNeg = DAG.getNode(ISD::XOR, dl, BoolVT, LHSNeg, RHSNeg);
+    SDValue Sub1 = DAG.getNode(ISD::SUB, dl, VT, Quot,
+                               DAG.getConstant(1, dl, VT));
+    Quot = DAG.getSelect(dl, VT,
+                         DAG.getNode(ISD::AND, dl, BoolVT, RemNonZero, QuotNeg),
+                         Sub1, Quot);
+  } else
+    Quot = DAG.getNode(ISD::UDIV, dl, VT,
+                       LHS, RHS);
+
+  // TODO: Saturation.
+
+  return Quot;
+}
+
 void TargetLowering::expandUADDSUBO(
    SDNode *Node, SDValue &Result, SDValue &Overflow, SelectionDAG &DAG) const {
  SDLoc dl(Node);
--- a/lib/CodeGen/TargetLoweringBase.cpp
+++ b/lib/CodeGen/TargetLoweringBase.cpp
@ -663,6 +663,8 @@ void TargetLoweringBase::initActions() {
    setOperationAction(ISD::SMULFIXSAT, VT, Expand);
    setOperationAction(ISD::UMULFIX, VT, Expand);
    setOperationAction(ISD::UMULFIXSAT, VT, Expand);
+    setOperationAction(ISD::SDIVFIX, VT, Expand);
+    setOperationAction(ISD::UDIVFIX, VT, Expand);

    // Overflow operations default to expand
    setOperationAction(ISD::SADDO, VT, Expand);
--- a/lib/IR/Verifier.cpp
+++ b/lib/IR/Verifier.cpp
@ -4677,28 +4677,32 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
  case Intrinsic::smul_fix:
  case Intrinsic::smul_fix_sat:
  case Intrinsic::umul_fix:
-  case Intrinsic::umul_fix_sat: {
+  case Intrinsic::umul_fix_sat:
+  case Intrinsic::sdiv_fix:
+  case Intrinsic::udiv_fix: {
    Value *Op1 = Call.getArgOperand(0);
    Value *Op2 = Call.getArgOperand(1);
    Assert(Op1->getType()->isIntOrIntVectorTy(),
-           "first operand of [us]mul_fix[_sat] must be an int type or vector "
-           "of ints");
+           "first operand of [us][mul|div]_fix[_sat] must be an int type or "
+           "vector of ints");
    Assert(Op2->getType()->isIntOrIntVectorTy(),
-           "second operand of [us]mul_fix_[sat] must be an int type or vector "
-           "of ints");
+           "second operand of [us][mul|div]_fix[_sat] must be an int type or "
+           "vector of ints");

    auto *Op3 = cast<ConstantInt>(Call.getArgOperand(2));
    Assert(Op3->getType()->getBitWidth() <= 32,
-           "third argument of [us]mul_fix[_sat] must fit within 32 bits");
+           "third argument of [us][mul|div]_fix[_sat] must fit within 32 bits");

-    if (ID == Intrinsic::smul_fix || ID == Intrinsic::smul_fix_sat) {
+    if (ID == Intrinsic::smul_fix || ID == Intrinsic::smul_fix_sat ||
+        ID == Intrinsic::sdiv_fix) {
      Assert(
          Op3->getZExtValue() < Op1->getType()->getScalarSizeInBits(),
-          "the scale of smul_fix[_sat] must be less than the width of the operands");
+          "the scale of s[mul|div]_fix[_sat] must be less than the width of "
+          "the operands");
    } else {
      Assert(Op3->getZExtValue() <= Op1->getType()->getScalarSizeInBits(),
-             "the scale of umul_fix[_sat] must be less than or equal to the width of "
-             "the operands");
+             "the scale of u[mul|div]_fix[_sat] must be less than or equal "
+             "to the width of the operands");
    }
    break;
  }
--- a/test/CodeGen/X86/sdiv_fix.ll
+++ b/test/CodeGen/X86/sdiv_fix.ll
@ -0,0 +1,713 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
+; RUN: llc < %s -mtriple=i686 -mattr=cmov | FileCheck %s --check-prefix=X86
+
+declare  i4  @llvm.sdiv.fix.i4   (i4,  i4,  i32)
+declare  i15 @llvm.sdiv.fix.i15  (i15, i15, i32)
+declare  i16 @llvm.sdiv.fix.i16  (i16, i16, i32)
+declare  i18 @llvm.sdiv.fix.i18  (i18, i18, i32)
+declare  i64 @llvm.sdiv.fix.i64  (i64, i64, i32)
+declare  <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32>, <4 x i32>, i32)
+
+define i16 @func(i16 %x, i16 %y) nounwind {
+; X64-LABEL: func:
+; X64:       # %bb.0:
+; X64-NEXT:    movswl %si, %esi
+; X64-NEXT:    movswl %di, %ecx
+; X64-NEXT:    shll $7, %ecx
+; X64-NEXT:    movl %ecx, %eax
+; X64-NEXT:    cltd
+; X64-NEXT:    idivl %esi
+; X64-NEXT:    # kill: def $eax killed $eax def $rax
+; X64-NEXT:    leal -1(%rax), %edi
+; X64-NEXT:    testl %esi, %esi
+; X64-NEXT:    sets %sil
+; X64-NEXT:    testl %ecx, %ecx
+; X64-NEXT:    sets %cl
+; X64-NEXT:    xorb %sil, %cl
+; X64-NEXT:    testl %edx, %edx
+; X64-NEXT:    setne %dl
+; X64-NEXT:    testb %cl, %dl
+; X64-NEXT:    cmovnel %edi, %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $rax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    movswl {{[0-9]+}}(%esp), %esi
+; X86-NEXT:    movswl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    shll $7, %ecx
+; X86-NEXT:    movl %ecx, %eax
+; X86-NEXT:    cltd
+; X86-NEXT:    idivl %esi
+; X86-NEXT:    leal -1(%eax), %edi
+; X86-NEXT:    testl %esi, %esi
+; X86-NEXT:    sets %bl
+; X86-NEXT:    testl %ecx, %ecx
+; X86-NEXT:    sets %cl
+; X86-NEXT:    xorb %bl, %cl
+; X86-NEXT:    testl %edx, %edx
+; X86-NEXT:    setne %dl
+; X86-NEXT:    testb %cl, %dl
+; X86-NEXT:    cmovnel %edi, %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    popl %ebx
+; X86-NEXT:    retl
+  %tmp = call i16 @llvm.sdiv.fix.i16(i16 %x, i16 %y, i32 7)
+  ret i16 %tmp
+}
+
+define i16 @func2(i8 %x, i8 %y) nounwind {
+; X64-LABEL: func2:
+; X64:       # %bb.0:
+; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    movsbl %sil, %ecx
+; X64-NEXT:    movswl %cx, %esi
+; X64-NEXT:    movswl %ax, %ecx
+; X64-NEXT:    shll $14, %ecx
+; X64-NEXT:    movl %ecx, %eax
+; X64-NEXT:    cltd
+; X64-NEXT:    idivl %esi
+; X64-NEXT:    # kill: def $eax killed $eax def $rax
+; X64-NEXT:    leal -1(%rax), %edi
+; X64-NEXT:    testl %esi, %esi
+; X64-NEXT:    sets %sil
+; X64-NEXT:    testl %ecx, %ecx
+; X64-NEXT:    sets %cl
+; X64-NEXT:    xorb %sil, %cl
+; X64-NEXT:    testl %edx, %edx
+; X64-NEXT:    setne %dl
+; X64-NEXT:    testb %cl, %dl
+; X64-NEXT:    cmovel %eax, %edi
+; X64-NEXT:    addl %edi, %edi
+; X64-NEXT:    movswl %di, %eax
+; X64-NEXT:    shrl %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func2:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    movsbl {{[0-9]+}}(%esp), %esi
+; X86-NEXT:    movsbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    shll $14, %ecx
+; X86-NEXT:    movl %ecx, %eax
+; X86-NEXT:    cltd
+; X86-NEXT:    idivl %esi
+; X86-NEXT:    leal -1(%eax), %edi
+; X86-NEXT:    testl %esi, %esi
+; X86-NEXT:    sets %bl
+; X86-NEXT:    testl %ecx, %ecx
+; X86-NEXT:    sets %cl
+; X86-NEXT:    xorb %bl, %cl
+; X86-NEXT:    testl %edx, %edx
+; X86-NEXT:    setne %dl
+; X86-NEXT:    testb %cl, %dl
+; X86-NEXT:    cmovel %eax, %edi
+; X86-NEXT:    addl %edi, %edi
+; X86-NEXT:    movswl %di, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    popl %ebx
+; X86-NEXT:    retl
+  %x2 = sext i8 %x to i15
+  %y2 = sext i8 %y to i15
+  %tmp = call i15 @llvm.sdiv.fix.i15(i15 %x2, i15 %y2, i32 14)
+  %tmp2 = sext i15 %tmp to i16
+  ret i16 %tmp2
+}
+
+define i16 @func3(i15 %x, i8 %y) nounwind {
+; X64-LABEL: func3:
+; X64:       # %bb.0:
+; X64-NEXT:    shll $8, %esi
+; X64-NEXT:    movswl %si, %ecx
+; X64-NEXT:    addl %edi, %edi
+; X64-NEXT:    shrl $4, %ecx
+; X64-NEXT:    movl %edi, %eax
+; X64-NEXT:    cwtd
+; X64-NEXT:    idivw %cx
+; X64-NEXT:    # kill: def $ax killed $ax def $rax
+; X64-NEXT:    leal -1(%rax), %esi
+; X64-NEXT:    testw %di, %di
+; X64-NEXT:    sets %dil
+; X64-NEXT:    testw %cx, %cx
+; X64-NEXT:    sets %cl
+; X64-NEXT:    xorb %dil, %cl
+; X64-NEXT:    testw %dx, %dx
+; X64-NEXT:    setne %dl
+; X64-NEXT:    testb %cl, %dl
+; X64-NEXT:    cmovel %eax, %esi
+; X64-NEXT:    addl %esi, %esi
+; X64-NEXT:    movswl %si, %eax
+; X64-NEXT:    shrl %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func3:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $8, %eax
+; X86-NEXT:    movswl %ax, %esi
+; X86-NEXT:    addl %ecx, %ecx
+; X86-NEXT:    shrl $4, %esi
+; X86-NEXT:    movl %ecx, %eax
+; X86-NEXT:    cwtd
+; X86-NEXT:    idivw %si
+; X86-NEXT:    # kill: def $ax killed $ax def $eax
+; X86-NEXT:    leal -1(%eax), %edi
+; X86-NEXT:    testw %cx, %cx
+; X86-NEXT:    sets %cl
+; X86-NEXT:    testw %si, %si
+; X86-NEXT:    sets %ch
+; X86-NEXT:    xorb %cl, %ch
+; X86-NEXT:    testw %dx, %dx
+; X86-NEXT:    setne %cl
+; X86-NEXT:    testb %ch, %cl
+; X86-NEXT:    cmovel %eax, %edi
+; X86-NEXT:    addl %edi, %edi
+; X86-NEXT:    movswl %di, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    retl
+  %y2 = sext i8 %y to i15
+  %y3 = shl i15 %y2, 7
+  %tmp = call i15 @llvm.sdiv.fix.i15(i15 %x, i15 %y3, i32 4)
+  %tmp2 = sext i15 %tmp to i16
+  ret i16 %tmp2
+}
+
+define i4 @func4(i4 %x, i4 %y) nounwind {
+; X64-LABEL: func4:
+; X64:       # %bb.0:
+; X64-NEXT:    pushq %rbx
+; X64-NEXT:    shlb $4, %sil
+; X64-NEXT:    sarb $4, %sil
+; X64-NEXT:    shlb $4, %dil
+; X64-NEXT:    sarb $4, %dil
+; X64-NEXT:    shlb $2, %dil
+; X64-NEXT:    movsbl %dil, %ecx
+; X64-NEXT:    movl %ecx, %eax
+; X64-NEXT:    idivb %sil
+; X64-NEXT:    movsbl %ah, %ebx
+; X64-NEXT:    movzbl %al, %edi
+; X64-NEXT:    leal -1(%rdi), %eax
+; X64-NEXT:    movzbl %al, %eax
+; X64-NEXT:    testb %sil, %sil
+; X64-NEXT:    sets %dl
+; X64-NEXT:    testb %cl, %cl
+; X64-NEXT:    sets %cl
+; X64-NEXT:    xorb %dl, %cl
+; X64-NEXT:    testb %bl, %bl
+; X64-NEXT:    setne %dl
+; X64-NEXT:    testb %cl, %dl
+; X64-NEXT:    cmovel %edi, %eax
+; X64-NEXT:    # kill: def $al killed $al killed $eax
+; X64-NEXT:    popq %rbx
+; X64-NEXT:    retq
+;
+; X86-LABEL: func4:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    movb {{[0-9]+}}(%esp), %dl
+; X86-NEXT:    shlb $4, %dl
+; X86-NEXT:    sarb $4, %dl
+; X86-NEXT:    movb {{[0-9]+}}(%esp), %dh
+; X86-NEXT:    shlb $4, %dh
+; X86-NEXT:    sarb $4, %dh
+; X86-NEXT:    shlb $2, %dh
+; X86-NEXT:    movsbl %dh, %eax
+; X86-NEXT:    idivb %dl
+; X86-NEXT:    movsbl %ah, %ecx
+; X86-NEXT:    movzbl %al, %esi
+; X86-NEXT:    decb %al
+; X86-NEXT:    movzbl %al, %eax
+; X86-NEXT:    testb %dl, %dl
+; X86-NEXT:    sets %dl
+; X86-NEXT:    testb %dh, %dh
+; X86-NEXT:    sets %dh
+; X86-NEXT:    xorb %dl, %dh
+; X86-NEXT:    testb %cl, %cl
+; X86-NEXT:    setne %cl
+; X86-NEXT:    testb %dh, %cl
+; X86-NEXT:    cmovel %esi, %eax
+; X86-NEXT:    # kill: def $al killed $al killed $eax
+; X86-NEXT:    popl %esi
+; X86-NEXT:    retl
+  %tmp = call i4 @llvm.sdiv.fix.i4(i4 %x, i4 %y, i32 2)
+  ret i4 %tmp
+}
+
+define i64 @func5(i64 %x, i64 %y) nounwind {
+; X64-LABEL: func5:
+; X64:       # %bb.0:
+; X64-NEXT:    pushq %rbp
+; X64-NEXT:    pushq %r15
+; X64-NEXT:    pushq %r14
+; X64-NEXT:    pushq %r13
+; X64-NEXT:    pushq %r12
+; X64-NEXT:    pushq %rbx
+; X64-NEXT:    subq $24, %rsp
+; X64-NEXT:    movq %rsi, %r14
+; X64-NEXT:    movq %rdi, %r15
+; X64-NEXT:    movq %rdi, %rax
+; X64-NEXT:    shrq $33, %rax
+; X64-NEXT:    movq %rdi, %rbx
+; X64-NEXT:    sarq $63, %rbx
+; X64-NEXT:    shlq $31, %rbx
+; X64-NEXT:    orq %rax, %rbx
+; X64-NEXT:    sets {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Folded Spill
+; X64-NEXT:    shlq $31, %r15
+; X64-NEXT:    movq %rsi, %r12
+; X64-NEXT:    sarq $63, %r12
+; X64-NEXT:    movq %r15, %rdi
+; X64-NEXT:    movq %rbx, %rsi
+; X64-NEXT:    movq %r14, %rdx
+; X64-NEXT:    movq %r12, %rcx
+; X64-NEXT:    callq __divti3
+; X64-NEXT:    movq %rax, %r13
+; X64-NEXT:    decq %rax
+; X64-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; X64-NEXT:    testq %r12, %r12
+; X64-NEXT:    sets %bpl
+; X64-NEXT:    xorb {{[-0-9]+}}(%r{{[sb]}}p), %bpl # 1-byte Folded Reload
+; X64-NEXT:    movq %r15, %rdi
+; X64-NEXT:    movq %rbx, %rsi
+; X64-NEXT:    movq %r14, %rdx
+; X64-NEXT:    movq %r12, %rcx
+; X64-NEXT:    callq __modti3
+; X64-NEXT:    orq %rax, %rdx
+; X64-NEXT:    setne %al
+; X64-NEXT:    testb %bpl, %al
+; X64-NEXT:    cmovneq {{[-0-9]+}}(%r{{[sb]}}p), %r13 # 8-byte Folded Reload
+; X64-NEXT:    movq %r13, %rax
+; X64-NEXT:    addq $24, %rsp
+; X64-NEXT:    popq %rbx
+; X64-NEXT:    popq %r12
+; X64-NEXT:    popq %r13
+; X64-NEXT:    popq %r14
+; X64-NEXT:    popq %r15
+; X64-NEXT:    popq %rbp
+; X64-NEXT:    retq
+;
+; X86-LABEL: func5:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    andl $-8, %esp
+; X86-NEXT:    subl $72, %esp
+; X86-NEXT:    movl 8(%ebp), %ecx
+; X86-NEXT:    movl 12(%ebp), %edx
+; X86-NEXT:    movl 20(%ebp), %ebx
+; X86-NEXT:    sarl $31, %ebx
+; X86-NEXT:    movl %edx, %eax
+; X86-NEXT:    shldl $31, %ecx, %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    shll $31, %ecx
+; X86-NEXT:    movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %edx, %esi
+; X86-NEXT:    sarl $31, %esi
+; X86-NEXT:    movl %esi, %edi
+; X86-NEXT:    shldl $31, %edx, %esi
+; X86-NEXT:    leal {{[0-9]+}}(%esp), %edx
+; X86-NEXT:    rorl %edi
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl 20(%ebp)
+; X86-NEXT:    pushl 16(%ebp)
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    pushl %edx
+; X86-NEXT:    calll __divti3
+; X86-NEXT:    addl $32, %esp
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    subl $1, %ecx
+; X86-NEXT:    movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    sbbl $0, %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    testl %ebx, %ebx
+; X86-NEXT:    sets %al
+; X86-NEXT:    testl %edi, %edi
+; X86-NEXT:    sets %cl
+; X86-NEXT:    xorb %al, %cl
+; X86-NEXT:    movb %cl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
+; X86-NEXT:    leal {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl 20(%ebp)
+; X86-NEXT:    pushl 16(%ebp)
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
+; X86-NEXT:    pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    calll __modti3
+; X86-NEXT:    addl $32, %esp
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    orl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    orl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    orl %eax, %ecx
+; X86-NEXT:    setne %al
+; X86-NEXT:    testb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Reload
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; X86-NEXT:    cmovel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
+; X86-NEXT:    cmovel {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
+; X86-NEXT:    leal -12(%ebp), %esp
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    popl %ebx
+; X86-NEXT:    popl %ebp
+; X86-NEXT:    retl
+  %tmp = call i64 @llvm.sdiv.fix.i64(i64 %x, i64 %y, i32 31)
+  ret i64 %tmp
+}
+
+define i18 @func6(i16 %x, i16 %y) nounwind {
+; X64-LABEL: func6:
+; X64:       # %bb.0:
+; X64-NEXT:    movswl %di, %ecx
+; X64-NEXT:    movswl %si, %esi
+; X64-NEXT:    shll $7, %ecx
+; X64-NEXT:    movl %ecx, %eax
+; X64-NEXT:    cltd
+; X64-NEXT:    idivl %esi
+; X64-NEXT:    # kill: def $eax killed $eax def $rax
+; X64-NEXT:    leal -1(%rax), %edi
+; X64-NEXT:    testl %esi, %esi
+; X64-NEXT:    sets %sil
+; X64-NEXT:    testl %ecx, %ecx
+; X64-NEXT:    sets %cl
+; X64-NEXT:    xorb %sil, %cl
+; X64-NEXT:    testl %edx, %edx
+; X64-NEXT:    setne %dl
+; X64-NEXT:    testb %cl, %dl
+; X64-NEXT:    cmovnel %edi, %eax
+; X64-NEXT:    # kill: def $eax killed $eax killed $rax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func6:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    movswl {{[0-9]+}}(%esp), %esi
+; X86-NEXT:    movswl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    shll $7, %ecx
+; X86-NEXT:    movl %ecx, %eax
+; X86-NEXT:    cltd
+; X86-NEXT:    idivl %esi
+; X86-NEXT:    leal -1(%eax), %edi
+; X86-NEXT:    testl %esi, %esi
+; X86-NEXT:    sets %bl
+; X86-NEXT:    testl %ecx, %ecx
+; X86-NEXT:    sets %cl
+; X86-NEXT:    xorb %bl, %cl
+; X86-NEXT:    testl %edx, %edx
+; X86-NEXT:    setne %dl
+; X86-NEXT:    testb %cl, %dl
+; X86-NEXT:    cmovnel %edi, %eax
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    popl %ebx
+; X86-NEXT:    retl
+  %x2 = sext i16 %x to i18
+  %y2 = sext i16 %y to i18
+  %tmp = call i18 @llvm.sdiv.fix.i18(i18 %x2, i18 %y2, i32 7)
+  ret i18 %tmp
+}
+
+define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
+; X64-LABEL: vec:
+; X64:       # %bb.0:
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    pcmpgtd %xmm1, %xmm2
+; X64-NEXT:    pshufd {{.*#+}} xmm3 = xmm1[2,3,0,1]
+; X64-NEXT:    movdqa %xmm1, %xmm4
+; X64-NEXT:    punpckldq {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
+; X64-NEXT:    movq %xmm4, %rcx
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    pcmpgtd %xmm0, %xmm2
+; X64-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
+; X64-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
+; X64-NEXT:    psllq $31, %xmm0
+; X64-NEXT:    movq %xmm0, %rax
+; X64-NEXT:    cqto
+; X64-NEXT:    idivq %rcx
+; X64-NEXT:    movq %rax, %r8
+; X64-NEXT:    movq %rdx, %r11
+; X64-NEXT:    pshufd {{.*#+}} xmm2 = xmm4[2,3,0,1]
+; X64-NEXT:    movq %xmm2, %rcx
+; X64-NEXT:    pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
+; X64-NEXT:    movq %xmm2, %rax
+; X64-NEXT:    cqto
+; X64-NEXT:    idivq %rcx
+; X64-NEXT:    movq %rax, %r10
+; X64-NEXT:    movq %rdx, %rcx
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    pcmpgtd %xmm3, %xmm2
+; X64-NEXT:    punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
+; X64-NEXT:    movq %xmm3, %rdi
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    pcmpgtd %xmm1, %xmm2
+; X64-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
+; X64-NEXT:    psllq $31, %xmm1
+; X64-NEXT:    movq %xmm1, %rax
+; X64-NEXT:    cqto
+; X64-NEXT:    idivq %rdi
+; X64-NEXT:    movq %rax, %r9
+; X64-NEXT:    movq %rdx, %rdi
+; X64-NEXT:    pshufd {{.*#+}} xmm2 = xmm3[2,3,0,1]
+; X64-NEXT:    movq %xmm2, %rsi
+; X64-NEXT:    pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]
+; X64-NEXT:    movq %xmm2, %rax
+; X64-NEXT:    cqto
+; X64-NEXT:    idivq %rsi
+; X64-NEXT:    movq %r11, %xmm2
+; X64-NEXT:    movq %rcx, %xmm5
+; X64-NEXT:    pxor %xmm6, %xmm6
+; X64-NEXT:    punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm5[0]
+; X64-NEXT:    pcmpeqd %xmm6, %xmm2
+; X64-NEXT:    pshufd {{.*#+}} xmm5 = xmm2[1,0,3,2]
+; X64-NEXT:    pand %xmm2, %xmm5
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    pcmpgtd %xmm4, %xmm2
+; X64-NEXT:    pxor %xmm4, %xmm4
+; X64-NEXT:    pcmpgtd %xmm0, %xmm4
+; X64-NEXT:    movq %r8, %xmm0
+; X64-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
+; X64-NEXT:    pshufd {{.*#+}} xmm4 = xmm4[1,1,3,3]
+; X64-NEXT:    pxor %xmm2, %xmm4
+; X64-NEXT:    movq %r10, %xmm2
+; X64-NEXT:    pandn %xmm4, %xmm5
+; X64-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; X64-NEXT:    movdqa %xmm5, %xmm2
+; X64-NEXT:    pandn %xmm0, %xmm2
+; X64-NEXT:    pcmpeqd %xmm4, %xmm4
+; X64-NEXT:    paddq %xmm4, %xmm0
+; X64-NEXT:    pand %xmm5, %xmm0
+; X64-NEXT:    por %xmm2, %xmm0
+; X64-NEXT:    movq %rdi, %xmm2
+; X64-NEXT:    movq %rdx, %xmm5
+; X64-NEXT:    punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm5[0]
+; X64-NEXT:    pcmpeqd %xmm6, %xmm2
+; X64-NEXT:    pshufd {{.*#+}} xmm5 = xmm2[1,0,3,2]
+; X64-NEXT:    pand %xmm2, %xmm5
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    pcmpgtd %xmm3, %xmm2
+; X64-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
+; X64-NEXT:    pcmpgtd %xmm1, %xmm6
+; X64-NEXT:    pshufd {{.*#+}} xmm1 = xmm6[1,1,3,3]
+; X64-NEXT:    pxor %xmm2, %xmm1
+; X64-NEXT:    pandn %xmm1, %xmm5
+; X64-NEXT:    movq %r9, %xmm1
+; X64-NEXT:    movq %rax, %xmm2
+; X64-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
+; X64-NEXT:    movdqa %xmm5, %xmm2
+; X64-NEXT:    pandn %xmm1, %xmm2
+; X64-NEXT:    paddq %xmm4, %xmm1
+; X64-NEXT:    pand %xmm5, %xmm1
+; X64-NEXT:    por %xmm2, %xmm1
+; X64-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
+; X64-NEXT:    retq
+;
+; X86-LABEL: vec:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    subl $64, %esp
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebx
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movl %ecx, %edx
+; X86-NEXT:    sarl $31, %edx
+; X86-NEXT:    movl %edi, %esi
+; X86-NEXT:    shll $31, %esi
+; X86-NEXT:    movl %ebx, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    andl $-2147483648, %ebx # imm = 0x80000000
+; X86-NEXT:    orl %eax, %ebx
+; X86-NEXT:    sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
+; X86-NEXT:    movl %ebp, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    andl $-2147483648, %ebp # imm = 0x80000000
+; X86-NEXT:    orl %eax, %ebp
+; X86-NEXT:    movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
+; X86-NEXT:    andl $-2147483648, %ebp # imm = 0x80000000
+; X86-NEXT:    orl %eax, %ebp
+; X86-NEXT:    movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    sets (%esp) # 1-byte Folded Spill
+; X86-NEXT:    movl %edi, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    andl $-2147483648, %edi # imm = 0x80000000
+; X86-NEXT:    orl %eax, %edi
+; X86-NEXT:    sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
+; X86-NEXT:    pushl %edx
+; X86-NEXT:    movl %edx, %ebp
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    calll __moddi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl {{[0-9]+}}(%esp)
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    calll __divdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    shll $31, %ecx
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NEXT:    movl %edx, %eax
+; X86-NEXT:    sarl $31, %eax
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    movl %eax, %esi
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    pushl %edx
+; X86-NEXT:    movl %edx, %ebp
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    movl %ecx, %edi
+; X86-NEXT:    calll __moddi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    calll __divdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $31, %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movl %ecx, %edx
+; X86-NEXT:    sarl $31, %edx
+; X86-NEXT:    pushl %edx
+; X86-NEXT:    movl %edx, %ebp
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    movl %ecx, %edi
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Reload
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    movl %eax, %esi
+; X86-NEXT:    calll __moddi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    calll __divdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $31, %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movl %ecx, %ebp
+; X86-NEXT:    sarl $31, %ebp
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    calll __moddi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %edx, %edi
+; X86-NEXT:    testl %ebp, %ebp
+; X86-NEXT:    sets %bl
+; X86-NEXT:    xorb (%esp), %bl # 1-byte Folded Reload
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl {{[0-9]+}}(%esp)
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
+; X86-NEXT:    calll __divdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    orl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Folded Reload
+; X86-NEXT:    setne %cl
+; X86-NEXT:    testb %bl, %cl
+; X86-NEXT:    leal -1(%eax), %ecx
+; X86-NEXT:    cmovel %eax, %ecx
+; X86-NEXT:    cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
+; X86-NEXT:    sets %al
+; X86-NEXT:    xorb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Folded Reload
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
+; X86-NEXT:    orl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
+; X86-NEXT:    setne %dl
+; X86-NEXT:    testb %al, %dl
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; X86-NEXT:    leal -1(%eax), %edi
+; X86-NEXT:    cmovel %eax, %edi
+; X86-NEXT:    cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
+; X86-NEXT:    sets %dl
+; X86-NEXT:    xorb {{[-0-9]+}}(%e{{[sb]}}p), %dl # 1-byte Folded Reload
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; X86-NEXT:    orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT:    setne %dh
+; X86-NEXT:    testb %dl, %dh
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; X86-NEXT:    leal -1(%eax), %edx
+; X86-NEXT:    cmovel %eax, %edx
+; X86-NEXT:    cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
+; X86-NEXT:    sets %bl
+; X86-NEXT:    xorb {{[-0-9]+}}(%e{{[sb]}}p), %bl # 1-byte Folded Reload
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; X86-NEXT:    orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT:    setne %bh
+; X86-NEXT:    testb %bl, %bh
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; X86-NEXT:    leal -1(%eax), %esi
+; X86-NEXT:    cmovel %eax, %esi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl %esi, 12(%eax)
+; X86-NEXT:    movl %edx, 8(%eax)
+; X86-NEXT:    movl %edi, 4(%eax)
+; X86-NEXT:    movl %ecx, (%eax)
+; X86-NEXT:    addl $64, %esp
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    popl %ebx
+; X86-NEXT:    popl %ebp
+; X86-NEXT:    retl $4
+  %tmp = call <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %x, <4 x i32> %y, i32 31)
+  ret <4 x i32> %tmp
+}
--- a/test/CodeGen/X86/udiv_fix.ll
+++ b/test/CodeGen/X86/udiv_fix.ll
@ -0,0 +1,344 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
+; RUN: llc < %s -mtriple=i686 -mattr=cmov | FileCheck %s --check-prefix=X86
+
+declare  i4  @llvm.udiv.fix.i4   (i4,  i4,  i32)
+declare  i15 @llvm.udiv.fix.i15  (i15, i15, i32)
+declare  i16 @llvm.udiv.fix.i16  (i16, i16, i32)
+declare  i18 @llvm.udiv.fix.i18  (i18, i18, i32)
+declare  i64 @llvm.udiv.fix.i64  (i64, i64, i32)
+declare  <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32>, <4 x i32>, i32)
+
+define i16 @func(i16 %x, i16 %y) nounwind {
+; X64-LABEL: func:
+; X64:       # %bb.0:
+; X64-NEXT:    movzwl %si, %ecx
+; X64-NEXT:    movzwl %di, %eax
+; X64-NEXT:    shll $7, %eax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divl %ecx
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func:
+; X86:       # %bb.0:
+; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $7, %eax
+; X86-NEXT:    xorl %edx, %edx
+; X86-NEXT:    divl %ecx
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    retl
+  %tmp = call i16 @llvm.udiv.fix.i16(i16 %x, i16 %y, i32 7)
+  ret i16 %tmp
+}
+
+define i16 @func2(i8 %x, i8 %y) nounwind {
+; X64-LABEL: func2:
+; X64:       # %bb.0:
+; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    andl $32767, %eax # imm = 0x7FFF
+; X64-NEXT:    movsbl %sil, %ecx
+; X64-NEXT:    andl $32767, %ecx # imm = 0x7FFF
+; X64-NEXT:    shll $14, %eax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divl %ecx
+; X64-NEXT:    addl %eax, %eax
+; X64-NEXT:    cwtl
+; X64-NEXT:    shrl %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func2:
+; X86:       # %bb.0:
+; X86-NEXT:    movsbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    andl $32767, %ecx # imm = 0x7FFF
+; X86-NEXT:    movsbl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    andl $32767, %eax # imm = 0x7FFF
+; X86-NEXT:    shll $14, %eax
+; X86-NEXT:    xorl %edx, %edx
+; X86-NEXT:    divl %ecx
+; X86-NEXT:    addl %eax, %eax
+; X86-NEXT:    cwtl
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    retl
+  %x2 = sext i8 %x to i15
+  %y2 = sext i8 %y to i15
+  %tmp = call i15 @llvm.udiv.fix.i15(i15 %x2, i15 %y2, i32 14)
+  %tmp2 = sext i15 %tmp to i16
+  ret i16 %tmp2
+}
+
+define i16 @func3(i15 %x, i8 %y) nounwind {
+; X64-LABEL: func3:
+; X64:       # %bb.0:
+; X64-NEXT:    # kill: def $edi killed $edi def $rdi
+; X64-NEXT:    leal (%rdi,%rdi), %eax
+; X64-NEXT:    movzbl %sil, %ecx
+; X64-NEXT:    shll $4, %ecx
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divw %cx
+; X64-NEXT:    # kill: def $ax killed $ax def $eax
+; X64-NEXT:    addl %eax, %eax
+; X64-NEXT:    cwtl
+; X64-NEXT:    shrl %eax
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func3:
+; X86:       # %bb.0:
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    addl %eax, %eax
+; X86-NEXT:    movzbl %cl, %ecx
+; X86-NEXT:    shll $4, %ecx
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    xorl %edx, %edx
+; X86-NEXT:    divw %cx
+; X86-NEXT:    # kill: def $ax killed $ax def $eax
+; X86-NEXT:    addl %eax, %eax
+; X86-NEXT:    cwtl
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    retl
+  %y2 = sext i8 %y to i15
+  %y3 = shl i15 %y2, 7
+  %tmp = call i15 @llvm.udiv.fix.i15(i15 %x, i15 %y3, i32 4)
+  %tmp2 = sext i15 %tmp to i16
+  ret i16 %tmp2
+}
+
+define i4 @func4(i4 %x, i4 %y) nounwind {
+; X64-LABEL: func4:
+; X64:       # %bb.0:
+; X64-NEXT:    andb $15, %sil
+; X64-NEXT:    andb $15, %dil
+; X64-NEXT:    shlb $2, %dil
+; X64-NEXT:    movzbl %dil, %eax
+; X64-NEXT:    divb %sil
+; X64-NEXT:    retq
+;
+; X86-LABEL: func4:
+; X86:       # %bb.0:
+; X86-NEXT:    movb {{[0-9]+}}(%esp), %cl
+; X86-NEXT:    andb $15, %cl
+; X86-NEXT:    movb {{[0-9]+}}(%esp), %al
+; X86-NEXT:    andb $15, %al
+; X86-NEXT:    shlb $2, %al
+; X86-NEXT:    movzbl %al, %eax
+; X86-NEXT:    divb %cl
+; X86-NEXT:    retl
+  %tmp = call i4 @llvm.udiv.fix.i4(i4 %x, i4 %y, i32 2)
+  ret i4 %tmp
+}
+
+define i64 @func5(i64 %x, i64 %y) nounwind {
+; X64-LABEL: func5:
+; X64:       # %bb.0:
+; X64-NEXT:    pushq %rax
+; X64-NEXT:    movq %rsi, %rdx
+; X64-NEXT:    movq %rdi, %rsi
+; X64-NEXT:    shlq $31, %rdi
+; X64-NEXT:    shrq $33, %rsi
+; X64-NEXT:    xorl %ecx, %ecx
+; X64-NEXT:    callq __udivti3
+; X64-NEXT:    popq %rcx
+; X64-NEXT:    retq
+;
+; X86-LABEL: func5:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    andl $-8, %esp
+; X86-NEXT:    subl $24, %esp
+; X86-NEXT:    movl 8(%ebp), %eax
+; X86-NEXT:    movl 12(%ebp), %ecx
+; X86-NEXT:    movl %ecx, %edx
+; X86-NEXT:    shrl %edx
+; X86-NEXT:    shldl $31, %eax, %ecx
+; X86-NEXT:    shll $31, %eax
+; X86-NEXT:    movl %esp, %esi
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl 20(%ebp)
+; X86-NEXT:    pushl 16(%ebp)
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl %edx
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    calll __udivti3
+; X86-NEXT:    addl $32, %esp
+; X86-NEXT:    movl (%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NEXT:    leal -4(%ebp), %esp
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %ebp
+; X86-NEXT:    retl
+  %tmp = call i64 @llvm.udiv.fix.i64(i64 %x, i64 %y, i32 31)
+  ret i64 %tmp
+}
+
+define i18 @func6(i16 %x, i16 %y) nounwind {
+; X64-LABEL: func6:
+; X64:       # %bb.0:
+; X64-NEXT:    movswl %di, %eax
+; X64-NEXT:    andl $262143, %eax # imm = 0x3FFFF
+; X64-NEXT:    movswl %si, %ecx
+; X64-NEXT:    andl $262143, %ecx # imm = 0x3FFFF
+; X64-NEXT:    shll $7, %eax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divl %ecx
+; X64-NEXT:    retq
+;
+; X86-LABEL: func6:
+; X86:       # %bb.0:
+; X86-NEXT:    movswl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    andl $262143, %ecx # imm = 0x3FFFF
+; X86-NEXT:    movswl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    andl $262143, %eax # imm = 0x3FFFF
+; X86-NEXT:    shll $7, %eax
+; X86-NEXT:    xorl %edx, %edx
+; X86-NEXT:    divl %ecx
+; X86-NEXT:    retl
+  %x2 = sext i16 %x to i18
+  %y2 = sext i16 %y to i18
+  %tmp = call i18 @llvm.udiv.fix.i18(i18 %x2, i18 %y2, i32 7)
+  ret i18 %tmp
+}
+
+define i16 @func7(i16 %x, i16 %y) nounwind {
+; X64-LABEL: func7:
+; X64:       # %bb.0:
+; X64-NEXT:    movl %edi, %eax
+; X64-NEXT:    shll $16, %eax
+; X64-NEXT:    movzwl %si, %ecx
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divl %ecx
+; X64-NEXT:    # kill: def $ax killed $ax killed $eax
+; X64-NEXT:    retq
+;
+; X86-LABEL: func7:
+; X86:       # %bb.0:
+; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $16, %eax
+; X86-NEXT:    xorl %edx, %edx
+; X86-NEXT:    divl %ecx
+; X86-NEXT:    # kill: def $ax killed $ax killed $eax
+; X86-NEXT:    retl
+  %tmp = call i16 @llvm.udiv.fix.i16(i16 %x, i16 %y, i32 16)
+  ret i16 %tmp
+}
+
+define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
+; X64-LABEL: vec:
+; X64:       # %bb.0:
+; X64-NEXT:    pxor %xmm2, %xmm2
+; X64-NEXT:    movdqa %xmm1, %xmm4
+; X64-NEXT:    punpckhdq {{.*#+}} xmm4 = xmm4[2],xmm2[2],xmm4[3],xmm2[3]
+; X64-NEXT:    movq %xmm4, %rcx
+; X64-NEXT:    movdqa %xmm0, %xmm5
+; X64-NEXT:    punpckhdq {{.*#+}} xmm5 = xmm5[2],xmm2[2],xmm5[3],xmm2[3]
+; X64-NEXT:    psllq $31, %xmm5
+; X64-NEXT:    movq %xmm5, %rax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divq %rcx
+; X64-NEXT:    movq %rax, %xmm3
+; X64-NEXT:    pshufd {{.*#+}} xmm4 = xmm4[2,3,0,1]
+; X64-NEXT:    movq %xmm4, %rcx
+; X64-NEXT:    pshufd {{.*#+}} xmm4 = xmm5[2,3,0,1]
+; X64-NEXT:    movq %xmm4, %rax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divq %rcx
+; X64-NEXT:    movq %rax, %xmm4
+; X64-NEXT:    punpcklqdq {{.*#+}} xmm3 = xmm3[0],xmm4[0]
+; X64-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
+; X64-NEXT:    movq %xmm1, %rcx
+; X64-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
+; X64-NEXT:    psllq $31, %xmm0
+; X64-NEXT:    movq %xmm0, %rax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divq %rcx
+; X64-NEXT:    movq %rax, %xmm2
+; X64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
+; X64-NEXT:    movq %xmm1, %rcx
+; X64-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
+; X64-NEXT:    movq %xmm0, %rax
+; X64-NEXT:    xorl %edx, %edx
+; X64-NEXT:    divq %rcx
+; X64-NEXT:    movq %rax, %xmm0
+; X64-NEXT:    punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
+; X64-NEXT:    shufps {{.*#+}} xmm2 = xmm2[0,2],xmm3[0,2]
+; X64-NEXT:    movaps %xmm2, %xmm0
+; X64-NEXT:    retq
+;
+; X86-LABEL: vec:
+; X86:       # %bb.0:
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    pushl %esi
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebx
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl %eax, %ecx
+; X86-NEXT:    shrl %ecx
+; X86-NEXT:    shll $31, %eax
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl {{[0-9]+}}(%esp)
+; X86-NEXT:    pushl %ecx
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    calll __udivdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, (%esp) # 4-byte Spill
+; X86-NEXT:    movl %ebx, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    shll $31, %ebx
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl {{[0-9]+}}(%esp)
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    pushl %ebx
+; X86-NEXT:    calll __udivdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, %ebx
+; X86-NEXT:    movl %ebp, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    shll $31, %ebp
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl {{[0-9]+}}(%esp)
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    calll __udivdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, %ebp
+; X86-NEXT:    movl %edi, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    shll $31, %edi
+; X86-NEXT:    pushl $0
+; X86-NEXT:    pushl {{[0-9]+}}(%esp)
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    pushl %edi
+; X86-NEXT:    calll __udivdi3
+; X86-NEXT:    addl $16, %esp
+; X86-NEXT:    movl %eax, 12(%esi)
+; X86-NEXT:    movl %ebp, 8(%esi)
+; X86-NEXT:    movl %ebx, 4(%esi)
+; X86-NEXT:    movl (%esp), %eax # 4-byte Reload
+; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %esi, %eax
+; X86-NEXT:    addl $4, %esp
+; X86-NEXT:    popl %esi
+; X86-NEXT:    popl %edi
+; X86-NEXT:    popl %ebx
+; X86-NEXT:    popl %ebp
+; X86-NEXT:    retl $4
+  %tmp = call <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %x, <4 x i32> %y, i32 31)
+  ret <4 x i32> %tmp
+}