[globalisel][docs] Rework GMIR documentation and add an early GenericOpcode reference

Summary: Rework the GMIR documentation to focus more on the end user than the implementation and tie it in to the MIR document. There was also some out-of-date information which has been removed. The quality of the GenericOpcode reference is highly variable and drops sharply as I worked through them all but we've got to start somewhere :-). It would be great if others could expand on this too as there is an awful lot to get through. Also fix a typo in the definition of G_FLOG. Previously, the comments said we had two base-2's (G_FLOG and G_FLOG2). Reviewers: aemerson, volkan, rovka, arsenm Reviewed By: rovka Subscribers: wdng, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69545
2025-01-31 20:51:52 +01:00 · 2019-11-05 15:10:00 -08:00 · 2019-11-05 15:10:00 -08:00 · 6cdb5ec3bc
commit 6cdb5ec3bc
parent c9c0d59dd4
7 changed files with 800 additions and 73 deletions
--- a/docs/GlobalISel/GMIR.rst
+++ b/docs/GlobalISel/GMIR.rst
@ -3,38 +3,35 @@
 Generic Machine IR
 ==================

-Machine IR operates on physical registers, register classes, and (mostly)
-target-specific instructions.
-
-To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to
-Machine IR:
-
 .. contents::
   :local:

-``NOTE``:
-The generic MIR (GMIR) representation still contains references to IR
-constructs (such as ``GlobalValue``).  Removing those should let us write more
-accurate tests, or delete IR after building the initial MIR.  However, it is
-not part of the GlobalISel effort.
+Generic MIR (gMIR) is an intermediate representation that shares the same data
+structures as :doc:`MachineIR (MIR) <../MIRLangRef>` but has more relaxed
+constraints. As the compilation pipeline proceeds, these constraints are
+gradually tightened until gMIR has become MIR.
+
+The rest of this document will assume that you are familiar with the concepts
+in :doc:`MachineIR (MIR) <../MIRLangRef>` and will highlight the differences
+between MIR and gMIR.

 .. _gmir-instructions:

-Generic Instructions
--------------------
+Generic Machine Instructions
+----------------------------

-The main addition is support for pre-isel generic machine instructions (e.g.,
-``G_ADD``).  Like other target-independent instructions (e.g., ``COPY`` or
-``PHI``), these are available on all targets.
+.. note::

-``TODO``:
-While we're progressively adding instructions, one kind in particular exposes
-interesting problems: compares and how to represent condition codes.
-Some targets (x86, ARM) have generic comparisons setting multiple flags,
-which are then used by predicated variants.
-Others (IR) specify the predicate in the comparison and users just get a single
-bit.  SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to
-represent this.
+  This section expands on :ref:`mir-instructions` from the MIR Language
+  Reference.
+
+Whereas MIR deals largely in Target Instructions and only has a small set of
+target independent opcodes such as ``COPY``, ``PHI``, and ``REG_SEQUENCE``,
+gMIR defines a rich collection of ``Generic Opcodes`` which are target
+independent and describe operations which are typically supported by targets.
+One example is ``G_ADD`` which is the generic opcode for an integer addition.
+More information on each of the generic opcodes can be found at
+:doc:`GenericOpcode`.

 The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides
 a convenient way to create these generic instructions.
@ -44,50 +41,109 @@ a convenient way to create these generic instructions.
 Generic Virtual Registers
 -------------------------

-Generic instructions operate on a new kind of register: "generic" virtual
-registers.  As opposed to non-generic vregs, they are not assigned a Register
-Class.  Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned
-a :ref:`gmir-regbank`.
+.. note::

-``MachineRegisterInfo`` tracks the same information that it does for
-non-generic vregs (e.g., use-def chains).  Additionally, it also tracks the
-:ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``,
-its :ref:`gmir-regbank`, if any.
+  This section expands on :ref:`mir-registers` from the MIR Language
+  Reference.

-For simplicity, most generic instructions only accept generic vregs:
+Generic virtual registers are like virtual registers but they are not assigned a
+Register Class constraint. Instead, generic virtual registers have less strict
+constraints starting with a :ref:`gmir-llt` and then further constrained to a
+:ref:`gmir-regbank`. Eventually they will be constrained to a register class
+at which point they become normal virtual registers.

-* instead of immediates, they use a gvreg defined by an instruction
-  materializing the immediate value (see :ref:`irtranslator-constants`).
-* instead of physical register, they use a gvreg defined by a ``COPY``.
+Generic virtual registers can be used with all the virtual register API's
+provided by ``MachineRegisterInfo``. In particular, the def-use chain API's can
+be used without needing to distinguish them from non-generic virtual registers.

-``NOTE``:
-We started with an alternative representation, where MRI tracks a size for
-each gvreg, and instructions have lists of types.
-That had two flaws: the type and size are redundant, and there was no generic
-way of getting a given operand's type (as there was no 1:1 mapping between
-instruction types and operands).
-We considered putting the type in some variant of MCInstrDesc instead:
-See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
-need a type but this increases the memory footprint of the related objects
+For simplicity, most generic instructions only accept virtual registers (both
+generic and non-generic). There are some exceptions to this but in general:
+
+* instead of immediates, they use a generic virtual register defined by an
+  instruction that materializes the immediate value (see
+  :ref:`irtranslator-constants`). Typically this is a G_CONSTANT or a
+  G_FCONSTANT. One example of an exception to this rule is G_SEXT_INREG where
+  having an immediate is mandatory.
+* instead of physical register, they use a generic virtual register that is
+  either defined by a ``COPY`` from the physical register or used by a ``COPY``
+  that defines the physical register.
+
+.. admonition:: Historical Note
+
+  We started with an alternative representation, where MRI tracks a size for
+  each generic virtual register, and instructions have lists of types.
+  That had two flaws: the type and size are redundant, and there was no generic
+  way of getting a given operand's type (as there was no 1:1 mapping between
+  instruction types and operands).
+  We considered putting the type in some variant of MCInstrDesc instead:
+  See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
+  need a type but this increases the memory footprint of the related objects

 .. _gmir-regbank:

 Register Bank
 -------------

-A Register Bank is a set of register classes defined by the target.
-A bank has a size, which is the maximum store size of all covered classes.
+A Register Bank is a set of register classes defined by the target. This
+definition is rather loose so let's talk about what they can achieve.

-In general, cross-class copies inside a bank are expected to be cheaper than
-copies across banks.  They are also coalesceable by the register coalescer,
-whereas cross-bank copies are not.
+Suppose we have a processor that has two register files, A and B. These are
+equal in every way and support the same instructions for the same cost. They're
+just physically stored apart and each instruction can only access registers from
+A or register B but never a mix of the two. If we want to perform an operation
+on data that's in split between the two register files, we must first copy all
+the data into a single register file.

-Also, equivalent operations can be performed on different banks using different
-instructions.
+Given a processor like this, we would benefit from clustering related data
+together into one register file so that we minimize the cost of copying data
+back and forth to satisfy the (possibly conflicting) requirements of all the
+instructions. Register Banks are a means to constrain the register allocator to
+use a particular register file for a virtual register.

-For example, X86 can be seen as having 3 main banks: general-purpose, x87, and
-vector (which could be further split into a bank per domain for single vs
-double precision instructions).
+In practice, register files A and B are rarely equal. They can typically store
+the same data but there's usually some restrictions on what operations you can
+do on each register file. A fairly common pattern is for one of them to be
+accessible to integer operations and the other accessible to floating point
+operations. To accomodate this, let's rename A and B to GPR (general purpose
+registers) and FPR (floating point registers).
+
+We now have some additional constraints that limit us. An operation like G_FMUL
+has to happen in FPR and G_ADD has to happen in GPR. However, even though this
+prescribes a lot of the assignments we still have some freedom. A G_LOAD can
+happen in both GPR and FPR, and which we want depends on who is going to consume
+the loaded data. Similarly, G_FNEG can happen in both GPR and FPR. If we assign
+it to FPR, then we'll use floating point negation. However, if we assign it to
+GPR then we can equivalently G_XOR the sign bit with 1 to invert it.
+
+In summary, Register Banks are a means of disambiguating between seemingly
+equivalent choices based on some analysis of the differences when each choice
+is applied in a given context.
+
+To give some concrete examples:
+
+AArch64
+
+  AArch64 has three main banks. GPR for integer operations, FPR for floating
+  point and also for the NEON vector instruction set. The third is CCR and
+  describes the condition code register used for predication.
+
+MIPS
+
+  MIPS has five main banks of which many programs only really use one or two.
+  GPR is the general purpose bank for integer operations. FGR or CP1 is for
+  the floating point operations as well as the MSA vector instructions and a
+  few other application specific extensions. CP0 is for system registers and
+  few programs will use it. CP2 and CP3 are for any application specific
+  coprocessors that may be present in the chip. Arguably, there is also a sixth
+  for the LO and HI registers but these are only used for the result of a few
+  operations and it's of questionable value to model distinctly from GPR.
+
+X86
+
+  X86 can be seen as having 3 main banks: general-purpose, x87, and
+  vector (which could be further split into a bank per domain for single vs
+  double precision instructions). It also looks like there's arguably a few
+  more potential banks such as one for the AVX512 Mask Registers.

 Register banks are described by a target-provided API,
 :ref:`RegisterBankInfo <api-registerbankinfo>`.
@ -108,7 +164,6 @@ as size and number of vector lanes:
 * ``sN`` for scalars
 * ``pN`` for pointers
 * ``<N x sM>`` for vectors
-* ``unsized`` for labels, etc..

 ``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG.

@ -122,14 +177,13 @@ Here are some LLT examples and their ``EVT`` and ``Type`` equivalents:
   ``s32``        ``i32``    ``i32``
   ``s32``        ``f32``    ``float``
   ``s17``        ``i17``    ``i17``
-   ``s16``        N/A        ``{i8, i8}``
-   ``s32``        N/A        ``[4 x i8]``
+   ``s16``        N/A        ``{i8, i8}`` [#abi-dependent]_
+   ``s32``        N/A        ``[4 x i8]`` [#abi-dependent]_
   ``p0``         ``iPTR``   ``i8*``, ``i32*``, ``%opaque*``
   ``p2``         ``iPTR``   ``i8 addrspace(2)*``
   ``<4 x s32>``  ``v4f32``  ``<4 x float>``
   ``s64``        ``v1f64``  ``<1 x double>``
   ``<3 x s32>``  ``v3i32``  ``<3 x i32>``
-   ``unsized``    ``Other``  ``label``
   =============  =========  ======================================


@ -143,16 +197,23 @@ to SelectionDAG where address space is an attribute on operations.
 This representation better supports pointers having different sizes depending
 on their addressspace.

-``NOTE``:
-Currently, LLT requires at least 2 elements in vectors, but some targets have
-the concept of a '1-element vector'.  Representing them as their underlying
-scalar type is a nice simplification.
+.. note::

-``TODO``:
-Currently, non-generic virtual registers, defined by non-pre-isel-generic
-instructions, cannot have a type, and thus cannot be used by a pre-isel generic
-instruction.  Instead, they are given a type using a COPY.  We could relax that
-and allow types on all vregs: this would reduce the number of MI required when
-emitting target-specific MIR early in the pipeline.  This should purely be
-a compile-time optimization.
+  .. caution::

+    Is this still true? I thought we'd removed the 1-element vector concept.
+    Hypothetically, it could be distinct from a scalar but I think we failed to
+    find a real occurrence.
+
+  Currently, LLT requires at least 2 elements in vectors, but some targets have
+  the concept of a '1-element vector'.  Representing them as their underlying
+  scalar type is a nice simplification.
+
+.. rubric:: Footnotes
+
+.. [#abi-dependent] This mapping is ABI dependent. Here we've assumed no additional padding is required.
+
+Generic Opcode Reference
+------------------------
+
+The Generic Opcodes that are available are described at :doc:`GenericOpcode`.
--- a/docs/GlobalISel/GenericOpcode.rst
+++ b/docs/GlobalISel/GenericOpcode.rst
@ -0,0 +1,658 @@
+
+.. _gmir-opcodes:
+
+Generic Opcodes
+===============
+
+.. contents::
+   :local:
+
+.. note::
+
+  This documentation does not yet fully account for vectors. Many of the
+  scalar/integer/floating-point operations can also take vectors.
+
+Constants
+---------
+
+G_IMPLICIT_DEF
+^^^^^^^^^^^^^^
+
+An undefined value.
+
+.. code-block:: none
+
+  %0:_(s32) = G_IMPLICIT_DEF
+
+G_CONSTANT
+^^^^^^^^^^
+
+An integer constant.
+
+.. code-block:: none
+
+  %0:_(s32) = G_CONSTANT i32 1
+
+G_FCONSTANT
+^^^^^^^^^^^
+
+A floating point constant.
+
+.. code-block:: none
+
+  %0:_(s32) = G_FCONSTANT float 1.0
+
+G_FRAME_INDEX
+^^^^^^^^^^^^^
+
+The address of an object in the stack frame.
+
+.. code-block:: none
+
+  %1:_(p0) = G_FRAME_INDEX %stack.0.ptr0
+
+G_GLOBAL_VALUE
+^^^^^^^^^^^^^^
+
+The address of a global value.
+
+.. code-block:: none
+
+  %0(p0) = G_GLOBAL_VALUE @var_local
+
+G_BLOCK_ADDR
+^^^^^^^^^^^^
+
+The address of a basic block.
+
+.. code-block:: none
+
+  %0:_(p0) = G_BLOCK_ADDR blockaddress(@test_blockaddress, %ir-block.block)
+
+Integer Extension and Truncation
+--------------------------------
+
+G_ANYEXT
+^^^^^^^^
+
+Extend the underlying scalar type of an operation, leaving the high bits
+unspecified.
+
+.. code-block:: none
+
+  %1:_(s32) = G_ANYEXT %0:_(s16)
+
+G_SEXT
+^^^^^^
+
+Sign extend the underlying scalar type of an operation, copying the sign bit
+into the newly-created space.
+
+.. code-block:: none
+
+  %1:_(s32) = G_SEXT %0:_(s16)
+
+G_SEXT_INREG
+^^^^^^^^^^^^
+
+Sign extend the a value from an arbitrary bit position, copying the sign bit
+into all bits above it. This is equivalent to a shl + ashr pair with an
+appropriate shift amount. $sz is an immediate (MachineOperand::isImm()
+returns true) to allow targets to have some bitwidths legal and others
+lowered. This opcode is particularly useful if the target has sign-extension
+instructions that are cheaper than the constituent shifts as the optimizer is
+able to make decisions on whether it's better to hang on to the G_SEXT_INREG
+or to lower it and optimize the individual shifts.
+
+.. code-block:: none
+
+  %1:_(s32) = G_SEXT_INREG %0:_(s32), 16
+
+G_ZEXT
+^^^^^^
+
+Zero extend the underlying scalar type of an operation, putting zero bits
+into the newly-created space.
+
+.. code-block:: none
+
+  %1:_(s32) = G_ZEXT %0:_(s16)
+
+G_TRUNC
+^^^^^^^
+
+Truncate the underlying scalar type of an operation. This is equivalent to
+G_EXTRACT for scalar types, but acts elementwise on vectors.
+
+.. code-block:: none
+
+  %1:_(s16) = G_TRUNC %0:_(s32)
+
+Type Conversions
+----------------
+
+G_INTTOPTR
+^^^^^^^^^^
+
+Convert an integer to a pointer.
+
+.. code-block:: none
+
+  %1:_(p0) = G_INTTOPTR %0:_(s32)
+
+G_PTRTOINT
+^^^^^^^^^^
+
+Convert an pointer to an integer.
+
+.. code-block:: none
+
+  %1:_(s32) = G_PTRTOINT %0:_(p0)
+
+G_BITCAST
+^^^^^^^^^
+
+Reinterpret a value as a new type. This is usually done without changing any
+bits but this is not always the case due a sublety in the definition of the
+:ref:`LLVM-IR Bitcast Instruction <i_bitcast>`.
+
+.. code-block:: none
+
+  %1:_(s64) = G_BITCAST %0:_(<2 x s32>)
+
+G_ADDRSPACE_CAST
+^^^^^^^^^^^^^^^^
+
+Convert a pointer to an address space to a pointer to another address space.
+
+.. code-block:: none
+
+  %1:_(p1) = G_ADDRSPACE_CAST %0:_(p0)
+
+.. caution::
+
+  :ref:`i_addrspacecast` doesn't mention what happens if the cast is simply
+  invalid (i.e. if the address spaces are disjoint).
+
+Scalar Operations
+-----------------
+
+G_EXTRACT
+^^^^^^^^^
+
+Extract a register of the specified size, starting from the block given by
+index. This will almost certainly be mapped to sub-register COPYs after
+register banks have been selected.
+
+G_INSERT
+^^^^^^^^
+
+Insert a smaller register into a larger one at the specified bit-index.
+
+G_MERGE_VALUES
+^^^^^^^^^^^^^^
+
+Concatenate multiple registers of the same size into a wider register.
+The input operands are always ordered from lowest bits to highest:
+
+.. code-block:: none
+
+  %0:(s32) = G_MERGE_VALUES %bits_0_7:(s8), %bits_8_15:(s8),
+                            %bits_16_23:(s8), %bits_24_31:(s8)
+
+G_UNMERGE_VALUES
+^^^^^^^^^^^^^^^^
+
+Extract multiple registers specified size, starting from blocks given by
+indexes. This will almost certainly be mapped to sub-register COPYs after
+register banks have been selected.
+The output operands are always ordered from lowest bits to highest:
+
+.. code-block:: none
+
+  %bits_0_7:(s8), %bits_8_15:(s8),
+      %bits_16_23:(s8), %bits_24_31:(s8) = G_UNMERGE_VALUES %0:(s32)
+
+G_BSWAP
+^^^^^^^
+
+Reverse the order of the bytes in a scalar
+
+.. code-block:: none
+
+  %1:_(s32) = G_BSWAP %0:_(s32)
+
+G_BITREVERSE
+^^^^^^^^^^^^
+
+Reverse the order of the bits in a scalar
+
+.. code-block:: none
+
+  %1:_(s32) = G_BITREVERSE %0:_(s32)
+
+Integer Operations
+-------------------
+
+G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SDIV, G_UDIV, G_SREM, G_UREM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These each perform their respective integer arithmetic on a scalar.
+
+.. code-block:: none
+
+  %2:_(s32) = G_ADD %0:_(s32), %1:_(s32)
+
+G_SHL, G_LSHR, G_ASHR
+^^^^^^^^^^^^^^^^^^^^^
+
+Shift the bits of a scalar left or right inserting zeros (sign-bit for G_ASHR).
+
+G_ICMP
+^^^^^^
+
+Perform integer comparison producing non-zero (true) or zero (false). It's
+target specific whether a true value is 1, ~0U, or some other non-zero value.
+
+G_SELECT
+^^^^^^^^
+
+Select between two values depending on a zero/non-zero value.
+
+.. code-block:: none
+
+  %5:_(s32) = G_SELECT %4(s1), %6, %2
+
+G_PTR_ADD
+^^^^^^^^^
+
+Add an offset to a pointer measured in addressible units. Addressible units are
+typically bytes but this can vary between targets.
+
+.. code-block:: none
+
+  %1:_(p0) = G_PTR_MASK %0, 3
+
+G_PTR_MASK
+^^^^^^^^^^
+
+Zero the least significant N bits of a pointer.
+
+.. code-block:: none
+
+  %1:_(p0) = G_PTR_MASK %0, 3
+
+G_SMIN, G_SMAX, G_UMIN, G_UMAX
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Take the minimum/maximum of two values.
+
+.. code-block:: none
+
+  %5:_(s32) = G_SMIN %6, %2
+
+G_UADDO, G_SADDO, G_USUBO, G_SSUBO, G_SMULO, G_UMULO
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Perform the requested arithmetic and produce a carry output in addition to the
+normal result.
+
+.. code-block:: none
+
+  %3:_(s32), %4:_(s1) = G_UADDO %0, %1
+
+G_UADDE, G_SADDE, G_USUBE, G_SSUBE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Perform the requested arithmetic and consume a carry input in addition to the
+normal input. Also produce a carry output in addition to the normal result.
+
+.. code-block:: none
+
+  %3:_(s32), %4:_(s1) = G_UADDO %0, %1
+
+G_UMULH, G_SMULH
+^^^^^^^^^^^^^^^^
+
+Multiply two numbers at twice the incoming bit width (signed) and return
+the high half of the result
+
+.. code-block:: none
+
+  %3:_(s32), %4:_(s1) = G_UADDO %0, %1
+
+G_CTLZ, G_CTTZ, G_CTPOP
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Count leading zeros, trailing zeros, or number of set bits
+
+G_CTLZ_ZERO_UNDEF, G_CTTZ_ZERO_UNDEF
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Count leading zeros or trailing zeros. If the value is zero then the result is
+undefined.
+
+Floating Point Operations
+-------------------------
+
+G_FCMP
+^^^^^^
+
+Perform floating point comparison producing non-zero (true) or zero
+(false). It's target specific whether a true value is 1, ~0U, or some other
+non-zero value.
+
+G_FNEG
+^^^^^^
+
+Floating point negation
+
+G_FPEXT
+^^^^^^^
+
+Convert a floating point value to a larger type
+
+G_FPTRUNC
+^^^^^^^^^
+
+Convert a floating point value to a narrower type
+
+G_FPTOSI, G_FPTOUI, G_SITOFP, G_UITOFP
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Convert between integer and floating point
+
+G_FABS
+^^^^^^
+
+Take the absolute value of a floating point value
+
+G_FCOPYSIGN
+^^^^^^^^^^^
+
+Copy the value of the first operand, replacing the sign bit with that of the
+second operand.
+
+G_FCANONICALIZE
+^^^^^^^^^^^^^^^
+
+See :ref:`i_intr_llvm_canonicalize`
+
+G_FMINNUM
+^^^^^^^^^
+
+Perform floating-point minimum on two values.
+
+In the case where a single input is a NaN (either signaling or quiet),
+the non-NaN input is returned.
+
+The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.
+
+G_FMAXNUM
+^^^^^^^^^
+
+Perform floating-point maximum on two values.
+
+In the case where a single input is a NaN (either signaling or quiet),
+the non-NaN input is returned.
+
+The return value of (FMAXNUM 0.0, -0.0) could be either 0.0 or -0.0.
+
+G_FMINNUM_IEEE
+^^^^^^^^^^^^^^
+
+Perform floating-point minimum on two values, following the IEEE-754 2008
+definition. This differs from FMINNUM in the handling of signaling NaNs. If one
+input is a signaling NaN, returns a quiet NaN.
+
+G_FMAXNUM_IEEE
+^^^^^^^^^^^^^^
+
+Perform floating-point maximum on two values, following the IEEE-754 2008
+definition. This differs from FMAXNUM in the handling of signaling NaNs. If one
+input is a signaling NaN, returns a quiet NaN.
+
+G_FMINIMUM
+^^^^^^^^^^
+
+NaN-propagating minimum that also treat -0.0 as less than 0.0. While
+FMINNUM_IEEE follow IEEE 754-2008 semantics, FMINIMUM follows IEEE 754-2018
+draft semantics.
+
+G_FMAXIMUM
+^^^^^^^^^^
+
+NaN-propagating maximum that also treat -0.0 as less than 0.0. While
+FMAXNUM_IEEE follow IEEE 754-2008 semantics, FMAXIMUM follows IEEE 754-2018
+draft semantics.
+
+G_FADD, G_FSUB, G_FMUL, G_FDIV, G_FREM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Perform the specified floating point arithmetic.
+
+G_FMA
+^^^^^
+
+Perform a fused multiple add (i.e. without the intermediate rounding step).
+
+G_FMAD
+^^^^^^
+
+Perform a non-fused multiple add (i.e. with the intermediate rounding step).
+
+G_FPOW
+^^^^^^
+
+Raise the first operand to the power of the second.
+
+G_FEXP, G_FEXP2
+^^^^^^^^^^^^^^^
+
+Calculate the base-e or base-2 exponential of a value
+
+G_FLOG, G_FLOG2, G_FLOG10
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Calculate the base-e, base-2, or base-10 respectively.
+
+G_FCEIL, G_FCOS, G_FSIN, G_FSQRT, G_FFLOOR, G_FRINT, G_FNEARBYINT
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These correspond to the standard C functions of the same name.
+
+G_INTRINSIC_TRUNC
+^^^^^^^^^^^^^^^^^
+
+Returns the operand rounded to the nearest integer not larger in magnitude than the operand.
+
+G_INTRINSIC_ROUND
+^^^^^^^^^^^^^^^^^
+
+Returns the operand rounded to the nearest integer.
+
+Vector Specific Operations
+--------------------------
+
+G_CONCAT_VECTORS
+^^^^^^^^^^^^^^^^
+
+Concatenate two vectors to form a longer vector.
+
+G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Create a vector from multiple scalar registers. No implicit
+conversion is performed (i.e. the result element type must be the
+same as all source operands)
+
+The _TRUNC version truncates the larger operand types to fit the
+destination vector elt type.
+
+G_INSERT_VECTOR_ELT
+^^^^^^^^^^^^^^^^^^^
+
+Insert an element into a vector
+
+G_EXTRACT_VECTOR_ELT
+^^^^^^^^^^^^^^^^^^^^
+
+Extract an element from a vector
+
+G_SHUFFLE_VECTOR
+^^^^^^^^^^^^^^^^
+
+Concatenate two vectors and shuffle the elements according to the mask operand.
+The mask operand should be an IR Constant which exactly matches the
+corresponding mask for the IR shufflevector instruction.
+
+Memory Operations
+-----------------
+
+G_LOAD, G_SEXTLOAD, G_ZEXTLOAD
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Generic load. Expects a MachineMemOperand in addition to explicit
+operands. If the result size is larger than the memory size, the
+high bits are undefined, sign-extended, or zero-extended respectively.
+
+Only G_LOAD is valid if the result is a vector type. If the result is larger
+than the memory size, the high elements are undefined (i.e. this is not a
+per-element, vector anyextload)
+
+G_INDEXED_LOAD
+^^^^^^^^^^^^^^
+
+Generic indexed load. Combines a GEP with a load. $newaddr is set to $base + $offset.
+If $am is 0 (post-indexed), then the value is loaded from $base; if $am is 1 (pre-indexed)
+then the value is loaded from $newaddr.
+
+G_INDEXED_SEXTLOAD
+^^^^^^^^^^^^^^^^^^
+
+Same as G_INDEXED_LOAD except that the load performed is sign-extending, as with G_SEXTLOAD.
+
+G_INDEXED_ZEXTLOAD
+^^^^^^^^^^^^^^^^^^
+
+Same as G_INDEXED_LOAD except that the load performed is zero-extending, as with G_ZEXTLOAD.
+
+G_STORE
+^^^^^^^
+
+Generic store. Expects a MachineMemOperand in addition to explicit operands.
+
+G_INDEXED_STORE
+^^^^^^^^^^^^^^^
+
+Combines a store with a GEP. See description of G_INDEXED_LOAD for indexing behaviour.
+
+G_ATOMIC_CMPXCHG_WITH_SUCCESS
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Generic atomic cmpxchg with internal success check. Expects a
+MachineMemOperand in addition to explicit operands.
+
+G_ATOMIC_CMPXCHG
+^^^^^^^^^^^^^^^^
+
+Generic atomic cmpxchg. Expects a MachineMemOperand in addition to explicit
+operands.
+
+G_ATOMICRMW_XCHG, G_ATOMICRMW_ADD, G_ATOMICRMW_SUB, G_ATOMICRMW_AND, G_ATOMICRMW_NAND, G_ATOMICRMW_OR, G_ATOMICRMW_XOR, G_ATOMICRMW_MAX, G_ATOMICRMW_MIN, G_ATOMICRMW_UMAX, G_ATOMICRMW_UMIN, G_ATOMICRMW_FADD, G_ATOMICRMW_FSUB
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Generic atomicrmw. Expects a MachineMemOperand in addition to explicit
+operands.
+
+G_FENCE
+^^^^^^^
+
+.. caution::
+
+  I couldn't find any documentation on this at the time of writing.
+
+Control Flow
+------------
+
+G_PHI
+^^^^^
+
+Implement the φ node in the SSA graph representing the function.
+
+.. code-block:: none
+
+  %1(s8) = G_PHI %7(s8), %bb.0, %3(s8), %bb.1
+
+G_BR
+^^^^
+
+Unconditional branch
+
+G_BRCOND
+^^^^^^^^
+
+Conditional branch
+
+G_BRINDIRECT
+^^^^^^^^^^^^
+
+Indirect branch
+
+G_BRJT
+^^^^^^
+
+Indirect branch to jump table entry
+
+G_JUMP_TABLE
+^^^^^^^^^^^^
+
+.. caution::
+
+  I found no documentation for this instruction at the time of writing.
+
+G_INTRINSIC, G_INTRINSIC_W_SIDE_EFFECTS
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Call an intrinsic
+
+The _W_SIDE_EFFECTS version is considered to have unknown side-effects and
+as such cannot be reordered acrosss other side-effecting instructions.
+
+.. note::
+
+  Unlike SelectionDAG, there is no _VOID variant. Both of these are permitted
+  to have zero, one, or multiple results.
+
+Variadic Arguments
+------------------
+
+G_VASTART
+^^^^^^^^^
+
+.. caution::
+
+  I found no documentation for this instruction at the time of writing.
+
+G_VAARG
+^^^^^^^
+
+.. caution::
+
+  I found no documentation for this instruction at the time of writing.
+
+Other Operations
+----------------
+
+G_DYN_STACKALLOC
+^^^^^^^^^^^^^^^^
+
+Dynamically realign the stack pointer to the specified alignment
+
+.. code-block:: none
+
+  %8:_(p0) = G_DYN_STACKALLOC %7(s64), 32
+
+.. caution::
+
+  What does it mean for the immediate to be 0? It happens in the tests
--- a/docs/GlobalISel/index.rst
+++ b/docs/GlobalISel/index.rst
@ -50,6 +50,7 @@ the following sections.
  :maxdepth: 1

  GMIR
+  GenericOpcode
  Pipeline
  Porting
  Resources
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@ -13954,6 +13954,8 @@ Examples
 Specialised Arithmetic Intrinsics
 ---------------------------------

+.. _i_intr_llvm_canonicalize:
+
 '``llvm.canonicalize.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

--- a/docs/MIRLangRef.rst
+++ b/docs/MIRLangRef.rst
@ -345,6 +345,8 @@ specified in brackets after the block's definition:

 ``Alignment`` is specified in bytes, and must be a power of two.

+.. _mir-instructions:
+
 Machine Instructions
 --------------------

@ -407,6 +409,8 @@ The syntax for bundled instructions is the following:
 The first instruction is often a bundle header. The instructions between ``{``
 and ``}`` are bundled with the first instruction.

+.. _mir-registers:
+
 Registers
 ---------

--- a/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
+++ b/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
@ -406,8 +406,9 @@ public:

  /// Build and insert \p Res = G_PTR_ADD \p Op0, \p Op1
  ///
-  /// G_PTR_ADD adds \p Op1 bytes to the pointer specified by \p Op0,
-  /// storing the resulting pointer in \p Res.
+  /// G_PTR_ADD adds \p Op1 addressible units to the pointer specified by \p Op0,
+  /// storing the resulting pointer in \p Res. Addressible units are typically
+  /// bytes but this can vary between targets.
  ///
  /// \pre setBasicBlock or setMI must have been called.
  /// \pre \p Res and \p Op0 must be generic virtual registers with pointer
--- a/include/llvm/Target/GenericOpcodes.td
+++ b/include/llvm/Target/GenericOpcodes.td
@ -670,7 +670,7 @@ def G_FEXP2 : GenericInstruction {
  let hasSideEffects = 0;
 }

-// Floating point base-2 logarithm of a value.
+// Floating point base-e logarithm of a value.
 def G_FLOG : GenericInstruction {
  let OutOperandList = (outs type0:$dst);
  let InOperandList = (ins type0:$src1);