diff --git a/docs/AMDGPUUsage.rst b/docs/AMDGPUUsage.rst
index f4436045571..5aecdfc3c22 100644
--- a/docs/AMDGPUUsage.rst
+++ b/docs/AMDGPUUsage.rst
@@ -78,141 +78,170 @@ names from both the *Processor* and *Alternative Processor* can be used.
   .. table:: AMDGPU Processors
      :name: amdgpu-processor-table
 
-     =========== =============== ============ ===== ========== ======= ======================
-     Processor   Alternative     Target       dGPU/ Target     ROCm    Example
-                 Processor       Triple       APU   Features   Support Products
+     =========== =============== ============ ===== ================= ======= ======================
+     Processor   Alternative     Target       dGPU/ Target            ROCm    Example
+                 Processor       Triple       APU   Features          Support Products
                                  Architecture       Supported
                                                     [Default]
-     =========== =============== ============ ===== ========== ======= ======================
+     =========== =============== ============ ===== ================= ======= ======================
      **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
-     ----------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------
      ``r600``                    ``r600``     dGPU
      ``r630``                    ``r600``     dGPU
      ``rs880``                   ``r600``     dGPU
      ``rv670``                   ``r600``     dGPU
      **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
-     ----------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------
      ``rv710``                   ``r600``     dGPU
      ``rv730``                   ``r600``     dGPU
      ``rv770``                   ``r600``     dGPU
      **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
-     ----------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------
      ``cedar``                   ``r600``     dGPU
      ``cypress``                 ``r600``     dGPU
      ``juniper``                 ``r600``     dGPU
      ``redwood``                 ``r600``     dGPU
      ``sumo``                    ``r600``     dGPU
      **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
-     ----------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------
      ``barts``                   ``r600``     dGPU
      ``caicos``                  ``r600``     dGPU
      ``cayman``                  ``r600``     dGPU
      ``turks``                   ``r600``     dGPU
      **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-     ----------------------------------------------------------------------------------------
+     -----------------------------------------------------------------------------------------------
      ``gfx600``  - ``tahiti``    ``amdgcn``   dGPU
      ``gfx601``  - ``hainan``    ``amdgcn``   dGPU
                  - ``oland``
                  - ``pitcairn``
                  - ``verde``
      **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
-     ----------------------------------------------------------------------------------------
-     ``gfx700``  - ``kaveri``    ``amdgcn``   APU                      - A6-7000
-                                                                       - A6 Pro-7050B
-                                                                       - A8-7100
-                                                                       - A8 Pro-7150B
-                                                                       - A10-7300
-                                                                       - A10 Pro-7350B
-                                                                       - FX-7500
-                                                                       - A8-7200P
-                                                                       - A10-7400P
-                                                                       - FX-7600P
-     ``gfx701``  - ``hawaii``    ``amdgcn``   dGPU             ROCm    - FirePro W8100
-                                                                       - FirePro W9100
-                                                                       - FirePro S9150
-                                                                       - FirePro S9170
-     ``gfx702``                  ``amdgcn``   dGPU             ROCm    - Radeon R9 290
-                                                                       - Radeon R9 290x
-                                                                       - Radeon R390
-                                                                       - Radeon R390x
-     ``gfx703``  - ``kabini``    ``amdgcn``   APU                      - E1-2100
-                 - ``mullins``                                         - E1-2200
-                                                                       - E1-2500
-                                                                       - E2-3000
-                                                                       - E2-3800
-                                                                       - A4-5000
-                                                                       - A4-5100
-                                                                       - A6-5200
-                                                                       - A4 Pro-3340B
-     ``gfx704``  - ``bonaire``   ``amdgcn``   dGPU                     - Radeon HD 7790
-                                                                       - Radeon HD 8770
-                                                                       - R7 260
-                                                                       - R7 260X
+     -----------------------------------------------------------------------------------------------
+     ``gfx700``  - ``kaveri``    ``amdgcn``   APU                             - A6-7000
+                                                                              - A6 Pro-7050B
+                                                                              - A8-7100
+                                                                              - A8 Pro-7150B
+                                                                              - A10-7300
+                                                                              - A10 Pro-7350B
+                                                                              - FX-7500
+                                                                              - A8-7200P
+                                                                              - A10-7400P
+                                                                              - FX-7600P
+     ``gfx701``  - ``hawaii``    ``amdgcn``   dGPU                    ROCm    - FirePro W8100
+                                                                              - FirePro W9100
+                                                                              - FirePro S9150
+                                                                              - FirePro S9170
+     ``gfx702``                  ``amdgcn``   dGPU                    ROCm    - Radeon R9 290
+                                                                              - Radeon R9 290x
+                                                                              - Radeon R390
+                                                                              - Radeon R390x
+     ``gfx703``  - ``kabini``    ``amdgcn``   APU                             - E1-2100
+                 - ``mullins``                                                - E1-2200
+                                                                              - E1-2500
+                                                                              - E2-3000
+                                                                              - E2-3800
+                                                                              - A4-5000
+                                                                              - A4-5100
+                                                                              - A6-5200
+                                                                              - A4 Pro-3340B
+     ``gfx704``  - ``bonaire``   ``amdgcn``   dGPU                            - Radeon HD 7790
+                                                                              - Radeon HD 8770
+                                                                              - R7 260
+                                                                              - R7 260X
      **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
-     ----------------------------------------------------------------------------------------
-     ``gfx801``  - ``carrizo``   ``amdgcn``   APU   - xnack            - A6-8500P
-                                                      [on]             - Pro A6-8500B
-                                                                       - A8-8600P
-                                                                       - Pro A8-8600B
-                                                                       - FX-8800P
-                                                                       - Pro A12-8800B
-     \                           ``amdgcn``   APU   - xnack    ROCm    - A10-8700P
-                                                      [on]             - Pro A10-8700B
-                                                                       - A10-8780P
-     \                           ``amdgcn``   APU   - xnack            - A10-9600P
-                                                      [on]             - A10-9630P
-                                                                       - A12-9700P
-                                                                       - A12-9730P
-                                                                       - FX-9800P
-                                                                       - FX-9830P
-     \                           ``amdgcn``   APU   - xnack            - E2-9010
-                                                      [on]             - A6-9210
-                                                                       - A9-9410
-     ``gfx802``  - ``iceland``   ``amdgcn``   dGPU  - xnack    ROCm    - FirePro S7150
-                 - ``tonga``                          [off]            - FirePro S7100
-                                                                       - FirePro W7100
-                                                                       - Radeon R285
-                                                                       - Radeon R9 380
-                                                                       - Radeon R9 385
-                                                                       - Mobile FirePro
-                                                                         M7170
-     ``gfx803``  - ``fiji``      ``amdgcn``   dGPU  - xnack    ROCm    - Radeon R9 Nano
-                                                      [off]            - Radeon R9 Fury
-                                                                       - Radeon R9 FuryX
-                                                                       - Radeon Pro Duo
-                                                                       - FirePro S9300x2
-                                                                       - Radeon Instinct MI8
-     \           - ``polaris10`` ``amdgcn``   dGPU  - xnack    ROCm    - Radeon RX 470
-                                                      [off]            - Radeon RX 480
-                                                                       - Radeon Instinct MI6
-     \           - ``polaris11`` ``amdgcn``   dGPU  - xnack    ROCm    - Radeon RX 460
+     -----------------------------------------------------------------------------------------------
+     ``gfx801``  - ``carrizo``   ``amdgcn``   APU   - xnack                   - A6-8500P
+                                                      [on]                    - Pro A6-8500B
+                                                                              - A8-8600P
+                                                                              - Pro A8-8600B
+                                                                              - FX-8800P
+                                                                              - Pro A12-8800B
+     \                           ``amdgcn``   APU   - xnack           ROCm    - A10-8700P
+                                                      [on]                    - Pro A10-8700B
+                                                                              - A10-8780P
+     \                           ``amdgcn``   APU   - xnack                   - A10-9600P
+                                                      [on]                    - A10-9630P
+                                                                              - A12-9700P
+                                                                              - A12-9730P
+                                                                              - FX-9800P
+                                                                              - FX-9830P
+     \                           ``amdgcn``   APU   - xnack                   - E2-9010
+                                                      [on]                    - A6-9210
+                                                                              - A9-9410
+     ``gfx802``  - ``iceland``   ``amdgcn``   dGPU  - xnack           ROCm    - FirePro S7150
+                 - ``tonga``                          [off]                   - FirePro S7100
+                                                                              - FirePro W7100
+                                                                              - Radeon R285
+                                                                              - Radeon R9 380
+                                                                              - Radeon R9 385
+                                                                              - Mobile FirePro
+                                                                                M7170
+     ``gfx803``  - ``fiji``      ``amdgcn``   dGPU  - xnack           ROCm    - Radeon R9 Nano
+                                                      [off]                   - Radeon R9 Fury
+                                                                              - Radeon R9 FuryX
+                                                                              - Radeon Pro Duo
+                                                                              - FirePro S9300x2
+                                                                              - Radeon Instinct MI8
+     \           - ``polaris10`` ``amdgcn``   dGPU  - xnack           ROCm    - Radeon RX 470
+                                                      [off]                   - Radeon RX 480
+                                                                              - Radeon Instinct MI6
+     \           - ``polaris11`` ``amdgcn``   dGPU  - xnack           ROCm    - Radeon RX 460
                                                       [off]
      ``gfx810``  - ``stoney``    ``amdgcn``   APU   - xnack
                                                       [on]
      **GCN GFX9** [AMD-GCN-GFX9]_
-     ----------------------------------------------------------------------------------------
-     ``gfx900``                  ``amdgcn``   dGPU  - xnack    ROCm    - Radeon Vega
-                                                      [off]              Frontier Edition
-                                                                       - Radeon RX Vega 56
-                                                                       - Radeon RX Vega 64
-                                                                       - Radeon RX Vega 64
-                                                                         Liquid
-                                                                       - Radeon Instinct MI25
-     ``gfx902``                  ``amdgcn``   APU   - xnack            - Ryzen 3 2200G
-                                                      [on]             - Ryzen 5 2400G
-     ``gfx904``                  ``amdgcn``   dGPU  - xnack            *TBA*
+     -----------------------------------------------------------------------------------------------
+     ``gfx900``                  ``amdgcn``   dGPU  - xnack           ROCm    - Radeon Vega
+                                                      [off]                     Frontier Edition
+                                                                              - Radeon RX Vega 56
+                                                                              - Radeon RX Vega 64
+                                                                              - Radeon RX Vega 64
+                                                                                Liquid
+                                                                              - Radeon Instinct MI25
+     ``gfx902``                  ``amdgcn``   APU   - xnack                   - Ryzen 3 2200G
+                                                      [on]                    - Ryzen 5 2400G
+     ``gfx904``                  ``amdgcn``   dGPU  - xnack                   *TBA*
                                                       [off]
-                                                                       .. TODO
-                                                                          Add product
-                                                                          names.
-     ``gfx906``                  ``amdgcn``   dGPU  - xnack            - Radeon Instinct MI50
-                                                      [off]            - Radeon Instinct MI60
-     ``gfx909``                  ``amdgcn``   APU   - xnack            *TBA* (Raven Ridge 2)
+                                                                              .. TODO
+                                                                                 Add product
+                                                                                 names.
+     ``gfx906``                  ``amdgcn``   dGPU  - xnack                   - Radeon Instinct MI50
+                                                      [off]                   - Radeon Instinct MI60
+     ``gfx909``                  ``amdgcn``   APU   - xnack                   *TBA* (Raven Ridge 2)
                                                       [on]
-                                                                       .. TODO
-                                                                          Add product
-                                                                          names.
-     =========== =============== ============ ===== ========== ======= ======================
+                                                                              .. TODO
+                                                                                 Add product
+                                                                                 names.
+     **GCN GFX10** [AMD-GCN-GFX10]_
+     -----------------------------------------------------------------------------------------------
+     ``gfx1010``                 ``amdgcn``   dGPU  - xnack                   *TBA*
+                                                      [off]
+                                                    - wavefrontsize64
+                                                      [off]
+                                                    - cumode
+                                                      [off]
+                                                                              .. TODO
+                                                                                 Add product
+                                                                                 names.
+     ``gfx1011``                 ``amdgcn``   dGPU  - xnack                   *TBA*
+                                                      [off]
+                                                    - wavefrontsize64
+                                                      [off]
+                                                    - cumode
+                                                      [off]
+                                                                              .. TODO
+                                                                                 Add product
+                                                                                 names.
+     ``gfx1012``                 ``amdgcn``   dGPU  - xnack                   *TBA*
+                                                      [off]
+                                                    - wavefrontsize64
+                                                      [off]
+                                                    - cumode
+                                                      [off]
+                                                                              .. TODO
+                                                                                 Add product
+                                                                                 names.
+     =========== =============== ============ ===== ================= ======= ======================
 
 .. _amdgpu-target-features:
 
@@ -243,26 +272,38 @@ For example:
   .. table:: AMDGPU Target Features
      :name: amdgpu-target-feature-table
 
-     =============== ==================================================
-     Target Feature  Description
-     =============== ==================================================
-     -m[no-]xnack    Enable/disable generating code that has
-                     memory clauses that are compatible with
-                     having XNACK replay enabled.
+     ====================== ==================================================
+     Target Feature         Description
+     ====================== ==================================================
+     -m[no-]xnack           Enable/disable generating code that has
+                            memory clauses that are compatible with
+                            having XNACK replay enabled.
 
-                     This is used for demand paging and page
-                     migration. If XNACK replay is enabled in
-                     the device, then if a page fault occurs
-                     the code may execute incorrectly if the
-                     ``xnack`` feature is not enabled. Executing
-                     code that has the feature enabled on a
-                     device that does not have XNACK replay
-                     enabled will execute correctly, but may
-                     be less performant than code with the
-                     feature disabled.
-     -m[no-]sram-ecc Enable/disable generating code that assumes SRAM
-                     ECC is enabled/disabled.
-     =============== ==================================================
+                            This is used for demand paging and page
+                            migration. If XNACK replay is enabled in
+                            the device, then if a page fault occurs
+                            the code may execute incorrectly if the
+                            ``xnack`` feature is not enabled. Executing
+                            code that has the feature enabled on a
+                            device that does not have XNACK replay
+                            enabled will execute correctly, but may
+                            be less performant than code with the
+                            feature disabled.
+
+     -m[no-]sram-ecc        Enable/disable generating code that assumes SRAM
+                            ECC is enabled/disabled.
+
+     -m[no-]wavefrontsize64 Control the default wavefront size used when
+                            generating code for kernels. When disabled
+                            native wavefront size 32 is used, when enabled
+                            wavefront size 64 is used.
+
+     -m[no-]cumode          Control the default wavefront execution mode used
+                            when generating code for kernels. When disabled
+                            native WGP wavefront execution mode is used,
+                            when enabled CU wavefront execution mode is used
+                            (see :ref:`amdgpu-amdhsa-memory-model`).
+     ====================== ==================================================
 
 .. _amdgpu-address-spaces:
 
@@ -635,6 +676,10 @@ The AMDGPU backend uses the following ELF header:
      ``EF_AMDGPU_MACH_AMDGCN_GFX906``  0x02f      ``gfx906``
      *reserved*                        0x030      Reserved.
      ``EF_AMDGPU_MACH_AMDGCN_GFX909``  0x031      ``gfx909``
+     *reserved*                        0x032      Reserved.
+     ``EF_AMDGPU_MACH_AMDGCN_GFX1010`` 0x033      ``gfx1010``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX1011`` 0x034      ``gfx1011``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX1012`` 0x035      ``gfx1012``
      ================================= ========== =============================
 
 Sections
@@ -1492,12 +1537,12 @@ non-AMD key names should be prefixed by "*vendor-name*.".
      "NumSGPRs"                   integer        Required  Number of scalar
                                                            registers used by a
                                                            wavefront for
-                                                           GFX6-GFX9. This
+                                                           GFX6-GFX10. This
                                                            includes the special
                                                            SGPRs for VCC, Flat
-                                                           Scratch (GFX7-GFX9)
+                                                           Scratch (GFX7-GFX10)
                                                            and XNACK (for
-                                                           GFX8-GFX9). It does
+                                                           GFX8-GFX10). It does
                                                            not include the 16
                                                            SGPR added if a trap
                                                            handler is
@@ -1508,7 +1553,7 @@ non-AMD key names should be prefixed by "*vendor-name*.".
      "NumVGPRs"                   integer        Required  Number of vector
                                                            registers used by
                                                            each work-item for
-                                                           GFX6-GFX9
+                                                           GFX6-GFX10
      "MaxFlatWorkGroupSize"       integer        Required  Maximum flat
                                                            work-group size
                                                            supported by the
@@ -2060,10 +2105,10 @@ the scratch buffer descriptor and per wavefront scratch offset, by the scratch
 instructions, or by flat instructions. If each lane of a wavefront accesses the
 same private address, the interleaving results in adjacent dwords being accessed
 and hence requires fewer cache lines to be fetched. Multi-dword access is not
-supported except by flat and scratch instructions in GFX9.
+supported except by flat and scratch instructions in GFX9-GFX10.
 
 The generic address space uses the hardware flat address support available in
-GFX7-GFX9. This uses two fixed ranges of virtual addresses (the private and
+GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and
 local appertures), that are outside the range of addressible global memory, to
 map from a flat address to a private or local address.
 
@@ -2078,7 +2123,7 @@ To convert between a segment address and a flat address the base address of the
 appertures address can be used. For GFX7-GFX8 these are available in the
 :ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
 Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
-GFX9 the appature base addresses are directly available as inline constant
+GFX9-GFX10 the appature base addresses are directly available as inline constant
 registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit
 address mode the apperture sizes are 2^32 bytes and the base is aligned to 2^32
 which makes it easier to convert from flat to segment or segment to flat.
@@ -2120,14 +2165,14 @@ A kernel descriptor consists of the information needed by CP to initiate the
 execution of a kernel, including the entry point address of the machine code
 that implements the kernel.
 
-Kernel Descriptor for GFX6-GFX9
-+++++++++++++++++++++++++++++++
+Kernel Descriptor for GFX6-GFX10
+++++++++++++++++++++++++++++++++
 
 CP microcode requires the Kernel descriptor to be allocated on 64 byte
 alignment.
 
-  .. table:: Kernel Descriptor for GFX6-GFX9
-     :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table
+  .. table:: Kernel Descriptor for GFX6-GFX10
+     :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table
 
      ======= ======= =============================== ============================
      Bits    Size    Field Name                      Description
@@ -2157,22 +2202,32 @@ alignment.
                                                      entry point instruction
                                                      which must be 256 byte
                                                      aligned.
-     383:192 24                                      Reserved, must be 0.
+     351:272 20                                      Reserved, must be 0.
              bytes
+     383:352 4 bytes COMPUTE_PGM_RSRC3               GFX6-9
+                                                       Reserved, must be 0.
+                                                     GFX10
+                                                       Compute Shader (CS)
+                                                       program settings used by
+                                                       CP to set up
+                                                       ``COMPUTE_PGM_RSRC3``
+                                                       configuration
+                                                       register. See
+                                                       :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table`.
      415:384 4 bytes COMPUTE_PGM_RSRC1               Compute Shader (CS)
                                                      program settings used by
                                                      CP to set up
                                                      ``COMPUTE_PGM_RSRC1``
                                                      configuration
                                                      register. See
-                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
      447:416 4 bytes COMPUTE_PGM_RSRC2               Compute Shader (CS)
                                                      program settings used by
                                                      CP to set up
                                                      ``COMPUTE_PGM_RSRC2``
                                                      configuration
                                                      register. See
-                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
      448     1 bit   ENABLE_SGPR_PRIVATE_SEGMENT     Enable the setup of the
                      _BUFFER                         SGPR user data registers
                                                      (see
@@ -2192,15 +2247,24 @@ alignment.
      453     1 bit   ENABLE_SGPR_FLAT_SCRATCH_INIT   *see above*
      454     1 bit   ENABLE_SGPR_PRIVATE_SEGMENT     *see above*
                      _SIZE
-     455     1 bit                                   Reserved, must be 0.
-     511:456 8 bytes                                 Reserved, must be 0.
+     457:455 3 bits                                  Reserved, must be 0.
+     458     1 bit   ENABLE_WAVEFRONT_SIZE32         GFX6-9
+                                                       Reserved, must be 0.
+                                                     GFX10
+                                                       - If 0 execute in
+                                                         wavefront size 64 mode.
+                                                       - If 1 execute in
+                                                         native wavefront size
+                                                         32 mode.
+     463:459 5 bits                                  Reserved, must be 0.
+     511:464 6 bytes                                 Reserved, must be 0.
      512     **Total size 64 bytes.**
      ======= ====================================================================
 
 ..
 
-  .. table:: compute_pgm_rsrc1 for GFX6-GFX9
-     :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table
+  .. table:: compute_pgm_rsrc1 for GFX6-GFX10
+     :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table
 
      ======= ======= =============================== ===========================================================================
      Bits    Size    Field Name                      Description
@@ -2213,6 +2277,12 @@ alignment.
                                                      GFX6-GFX9
                                                        - vgprs_used 0..256
                                                        - max(0, ceil(vgprs_used / 4) - 1)
+                                                     GFX10 (wavefront size 64)
+                                                       - max_vgpr 1..256
+                                                       - max(0, ceil(vgprs_used / 4) - 1)
+                                                     GFX10 (wavefront size 32)
+                                                       - max_vgpr 1..256
+                                                       - max(0, ceil(vgprs_used / 8) - 1)
 
                                                      Where vgprs_used is defined
                                                      as the highest VGPR number
@@ -2244,6 +2314,10 @@ alignment.
                                                      GFX9
                                                        - sgprs_used 0..112
                                                        - 2 * max(0, ceil(sgprs_used / 16) - 1)
+                                                     GFX10
+                                                       Reserved, must be 0.
+                                                       (128 SGPRs always
+                                                       allocated.)
 
                                                      Where sgprs_used is
                                                      defined as the highest
@@ -2407,7 +2481,7 @@ alignment.
                                                      ``COMPUTE_PGM_RSRC1.CDBG_USER``.
      26      1 bit   FP16_OVFL                       GFX6-GFX8
                                                        Reserved, must be 0.
-                                                     GFX9
+                                                     GFX9-GFX10
                                                        Wavefront starts execution
                                                        with specified fp16 overflow
                                                        mode.
@@ -2423,14 +2497,60 @@ alignment.
 
                                                        Used by CP to set up
                                                        ``COMPUTE_PGM_RSRC1.FP16_OVFL``.
-     31:27   5 bits                                  Reserved, must be 0.
+     28:27   2 bits                                  Reserved, must be 0.
+     29      1 bit    WGP_MODE                       GFX6-GFX9
+                                                       Reserved, must be 0.
+                                                     GFX10
+                                                       - If 0 execute work-groups in
+                                                         CU wavefront execution mode.
+                                                       - If 1 execute work-groups on
+                                                         in WGP wavefront execution mode.
+
+                                                       See :ref:`amdgpu-amdhsa-memory-model`.
+
+                                                       Used by CP to set up
+                                                       ``COMPUTE_PGM_RSRC1.WGP_MODE``.
+     30      1 bit    MEM_ORDERED                    GFX6-9
+                                                       Reserved, must be 0.
+                                                     GFX10
+                                                       Controls the behavior of the
+                                                       waitcnt's vmcnt and vscnt
+                                                       counters.
+
+                                                       - If 0 vmcnt reports completion
+                                                         of load and atomic with return
+                                                         out of order with sample
+                                                         instructions, and the vscnt
+                                                         reports the completion of
+                                                         store and atomic without
+                                                         return in order.
+                                                       - If 1 vmcnt reports completion
+                                                         of load, atomic with return
+                                                         and sample instructions in
+                                                         order, and the vscnt reports
+                                                         the completion of store and
+                                                         atomic without return in order.
+
+                                                       Used by CP to set up
+                                                       ``COMPUTE_PGM_RSRC1.MEM_ORDERED``.
+     31      1 bit    FWD_PROGRESS                   GFX6-9
+                                                       Reserved, must be 0.
+                                                     GFX10
+                                                       - If 0 execute SIMD wavefronts
+                                                         using oldest first policy.
+                                                       - If 1 execute SIMD wavefronts to
+                                                         ensure wavefronts will make some
+                                                         forward progress.
+
+                                                       Used by CP to set up
+                                                       ``COMPUTE_PGM_RSRC1.FWD_PROGRESS``.
      32      **Total size 4 bytes**
      ======= ===================================================================================================================
 
 ..
 
-  .. table:: compute_pgm_rsrc2 for GFX6-GFX9
-     :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table
+  .. table:: compute_pgm_rsrc2 for GFX6-GFX10
+     :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table
 
      ======= ======= =============================== ===========================================================================
      Bits    Size    Field Name                      Description
@@ -2549,7 +2669,7 @@ alignment.
 
                                                      GFX6:
                                                        roundup(lds-size / (64 * 4))
-                                                     GFX7-GFX9:
+                                                     GFX7-GFX10:
                                                        roundup(lds-size / (128 * 4))
 
      24      1 bit   ENABLE_EXCEPTION_IEEE_754_FP    Wavefront starts execution
@@ -2580,6 +2700,21 @@ alignment.
      32      **Total size 4 bytes.**
      ======= ===================================================================================================================
 
+..
+
+  .. table:: compute_pgm_rsrc3 for GFX10
+     :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table
+
+     ======= ======= =============================== ===========================================================================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================================================================
+     3:0     4 bits  SHARED_VGPR_COUNT               Number of shared VGPRs for wavefront size 64. Granularity 8. Value 0-120.
+                                                     compute_pgm_rsrc1.vgprs + shared_vgpr_cnt cannot exceed 64.
+     31:4    28                                      Reserved, must be 0.
+             bits
+     32      **Total size 4 bytes.**
+     ======= ===================================================================================================================
+
 ..
 
   .. table:: Floating Point Rounding Mode Enumeration Values
@@ -2749,7 +2884,7 @@ SGPR register initial state is defined in
                                                     it once avoids loading it at
                                                     the beginning of every
                                                     wavefront.
-                                                  GFX9
+                                                  GFX9-GFX10
                                                     This is the
                                                     64 bit base address of the
                                                     per SPI scratch backing
@@ -2787,7 +2922,7 @@ SGPR register initial state is defined in
                                                   GFX7-GFX8 since it is the same
                                                   value as the second SGPR of
                                                   Flat Scratch Init. However, it
-                                                  may be needed for GFX9 which
+                                                  may be needed for GFX9-GFX10 which
                                                   changes the meaning of the
                                                   Flat Scratch Init value.
      then       Grid Work-Group Count X    1      32 bit count of the number of
@@ -2889,8 +3024,8 @@ Flat Scratch register pair are adjacent SGRRs so they can be moved as a 64 bit
 value to the hardware required SGPRn-3 and SGPRn-4 respectively.
 
 The global segment can be accessed either using buffer instructions (GFX6 which
-has V# 64 bit address support), flat instructions (GFX7-GFX9), or global
-instructions (GFX9).
+has V# 64 bit address support), flat instructions (GFX7-GFX10), or global
+instructions (GFX9-GFX10).
 
 If buffer operations are used then the compiler can generate a V# with the
 following properties:
@@ -2918,7 +3053,7 @@ GFX6-GFX8
   available in dispatch packet. For M0, it is also possible to use maximum
   possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
   GFX7-GFX8).
-GFX9
+GFX9-GFX10
   The M0 register is not used for range checking LDS accesses and so does not
   need to be initialized in the prolog.
 
@@ -2951,7 +3086,7 @@ GFX7-GFX8
      wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT SCRATCH
      SIZE.
 
-GFX9
+GFX9-GFX10
   The Flat Scratch Init is the 64 bit address of the base of scratch backing
   memory being managed by SPI for the queue executing the kernel dispatch. The
   prolog must add the value of Scratch Wavefront Offset and moved to the FLAT_SCRATCH
@@ -2972,7 +3107,7 @@ The AMDGPU backend supports the memory synchronization scopes specified in
 :ref:`amdgpu-memory-scopes`.
 
 The code sequences used to implement the memory model are defined in table
-:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
+:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table`.
 
 The sequences specify the order of instructions that a single thread must
 execute. The ``s_waitcnt`` and ``buffer_wbinvl1_vol`` are defined with respect
@@ -3010,7 +3145,8 @@ termed vector memory operations.
 
 For GFX6-GFX9:
 
-* Each agent has multiple compute units (CU).
+* Each agent has multiple shader arrays (SA).
+* Each SA has multiple compute units (CU).
 * Each CU has multiple SIMDs that execute wavefronts.
 * The wavefronts for a single work-group are executed in the same CU but may be
   executed by different SIMDs.
@@ -3056,8 +3192,79 @@ For GFX6-GFX9:
 * The L2 cache can be kept coherent with other agents on some targets, or ranges
   of virtual addresses can be set up to bypass it to ensure system coherence.
 
+For GFX10:
+
+* Each agent has multiple shader arrays (SA).
+* Each SA has multiple work-group processors (WGP).
+* Each WGP has multiple compute units (CU).
+* Each CU has multiple SIMDs that execute wavefronts.
+* The wavefronts for a single work-group are executed in the same
+  WGP. In CU wavefront execution mode the wavefronts may be executed by
+  different SIMDs in the same CU. In WGP wavefront execution mode the
+  wavefronts may be executed by different SIMDs in different CUs in the same
+  WGP.
+* Each WGP has a single LDS memory shared by the wavefronts of the work-groups
+  executing on it.
+* All LDS operations of a WGP are performed as wavefront wide operations in a
+  global order and involve no caching. Completion is reported to a wavefront in
+  execution order.
+* The LDS memory has multiple request queues shared by the SIMDs of a
+  WGP. Therefore, the LDS operations performed by different wavefronts of a work-group
+  can be reordered relative to each other, which can result in reordering the
+  visibility of vector memory operations with respect to LDS operations of other
+  wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
+  ensure synchronization between LDS operations and vector memory operations
+  between wavefronts of a work-group, but not between operations performed by the
+  same wavefront.
+* The vector memory operations are performed as wavefront wide operations.
+  Completion of load/store/sample operations are reported to a wavefront in
+  execution order of other load/store/sample operations performed by that
+  wavefront.
+* The vector memory operations access a vector L0 cache. There is a single L0
+  cache per CU. Each SIMD of a CU accesses the same L0 cache.
+  Therefore, no special action is required for coherence between the lanes of a
+  single wavefront. However, a ``BUFFER_GL0_INV`` is required for coherence
+  between wavefronts executing in the same work-group as they may be executing on
+  SIMDs of different CUs that access different L0s. A ``BUFFER_GL0_INV`` is also
+  required for coherence between wavefronts executing in different work-groups as
+  they may be executing on different WGPs.
+* The scalar memory operations access a scalar L0 cache shared by all wavefronts
+  on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
+  operations are used in a restricted way so do not impact the memory model. See
+  :ref:`amdgpu-amdhsa-memory-spaces`.
+* The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
+  the same SA. Therefore, no special action is required for coherence between
+  the wavefronts of a single work-group. However, a ``BUFFER_GL1_INV`` is
+  required for coherence between wavefronts executing in different work-groups as
+  they may be executing on different SAs that access different L1s.
+* The L1 caches have independent quadrants to service disjoint ranges of virtual
+  addresses.
+* Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
+  vector and scalar memory operations performed by different wavefronts, whether
+  executing in the same or different work-groups (which may be executing on
+  different CUs accessing different L0s), can be reordered relative to each
+  other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is required to ensure synchronization
+  between vector memory operations of different wavefronts. It ensures a previous
+  vector memory operation has completed before executing a subsequent vector
+  memory or LDS operation and so can be used to meet the requirements of acquire,
+  release and sequential consistency.
+* The L1 caches use an L2 cache shared by all SAs on the same agent.
+* The L2 cache has independent channels to service disjoint ranges of virtual
+  addresses.
+* Each L1 quadrant of a single SA accesses a different L2 channel. Each L1
+  quadrant has a separate request queue per L2 channel. Therefore, the vector
+  and scalar memory operations performed by wavefronts executing in different
+  work-groups (which may be executing on different SAs) of an agent can be
+  reordered relative to each other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is
+  required to ensure synchronization between vector memory operations of
+  different SAs. It ensures a previous vector memory operation has completed
+  before executing a subsequent vector memory and so can be used to meet the
+  requirements of acquire, release and sequential consistency.
+* The L2 cache can be kept coherent with other agents on some targets, or ranges
+  of virtual addresses can be set up to bypass it to ensure system coherence.
+
 Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-GFX8),
-or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
+or ``scratch_load/store`` (GFX9-GFX10). Since only a single thread is accessing the
 memory, atomic memory orderings are not meaningful and all accesses are treated
 as non-atomic.
 
@@ -3100,285 +3307,428 @@ future wavefront that uses the same scratch area, or a function call that create
 frame at the same address, respectively. There is no need for a ``s_dcache_inv``
 as all scalar writes are write-before-read in the same thread.
 
-Scratch backing memory (which is used for the private address space)
+For GFX6-GFX9, scratch backing memory (which is used for the private address space)
 is accessed with MTYPE NC_NV (non-coherenent non-volatile). Since the private
 address space is only accessed by a single thread, and is always
 write-before-read, there is never a need to invalidate these entries from the L1
 cache. Hence all cache invalidates are done as ``*_vol`` to only invalidate the
 volatile cache lines.
 
+For GFX10, scratch backing memory (which is used for the private address space)
+is accessed with MTYPE NC (non-coherenent). Since the private address space is
+only accessed by a single thread, and is always write-before-read, there is
+never a need to invalidate these entries from the L0 or L1 caches.
+
+For GFX10, wavefronts are executed in native mode with in-order reporting of loads
+and sample instructions. In this mode vmcnt reports completion of load, atomic
+with return and sample instructions in order, and the vscnt reports the
+completion of store and atomic without return in order. See ``MEM_ORDERED`` field
+in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+
+In GFX10, wavefronts can be executed in WGP or CU wavefront execution mode:
+
+* In WGP wavefront execution mode the wavefronts of a work-group are executed
+  on the SIMDs of both CUs of the WGP. Therefore, explicit management of the per
+  CU L0 caches is required for work-group synchronization. Also accesses to L1 at
+  work-group scope need to be expicitly ordered as the accesses from different
+  CUs are not ordered.
+* In CU wavefront execution mode the wavefronts of a work-group are executed on
+  the SIMDs of a single CU of the WGP. Therefore, all global memory access by
+  the work-group access the same L0 which in turn ensures L1 accesses are
+  ordered and so do not require explicit management of the caches for
+  work-group synchronization.
+
+See ``WGP_MODE`` field in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`
+and :ref:`amdgpu-target-features`.
+
 On dGPU the kernarg backing memory is accessed as UC (uncached) to avoid needing
-to invalidate the L2 cache. This also causes it to be treated as
+to invalidate the L2 cache. For GFX6-GFX9, this also causes it to be treated as
 non-volatile and so is not invalidated by ``*_vol``. On APU it is accessed as CC
-(cache coherent) and so the L2 cache will coherent with the CPU and other
+(cache coherent) and so the L2 cache will be coherent with the CPU and other
 agents.
 
-  .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX9
-     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table
+  .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX10
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table
 
-     ============ ============ ============== ========== ===============================
-     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
-                  Ordering     Sync Scope     Address
+     ============ ============ ============== ========== =============================== ==================================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code             AMDGPU Machine Code
+                  Ordering     Sync Scope     Address    GFX6-9                          GFX10
                                               Space
-     ============ ============ ============== ========== ===============================
+     ============ ============ ============== ========== =============================== ==================================
      **Non-Atomic**
-     -----------------------------------------------------------------------------------
-     load         *none*       *none*         - global   - !volatile & !nontemporal
+     ----------------------------------------------------------------------------------------------------------------------
+     load         *none*       *none*         - global   - !volatile & !nontemporal      - !volatile & !nontemporal
                                               - generic
-                                              - private    1. buffer/global/flat_load
+                                              - private    1. buffer/global/flat_load      1. buffer/global/flat_load
                                               - constant
-                                                         - volatile & !nontemporal
+                                                         - volatile & !nontemporal       - volatile & !nontemporal
 
-                                                           1. buffer/global/flat_load
-                                                              glc=1
+                                                           1. buffer/global/flat_load      1. buffer/global/flat_load
+                                                              glc=1                           glc=1 dlc=1
 
-                                                         - nontemporal
+                                                         - nontemporal                   - nontemporal
 
-                                                           1. buffer/global/flat_load
-                                                              glc=1 slc=1
+                                                           1. buffer/global/flat_load      1. buffer/global/flat_load
+                                                              glc=1 slc=1                     slc=1
 
-     load         *none*       *none*         - local    1. ds_load
-     store        *none*       *none*         - global   - !nontemporal
+     load         *none*       *none*         - local    1. ds_load                      1. ds_load
+     store        *none*       *none*         - global   - !nontemporal                  - !nontemporal
                                               - generic
-                                              - private    1. buffer/global/flat_store
+                                              - private    1. buffer/global/flat_store     1. buffer/global/flat_store
                                               - constant
-                                                         - nontemporal
+                                                         - nontemporal                   - nontemporal
 
-                                                           1. buffer/global/flat_stote
-                                                              glc=1 slc=1
+                                                           1. buffer/global/flat_stote      1. buffer/global/flat_store
+                                                              glc=1 slc=1                      slc=1
 
-     store        *none*       *none*         - local    1. ds_store
+     store        *none*       *none*         - local    1. ds_store                     1. ds_store
      **Unordered Atomic**
-     -----------------------------------------------------------------------------------
-     load atomic  unordered    *any*          *any*      *Same as non-atomic*.
-     store atomic unordered    *any*          *any*      *Same as non-atomic*.
-     atomicrmw    unordered    *any*          *any*      *Same as monotonic
-                                                         atomic*.
+     ----------------------------------------------------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.           *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.           *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic              *Same as monotonic
+                                                         atomic*.                        atomic*.
      **Monotonic Atomic**
-     -----------------------------------------------------------------------------------
-     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load
+     ----------------------------------------------------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load      1. buffer/global/flat_load
                                - wavefront    - generic
-                               - workgroup
-     load atomic  monotonic    - singlethread - local    1. ds_load
+     load atomic  monotonic    - workgroup    - global   1. buffer/global/flat_load      1. buffer/global/flat_load
+                                              - generic                                     glc=1
+
+                                                                                           - If CU wavefront execution mode, omit glc=1.
+
+     load atomic  monotonic    - singlethread - local    1. ds_load                      1. ds_load
                                - wavefront
                                - workgroup
-     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load
-                               - system       - generic     glc=1
-     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
+     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load      1. buffer/global/flat_load
+                               - system       - generic     glc=1                           glc=1 dlc=1
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store     1. buffer/global/flat_store
                                - wavefront    - generic
                                - workgroup
                                - agent
                                - system
-     store atomic monotonic    - singlethread - local    1. ds_store
+     store atomic monotonic    - singlethread - local    1. ds_store                     1. ds_store
                                - wavefront
                                - workgroup
-     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic    1. buffer/global/flat_atomic
                                - wavefront    - generic
                                - workgroup
                                - agent
                                - system
-     atomicrmw    monotonic    - singlethread - local    1. ds_atomic
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic                    1. ds_atomic
                                - wavefront
                                - workgroup
      **Acquire Atomic**
-     -----------------------------------------------------------------------------------
-     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load
+     ----------------------------------------------------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load   1. buffer/global/ds/flat_load
                                - wavefront    - local
                                               - generic
-     load atomic  acquire      - workgroup    - global   1. buffer/global/flat_load
-     load atomic  acquire      - workgroup    - local    1. ds_load
-                                                         2. s_waitcnt lgkmcnt(0)
+     load atomic  acquire      - workgroup    - global   1. buffer/global/flat_load      1. buffer/global_load glc=1
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/store/store
-                                                             atomic/atomicrmw.
-                                                           - Ensures any
-                                                             following global
-                                                             data read is no
-                                                             older than the load
-                                                             atomic value being
-                                                             acquired.
-     load atomic  acquire      - workgroup    - generic  1. flat_load
-                                                         2. s_waitcnt lgkmcnt(0)
+                                                                                           - If CU wavefront execution mode, omit glc=1.
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/store/store
-                                                             atomic/atomicrmw.
-                                                           - Ensures any
-                                                             following global
-                                                             data read is no
-                                                             older than the load
-                                                             atomic value being
-                                                             acquired.
-     load atomic  acquire      - agent        - global   1. buffer/global/flat_load
-                               - system                     glc=1
-                                                         2. s_waitcnt vmcnt(0)
+                                                                                         2. s_waitcnt vmcnt(0)
 
-                                                           - Must happen before
-                                                             following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures the load
-                                                             has completed
-                                                             before invalidating
-                                                             the cache.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Must happen before
+                                                                                             the following buffer_gl0_inv
+                                                                                             and before any following
+                                                                                             global/generic
+                                                                                             load/load
+                                                                                             atomic/stote/store
+                                                                                             atomic/atomicrmw.
 
-                                                         3. buffer_wbinvl1_vol
+                                                                                         3. buffer_gl0_inv
 
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following
-                                                             loads will not see
-                                                             stale global data.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
 
-     load atomic  acquire      - agent        - generic  1. flat_load glc=1
-                               - system                  2. s_waitcnt vmcnt(0) &
-                                                            lgkmcnt(0)
+     load atomic  acquire      - workgroup    - local    1. ds_load                      1. ds_load
+                                                         2. s_waitcnt lgkmcnt(0)         2. s_waitcnt lgkmcnt(0)
 
-                                                           - If OpenCL omit
-                                                             lgkmcnt(0).
-                                                           - Must happen before
-                                                             following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures the flat_load
-                                                             has completed
-                                                             before invalidating
-                                                             the cache.
+                                                           - If OpenCL, omit.              - If OpenCL, omit.
+                                                           - Must happen before            - Must happen before
+                                                             any following                   the following buffer_gl0_inv
+                                                             global/generic                  and before any following
+                                                             load/load                       global/generic load/load
+                                                             atomic/store/store              atomic/store/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures any                   - Ensures any
+                                                             following global                following global
+                                                             data read is no                 data read is no
+                                                             older than the load             older than the load
+                                                             atomic value being              atomic value being
+                                                             acquired.                       acquired.
 
-                                                         3. buffer_wbinvl1_vol
+                                                                                         3. buffer_gl0_inv
 
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - If OpenCL, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
 
-     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic
+     load atomic  acquire      - workgroup    - generic  1. flat_load                    1. flat_load glc=1
+
+                                                                                           - If CU wavefront execution mode, omit glc=1.
+
+                                                         2. s_waitcnt lgkmcnt(0)         2. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit vmcnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             lgkmcnt(0).
+                                                           - Must happen before            - Must happen before
+                                                             any following                   the following
+                                                             global/generic                  buffer_gl0_inv and any
+                                                             load/load                       following global/generic
+                                                             atomic/store/store              load/load
+                                                             atomic/atomicrmw.               atomic/store/store
+                                                                                             atomic/atomicrmw.
+                                                           - Ensures any                   - Ensures any
+                                                             following global                following global
+                                                             data read is no                 data read is no
+                                                             older than the load             older than the load
+                                                             atomic value being              atomic value being
+                                                             acquired.                       acquired.
+
+                                                                                         3. buffer_gl0_inv
+
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     load atomic  acquire      - agent        - global   1. buffer/global/flat_load      1. buffer/global_load
+                               - system                     glc=1                           glc=1 dlc=1
+                                                         2. s_waitcnt vmcnt(0)           2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before            - Must happen before
+                                                             following                       following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
+                                                           - Ensures the load              - Ensures the load
+                                                             has completed                   has completed
+                                                             before invalidating             before invalidating
+                                                             the cache.                      the caches.
+
+                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
+
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following                       following
+                                                             loads will not see              loads will not see
+                                                             stale global data.              stale global data.
+
+     load atomic  acquire      - agent        - generic  1. flat_load glc=1              1. flat_load glc=1 dlc=1
+                               - system                  2. s_waitcnt vmcnt(0) &         2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)                      lgkmcnt(0)
+
+                                                           - If OpenCL omit                - If OpenCL omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                           - Must happen before            - Must happen before
+                                                             following                       following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_invl.
+                                                           - Ensures the flat_load         - Ensures the flat_load
+                                                             has completed                   has completed
+                                                             before invalidating             before invalidating
+                                                             the cache.                      the caches.
+
+                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
+
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data.                    global data.
+
+     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    acquire      - workgroup    - global   1. buffer/global/flat_atomic
-     atomicrmw    acquire      - workgroup    - local    1. ds_atomic
-                                                         2. waitcnt lgkmcnt(0)
+     atomicrmw    acquire      - workgroup    - global   1. buffer/global/flat_atomic    1. buffer/global_atomic
+                                                                                         2. s_waitcnt vm/vscnt(0)
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                                                           - Must happen before
+                                                                                             the following buffer_gl0_inv
+                                                                                             and before any following
+                                                                                             global/generic
+                                                                                             load/load
+                                                                                             atomic/stote/store
+                                                                                             atomic/atomicrmw.
+
+                                                                                         3. buffer_gl0_inv
+
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     atomicrmw    acquire      - workgroup    - local    1. ds_atomic                    1. ds_atomic
+                                                         2. waitcnt lgkmcnt(0)           2. waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.              - If OpenCL, omit.
+                                                           - Must happen before            - Must happen before
+                                                             any following                   the following
+                                                             global/generic                  buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any
-                                                             following global
-                                                             data read is no
-                                                             older than the
-                                                             atomicrmw value
-                                                             being acquired.
+                                                           - Ensures any                   - Ensures any
+                                                             following global                following global
+                                                             data read is no                 data read is no
+                                                             older than the                  older than the
+                                                             atomicrmw value                 atomicrmw value
+                                                             being acquired.                 being acquired.
 
-     atomicrmw    acquire      - workgroup    - generic  1. flat_atomic
-                                                         2. waitcnt lgkmcnt(0)
+                                                                                         3. buffer_gl0_inv
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
+                                                                                           - If OpenCL omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     atomicrmw    acquire      - workgroup    - generic  1. flat_atomic                  1. flat_atomic
+                                                         2. waitcnt lgkmcnt(0)           2. waitcnt lgkmcnt(0) &
+                                                                                            vm/vscnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit vm/vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             waitcnt lgkmcnt(0)..
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                                                             waitcnt lgkmcnt(0).
+                                                           - Must happen before            - Must happen before
+                                                             any following                   the following
+                                                             global/generic                  buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any
-                                                             following global
-                                                             data read is no
-                                                             older than the
-                                                             atomicrmw value
-                                                             being acquired.
+                                                           - Ensures any                   - Ensures any
+                                                             following global                following global
+                                                             data read is no                 data read is no
+                                                             older than the                  older than the
+                                                             atomicrmw value                 atomicrmw value
+                                                             being acquired.                 being acquired.
 
-     atomicrmw    acquire      - agent        - global   1. buffer/global/flat_atomic
-                               - system                  2. s_waitcnt vmcnt(0)
+                                                                                         3. buffer_gl0_inv
 
-                                                           - Must happen before
-                                                             following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures the
-                                                             atomicrmw has
-                                                             completed before
-                                                             invalidating the
-                                                             cache.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
 
-                                                         3. buffer_wbinvl1_vol
+     atomicrmw    acquire      - agent        - global   1. buffer/global/flat_atomic    1. buffer/global_atomic
+                               - system                  2. s_waitcnt vmcnt(0)           2. s_waitcnt vm/vscnt(0)
 
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data.
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                                                             waitcnt lgkmcnt(0).
+                                                           - Must happen before            - Must happen before
+                                                             following                       following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
+                                                           - Ensures the                   - Ensures the
+                                                             atomicrmw has                   atomicrmw has
+                                                             completed before                completed before
+                                                             invalidating the                invalidating the
+                                                             cache.                          caches.
 
-     atomicrmw    acquire      - agent        - generic  1. flat_atomic
-                               - system                  2. s_waitcnt vmcnt(0) &
-                                                            lgkmcnt(0)
+                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
 
-                                                           - If OpenCL, omit
-                                                             lgkmcnt(0).
-                                                           - Must happen before
-                                                             following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures the
-                                                             atomicrmw has
-                                                             completed before
-                                                             invalidating the
-                                                             cache.
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data.                    global data.
 
-                                                         3. buffer_wbinvl1_vol
+     atomicrmw    acquire      - agent        - generic  1. flat_atomic                  1. flat_atomic
+                               - system                  2. s_waitcnt vmcnt(0) &         2. s_waitcnt vm/vscnt(0) &
+                                                            lgkmcnt(0)                      lgkmcnt(0)
 
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data.
+                                                           - If OpenCL, omit               - If OpenCL, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                           - Must happen before            - Must happen before
+                                                             following                       following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
+                                                           - Ensures the                   - Ensures the
+                                                             atomicrmw has                   atomicrmw has
+                                                             completed before                completed before
+                                                             invalidating the                invalidating the
+                                                             cache.                          caches.
 
-     fence        acquire      - singlethread *none*     *none*
+                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
+
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data.                    global data.
+
+     fence        acquire      - singlethread *none*     *none*                          *none*
                                - wavefront
-     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL and
-                                                             address space is
-                                                             not generic, omit.
-                                                           - However, since LLVM
-                                                             currently has no
-                                                             address space on
-                                                             the fence need to
-                                                             conservatively
-                                                             always generate. If
-                                                             fence had an
-                                                             address space then
-                                                             set to address
-                                                             space of OpenCL
-                                                             fence flag, or to
-                                                             generic if both
-                                                             local and global
-                                                             flags are
-                                                             specified.
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             not generic, omit.              not generic, omit
+                                                                                             lgkmcnt(0).
+                                                                                           - If OpenCL and
+                                                                                             address space is
+                                                                                             local, omit
+                                                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM           - However, since LLVM
+                                                             currently has no                currently has no
+                                                             address space on                address space on
+                                                             the fence need to               the fence need to
+                                                             conservatively                  conservatively
+                                                             always generate. If             always generate. If
+                                                             fence had an                    fence had an
+                                                             address space then              address space then
+                                                             set to address                  set to address
+                                                             space of OpenCL                 space of OpenCL
+                                                             fence flag, or to               fence flag, or to
+                                                             generic if both                 generic if both
+                                                             local and global                local and global
+                                                             flags are                       flags are
+                                                             specified.                      specified.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic load
@@ -3402,22 +3752,95 @@ agents.
                                                              older than the
                                                              value read by the
                                                              fence-paired-atomic.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value
+                                                                                             with an equal or
+                                                                                             wider sync scope
+                                                                                             and memory ordering
+                                                                                             stronger than
+                                                                                             unordered (this is
+                                                                                             termed the
+                                                                                             fence-paired-atomic).
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             atomicrmw-no-return-value
+                                                                                             with an equal or
+                                                                                             wider sync scope
+                                                                                             and memory ordering
+                                                                                             stronger than
+                                                                                             unordered (this is
+                                                                                             termed the
+                                                                                             fence-paired-atomic).
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic load
+                                                                                             atomic/atomicrmw
+                                                                                             with an equal or
+                                                                                             wider sync scope
+                                                                                             and memory ordering
+                                                                                             stronger than
+                                                                                             unordered (this is
+                                                                                             termed the
+                                                                                             fence-paired-atomic).
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             buffer_gl0_inv.
+                                                                                           - Ensures that the
+                                                                                             fence-paired atomic
+                                                                                             has completed
+                                                                                             before invalidating
+                                                                                             the
+                                                                                             cache. Therefore
+                                                                                             any following
+                                                                                             locations read must
+                                                                                             be no older than
+                                                                                             the value read by
+                                                                                             the
+                                                                                             fence-paired-atomic.
 
-     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)
+                                                                                         3. buffer_gl0_inv
 
-                                                           - If OpenCL and
-                                                             address space is
-                                                             not generic, omit
-                                                             lgkmcnt(0).
-                                                           - However, since LLVM
-                                                             currently has no
-                                                             address space on
-                                                             the fence need to
-                                                             conservatively
-                                                             always generate
-                                                             (see comment for
-                                                             previous fence).
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             not generic, omit               not generic, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                                                           - If OpenCL and
+                                                                                             address space is
+                                                                                             local, omit
+                                                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM           - However, since LLVM
+                                                             currently has no                currently has no
+                                                             address space on                address space on
+                                                             the fence need to               the fence need to
+                                                             conservatively                  conservatively
+                                                             always generate                 always generate
+                                                             (see comment for                (see comment for
+                                                             previous fence).                previous fence).
                                                            - Could be split into
                                                              separate s_waitcnt
                                                              vmcnt(0) and
@@ -3466,863 +3889,1555 @@ agents.
                                                              the value read by
                                                              the
                                                              fence-paired-atomic.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value
+                                                                                             with an equal or
+                                                                                             wider sync scope
+                                                                                             and memory ordering
+                                                                                             stronger than
+                                                                                             unordered (this is
+                                                                                             termed the
+                                                                                             fence-paired-atomic).
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             atomicrmw-no-return-value
+                                                                                             with an equal or
+                                                                                             wider sync scope
+                                                                                             and memory ordering
+                                                                                             stronger than
+                                                                                             unordered (this is
+                                                                                             termed the
+                                                                                             fence-paired-atomic).
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic load
+                                                                                             atomic/atomicrmw
+                                                                                             with an equal or
+                                                                                             wider sync scope
+                                                                                             and memory ordering
+                                                                                             stronger than
+                                                                                             unordered (this is
+                                                                                             termed the
+                                                                                             fence-paired-atomic).
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             buffer_gl*_inv.
+                                                                                           - Ensures that the
+                                                                                             fence-paired atomic
+                                                                                             has completed
+                                                                                             before invalidating
+                                                                                             the
+                                                                                             caches. Therefore
+                                                                                             any following
+                                                                                             locations read must
+                                                                                             be no older than
+                                                                                             the value read by
+                                                                                             the
+                                                                                             fence-paired-atomic.
 
-                                                         2. buffer_wbinvl1_vol
+                                                         2. buffer_wbinvl1_vol           2. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
 
-                                                           - Must happen before any
-                                                             following global/generic
-                                                             load/load
-                                                             atomic/store/store
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data.
+                                                           - Must happen before any        - Must happen before any
+                                                             following global/generic        following global/generic
+                                                             load/load                       load/load
+                                                             atomic/store/store              atomic/store/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data.                    global data.
 
      **Release Atomic**
-     -----------------------------------------------------------------------------------
-     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store
+     ----------------------------------------------------------------------------------------------------------------------
+     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store  1. buffer/global/ds/flat_store
                                - wavefront    - local
                                               - generic
-     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit.
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             store.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             store that is being
-                                                             released.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic
+                                                                                             load/store/load
+                                                                                             atomic/store
+                                                                                             atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             store.                          store.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             store that is being             store that is being
+                                                             released.                       released.
 
-                                                         2. buffer/global/flat_store
-     store atomic release      - workgroup    - local    1. ds_store
-     store atomic release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)
+                                                         2. buffer/global/flat_store     2. buffer/global_store
+     store atomic release      - workgroup    - local                                    1. waitcnt vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - If OpenCL, omit.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0) and s_waitcnt
+                                                                                             vscnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             store.
+                                                                                           - Ensures that all
+                                                                                             global memory
+                                                                                             operations have
+                                                                                             completed before
+                                                                                             performing the
+                                                                                             store that is being
+                                                                                             released.
+
+                                                         1. ds_store                     2. ds_store
+     store atomic release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             store.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             store that is being
-                                                             released.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic load/store/load
+                                                                                             atomic/store atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             store.                          store.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             store that is being             store that is being
+                                                             released.                       released.
 
-                                                         2. flat_store
-     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)
+                                                         2. flat_store                   2. flat_store
+     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &         1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)                          vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit
-                                                             lgkmcnt(0).
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0) and
-                                                             s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - s_waitcnt vmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             global/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - s_waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             store.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to memory have
-                                                             completed before
-                                                             performing the
-                                                             store that is being
-                                                             released.
+                                                           - If OpenCL, omit               - If OpenCL, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt vscnt(0)
+                                                             s_waitcnt                       and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             global/generic                  global/generic
+                                                             load/store/load                 load/load
+                                                             atomic/store                    atomic/
+                                                             atomic/atomicrmw.               atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             store.                          store.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to memory have                  to memory have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             store that is being             store that is being
+                                                             released.                       released.
 
-                                                         2. buffer/global/ds/flat_store
-     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic
+                                                         2. buffer/global/ds/flat_store  2. buffer/global/ds/flat_store
+     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
 
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
                                                            - If OpenCL, omit.
+
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             atomicrmw that is
-                                                             being released.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic
+                                                                                             load/store/load
+                                                                                             atomic/store
+                                                                                             atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             atomicrmw that is               atomicrmw that is
+                                                             being released.                 being released.
 
-                                                         2. buffer/global/flat_atomic
-     atomicrmw    release      - workgroup    - local    1. ds_atomic
-     atomicrmw    release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)
+                                                         2. buffer/global/flat_atomic    2. buffer/global_atomic
+     atomicrmw    release      - workgroup    - local                                    1. waitcnt vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - If OpenCL, omit.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0) and s_waitcnt
+                                                                                             vscnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             store.
+                                                                                           - Ensures that all
+                                                                                             global memory
+                                                                                             operations have
+                                                                                             completed before
+                                                                                             performing the
+                                                                                             store that is being
+                                                                                             released.
+
+                                                         1. ds_atomic                    2. ds_atomic
+     atomicrmw    release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             waitcnt lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             atomicrmw that is
-                                                             being released.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic load/store/load
+                                                                                             atomic/store atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             atomicrmw that is               atomicrmw that is
+                                                             being released.                 being released.
 
-                                                         2. flat_atomic
-     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)
+                                                         2. flat_atomic                  2. flat_atomic
+     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lkkmcnt(0) &
+                               - system       - generic     vmcnt(0)                         vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit
-                                                             lgkmcnt(0).
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0) and
-                                                             s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - s_waitcnt vmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             global/generic
-                                                             load/store/load
-                                                             atomic/store
+                                                           - If OpenCL, omit               - If OpenCL, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
+                                                             s_waitcnt                       vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             global/generic                  global/generic
+                                                             load/store/load                 load/load atomic/
+                                                             atomic/store                    atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                           - s_waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to global and local
-                                                             have completed
-                                                             before performing
-                                                             the atomicrmw that
-                                                             is being released.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to global and local             to global and local
+                                                             have completed                  have completed
+                                                             before performing               before performing
+                                                             the atomicrmw that              the atomicrmw that
+                                                             is being released.              is being released.
 
-                                                         2. buffer/global/ds/flat_atomic
-     fence        release      - singlethread *none*     *none*
+                                                         2. buffer/global/ds/flat_atomic 2. buffer/global/ds/flat_atomic
+     fence        release      - singlethread *none*     *none*                          *none*
                                - wavefront
-     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL and
-                                                             address space is
-                                                             not generic, omit.
-                                                           - However, since LLVM
-                                                             currently has no
-                                                             address space on
-                                                             the fence need to
-                                                             conservatively
-                                                             always generate. If
-                                                             fence had an
-                                                             address space then
-                                                             set to address
-                                                             space of OpenCL
-                                                             fence flag, or to
-                                                             generic if both
-                                                             local and global
-                                                             flags are
-                                                             specified.
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             not generic, omit.              not generic, omit
+                                                                                             lgkmcnt(0).
+                                                                                           - If OpenCL and
+                                                                                             address space is
+                                                                                             local, omit
+                                                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM           - However, since LLVM
+                                                             currently has no                currently has no
+                                                             address space on                address space on
+                                                             the fence need to               the fence need to
+                                                             conservatively                  conservatively
+                                                             always generate. If             always generate. If
+                                                             fence had an                    fence had an
+                                                             address space then              address space then
+                                                             set to address                  set to address
+                                                             space of OpenCL                 space of OpenCL
+                                                             fence flag, or to               fence flag, or to
+                                                             generic if both                 generic if both
+                                                             local and global                local and global
+                                                             flags are                       flags are
+                                                             specified.                      specified.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             any following store
-                                                             atomic/atomicrmw
-                                                             with an equal or
-                                                             wider sync scope
-                                                             and memory ordering
-                                                             stronger than
-                                                             unordered (this is
-                                                             termed the
-                                                             fence-paired-atomic).
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             following
-                                                             fence-paired-atomic.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic
+                                                                                             load/store/load
+                                                                                             atomic/store atomic/
+                                                                                             atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             any following store             any following store
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with an equal or                with an equal or
+                                                             wider sync scope                wider sync scope
+                                                             and memory ordering             and memory ordering
+                                                             stronger than                   stronger than
+                                                             unordered (this is              unordered (this is
+                                                             termed the                      termed the
+                                                             fence-paired-atomic).           fence-paired-atomic).
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             following                       following
+                                                             fence-paired-atomic.            fence-paired-atomic.
 
-     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)
+     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL and
-                                                             address space is
-                                                             not generic, omit
-                                                             lgkmcnt(0).
-                                                           - If OpenCL and
-                                                             address space is
-                                                             local, omit
-                                                             vmcnt(0).
-                                                           - However, since LLVM
-                                                             currently has no
-                                                             address space on
-                                                             the fence need to
-                                                             conservatively
-                                                             always generate. If
-                                                             fence had an
-                                                             address space then
-                                                             set to address
-                                                             space of OpenCL
-                                                             fence flag, or to
-                                                             generic if both
-                                                             local and global
-                                                             flags are
-                                                             specified.
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0) and
-                                                             s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - s_waitcnt vmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             global/generic
-                                                             load/store/load
-                                                             atomic/store
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             not generic, omit               not generic, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             local, omit                     local, omit
+                                                             vmcnt(0).                       vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM           - However, since LLVM
+                                                             currently has no                currently has no
+                                                             address space on                address space on
+                                                             the fence need to               the fence need to
+                                                             conservatively                  conservatively
+                                                             always generate. If             always generate. If
+                                                             fence had an                    fence had an
+                                                             address space then              address space then
+                                                             set to address                  set to address
+                                                             space of OpenCL                 space of OpenCL
+                                                             fence flag, or to               fence flag, or to
+                                                             generic if both                 generic if both
+                                                             local and global                local and global
+                                                             flags are                       flags are
+                                                             specified.                      specified.
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
+                                                             s_waitcnt                       vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             global/generic                  global/generic
+                                                             load/store/load                 load/load atomic/
+                                                             atomic/store                    atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                           - s_waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - Must happen before
-                                                             any following store
-                                                             atomic/atomicrmw
-                                                             with an equal or
-                                                             wider sync scope
-                                                             and memory ordering
-                                                             stronger than
-                                                             unordered (this is
-                                                             termed the
-                                                             fence-paired-atomic).
-                                                           - Ensures that all
-                                                             memory operations
-                                                             have
-                                                             completed before
-                                                             performing the
-                                                             following
-                                                             fence-paired-atomic.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             any following store             any following store
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with an equal or                with an equal or
+                                                             wider sync scope                wider sync scope
+                                                             and memory ordering             and memory ordering
+                                                             stronger than                   stronger than
+                                                             unordered (this is              unordered (this is
+                                                             termed the                      termed the
+                                                             fence-paired-atomic).           fence-paired-atomic).
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             have                            have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             following                       following
+                                                             fence-paired-atomic.            fence-paired-atomic.
 
      **Acquire-Release Atomic**
-     -----------------------------------------------------------------------------------
-     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic
+     ----------------------------------------------------------------------------------------------------------------------
+     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit.
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             s_waitcnt lgkmcnt(0).
+                                                           - Must happen after             - Must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic load/store/load
+                                                                                             atomic/store atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             atomicrmw that is               atomicrmw that is
+                                                             being released.                 being released.
+
+                                                         2. buffer/global/flat_atomic    2. buffer/global_atomic
+                                                                                         3. s_waitcnt vm/vscnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit vm/vscnt.
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                                                             waitcnt lgkmcnt(0).
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             buffer_gl0_inv.
+                                                                                           - Ensures any
+                                                                                             following global
+                                                                                             data read is no
+                                                                                             older than the
+                                                                                             atomicrmw value
+                                                                                             being acquired.
+
+                                                                                         4. buffer_gl0_inv
+
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     atomicrmw    acq_rel      - workgroup    - local                                    1. waitcnt vmcnt(0) & vscnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - If OpenCL, omit.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0) and s_waitcnt
+                                                                                             vscnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             store.
+                                                                                           - Ensures that all
+                                                                                             global memory
+                                                                                             operations have
+                                                                                             completed before
+                                                                                             performing the
+                                                                                             store that is being
+                                                                                             released.
+
+                                                         1. ds_atomic                    2. ds_atomic
+                                                         2. s_waitcnt lgkmcnt(0)         3. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.              - If OpenCL, omit.
+                                                           - Must happen before            - Must happen before
+                                                             any following                   the following
+                                                             global/generic                  buffer_gl0_inv.
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any                   - Ensures any
+                                                             following global                following global
+                                                             data read is no                 data read is no
+                                                             older than the load             older than the load
+                                                             atomic value being              atomic value being
+                                                             acquired.                       acquired.
+
+                                                                                         4. buffer_gl0_inv
+
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - If OpenCL omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
+
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             waitcnt lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             atomicrmw that is
-                                                             being released.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic load/store/load
+                                                                                             atomic/store atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             atomicrmw that is               atomicrmw that is
+                                                             being released.                 being released.
 
-                                                         2. buffer/global/flat_atomic
-     atomicrmw    acq_rel      - workgroup    - local    1. ds_atomic
-                                                         2. s_waitcnt lgkmcnt(0)
+                                                         2. flat_atomic                  2. flat_atomic
+                                                         3. s_waitcnt lgkmcnt(0)         3. s_waitcnt lgkmcnt(0) &
+                                                                                            vm/vscnt(0)
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
+                                                                                           - If CU wavefront execution mode, omit vm/vscnt.
+                                                           - If OpenCL, omit.              - If OpenCL, omit
+                                                                                             waitcnt lgkmcnt(0).
+                                                           - Must happen before            - Must happen before
+                                                             any following                   the following
+                                                             global/generic                  buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any
-                                                             following global
-                                                             data read is no
-                                                             older than the load
-                                                             atomic value being
-                                                             acquired.
+                                                           - Ensures any                   - Ensures any
+                                                             following global                following global
+                                                             data read is no                 data read is no
+                                                             older than the load             older than the load
+                                                             atomic value being              atomic value being
+                                                             acquired.                       acquired.
 
-     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)
+                                                                                         3. buffer_gl0_inv
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
+
+     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit               - If OpenCL, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
+                                                             s_waitcnt                       vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             global/generic                  global/generic
+                                                             load/store/load                 load/load atomic/
+                                                             atomic/store                    atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing the
-                                                             atomicrmw that is
-                                                             being released.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to global have                  to global have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             atomicrmw that is               atomicrmw that is
+                                                             being released.                 being released.
 
-                                                         2. flat_atomic
-                                                         3. s_waitcnt lgkmcnt(0)
+                                                         2. buffer/global/flat_atomic    2. buffer/global_atomic
+                                                         3. s_waitcnt vmcnt(0)           3. s_waitcnt vm/vscnt(0)
 
-                                                           - If OpenCL, omit.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/store/store
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                                                             waitcnt lgkmcnt(0).
+                                                           - Must happen before            - Must happen before
+                                                             following                       following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
+                                                           - Ensures the                   - Ensures the
+                                                             atomicrmw has                   atomicrmw has
+                                                             completed before                completed before
+                                                             invalidating the                invalidating the
+                                                             cache.                          caches.
+
+                                                         4. buffer_wbinvl1_vol           4. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
+
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data.                    global data.
+
+     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit               - If OpenCL, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
+                                                             s_waitcnt                       vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             global/generic                  global/generic
+                                                             load/store/load                 load/load atomic
+                                                             atomic/store                    atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                           - Ensures any
-                                                             following global
-                                                             data read is no
-                                                             older than the load
-                                                             atomic value being
-                                                             acquired.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             atomicrmw.                      atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to global have                  have
+                                                             completed before                completed before
+                                                             performing the                  performing the
+                                                             atomicrmw that is               atomicrmw that is
+                                                             being released.                 being released.
 
-     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)
+                                                         2. flat_atomic                  2. flat_atomic
+                                                         3. s_waitcnt vmcnt(0) &         3. s_waitcnt vm/vscnt(0) &
+                                                            lgkmcnt(0)                      lgkmcnt(0)
 
-                                                           - If OpenCL, omit
-                                                             lgkmcnt(0).
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0) and
-                                                             s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - s_waitcnt vmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             global/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - s_waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to global have
-                                                             completed before
-                                                             performing the
-                                                             atomicrmw that is
-                                                             being released.
+                                                           - If OpenCL, omit               - If OpenCL, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                                                           - Use vmcnt if atomic with
+                                                                                             return and vscnt if atomic
+                                                                                             with no-return.
+                                                           - Must happen before            - Must happen before
+                                                             following                       following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
+                                                           - Ensures the                   - Ensures the
+                                                             atomicrmw has                   atomicrmw has
+                                                             completed before                completed before
+                                                             invalidating the                invalidating the
+                                                             cache.                          caches.
 
-                                                         2. buffer/global/flat_atomic
-                                                         3. s_waitcnt vmcnt(0)
+                                                         4. buffer_wbinvl1_vol           4. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
 
-                                                           - Must happen before
-                                                             following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures the
-                                                             atomicrmw has
-                                                             completed before
-                                                             invalidating the
-                                                             cache.
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data.                    global data.
 
-                                                         4. buffer_wbinvl1_vol
-
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data.
-
-     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)
-
-                                                           - If OpenCL, omit
-                                                             lgkmcnt(0).
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0) and
-                                                             s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - s_waitcnt vmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             global/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - s_waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to global have
-                                                             completed before
-                                                             performing the
-                                                             atomicrmw that is
-                                                             being released.
-
-                                                         2. flat_atomic
-                                                         3. s_waitcnt vmcnt(0) &
-                                                            lgkmcnt(0)
-
-                                                           - If OpenCL, omit
-                                                             lgkmcnt(0).
-                                                           - Must happen before
-                                                             following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures the
-                                                             atomicrmw has
-                                                             completed before
-                                                             invalidating the
-                                                             cache.
-
-                                                         4. buffer_wbinvl1_vol
-
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data.
-
-     fence        acq_rel      - singlethread *none*     *none*
+     fence        acq_rel      - singlethread *none*     *none*                          *none*
                                - wavefront
-     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                                                                            vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL and
-                                                             address space is
-                                                             not generic, omit.
-                                                           - However,
-                                                             since LLVM
-                                                             currently has no
-                                                             address space on
-                                                             the fence need to
-                                                             conservatively
-                                                             always generate
-                                                             (see comment for
-                                                             previous fence).
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             not generic, omit.              not generic, omit
+                                                                                             lgkmcnt(0).
+                                                                                           - If OpenCL and
+                                                                                             address space is
+                                                                                             local, omit
+                                                                                             vmcnt(0) and vscnt(0).
+                                                           - However,                      - However,
+                                                             since LLVM                      since LLVM
+                                                             currently has no                currently has no
+                                                             address space on                address space on
+                                                             the fence need to               the fence need to
+                                                             conservatively                  conservatively
+                                                             always generate                 always generate
+                                                             (see comment for                (see comment for
+                                                             previous fence).                previous fence).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/store/store
-                                                             atomic/atomicrmw.
-                                                           - Ensures that all
-                                                             memory operations
-                                                             to local have
-                                                             completed before
-                                                             performing any
-                                                             following global
-                                                             memory operations.
-                                                           - Ensures that the
-                                                             preceding
-                                                             local/generic load
-                                                             atomic/atomicrmw
-                                                             with an equal or
-                                                             wider sync scope
-                                                             and memory ordering
-                                                             stronger than
-                                                             unordered (this is
-                                                             termed the
-                                                             acquire-fence-paired-atomic
-                                                             ) has completed
-                                                             before following
-                                                             global memory
-                                                             operations. This
-                                                             satisfies the
-                                                             requirements of
-                                                             acquire.
-                                                           - Ensures that all
-                                                             previous memory
-                                                             operations have
-                                                             completed before a
-                                                             following
-                                                             local/generic store
-                                                             atomic/atomicrmw
-                                                             with an equal or
-                                                             wider sync scope
-                                                             and memory ordering
-                                                             stronger than
-                                                             unordered (this is
-                                                             termed the
-                                                             release-fence-paired-atomic
-                                                             ). This satisfies the
-                                                             requirements of
-                                                             release.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - s_waitcnt vmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             load/load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                                                           - s_waitcnt lgkmcnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             local/generic
+                                                                                             load/store/load
+                                                                                             atomic/store atomic/
+                                                                                             atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/store/store              atomic/store/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that all              - Ensures that all
+                                                             memory operations               memory operations
+                                                             to local have                   have
+                                                             completed before                completed before
+                                                             performing any                  performing any
+                                                             following global                following global
+                                                             memory operations.              memory operations.
+                                                           - Ensures that the              - Ensures that the
+                                                             preceding                       preceding
+                                                             local/generic load              local/generic load
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with an equal or                with an equal or
+                                                             wider sync scope                wider sync scope
+                                                             and memory ordering             and memory ordering
+                                                             stronger than                   stronger than
+                                                             unordered (this is              unordered (this is
+                                                             termed the                      termed the
+                                                             acquire-fence-paired-atomic     acquire-fence-paired-atomic
+                                                             ) has completed                 ) has completed
+                                                             before following                before following
+                                                             global memory                   global memory
+                                                             operations. This                operations. This
+                                                             satisfies the                   satisfies the
+                                                             requirements of                 requirements of
+                                                             acquire.                        acquire.
+                                                           - Ensures that all              - Ensures that all
+                                                             previous memory                 previous memory
+                                                             operations have                 operations have
+                                                             completed before a              completed before a
+                                                             following                       following
+                                                             local/generic store             local/generic store
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with an equal or                with an equal or
+                                                             wider sync scope                wider sync scope
+                                                             and memory ordering             and memory ordering
+                                                             stronger than                   stronger than
+                                                             unordered (this is              unordered (this is
+                                                             termed the                      termed the
+                                                             release-fence-paired-atomic     release-fence-paired-atomic
+                                                             ). This satisfies the           ). This satisfies the
+                                                             requirements of                 requirements of
+                                                             release.                        release.
+                                                                                           - Must happen before
+                                                                                             the following
+                                                                                             buffer_gl0_inv.
+                                                                                           - Ensures that the
+                                                                                             acquire-fence-paired
+                                                                                             atomic has completed
+                                                                                             before invalidating
+                                                                                             the
+                                                                                             cache. Therefore
+                                                                                             any following
+                                                                                             locations read must
+                                                                                             be no older than
+                                                                                             the value read by
+                                                                                             the
+                                                                                             acquire-fence-paired-atomic.
 
-     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)
+                                                                                         3. buffer_gl0_inv
 
-                                                           - If OpenCL and
-                                                             address space is
-                                                             not generic, omit
-                                                             lgkmcnt(0).
-                                                           - However, since LLVM
-                                                             currently has no
-                                                             address space on
-                                                             the fence need to
-                                                             conservatively
-                                                             always generate
-                                                             (see comment for
-                                                             previous fence).
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0) and
-                                                             s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - s_waitcnt vmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             global/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - s_waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             any preceding
-                                                             local/generic
-                                                             load/store/load
-                                                             atomic/store
-                                                             atomic/atomicrmw.
-                                                           - Must happen before
-                                                             the following
-                                                             buffer_wbinvl1_vol.
-                                                           - Ensures that the
-                                                             preceding
-                                                             global/local/generic
-                                                             load
-                                                             atomic/atomicrmw
-                                                             with an equal or
-                                                             wider sync scope
-                                                             and memory ordering
-                                                             stronger than
-                                                             unordered (this is
-                                                             termed the
-                                                             acquire-fence-paired-atomic
-                                                             ) has completed
-                                                             before invalidating
-                                                             the cache. This
-                                                             satisfies the
-                                                             requirements of
-                                                             acquire.
-                                                           - Ensures that all
-                                                             previous memory
-                                                             operations have
-                                                             completed before a
-                                                             following
-                                                             global/local/generic
-                                                             store
-                                                             atomic/atomicrmw
-                                                             with an equal or
-                                                             wider sync scope
-                                                             and memory ordering
-                                                             stronger than
-                                                             unordered (this is
-                                                             termed the
-                                                             release-fence-paired-atomic
-                                                             ). This satisfies the
-                                                             requirements of
-                                                             release.
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Ensures that
+                                                                                             following
+                                                                                             loads will not see
+                                                                                             stale data.
 
-                                                         2. buffer_wbinvl1_vol
+     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
 
-                                                           - Must happen before
-                                                             any following
-                                                             global/generic
-                                                             load/load
-                                                             atomic/store/store
-                                                             atomic/atomicrmw.
-                                                           - Ensures that
-                                                             following loads
-                                                             will not see stale
-                                                             global data. This
-                                                             satisfies the
-                                                             requirements of
-                                                             acquire.
+                                                           - If OpenCL and                 - If OpenCL and
+                                                             address space is                address space is
+                                                             not generic, omit               not generic, omit
+                                                             lgkmcnt(0).                     lgkmcnt(0).
+                                                                                           - If OpenCL and
+                                                                                             address space is
+                                                                                             local, omit
+                                                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM           - However, since LLVM
+                                                             currently has no                currently has no
+                                                             address space on                address space on
+                                                             the fence need to               the fence need to
+                                                             conservatively                  conservatively
+                                                             always generate                 always generate
+                                                             (see comment for                (see comment for
+                                                             previous fence).                previous fence).
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
+                                                             s_waitcnt                       vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             global/generic                  global/generic
+                                                             load/store/load                 load/load
+                                                             atomic/store                    atomic/
+                                                             atomic/atomicrmw.               atomicrmw-with-return-value.
+                                                                                           - s_waitcnt vscnt(0)
+                                                                                             must happen after
+                                                                                             any preceding
+                                                                                             global/generic
+                                                                                             store/store atomic/
+                                                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             any preceding                   any preceding
+                                                             local/generic                   local/generic
+                                                             load/store/load                 load/store/load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Must happen before            - Must happen before
+                                                             the following                   the following
+                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
+                                                           - Ensures that the              - Ensures that the
+                                                             preceding                       preceding
+                                                             global/local/generic            global/local/generic
+                                                             load                            load
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with an equal or                with an equal or
+                                                             wider sync scope                wider sync scope
+                                                             and memory ordering             and memory ordering
+                                                             stronger than                   stronger than
+                                                             unordered (this is              unordered (this is
+                                                             termed the                      termed the
+                                                             acquire-fence-paired-atomic     acquire-fence-paired-atomic
+                                                             ) has completed                 ) has completed
+                                                             before invalidating             before invalidating
+                                                             the cache. This                 the caches. This
+                                                             satisfies the                   satisfies the
+                                                             requirements of                 requirements of
+                                                             acquire.                        acquire.
+                                                           - Ensures that all              - Ensures that all
+                                                             previous memory                 previous memory
+                                                             operations have                 operations have
+                                                             completed before a              completed before a
+                                                             following                       following
+                                                             global/local/generic            global/local/generic
+                                                             store                           store
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with an equal or                with an equal or
+                                                             wider sync scope                wider sync scope
+                                                             and memory ordering             and memory ordering
+                                                             stronger than                   stronger than
+                                                             unordered (this is              unordered (this is
+                                                             termed the                      termed the
+                                                             release-fence-paired-atomic     release-fence-paired-atomic
+                                                             ). This satisfies the           ). This satisfies the
+                                                             requirements of                 requirements of
+                                                             release.                        release.
+
+                                                         2. buffer_wbinvl1_vol           2. buffer_gl0_inv;
+                                                                                            buffer_gl1_inv
+
+                                                           - Must happen before            - Must happen before
+                                                             any following                   any following
+                                                             global/generic                  global/generic
+                                                             load/load                       load/load
+                                                             atomic/store/store              atomic/store/store
+                                                             atomic/atomicrmw.               atomic/atomicrmw.
+                                                           - Ensures that                  - Ensures that
+                                                             following loads                 following loads
+                                                             will not see stale              will not see stale
+                                                             global data. This               global data. This
+                                                             satisfies the                   satisfies the
+                                                             requirements of                 requirements of
+                                                             acquire.                        acquire.
 
      **Sequential Consistent Atomic**
-     -----------------------------------------------------------------------------------
-     load atomic  seq_cst      - singlethread - global   *Same as corresponding
-                               - wavefront    - local    load atomic acquire,
-                                              - generic  except must generated
-                                                         all instructions even
-                                                         for OpenCL.*
-     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
-                                              - generic
-                                                           - Must
-                                                             happen after
-                                                             preceding
-                                                             global/generic load
-                                                             atomic/store
-                                                             atomic/atomicrmw
-                                                             with memory
-                                                             ordering of seq_cst
-                                                             and with equal or
-                                                             wider sync scope.
-                                                             (Note that seq_cst
-                                                             fences have their
-                                                             own s_waitcnt
-                                                             lgkmcnt(0) and so do
-                                                             not need to be
-                                                             considered.)
-                                                           - Ensures any
-                                                             preceding
-                                                             sequential
-                                                             consistent local
-                                                             memory instructions
-                                                             have completed
-                                                             before executing
-                                                             this sequentially
-                                                             consistent
-                                                             instruction. This
-                                                             prevents reordering
-                                                             a seq_cst store
-                                                             followed by a
-                                                             seq_cst load. (Note
-                                                             that seq_cst is
-                                                             stronger than
-                                                             acquire/release as
-                                                             the reordering of
-                                                             load acquire
-                                                             followed by a store
-                                                             release is
-                                                             prevented by the
-                                                             waitcnt of
-                                                             the release, but
-                                                             there is nothing
-                                                             preventing a store
-                                                             release followed by
-                                                             load acquire from
-                                                             competing out of
-                                                             order.)
+     ----------------------------------------------------------------------------------------------------------------------
+     load atomic  seq_cst      - singlethread - global   *Same as corresponding          *Same as corresponding
+                               - wavefront    - local    load atomic acquire,            load atomic acquire,
+                                              - generic  except must generated           except must generated
+                                                         all instructions even           all instructions even
+                                                         for OpenCL.*                    for OpenCL.*
+     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
+                                              - generic                                     vmcnt(0) & vscnt(0)
 
-                                                         2. *Following
-                                                            instructions same as
-                                                            corresponding load
-                                                            atomic acquire,
-                                                            except must generated
-                                                            all instructions even
-                                                            for OpenCL.*
+                                                                                           - If CU wavefront execution mode, omit vmcnt and
+                                                                                             vscnt.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0), s_waitcnt
+                                                                                             vscnt(0) and s_waitcnt
+                                                                                             lgkmcnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                           - Must                          - waitcnt lgkmcnt(0) must
+                                                             happen after                    happen after
+                                                             preceding                       preceding
+                                                             global/generic load             local load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with memory                     with memory
+                                                             ordering of seq_cst             ordering of seq_cst
+                                                             and with equal or               and with equal or
+                                                             wider sync scope.               wider sync scope.
+                                                             (Note that seq_cst              (Note that seq_cst
+                                                             fences have their               fences have their
+                                                             own s_waitcnt                   own s_waitcnt
+                                                             lgkmcnt(0) and so do            lgkmcnt(0) and so do
+                                                             not need to be                  not need to be
+                                                             considered.)                    considered.)
+                                                                                           - waitcnt vmcnt(0)
+                                                                                             Must happen after
+                                                                                             preceding
+                                                                                             global/generic load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value
+                                                                                             with memory
+                                                                                             ordering of seq_cst
+                                                                                             and with equal or
+                                                                                             wider sync scope.
+                                                                                             (Note that seq_cst
+                                                                                             fences have their
+                                                                                             own s_waitcnt
+                                                                                             vmcnt(0) and so do
+                                                                                             not need to be
+                                                                                             considered.)
+                                                                                           - waitcnt vscnt(0)
+                                                                                             Must happen after
+                                                                                             preceding
+                                                                                             global/generic store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value
+                                                                                             with memory
+                                                                                             ordering of seq_cst
+                                                                                             and with equal or
+                                                                                             wider sync scope.
+                                                                                             (Note that seq_cst
+                                                                                             fences have their
+                                                                                             own s_waitcnt
+                                                                                             vscnt(0) and so do
+                                                                                             not need to be
+                                                                                             considered.)
+                                                           - Ensures any                   - Ensures any
+                                                             preceding                       preceding
+                                                             sequential                      sequential
+                                                             consistent local                consistent global/local
+                                                             memory instructions             memory instructions
+                                                             have completed                  have completed
+                                                             before executing                before executing
+                                                             this sequentially               this sequentially
+                                                             consistent                      consistent
+                                                             instruction. This               instruction. This
+                                                             prevents reordering             prevents reordering
+                                                             a seq_cst store                 a seq_cst store
+                                                             followed by a                   followed by a
+                                                             seq_cst load. (Note             seq_cst load. (Note
+                                                             that seq_cst is                 that seq_cst is
+                                                             stronger than                   stronger than
+                                                             acquire/release as              acquire/release as
+                                                             the reordering of               the reordering of
+                                                             load acquire                    load acquire
+                                                             followed by a store             followed by a store
+                                                             release is                      release is
+                                                             prevented by the                prevented by the
+                                                             waitcnt of                      waitcnt of
+                                                             the release, but                the release, but
+                                                             there is nothing                there is nothing
+                                                             preventing a store              preventing a store
+                                                             release followed by             release followed by
+                                                             load acquire from               load acquire from
+                                                             competing out of                competing out of
+                                                             order.)                         order.)
+
+                                                         2. *Following                   2. *Following
+                                                            instructions same as            instructions same as
+                                                            corresponding load              corresponding load
+                                                            atomic acquire,                 atomic acquire,
+                                                            except must generated           except must generated
+                                                            all instructions even           all instructions even
+                                                            for OpenCL.*                    for OpenCL.*
      load atomic  seq_cst      - workgroup    - local    *Same as corresponding
                                                          load atomic acquire,
                                                          except must generated
                                                          all instructions even
                                                          for OpenCL.*
-     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)
 
-                                                           - Could be split into
-                                                             separate s_waitcnt
-                                                             vmcnt(0)
-                                                             and s_waitcnt
-                                                             lgkmcnt(0) to allow
-                                                             them to be
-                                                             independently moved
-                                                             according to the
-                                                             following rules.
-                                                           - waitcnt lgkmcnt(0)
-                                                             must happen after
-                                                             preceding
-                                                             global/generic load
-                                                             atomic/store
-                                                             atomic/atomicrmw
-                                                             with memory
-                                                             ordering of seq_cst
-                                                             and with equal or
-                                                             wider sync scope.
-                                                             (Note that seq_cst
-                                                             fences have their
-                                                             own s_waitcnt
-                                                             lgkmcnt(0) and so do
-                                                             not need to be
-                                                             considered.)
-                                                           - waitcnt vmcnt(0)
-                                                             must happen after
-                                                             preceding
-                                                             global/generic load
-                                                             atomic/store
-                                                             atomic/atomicrmw
-                                                             with memory
-                                                             ordering of seq_cst
-                                                             and with equal or
-                                                             wider sync scope.
-                                                             (Note that seq_cst
-                                                             fences have their
-                                                             own s_waitcnt
-                                                             vmcnt(0) and so do
-                                                             not need to be
-                                                             considered.)
-                                                           - Ensures any
-                                                             preceding
-                                                             sequential
-                                                             consistent global
-                                                             memory instructions
-                                                             have completed
-                                                             before executing
-                                                             this sequentially
-                                                             consistent
-                                                             instruction. This
-                                                             prevents reordering
-                                                             a seq_cst store
-                                                             followed by a
-                                                             seq_cst load. (Note
-                                                             that seq_cst is
-                                                             stronger than
-                                                             acquire/release as
-                                                             the reordering of
-                                                             load acquire
-                                                             followed by a store
-                                                             release is
-                                                             prevented by the
-                                                             waitcnt of
-                                                             the release, but
-                                                             there is nothing
-                                                             preventing a store
-                                                             release followed by
-                                                             load acquire from
-                                                             competing out of
-                                                             order.)
+                                                                                         1. s_waitcnt vmcnt(0) & vscnt(0)
 
-                                                         2. *Following
-                                                            instructions same as
-                                                            corresponding load
-                                                            atomic acquire,
-                                                            except must generated
-                                                            all instructions even
-                                                            for OpenCL.*
-     store atomic seq_cst      - singlethread - global   *Same as corresponding
-                               - wavefront    - local    store atomic release,
-                               - workgroup    - generic  except must generated
-                                                         all instructions even
-                                                         for OpenCL.*
-     store atomic seq_cst      - agent        - global   *Same as corresponding
-                               - system       - generic  store atomic release,
-                                                         except must generated
-                                                         all instructions even
-                                                         for OpenCL.*
-     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding
-                               - wavefront    - local    atomicrmw acq_rel,
-                               - workgroup    - generic  except must generated
-                                                         all instructions even
-                                                         for OpenCL.*
-     atomicrmw    seq_cst      - agent        - global   *Same as corresponding
-                               - system       - generic  atomicrmw acq_rel,
-                                                         except must generated
-                                                         all instructions even
-                                                         for OpenCL.*
-     fence        seq_cst      - singlethread *none*     *Same as corresponding
-                               - wavefront               fence acq_rel,
-                               - workgroup               except must generated
-                               - agent                   all instructions even
-                               - system                  for OpenCL.*
-     ============ ============ ============== ========== ===============================
+                                                                                           - If CU wavefront execution mode, omit.
+                                                                                           - Could be split into
+                                                                                             separate s_waitcnt
+                                                                                             vmcnt(0) and s_waitcnt
+                                                                                             vscnt(0) to allow
+                                                                                             them to be
+                                                                                             independently moved
+                                                                                             according to the
+                                                                                             following rules.
+                                                                                           - waitcnt vmcnt(0)
+                                                                                             Must happen after
+                                                                                             preceding
+                                                                                             global/generic load
+                                                                                             atomic/
+                                                                                             atomicrmw-with-return-value
+                                                                                             with memory
+                                                                                             ordering of seq_cst
+                                                                                             and with equal or
+                                                                                             wider sync scope.
+                                                                                             (Note that seq_cst
+                                                                                             fences have their
+                                                                                             own s_waitcnt
+                                                                                             vmcnt(0) and so do
+                                                                                             not need to be
+                                                                                             considered.)
+                                                                                           - waitcnt vscnt(0)
+                                                                                             Must happen after
+                                                                                             preceding
+                                                                                             global/generic store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value
+                                                                                             with memory
+                                                                                             ordering of seq_cst
+                                                                                             and with equal or
+                                                                                             wider sync scope.
+                                                                                             (Note that seq_cst
+                                                                                             fences have their
+                                                                                             own s_waitcnt
+                                                                                             vscnt(0) and so do
+                                                                                             not need to be
+                                                                                             considered.)
+                                                                                           - Ensures any
+                                                                                             preceding
+                                                                                             sequential
+                                                                                             consistent global
+                                                                                             memory instructions
+                                                                                             have completed
+                                                                                             before executing
+                                                                                             this sequentially
+                                                                                             consistent
+                                                                                             instruction. This
+                                                                                             prevents reordering
+                                                                                             a seq_cst store
+                                                                                             followed by a
+                                                                                             seq_cst load. (Note
+                                                                                             that seq_cst is
+                                                                                             stronger than
+                                                                                             acquire/release as
+                                                                                             the reordering of
+                                                                                             load acquire
+                                                                                             followed by a store
+                                                                                             release is
+                                                                                             prevented by the
+                                                                                             waitcnt of
+                                                                                             the release, but
+                                                                                             there is nothing
+                                                                                             preventing a store
+                                                                                             release followed by
+                                                                                             load acquire from
+                                                                                             competing out of
+                                                                                             order.)
+
+                                                                                         2. *Following
+                                                                                            instructions same as
+                                                                                            corresponding load
+                                                                                            atomic acquire,
+                                                                                            except must generated
+                                                                                            all instructions even
+                                                                                            for OpenCL.*
+
+     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)                        vmcnt(0) & vscnt(0)
+
+                                                           - Could be split into           - Could be split into
+                                                             separate s_waitcnt              separate s_waitcnt
+                                                             vmcnt(0)                        vmcnt(0), s_waitcnt
+                                                             and s_waitcnt                   vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
+                                                             them to be                      them to be
+                                                             independently moved             independently moved
+                                                             according to the                according to the
+                                                             following rules.                following rules.
+                                                           - waitcnt lgkmcnt(0)            - waitcnt lgkmcnt(0)
+                                                             must happen after               must happen after
+                                                             preceding                       preceding
+                                                             global/generic load             local load
+                                                             atomic/store                    atomic/store
+                                                             atomic/atomicrmw                atomic/atomicrmw
+                                                             with memory                     with memory
+                                                             ordering of seq_cst             ordering of seq_cst
+                                                             and with equal or               and with equal or
+                                                             wider sync scope.               wider sync scope.
+                                                             (Note that seq_cst              (Note that seq_cst
+                                                             fences have their               fences have their
+                                                             own s_waitcnt                   own s_waitcnt
+                                                             lgkmcnt(0) and so do            lgkmcnt(0) and so do
+                                                             not need to be                  not need to be
+                                                             considered.)                    considered.)
+                                                           - waitcnt vmcnt(0)              - waitcnt vmcnt(0)
+                                                             must happen after               must happen after
+                                                             preceding                       preceding
+                                                             global/generic load             global/generic load
+                                                             atomic/store                    atomic/
+                                                             atomic/atomicrmw                atomicrmw-with-return-value
+                                                             with memory                     with memory
+                                                             ordering of seq_cst             ordering of seq_cst
+                                                             and with equal or               and with equal or
+                                                             wider sync scope.               wider sync scope.
+                                                             (Note that seq_cst              (Note that seq_cst
+                                                             fences have their               fences have their
+                                                             own s_waitcnt                   own s_waitcnt
+                                                             vmcnt(0) and so do              vmcnt(0) and so do
+                                                             not need to be                  not need to be
+                                                             considered.)                    considered.)
+                                                                                           - waitcnt vscnt(0)
+                                                                                             Must happen after
+                                                                                             preceding
+                                                                                             global/generic store
+                                                                                             atomic/
+                                                                                             atomicrmw-no-return-value
+                                                                                             with memory
+                                                                                             ordering of seq_cst
+                                                                                             and with equal or
+                                                                                             wider sync scope.
+                                                                                             (Note that seq_cst
+                                                                                             fences have their
+                                                                                             own s_waitcnt
+                                                                                             vscnt(0) and so do
+                                                                                             not need to be
+                                                                                             considered.)
+                                                           - Ensures any                   - Ensures any
+                                                             preceding                       preceding
+                                                             sequential                      sequential
+                                                             consistent global               consistent global
+                                                             memory instructions             memory instructions
+                                                             have completed                  have completed
+                                                             before executing                before executing
+                                                             this sequentially               this sequentially
+                                                             consistent                      consistent
+                                                             instruction. This               instruction. This
+                                                             prevents reordering             prevents reordering
+                                                             a seq_cst store                 a seq_cst store
+                                                             followed by a                   followed by a
+                                                             seq_cst load. (Note             seq_cst load. (Note
+                                                             that seq_cst is                 that seq_cst is
+                                                             stronger than                   stronger than
+                                                             acquire/release as              acquire/release as
+                                                             the reordering of               the reordering of
+                                                             load acquire                    load acquire
+                                                             followed by a store             followed by a store
+                                                             release is                      release is
+                                                             prevented by the                prevented by the
+                                                             waitcnt of                      waitcnt of
+                                                             the release, but                the release, but
+                                                             there is nothing                there is nothing
+                                                             preventing a store              preventing a store
+                                                             release followed by             release followed by
+                                                             load acquire from               load acquire from
+                                                             competing out of                competing out of
+                                                             order.)                         order.)
+
+                                                         2. *Following                   2. *Following
+                                                            instructions same as            instructions same as
+                                                            corresponding load              corresponding load
+                                                            atomic acquire,                 atomic acquire,
+                                                            except must generated           except must generated
+                                                            all instructions even           all instructions even
+                                                            for OpenCL.*                    for OpenCL.*
+     store atomic seq_cst      - singlethread - global   *Same as corresponding          *Same as corresponding
+                               - wavefront    - local    store atomic release,           store atomic release,
+                               - workgroup    - generic  except must generated           except must generated
+                                                         all instructions even           all instructions even
+                                                         for OpenCL.*                    for OpenCL.*
+     store atomic seq_cst      - agent        - global   *Same as corresponding          *Same as corresponding
+                               - system       - generic  store atomic release,           store atomic release,
+                                                         except must generated           except must generated
+                                                         all instructions even           all instructions even
+                                                         for OpenCL.*                    for OpenCL.*
+     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding          *Same as corresponding
+                               - wavefront    - local    atomicrmw acq_rel,              atomicrmw acq_rel,
+                               - workgroup    - generic  except must generated           except must generated
+                                                         all instructions even           all instructions even
+                                                         for OpenCL.*                    for OpenCL.*
+     atomicrmw    seq_cst      - agent        - global   *Same as corresponding          *Same as corresponding
+                               - system       - generic  atomicrmw acq_rel,              atomicrmw acq_rel,
+                                                         except must generated           except must generated
+                                                         all instructions even           all instructions even
+                                                         for OpenCL.*                    for OpenCL.*
+     fence        seq_cst      - singlethread *none*     *Same as corresponding          *Same as corresponding
+                               - wavefront               fence acq_rel,                  fence acq_rel,
+                               - workgroup               except must generated           except must generated
+                               - agent                   all instructions even           all instructions even
+                               - system                  for OpenCL.*                    for OpenCL.*
+     ============ ============ ============== ========== =============================== ==================================
 
 The memory order also adds the single thread optimization constrains defined in
 table
-:ref:`amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx9-table`.
+:ref:`amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx10-table`.
 
-  .. table:: AMDHSA Memory Model Single Thread Optimization Constraints GFX6-GFX9
-     :name: amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx9-table
+  .. table:: AMDHSA Memory Model Single Thread Optimization Constraints GFX6-GFX10
+     :name: amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx10-table
 
      ============ ==============================================================
      LLVM Memory  Optimization Constraints
@@ -4597,7 +5712,7 @@ Assembler
 ---------
 
 AMDGPU backend has LLVM-MC based assembler which is currently in development.
-It supports AMDGCN GFX6-GFX9.
+It supports AMDGCN GFX6-GFX10.
 
 This section describes general syntax for instructions and operands.
 
@@ -4615,6 +5730,9 @@ Instructions
    AMDGPUInstructionSyntax
    AMDGPUInstructionNotation
 
+.. TODO
+   AMDGPUAsmGFX10
+
 An instruction has the following :doc:`syntax<AMDGPUInstructionSyntax>`:
 
     ``<``\ *opcode*\ ``>    <``\ *operand0*\ ``>, <``\ *operand1*\ ``>,...    <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``
@@ -4632,7 +5750,8 @@ Note that features under development are not included in this description.
 
 For more information about instructions, their semantics and supported combinations of
 operands, refer to one of instruction set architecture manuals
-[AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_ and [AMD-GCN-GFX9]_.
+[AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_, [AMD-GCN-GFX9]_ and
+[AMD-GCN-GFX10]_.
 
 Operands
 ~~~~~~~~
@@ -4929,16 +6048,24 @@ The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
 any amd_kernel_code_t values that are unspecified a default value will be
 used.  The default value for all keys is 0, with the following exceptions:
 
-- *kernel_code_version_major* defaults to 1.
-- *machine_kind* defaults to 1.
-- *machine_version_major*, *machine_version_minor*, and
-  *machine_version_stepping* are derived from the value of the -mcpu option
+- *amd_code_version_major* defaults to 1.
+- *amd_kernel_code_version_minor* defaults to 2.
+- *amd_machine_kind* defaults to 1.
+- *amd_machine_version_major*, *machine_version_minor*, and
+  *amd_machine_version_stepping* are derived from the value of the -mcpu option
   that is passed to the assembler.
 - *kernel_code_entry_byte_offset* defaults to 256.
-- *wavefront_size* defaults to 6.
+- *wavefront_size* defaults 6 for all targets before GFX10. For GFX10 onwards
+  defaults to 6 if target feature ``wavefrontsize64`` is enabled, otherwise 5.
+  Note that wavefront size is specified as a power of two, so a value of **n**
+  means a size of 2^ **n**.
+- *call_convention* defaults to -1.
 - *kernarg_segment_alignment*, *group_segment_alignment*, and
   *private_segment_alignment* default to 4. Note that alignments are specified
   as a power of 2, so a value of **n** means an alignment of 2^ **n**.
+- *enable_wgp_mode* defaults to 1 if target feature ``cumode`` is disabled for
+  GFX10 onwards.
+- *enable_mem_ordered* defaults to 1 for GFX10 onwards.
 
 The *.amd_kernel_code_t* directive must be placed immediately after the
 function label and before any instructions.
@@ -4976,9 +6103,9 @@ Here is an example of a minimal assembly source file, defining one HSA kernel:
          compute_pgm_rsrc1_vgprs = 0
          compute_pgm_rsrc1_sgprs = 0
          compute_pgm_rsrc2_user_sgpr = 2
-         kernarg_segment_byte_size = 8
-         wavefront_sgpr_count = 2
-         workitem_vgpr_count = 3
+         compute_pgm_rsrc1_wgp_mode = 0
+         compute_pgm_rsrc1_mem_ordered = 0
+         compute_pgm_rsrc1_fwd_progress = 1
      .end_amd_kernel_code_t
 
      s_load_dwordx2 s[0:1], s[0:1] 0x0
@@ -5095,95 +6222,107 @@ terminated by an ``.end_amdhsa_kernel`` directive.
   .. table:: AMDHSA Kernel Assembler Directives
      :name: amdhsa-kernel-directives-table
 
-     ======================================================== ================ ============ ===================
-     Directive                                                Default          Supported On Description
-     ======================================================== ================ ============ ===================
-     ``.amdhsa_group_segment_fixed_size``                     0                GFX6-GFX9    Controls GROUP_SEGMENT_FIXED_SIZE in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_private_segment_fixed_size``                   0                GFX6-GFX9    Controls PRIVATE_SEGMENT_FIXED_SIZE in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_private_segment_buffer``             0                GFX6-GFX9    Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_dispatch_ptr``                       0                GFX6-GFX9    Controls ENABLE_SGPR_DISPATCH_PTR in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_queue_ptr``                          0                GFX6-GFX9    Controls ENABLE_SGPR_QUEUE_PTR in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_kernarg_segment_ptr``                0                GFX6-GFX9    Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_dispatch_id``                        0                GFX6-GFX9    Controls ENABLE_SGPR_DISPATCH_ID in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_flat_scratch_init``                  0                GFX6-GFX9    Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_user_sgpr_private_segment_size``               0                GFX6-GFX9    Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
-                                                                                            :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table`.
-     ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0                GFX6-GFX9    Controls ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_system_sgpr_workgroup_id_x``                   1                GFX6-GFX9    Controls ENABLE_SGPR_WORKGROUP_ID_X in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_system_sgpr_workgroup_id_y``                   0                GFX6-GFX9    Controls ENABLE_SGPR_WORKGROUP_ID_Y in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_system_sgpr_workgroup_id_z``                   0                GFX6-GFX9    Controls ENABLE_SGPR_WORKGROUP_ID_Z in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_system_sgpr_workgroup_info``                   0                GFX6-GFX9    Controls ENABLE_SGPR_WORKGROUP_INFO in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_system_vgpr_workitem_id``                      0                GFX6-GFX9    Controls ENABLE_VGPR_WORKITEM_ID in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-                                                                                            Possible values are defined in
-                                                                                            :ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`.
-     ``.amdhsa_next_free_vgpr``                               Required         GFX6-GFX9    Maximum VGPR number explicitly referenced, plus one.
-                                                                                            Used to calculate GRANULATED_WORKITEM_VGPR_COUNT in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_next_free_sgpr``                               Required         GFX6-GFX9    Maximum SGPR number explicitly referenced, plus one.
-                                                                                            Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_reserve_vcc``                                  1                GFX6-GFX9    Whether the kernel may use the special VCC SGPR.
-                                                                                            Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_reserve_flat_scratch``                         1                GFX7-GFX9    Whether the kernel may use flat instructions to access
-                                                                                            scratch memory. Used to calculate
-                                                                                            GRANULATED_WAVEFRONT_SGPR_COUNT in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_reserve_xnack_mask``                           Target           GFX8-GFX9    Whether the kernel may trigger XNACK replay.
-                                                              Feature                       Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
-                                                              Specific                      :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
+     ======================================================== =================== ============ ===================
+     Directive                                                Default             Supported On Description
+     ======================================================== =================== ============ ===================
+     ``.amdhsa_group_segment_fixed_size``                     0                   GFX6-GFX10   Controls GROUP_SEGMENT_FIXED_SIZE in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_private_segment_fixed_size``                   0                   GFX6-GFX10   Controls PRIVATE_SEGMENT_FIXED_SIZE in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_private_segment_buffer``             0                   GFX6-GFX10   Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_dispatch_ptr``                       0                   GFX6-GFX10   Controls ENABLE_SGPR_DISPATCH_PTR in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_queue_ptr``                          0                   GFX6-GFX10   Controls ENABLE_SGPR_QUEUE_PTR in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_kernarg_segment_ptr``                0                   GFX6-GFX10   Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_dispatch_id``                        0                   GFX6-GFX10   Controls ENABLE_SGPR_DISPATCH_ID in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_flat_scratch_init``                  0                   GFX6-GFX10   Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_user_sgpr_private_segment_size``               0                   GFX6-GFX10   Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+     ``.amdhsa_wavefront_size32``                             Target              GFX10        Controls ENABLE_WAVEFRONT_SIZE32 in
+                                                              Feature                          :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+                                                              Specific
+                                                              (-wavefrontsize64)
+     ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0                   GFX6-GFX10   Controls ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_system_sgpr_workgroup_id_x``                   1                   GFX6-GFX10   Controls ENABLE_SGPR_WORKGROUP_ID_X in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_system_sgpr_workgroup_id_y``                   0                   GFX6-GFX10   Controls ENABLE_SGPR_WORKGROUP_ID_Y in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_system_sgpr_workgroup_id_z``                   0                   GFX6-GFX10   Controls ENABLE_SGPR_WORKGROUP_ID_Z in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_system_sgpr_workgroup_info``                   0                   GFX6-GFX10   Controls ENABLE_SGPR_WORKGROUP_INFO in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_system_vgpr_workitem_id``                      0                   GFX6-GFX10   Controls ENABLE_VGPR_WORKITEM_ID in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+                                                                                               Possible values are defined in
+                                                                                               :ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`.
+     ``.amdhsa_next_free_vgpr``                               Required            GFX6-GFX10   Maximum VGPR number explicitly referenced, plus one.
+                                                                                               Used to calculate GRANULATED_WORKITEM_VGPR_COUNT in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_next_free_sgpr``                               Required            GFX6-GFX10   Maximum SGPR number explicitly referenced, plus one.
+                                                                                               Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_reserve_vcc``                                  1                   GFX6-GFX10   Whether the kernel may use the special VCC SGPR.
+                                                                                               Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_reserve_flat_scratch``                         1                   GFX7-GFX10   Whether the kernel may use flat instructions to access
+                                                                                               scratch memory. Used to calculate
+                                                                                               GRANULATED_WAVEFRONT_SGPR_COUNT in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_reserve_xnack_mask``                           Target              GFX8-GFX10   Whether the kernel may trigger XNACK replay.
+                                                              Feature                          Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
+                                                              Specific                         :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
                                                               (+xnack)
-     ``.amdhsa_float_round_mode_32``                          0                GFX6-GFX9    Controls FLOAT_ROUND_MODE_32 in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-                                                                                            Possible values are defined in
-                                                                                            :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
-     ``.amdhsa_float_round_mode_16_64``                       0                GFX6-GFX9    Controls FLOAT_ROUND_MODE_16_64 in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-                                                                                            Possible values are defined in
-                                                                                            :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
-     ``.amdhsa_float_denorm_mode_32``                         0                GFX6-GFX9    Controls FLOAT_DENORM_MODE_32 in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-                                                                                            Possible values are defined in
-                                                                                            :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
-     ``.amdhsa_float_denorm_mode_16_64``                      3                GFX6-GFX9    Controls FLOAT_DENORM_MODE_16_64 in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-                                                                                            Possible values are defined in
-                                                                                            :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
-     ``.amdhsa_dx10_clamp``                                   1                GFX6-GFX9    Controls ENABLE_DX10_CLAMP in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_ieee_mode``                                    1                GFX6-GFX9    Controls ENABLE_IEEE_MODE in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_fp16_overflow``                                0                GFX9         Controls FP16_OVFL in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
-     ``.amdhsa_exception_fp_ieee_invalid_op``                 0                GFX6-GFX9    Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_exception_fp_denorm_src``                      0                GFX6-GFX9    Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_exception_fp_ieee_div_zero``                   0                GFX6-GFX9    Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_exception_fp_ieee_overflow``                   0                GFX6-GFX9    Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_exception_fp_ieee_underflow``                  0                GFX6-GFX9    Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_exception_fp_ieee_inexact``                    0                GFX6-GFX9    Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ``.amdhsa_exception_int_div_zero``                       0                GFX6-GFX9    Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
-                                                                                            :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
-     ======================================================== ================ ============ ===================
+     ``.amdhsa_float_round_mode_32``                          0                   GFX6-GFX10   Controls FLOAT_ROUND_MODE_32 in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+                                                                                               Possible values are defined in
+                                                                                               :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+     ``.amdhsa_float_round_mode_16_64``                       0                   GFX6-GFX10   Controls FLOAT_ROUND_MODE_16_64 in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+                                                                                               Possible values are defined in
+                                                                                               :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+     ``.amdhsa_float_denorm_mode_32``                         0                   GFX6-GFX10   Controls FLOAT_DENORM_MODE_32 in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+                                                                                               Possible values are defined in
+                                                                                               :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+     ``.amdhsa_float_denorm_mode_16_64``                      3                   GFX6-GFX10   Controls FLOAT_DENORM_MODE_16_64 in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+                                                                                               Possible values are defined in
+                                                                                               :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+     ``.amdhsa_dx10_clamp``                                   1                   GFX6-GFX10   Controls ENABLE_DX10_CLAMP in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_ieee_mode``                                    1                   GFX6-GFX10   Controls ENABLE_IEEE_MODE in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_fp16_overflow``                                0                   GFX9-GFX10   Controls FP16_OVFL in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_workgroup_processor_mode``                     Target              GFX10        Controls ENABLE_WGP_MODE in
+                                                              Feature                          :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table`.
+                                                              Specific
+                                                              (-cumode)
+     ``.amdhsa_memory_ordered``                               1                   GFX10        Controls MEM_ORDERED in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_forward_progress``                             0                   GFX10        Controls FWD_PROGRESS in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+     ``.amdhsa_exception_fp_ieee_invalid_op``                 0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_exception_fp_denorm_src``                      0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_exception_fp_ieee_div_zero``                   0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_exception_fp_ieee_overflow``                   0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_exception_fp_ieee_underflow``                  0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_exception_fp_ieee_inexact``                    0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ``.amdhsa_exception_int_div_zero``                       0                   GFX6-GFX10   Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
+                                                                                               :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`.
+     ======================================================== =================== ============ ===================
 
 .amdgpu_metadata
 ++++++++++++++++
@@ -5334,6 +6473,9 @@ Additional Documentation
 .. [AMD-GCN-GFX7] `AMD Sea Islands Series ISA <http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf>`_
 .. [AMD-GCN-GFX8] `AMD GCN3 Instruction Set Architecture <http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf>`__
 .. [AMD-GCN-GFX9] `AMD "Vega" Instruction Set Architecture <http://developer.amd.com/wordpress/media/2013/12/Vega_Shader_ISA_28July2017.pdf>`__
+.. [AMD-GCN-GFX10] AMD "Navi" Instruction Set Architecture *TBA*
+.. TODO
+   ttye Add link when made public.
 .. [AMD-ROCm] `ROCm: Open Platform for Development, Discovery and Education Around GPU Computing <http://gpuopen.com/compute-product/rocm/>`__
 .. [AMD-ROCm-github] `ROCm github <http://github.com/RadeonOpenCompute>`__
 .. [HSA] `Heterogeneous System Architecture (HSA) Foundation <http://www.hsafoundation.com/>`__