[AMDGPU] Update code object description

- Use ELF header flags to identify processor. - Remove isa note record. - Add target feature section. - Make metadata for NumVGPRs, NumSGPRs and MaxFlatWorkGroupSize required. - Add FixedWorkGroupSize to CodeProps metadata. - Add ReqdWorkGroupSize* to kernel descriptor and move MaxFlatWorkGroupSize to be adjacent. - Move IsXNACKEnabled in the kernel descriptor to be at the end of the unused flags. - Remove IsDynamicCallStack from the metadata and kernel descriptor. - Remove legacy debugger metadata. - Remove old XNACK enabled processor names. Differential Revision: https://reviews.llvm.org/D39828 llvm-svn: 317855
2024-11-26 04:32:44 +01:00 · 2017-11-10 01:00:54 +00:00 · 2017-11-10 01:00:54 +00:00 · 34f9af3e72
commit 34f9af3e72
parent befcc521fc
1 changed files with 360 additions and 287 deletions
--- a/docs/AMDGPUUsage.rst
+++ b/docs/AMDGPUUsage.rst
@ -23,50 +23,55 @@ Target Triples
 Use the ``clang -target <Architecture>-<Vendor>-<OS>-<Environment>`` option to
 specify the target triple:

-  .. table:: AMDGPU Target Triples
-     :name: amdgpu-target-triples-table
+  .. table:: AMDGPU Architectures
+     :name: amdgpu-architecture-table

-     ============ ======== ========= ===========
-     Architecture Vendor   OS        Environment
-     ============ ======== ========= ===========
-     r600         amd      <empty>   <empty>
-     amdgcn       amd      <empty>   <empty>
-     amdgcn       amd      amdhsa    <empty>
-     amdgcn       amd      amdhsa    opencl
-     amdgcn       amd      amdhsa    amdgizcl
-     amdgcn       amd      amdhsa    amdgiz
-     amdgcn       amd      amdhsa    hcc
-     ============ ======== ========= ===========
+     ============ ==============================================================
+     Architecture Description
+     ============ ==============================================================
+     ``r600``     AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders.
+     ``amdgcn``   AMD GPUs GCN GFX6 onwards for graphics and compute shaders.
+     ============ ==============================================================

-``r600-amd--``
-  Supports AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders executed on
-  the MESA runtime.
+  .. table:: AMDGPU Vendors
+     :name: amdgpu-vendor-table

-``amdgcn-amd--``
-  Supports AMD GPUs GCN GFX6 onwards for graphics and compute shaders executed on
-  the MESA runtime.
+     ============ ==============================================================
+     Vendor       Description
+     ============ ==============================================================
+     ``amd``      Can be used for all AMD GPU usage.
+     ``mesa3d``   Can be used if the OS is ``mesa3d``.
+     ============ ==============================================================

-``amdgcn-amd-amdhsa-``
-  Supports AMD GCN GPUs GFX6 onwards for compute kernels executed on HSA [HSA]_
-  compatible runtimes such as AMD's ROCm [AMD-ROCm]_.
+  .. table:: AMDGPU Operating Systems
+     :name: amdgpu-os-table

-``amdgcn-amd-amdhsa-opencl``
-  Supports AMD GCN GPUs GFX6 onwards for OpenCL compute kernels executed on HSA
-  [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
-  :ref:`amdgpu-opencl`.
+     ============== ============================================================
+     OS             Description
+     ============== ============================================================
+     *<empty>*      Defaults to the *unknown* OS.
+     ``amdhsa``     Compute kernels executed on HSA [HSA]_ compatible runtimes
+                    such as AMD's ROCm [AMD-ROCm]_.
+     ``amdpal``     Graphic shaders and compute kernels executed on AMD PAL
+                    runtime.
+     ``mesa3d``     Graphic shaders and compute kernels executed on Mesa 3D
+                    runtime.
+     ============== ============================================================

-``amdgcn-amd-amdhsa-amdgizcl``
-  Same as ``amdgcn-amd-amdhsa-opencl`` except a different address space mapping
-  is used (see :ref:`amdgpu-address-spaces`).
+  .. table:: AMDGPU Environments
+     :name: amdgpu-environment-table

-``amdgcn-amd-amdhsa-amdgiz``
-  Same as ``amdgcn-amd-amdhsa-`` except a different address space mapping is
+     ============ ==============================================================
+     Environment  Description
+     ============ ==============================================================
+     *<empty>*    Defaults to ``opencl``.
+     ``opencl``   OpenCL compute kernel (see :ref:`amdgpu-opencl`).
+     ``amdgizcl`` Same as ``opencl`` except a different address space mapping is
                  used (see :ref:`amdgpu-address-spaces`).
-
-``amdgcn-amd-amdhsa-hcc``
-  Supports AMD GCN GPUs GFX6 onwards for AMD HC language compute kernels
-  executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
-  :ref:`amdgpu-hcc`.
+     ``amdgiz``   Same as ``opencl`` except a different address space mapping is
+                  used (see :ref:`amdgpu-address-spaces`).
+     ``hcc``      AMD HC language compute kernel (see :ref:`amdgpu-hcc`).
+     ============ ==============================================================

 .. _amdgpu-processors:

@ -77,51 +82,51 @@ Use the ``clang -mcpu <Processor>`` option to specify the AMD GPU processor. The
 names from both the *Processor* and *Alternative Processor* can be used.

  .. table:: AMDGPU Processors
-     :name: amdgpu-processors-table
+     :name: amdgpu-processor-table

-     ========== =========== ============ ===== ======= ==================
-     Processor  Alternative Target       dGPU/ Runtime Example
+     =========== =============== ============ ===== ======= ==================
+     Processor   Alternative     Target       dGPU/ ROCm    Example
                 Processor       Triple       APU   Support Products
                                 Architecture
-     ========== =========== ============ ===== ======= ==================
+     =========== =============== ============ ===== ======= ==================
     **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
-     --------------------------------------------------------------------
-     r600                   r600         dGPU
-     r630                   r600         dGPU
-     rs880                  r600         dGPU
-     rv670                  r600         dGPU
+     -------------------------------------------------------------------------
+     ``r600``                    ``r600``     dGPU
+     ``r630``                    ``r600``     dGPU
+     ``rs880``                   ``r600``     dGPU
+     ``rv670``                   ``r600``     dGPU
     **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
-     --------------------------------------------------------------------
-     rv710                  r600         dGPU
-     rv730                  r600         dGPU
-     rv770                  r600         dGPU
+     -------------------------------------------------------------------------
+     ``rv710``                   ``r600``     dGPU
+     ``rv730``                   ``r600``     dGPU
+     ``rv770``                   ``r600``     dGPU
     **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
-     --------------------------------------------------------------------
-     cedar                  r600         dGPU
-     redwood                r600         dGPU
-     sumo                   r600         dGPU
-     juniper                r600         dGPU
-     cypress                r600         dGPU
+     -------------------------------------------------------------------------
+     ``cedar``                   ``r600``     dGPU
+     ``redwood``                 ``r600``     dGPU
+     ``sumo``                    ``r600``     dGPU
+     ``juniper``                 ``r600``     dGPU
+     ``cypress``                 ``r600``     dGPU
     **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
-     --------------------------------------------------------------------
-     barts                  r600         dGPU
-     turks                  r600         dGPU
-     caicos                 r600         dGPU
-     cayman                 r600         dGPU
+     -------------------------------------------------------------------------
+     ``barts``                   ``r600``     dGPU
+     ``turks``                   ``r600``     dGPU
+     ``caicos``                  ``r600``     dGPU
+     ``cayman``                  ``r600``     dGPU
     **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-     --------------------------------------------------------------------
-     gfx600     - tahiti    amdgcn       dGPU
-     gfx601     - pitcairn  amdgcn       dGPU
-                - verde
-                - oland
-                - hainan
+     -------------------------------------------------------------------------
+     ``gfx600``  - ``tahiti``    ``amdgcn``   dGPU
+     ``gfx601``  - ``pitcairn``  ``amdgcn``   dGPU
+                 - ``verde``
+                 - ``oland``
+                 - ``hainan``
     **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
-     --------------------------------------------------------------------
-     gfx700     - bonaire   amdgcn       dGPU          - Radeon HD 7790
+     -------------------------------------------------------------------------
+     ``gfx700``  - ``bonaire``   ``amdgcn``   dGPU          - Radeon HD 7790
                                                            - Radeon HD 8770
                                                            - R7 260
                                                            - R7 260X
-     \          - kaveri    amdgcn       APU           - A6-7000
+     \           - ``kaveri``    ``amdgcn``   APU           - A6-7000
                                                            - A6 Pro-7050B
                                                            - A8-7100
                                                            - A8 Pro-7150B
@ -131,16 +136,16 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                            - A8-7200P
                                                            - A10-7400P
                                                            - FX-7600P
-     gfx701     - hawaii    amdgcn       dGPU  ROCm    - FirePro W8100
+     ``gfx701``  - ``hawaii``    ``amdgcn``   dGPU  ROCm    - FirePro W8100
                                                            - FirePro W9100
                                                            - FirePro S9150
                                                            - FirePro S9170
-     gfx702                              dGPU  ROCm    - Radeon R9 290
+     ``gfx702``                  ``amdgcn``   dGPU  ROCm    - Radeon R9 290
                                                            - Radeon R9 290x
                                                            - Radeon R390
                                                            - Radeon R390x
-     gfx703     - kabini    amdgcn       APU           - E1-2100
-                - mullins                              - E1-2200
+     ``gfx703``  - ``kabini``    ``amdgcn``   APU           - E1-2100
+                 - ``mullins``                              - E1-2200
                                                            - E1-2500
                                                            - E2-3000
                                                            - E2-3800
@ -149,8 +154,8 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                            - A6-5200
                                                            - A4 Pro-3340B
     **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
-     --------------------------------------------------------------------
-     gfx800     - iceland   amdgcn       dGPU          - FirePro S7150
+     -------------------------------------------------------------------------
+     ``gfx800``  - ``iceland``   ``amdgcn``   dGPU          - FirePro S7150
                                                            - FirePro S7100
                                                            - FirePro W7100
                                                            - Radeon R285
@ -158,58 +163,98 @@ names from both the *Processor* and *Alternative Processor* can be used.
                                                            - Radeon R9 385
                                                            - Mobile FirePro
                                                              M7170
-     gfx801     - carrizo   amdgcn       APU           - A6-8500P
+     ``gfx801``  - ``carrizo``   ``amdgcn``   APU           - A6-8500P
                                                            - Pro A6-8500B
                                                            - A8-8600P
                                                            - Pro A8-8600B
                                                            - FX-8800P
                                                            - Pro A12-8800B
-     \                      amdgcn       APU   ROCm    - A10-8700P
+     \                           ``amdgcn``   APU   ROCm    - A10-8700P
                                                            - Pro A10-8700B
                                                            - A10-8780P
-     \                      amdgcn       APU           - A10-9600P
+     \                           ``amdgcn``   APU           - A10-9600P
                                                            - A10-9630P
                                                            - A12-9700P
                                                            - A12-9730P
                                                            - FX-9800P
                                                            - FX-9830P
-     \                      amdgcn       APU           - E2-9010
+     \                           ``amdgcn``   APU           - E2-9010
                                                            - A6-9210
                                                            - A9-9410
-     gfx802     - tonga     amdgcn       dGPU  ROCm    Same as gfx800
-     gfx803     - fiji      amdgcn       dGPU  ROCm    - Radeon R9 Nano
+     ``gfx802``  - ``tonga``     ``amdgcn``   dGPU  ROCm    Same as gfx800
+     ``gfx803``  - ``fiji``      ``amdgcn``   dGPU  ROCm    - Radeon R9 Nano
                                                            - Radeon R9 Fury
                                                            - Radeon R9 FuryX
                                                            - Radeon Pro Duo
                                                            - FirePro S9300x2
                                                            - Radeon Instinct MI8
-     \          - polaris10 amdgcn       dGPU  ROCm    - Radeon RX 470
+     \           - ``polaris10`` ``amdgcn``   dGPU  ROCm    - Radeon RX 470
                                                            - Radeon RX 480
                                                            - Radeon Instinct MI6
-     \          - polaris11 amdgcn       dGPU  ROCm    - Radeon RX 460
-     gfx804                 amdgcn       dGPU          Same as gfx803
-     gfx810     - stoney    amdgcn       APU
+     \           - ``polaris11`` ``amdgcn``   dGPU  ROCm    - Radeon RX 460
+     ``gfx810``  - ``stoney``    ``amdgcn``   APU
     **GCN GFX9** [AMD-GCN-GFX9]_
-     --------------------------------------------------------------------
-     gfx900                 amdgcn       dGPU          - Radeon Vega
+     -------------------------------------------------------------------------
+     ``gfx900``                  ``amdgcn``   dGPU  ROCm    - Radeon Vega
                                                              Frontier Edition
                                                            - Radeon RX Vega 56
                                                            - Radeon RX Vega 64
                                                            - Radeon RX Vega 64
                                                              Liquid
                                                            - Radeon Instinct MI25
-     gfx901                 amdgcn       dGPU  ROCm    Same as gfx900
-                                                       except XNACK is
-                                                       enabled
-     gfx902                 amdgcn       APU           *TBA*
+     ``gfx902``                  ``amdgcn``   APU           *TBA*

                                                            .. TODO
                                                               Add product
                                                               names.
-     gfx903                 amdgcn       APU           Same as gfx902
-                                                       except XNACK is
-                                                       enabled
-     ========== =========== ============ ===== ======= ==================
+     =========== =============== ============ ===== ======= ==================
+
+.. _amdgpu-target-features:
+
+Target Features
+---------------
+
+Target features control how code is generated to support certain
+features. Not all target features are supported by all processors. The
+runtime must ensure that the features supported by the device used to
+execute the code match the features enabled when generating the
+code. A mismatch of features may result in incorrect execution, or a
+reduction in performance.
+
+Use the ``clang -m[no-]<TargetFeature>`` option to specify the AMD GPU
+target features.
+
+For example:
+
+``-mxnack``
+  Enable the *XNACK* feature.
+``-mno-xnack``
+  Disable the *XNACK* feature.
+
+  .. table:: AMDGPU Target Features
+     :name: amdgpu-target-feature-table
+
+     ============== ======== ==================================================
+     Target Feature Default  Description
+     ============== ======== ==================================================
+     -m[no-]xnack   disabled Enable/disable generating code that has
+                             memory clauses that are compatible with
+                             having XNACK replay enabled.
+
+                             This is used for demand paging and page
+                             migration. If XNACK replay is enabled in
+                             the device, then if a page fault occurs
+                             the code may execute incorrectly if the
+                             XNACK feature is not enabled. Executing
+                             code that has the feature enabled on a
+                             device that does not have XNACK replay
+                             enabled will execute correctly, but may
+                             be less performant than code with the
+                             feature disabled.
+
+                             This feature is supported by the
+                             ``amdgcn`` architecture for GFX7-GFX9.
+     ============== ======== ==================================================

 .. _amdgpu-address-spaces:

@ -268,14 +313,14 @@ The memory model supported is based on the HSA memory model [HSA]_ which is
 based in turn on HRF-indirect with scope inclusion [HRF]_. The happens-before
 relation is transitive over the synchonizes-with relation independent of scope,
 and synchonizes-with allows the memory scope instances to be inclusive (see
-table :ref:`amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table`).
+table :ref:`amdgpu-amdhsa-llvm-sync-scopes-table`).

 This is different to the OpenCL [OpenCL]_ memory model which does not have scope
 inclusion and requires the memory scopes to exactly match. However, this
 is conservatively correct for OpenCL.

-  .. table:: AMDHSA LLVM Sync Scopes for AMDHSA
-     :name: amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table
+  .. table:: AMDHSA LLVM Sync Scopes
+     :name: amdgpu-amdhsa-llvm-sync-scopes-table

     ================ ==========================================================
     LLVM Sync Scope  Description
@ -364,16 +409,18 @@ The AMDGPU backend uses the following ELF header:
     ========================== ===============================
     ``e_ident[EI_CLASS]``      ``ELFCLASS64``
     ``e_ident[EI_DATA]``       ``ELFDATA2LSB``
-     ``e_ident[EI_OSABI]``      ``ELFOSABI_AMDGPU_HSA``,
-                                ``ELFOSABI_AMDGPU_PAL`` or
-                                ``ELFOSABI_AMDGPU_MESA3D``
-     ``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA``,
-                                ``ELFABIVERSION_AMDGPU_PAL`` or
-                                ``ELFABIVERSION_AMDGPU_MESA3D``
-     ``e_type``                 ``ET_REL`` or ``ET_DYN``
+     ``e_ident[EI_OSABI]``      - ``ELFOSABI_NONE``
+                                - ``ELFOSABI_AMDGPU_HSA``
+                                - ``ELFOSABI_AMDGPU_PAL``
+                                - ``ELFOSABI_AMDGPU_MESA3D``
+     ``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA``
+                                - ``ELFABIVERSION_AMDGPU_PAL``
+                                - ``ELFABIVERSION_AMDGPU_MESA3D``
+     ``e_type``                 - ``ET_REL``
+                                - ``ET_DYN``
     ``e_machine``              ``EM_AMDGPU``
     ``e_entry``                0
-     ``e_flags``                0
+     ``e_flags``                See :ref:`amdgpu-elf-header-e_flags-table`
     ========================== ===============================

 ..
@ -385,6 +432,7 @@ The AMDGPU backend uses the following ELF header:
     Name                            Value
     =============================== =====
     ``EM_AMDGPU``                   224
+     ``ELFOSABI_NONE``               0
     ``ELFOSABI_AMDGPU_HSA``         64
     ``ELFOSABI_AMDGPU_PAL``         65
     ``ELFOSABI_AMDGPU_MESA3D``      66
@ -394,23 +442,27 @@ The AMDGPU backend uses the following ELF header:
     =============================== =====

 ``e_ident[EI_CLASS]``
-  The ELF class is always ``ELFCLASS64``. The AMDGPU backend only supports 64
+  The ELF class is:
+
+  * ``ELFCLASS32`` for ``r600`` architecture.
+
+  * ``ELFCLASS64`` for ``amdgcn`` architecture which only supports 64
    bit applications.

 ``e_ident[EI_DATA]``
-  All AMDGPU targets use ELFDATA2LSB for little-endian byte ordering.
+  All AMDGPU targets use ``ELFDATA2LSB`` for little-endian byte ordering.

 ``e_ident[EI_OSABI]``
-  One of the following AMD GPU architecture specific OS ABIs:
+  One of the following AMD GPU architecture specific OS ABIs
+  (see :ref:`amdgpu-os-table`):

-  * ``ELFOSABI_AMDGPU_HSA`` is used to specify that the code object conforms to
-    the AMD HSA runtime ABI [HSA]_.
+  * ``ELFOSABI_NONE`` for *unknown* OS.

-  * ``ELFOSABI_AMDGPU_PAL`` is used to specify that the code object conforms to
-    the AMD PAL runtime ABI.
+  * ``ELFOSABI_AMDGPU_HSA`` for ``amdhsa`` OS.

-  * ``ELFOSABI_AMDGPU_MESA3D`` is used to specify that the code object conforms
-    to the AMD MESA runtime ABI.
+  * ``ELFOSABI_AMDGPU_PAL`` for ``amdpal`` OS.
+
+  * ``ELFOSABI_AMDGPU_MESA3D`` for ``mesa3D`` OS.

 ``e_ident[EI_ABIVERSION]``
  The ABI version of the AMD GPU architecture specific OS ABI to which the code
@ -423,7 +475,7 @@ The AMDGPU backend uses the following ELF header:
    runtime ABI.

  * ``ELFABIVERSION_AMDGPU_MESA3D`` is used to specify the version of AMD MESA
-    runtime ABI.
+    3D runtime ABI.

 ``e_type``
  Can be one of the following values:
@ -439,17 +491,74 @@ The AMDGPU backend uses the following ELF header:
  The AMD HSA runtime loader requires a ``ET_DYN`` code object.

 ``e_machine``
-  The value ``EM_AMDGPU`` is used for the machine for all members of the AMD GPU
-  architecture family. The specific member is specified in the
-  ``NT_AMD_AMDGPU_ISA`` entry in the ``.note`` section (see
-  :ref:`amdgpu-note-records`).
+  The value ``EM_AMDGPU`` is used for the machine for all processors supported
+  by the ``r600`` and ``amdgcn`` architectures (see
+  :ref:`amdgpu-processor-table`). The specific processor is specified in the
+  ``EF_AMDGPU_MACH`` bit field of the ``e_flags`` (see
+  :ref:`amdgpu-elf-header-e_flags-table`).

 ``e_entry``
  The entry point is 0 as the entry points for individual kernels must be
  selected in order to invoke them through AQL packets.

 ``e_flags``
-  The value is 0 as no flags are used.
+  The AMDGPU backend uses the following ELF header flags:
+
+  .. table:: AMDGPU ELF Header ``e_flags``
+     :name: amdgpu-elf-header-e_flags-table
+
+     ================================= ========== =============================
+     Name                              Value      Description
+     ================================= ========== =============================
+     **AMDGPU Processor Flag**                    See :ref:`amdgpu-processor-table`.
+     -------------------------------------------- -----------------------------
+     ``EF_AMDGPU_MACH``                0x000000ff AMDGPU processor selection
+                                                  mask for
+                                                  ``EF_AMDGPU_MACH_xxx`` values
+                                                  defined in
+                                                  :ref:`amdgpu-ef-amdgpu-mach-table`.
+     ================================= ========== =============================
+
+  .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
+     :name: amdgpu-ef-amdgpu-mach-table
+
+     ================================= ========== =============================
+     Name                              Value      Description (see
+                                                  :ref:`amdgpu-processor-table`)
+     ================================= ========== =============================
+     ``EF_AMDGPU_MACH_NONE``           0          *not specified*
+     ``EF_AMDGPU_MACH_R600_R600``      1          ``r600``
+     ``EF_AMDGPU_MACH_R600_R630``      2          ``r630``
+     ``EF_AMDGPU_MACH_R600_RS880``     3          ``rs880``
+     ``EF_AMDGPU_MACH_R600_RV670``     4          ``rv670``
+     ``EF_AMDGPU_MACH_R600_RV710``     5          ``rv710``
+     ``EF_AMDGPU_MACH_R600_RV730``     6          ``rv730``
+     ``EF_AMDGPU_MACH_R600_RV770``     7          ``rv770``
+     ``EF_AMDGPU_MACH_R600_CEDAR``     8          ``cedar``
+     ``EF_AMDGPU_MACH_R600_REDWOOD``   9          ``redwood``
+     ``EF_AMDGPU_MACH_R600_SUMO``      10         ``sumo``
+     ``EF_AMDGPU_MACH_R600_JUNIPER``   11         ``juniper``
+     ``EF_AMDGPU_MACH_R600_CYPRESS``   12         ``cypress``
+     ``EF_AMDGPU_MACH_R600_BARTS``     13         ``barts``
+     ``EF_AMDGPU_MACH_R600_TURKS``     14         ``turks``
+     ``EF_AMDGPU_MACH_R600_CAICOS``    15         ``caicos``
+     ``EF_AMDGPU_MACH_R600_CAYMAN``    16         ``cayman``
+     *reserved*                        17-31      Reserved for ``r600``
+                                                  architecture processors.
+     ``EF_AMDGPU_MACH_AMDGCN_GFX600``  32         ``gfx600``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX601``  33         ``gfx601``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX700``  34         ``gfx700``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX701``  35         ``gfx701``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX702``  36         ``gfx702``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX703``  37         ``gfx703``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX800``  38         ``gfx800``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX801``  39         ``gfx801``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX802``  40         ``gfx802``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX803``  41         ``gfx803``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX810``  42         ``gfx810``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX900``  43         ``gfx900``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX902``  44         ``gfx902``
+     ================================= ========== =============================

 Sections
 --------
@ -515,11 +624,11 @@ if needed.
 Note Records
 ------------

-As required by ``ELFCLASS64``, minimal zero byte padding must be generated after
-the ``name`` field to ensure the ``desc`` field is 4 byte aligned. In addition,
-minimal zero byte padding must be generated to ensure the ``desc`` field size is
-a multiple of 4 bytes. The ``sh_addralign`` field of the ``.note`` section must
-be at least 4 to indicate at least 8 byte alignment.
+As required by ``ELFCLASS32`` and ``ELFCLASS64``, minimal zero byte padding must
+be generated after the ``name`` field to ensure the ``desc`` field is 4 byte
+aligned. In addition, minimal zero byte padding must be generated to ensure the
+``desc`` field size is a multiple of 4 bytes. The ``sh_addralign`` field of the
+``.note`` section must be at least 4 to indicate at least 8 byte alignment.

 The AMDGPU backend code object uses the following ELF note records in the
 ``.note`` section. The *Description* column specifies the layout of the note
@ -537,7 +646,6 @@ Additional note records can be present.
     Name  Type                           Description
     ===== ============================== ======================================
     "AMD" ``NT_AMD_AMDGPU_HSA_METADATA`` <metadata null terminated string>
-     "AMD" ``NT_AMD_AMDGPU_ISA``          <isa name null terminated string>
     ===== ============================== ======================================

 ..
@ -550,49 +658,9 @@ Additional note records can be present.
     ============================== =====
     *reserved*                       0-9
     ``NT_AMD_AMDGPU_HSA_METADATA``    10
-     ``NT_AMD_AMDGPU_ISA``             11
+     *reserved*                        11
     ============================== =====

-``NT_AMD_AMDGPU_ISA``
-  Specifies the instruction set architecture used by the machine code contained
-  in the code object.
-
-  This note record is required for code objects containing machine code for
-  processors matching the ``amdgcn`` architecture in table
-  :ref:`amdgpu-processors`.
-
-  The null terminated string has the following syntax:
-
-    *architecture*\ ``-``\ *vendor*\ ``-``\ *os*\ ``-``\ *environment*\ ``-``\ *processor*
-
-  where:
-
-    *architecture*
-      The architecture from table :ref:`amdgpu-target-triples-table`.
-
-      This is always ``amdgcn`` when the target triple OS is ``amdhsa`` (see
-      :ref:`amdgpu-target-triples`).
-
-    *vendor*
-      The vendor from table :ref:`amdgpu-target-triples-table`.
-
-      For the AMDGPU backend this is always ``amd``.
-
-    *os*
-      The OS from table :ref:`amdgpu-target-triples-table`.
-
-    *environment*
-      An environment from table :ref:`amdgpu-target-triples-table`, or blank if
-      the environment has no affect on the execution of the code object.
-
-      For the AMDGPU backend this is currently always blank.
-    *processor*
-      The processor from table :ref:`amdgpu-processors-table`.
-
-  For example:
-
-    ``amdgcn-amd-amdhsa--gfx901``
-
 ``NT_AMD_AMDGPU_HSA_METADATA``
  Specifies extensible metadata associated with the code objects executed on HSA
  [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when
@ -904,10 +972,6 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                the kernel code. See
                                                :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
                                                for the mapping definition.
-     "DebugProps"      mapping                  Mapping of properties related to
-                                                the kernel debugging. See
-                                                :ref:`amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table`
-                                                for the mapping definition.
     ================= ============== ========= ================================

 ..
@ -1175,9 +1239,9 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                           private address space
                                                           memory required for a
                                                           work-item in
-                                                           bytes. If
-                                                           IsDynamicCallstack
-                                                           is 1 then additional
+                                                           bytes. If the kernel
+                                                           uses a dynamic call
+                                                           stack then additional
                                                           space must be added
                                                           to this value for the
                                                           call stack.
@ -1188,7 +1252,7 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                           be a power of 2.
     "WavefrontSize"              integer        Required  Wavefront size. Must
                                                           be a power of 2.
-     "NumSGPRs"                   integer                  Number of scalar
+     "NumSGPRs"                   integer        Required  Number of scalar
                                                           registers used by a
                                                           wavefront for
                                                           GFX6-GFX9. This
@ -1204,44 +1268,37 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                           rounded up to the
                                                           allocation
                                                           granularity.
-     "NumVGPRs"                   integer                  Number of vector
+     "NumVGPRs"                   integer        Required  Number of vector
                                                           registers used by
                                                           each work-item for
                                                           GFX6-GFX9
-     "MaxFlatWorkGroupSize"       integer                  Maximum flat
+     "MaxFlatWorkGroupSize"       integer        Required  Maximum flat
                                                           work-group size
                                                           supported by the
                                                           kernel in work-items.
-     "IsDynamicCallStack"         boolean                  Indicates if the
-                                                           generated machine
-                                                           code is using a
-                                                           dynamically sized
-                                                           call stack.
+                                                           Must be >=1 and
+                                                           consistent with any
+                                                           non-0 values in
+                                                           FixedWorkGroupSize.
+     "FixedWorkGroupSize"         sequence of              Corresponds to the
+                                  3 integers               dispatch work-group
+                                                           size X, Y, Z. If
+                                                           omitted, defaults to
+                                                           0, 0, 0. If an
+                                                           element is non-0 then
+                                                           the kernel must only
+                                                           be launched with a
+                                                           matching corresponding
+                                                           work-group size.
     "IsXNACKEnabled"             boolean                  Indicates if the
                                                           generated machine
                                                           code is capable of
-                                                           supporting XNACK.
+                                                           supporting XNACK. See
+                                                           :ref:`amdgpu-target-features`.
     ============================ ============== ========= =====================

 ..

-  .. table:: AMDHSA Code Object Kernel Debug Properties Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table
-
-     =================================== ============== ========= ==============
-     String Key                          Value Type     Required? Description
-     =================================== ============== ========= ==============
-     "DebuggerABIVersion"                sequence of
-                                         2 integers
-     "ReservedNumVGPRs"                  integer
-     "ReservedFirstVGPR"                 integer
-     "PrivateSegmentBufferSGPR"          integer
-     "WavefrontPrivateSegmentOffsetSGPR" integer
-     =================================== ============== ========= ==============
-
-.. TODO
-   Plan to remove the debug properties metadata.
-
 Kernel Dispatch
 ~~~~~~~~~~~~~~~

@ -1451,17 +1508,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     then additional space must
                                                     be added to this value for
                                                     the call stack.
-     95:64   4 bytes MaxFlatWorkGroupSize            Maximum flat work-group
-                                                     size supported by the
-                                                     kernel in work-items.
-     96      1 bit   IsDynamicCallStack              Indicates if the generated
-                                                     machine code is using a
-                                                     dynamically sized call
-                                                     stack.
-     97      1 bit   IsXNACKEnabled                  Indicates if the generated
-                                                     machine code is capable of
-                                                     suppoting XNACK.
-     127:98  30 bits                                 Reserved, must be 0.
+     127:64  8 bytes                                 Reserved, must be 0.
     191:128 8 bytes KernelCodeEntryByteOffset       Byte offset (possibly
                                                     negative) from base
                                                     address of kernel
@ -1469,7 +1516,30 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     entry point instruction
                                                     which must be 256 byte
                                                     aligned.
-     383:192 24                                      Reserved, must be 0.
+     223:192 4 bytes MaxFlatWorkGroupSize            Maximum flat work-group
+                                                     size supported by the
+                                                     kernel in work-items. If
+                                                     an exact work-group size
+                                                     is required then must be
+                                                     omitted or 0 and
+                                                     ReqdWorkGroupSize* must
+                                                     be set to non-0.
+     239:224 2 bytes ReqdWorkGroupSizeX              If present and non-0 then
+                                                     the kernel
+                                                     must be executed with the
+                                                     specified work-group size
+                                                     for X.
+     255:240 2 bytes ReqdWorkGroupSizeY              If present and non-0 then
+                                                     the kernel
+                                                     must be executed with the
+                                                     specified work-group size
+                                                     for Y.
+     271:256 2 bytes ReqdWorkGroupSizeZ              If present and non-0 then
+                                                     the kernel
+                                                     must be executed with the
+                                                     specified work-group size
+                                                     for Z.
+     383:271 14                                      Reserved, must be 0.
             bytes
     415:384 4 bytes ComputePgmRsrc1                 Compute Shader (CS)
                                                     program settings used by
@ -1509,7 +1579,10 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     should always be 0.
     457     1 bit   EnableSGPRGridWorkgroupCountZ   Not implemented in CP and
                                                     should always be 0.
-     463:458 6 bits                                  Reserved, must be 0.
+     462:458 5 bits                                  Reserved, must be 0.
+     463     1 bit   IsXNACKEnabled                  Indicates if the generated
+                                                     machine code is capable of
+                                                     supporting XNACK.
     511:464 6                                       Reserved, must be 0.
             bytes
     512     **Total size 64 bytes.**
@ -1528,7 +1601,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     granularity is device
                                                     specific:

-                                                     GFX6-9
+                                                     GFX6-GFX9
                                                       - max_vgpr 1..256
                                                       - roundup((max_vgpg + 1)
                                                         / 4) - 1
@ -1540,7 +1613,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     granularity is device
                                                     specific:

-                                                     GFX6-8
+                                                     GFX6-GFX8
                                                       - max_sgpr 1..112
                                                       - roundup((max_sgpg + 1)
                                                         / 8) - 1
@ -1679,7 +1752,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                     CP is responsible for
                                                     filling in
                                                     ``COMPUTE_PGM_RSRC1.CDBG_USER``.
-     26      1 bit   FP16_OVFL                       GFX6-8
+     26      1 bit   FP16_OVFL                       GFX6-GFX8
                                                       Reserved, must be 0.
                                                     GFX9
                                                       Wavefront starts execution
@ -2166,7 +2239,7 @@ Flat Scratch register pair are adjacent SGRRs so they can be moved as a 64 bit
 value to the hardware required SGPRn-3 and SGPRn-4 respectively.

 The global segment can be accessed either using buffer instructions (GFX6 which
-has V# 64 bit address support), flat instructions (GFX7-9), or global
+has V# 64 bit address support), flat instructions (GFX7-GFX9), or global
 instructions (GFX9).

 If buffer operations are used then the compiler can generate a V# with the
@ -2212,7 +2285,7 @@ Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
 GFX6
  Flat scratch is not supported.

-GFX7-8
+GFX7-GFX8
  1. The low word of Flat Scratch Init is 32 bit byte offset from
     ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
     being managed by SPI for the queue executing the kernel dispatch. This is
@ -2305,7 +2378,7 @@ For GFX6-GFX9:
  same wavefront.
 * The vector memory operations are performed as wavefront wide operations and
  completion is reported to a wavefront in execution order. The exception is
-  that for GFX7-9 ``flat_load/store/atomic`` instructions can report out of
+  that for GFX7-GFX9 ``flat_load/store/atomic`` instructions can report out of
  vector memory order if they access LDS memory, and out of LDS operation order
  if they access global memory.
 * The vector memory operations access a single vector L1 cache shared by all
@ -2332,7 +2405,7 @@ For GFX6-GFX9:
 * The L2 cache can be kept coherent with other agents on some targets, or ranges
  of virtual addresses can be set up to bypass it to ensure system coherence.

-Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-8),
+Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-GFX8),
 or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
 memory, atomic memory orderings are not meaningful and all accesses are treated
 as non-atomic.