1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

[AMDGPU] Update code object description

- Use ELF header flags to identify processor.
 - Remove isa note record.
 - Add target feature section.
 - Make metadata for NumVGPRs, NumSGPRs and MaxFlatWorkGroupSize required.
 - Add FixedWorkGroupSize to CodeProps metadata.
 - Add ReqdWorkGroupSize* to kernel descriptor and move MaxFlatWorkGroupSize to be adjacent.
 - Move IsXNACKEnabled in the kernel descriptor to be at the end of the unused flags.
 - Remove IsDynamicCallStack from the metadata and kernel descriptor.
 - Remove legacy debugger metadata.
 - Remove old XNACK enabled processor names.

Differential Revision: https://reviews.llvm.org/D39828

llvm-svn: 317855
This commit is contained in:
Tony Tye 2017-11-10 01:00:54 +00:00
parent befcc521fc
commit 34f9af3e72

View File

@ -23,50 +23,55 @@ Target Triples
Use the ``clang -target <Architecture>-<Vendor>-<OS>-<Environment>`` option to
specify the target triple:
.. table:: AMDGPU Target Triples
:name: amdgpu-target-triples-table
.. table:: AMDGPU Architectures
:name: amdgpu-architecture-table
============ ======== ========= ===========
Architecture Vendor OS Environment
============ ======== ========= ===========
r600 amd <empty> <empty>
amdgcn amd <empty> <empty>
amdgcn amd amdhsa <empty>
amdgcn amd amdhsa opencl
amdgcn amd amdhsa amdgizcl
amdgcn amd amdhsa amdgiz
amdgcn amd amdhsa hcc
============ ======== ========= ===========
============ ==============================================================
Architecture Description
============ ==============================================================
``r600`` AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders.
``amdgcn`` AMD GPUs GCN GFX6 onwards for graphics and compute shaders.
============ ==============================================================
``r600-amd--``
Supports AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders executed on
the MESA runtime.
.. table:: AMDGPU Vendors
:name: amdgpu-vendor-table
``amdgcn-amd--``
Supports AMD GPUs GCN GFX6 onwards for graphics and compute shaders executed on
the MESA runtime.
============ ==============================================================
Vendor Description
============ ==============================================================
``amd`` Can be used for all AMD GPU usage.
``mesa3d`` Can be used if the OS is ``mesa3d``.
============ ==============================================================
``amdgcn-amd-amdhsa-``
Supports AMD GCN GPUs GFX6 onwards for compute kernels executed on HSA [HSA]_
compatible runtimes such as AMD's ROCm [AMD-ROCm]_.
.. table:: AMDGPU Operating Systems
:name: amdgpu-os-table
``amdgcn-amd-amdhsa-opencl``
Supports AMD GCN GPUs GFX6 onwards for OpenCL compute kernels executed on HSA
[HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
:ref:`amdgpu-opencl`.
============== ============================================================
OS Description
============== ============================================================
*<empty>* Defaults to the *unknown* OS.
``amdhsa`` Compute kernels executed on HSA [HSA]_ compatible runtimes
such as AMD's ROCm [AMD-ROCm]_.
``amdpal`` Graphic shaders and compute kernels executed on AMD PAL
runtime.
``mesa3d`` Graphic shaders and compute kernels executed on Mesa 3D
runtime.
============== ============================================================
``amdgcn-amd-amdhsa-amdgizcl``
Same as ``amdgcn-amd-amdhsa-opencl`` except a different address space mapping
is used (see :ref:`amdgpu-address-spaces`).
.. table:: AMDGPU Environments
:name: amdgpu-environment-table
``amdgcn-amd-amdhsa-amdgiz``
Same as ``amdgcn-amd-amdhsa-`` except a different address space mapping is
============ ==============================================================
Environment Description
============ ==============================================================
*<empty>* Defaults to ``opencl``.
``opencl`` OpenCL compute kernel (see :ref:`amdgpu-opencl`).
``amdgizcl`` Same as ``opencl`` except a different address space mapping is
used (see :ref:`amdgpu-address-spaces`).
``amdgcn-amd-amdhsa-hcc``
Supports AMD GCN GPUs GFX6 onwards for AMD HC language compute kernels
executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
:ref:`amdgpu-hcc`.
``amdgiz`` Same as ``opencl`` except a different address space mapping is
used (see :ref:`amdgpu-address-spaces`).
``hcc`` AMD HC language compute kernel (see :ref:`amdgpu-hcc`).
============ ==============================================================
.. _amdgpu-processors:
@ -77,51 +82,51 @@ Use the ``clang -mcpu <Processor>`` option to specify the AMD GPU processor. The
names from both the *Processor* and *Alternative Processor* can be used.
.. table:: AMDGPU Processors
:name: amdgpu-processors-table
:name: amdgpu-processor-table
========== =========== ============ ===== ======= ==================
Processor Alternative Target dGPU/ Runtime Example
=========== =============== ============ ===== ======= ==================
Processor Alternative Target dGPU/ ROCm Example
Processor Triple APU Support Products
Architecture
========== =========== ============ ===== ======= ==================
=========== =============== ============ ===== ======= ==================
**Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
--------------------------------------------------------------------
r600 r600 dGPU
r630 r600 dGPU
rs880 r600 dGPU
rv670 r600 dGPU
-------------------------------------------------------------------------
``r600`` ``r600`` dGPU
``r630`` ``r600`` dGPU
``rs880`` ``r600`` dGPU
``rv670`` ``r600`` dGPU
**Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
--------------------------------------------------------------------
rv710 r600 dGPU
rv730 r600 dGPU
rv770 r600 dGPU
-------------------------------------------------------------------------
``rv710`` ``r600`` dGPU
``rv730`` ``r600`` dGPU
``rv770`` ``r600`` dGPU
**Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
--------------------------------------------------------------------
cedar r600 dGPU
redwood r600 dGPU
sumo r600 dGPU
juniper r600 dGPU
cypress r600 dGPU
-------------------------------------------------------------------------
``cedar`` ``r600`` dGPU
``redwood`` ``r600`` dGPU
``sumo`` ``r600`` dGPU
``juniper`` ``r600`` dGPU
``cypress`` ``r600`` dGPU
**Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
--------------------------------------------------------------------
barts r600 dGPU
turks r600 dGPU
caicos r600 dGPU
cayman r600 dGPU
-------------------------------------------------------------------------
``barts`` ``r600`` dGPU
``turks`` ``r600`` dGPU
``caicos`` ``r600`` dGPU
``cayman`` ``r600`` dGPU
**GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
--------------------------------------------------------------------
gfx600 - tahiti amdgcn dGPU
gfx601 - pitcairn amdgcn dGPU
- verde
- oland
- hainan
-------------------------------------------------------------------------
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU
``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU
- ``verde``
- ``oland``
- ``hainan``
**GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
--------------------------------------------------------------------
gfx700 - bonaire amdgcn dGPU - Radeon HD 7790
-------------------------------------------------------------------------
``gfx700`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790
- Radeon HD 8770
- R7 260
- R7 260X
\ - kaveri amdgcn APU - A6-7000
\ - ``kaveri`` ``amdgcn`` APU - A6-7000
- A6 Pro-7050B
- A8-7100
- A8 Pro-7150B
@ -131,16 +136,16 @@ names from both the *Processor* and *Alternative Processor* can be used.
- A8-7200P
- A10-7400P
- FX-7600P
gfx701 - hawaii amdgcn dGPU ROCm - FirePro W8100
``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100
- FirePro W9100
- FirePro S9150
- FirePro S9170
gfx702 dGPU ROCm - Radeon R9 290
``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290
- Radeon R9 290x
- Radeon R390
- Radeon R390x
gfx703 - kabini amdgcn APU - E1-2100
- mullins - E1-2200
``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100
- ``mullins`` - E1-2200
- E1-2500
- E2-3000
- E2-3800
@ -149,8 +154,8 @@ names from both the *Processor* and *Alternative Processor* can be used.
- A6-5200
- A4 Pro-3340B
**GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
--------------------------------------------------------------------
gfx800 - iceland amdgcn dGPU - FirePro S7150
-------------------------------------------------------------------------
``gfx800`` - ``iceland`` ``amdgcn`` dGPU - FirePro S7150
- FirePro S7100
- FirePro W7100
- Radeon R285
@ -158,58 +163,98 @@ names from both the *Processor* and *Alternative Processor* can be used.
- Radeon R9 385
- Mobile FirePro
M7170
gfx801 - carrizo amdgcn APU - A6-8500P
``gfx801`` - ``carrizo`` ``amdgcn`` APU - A6-8500P
- Pro A6-8500B
- A8-8600P
- Pro A8-8600B
- FX-8800P
- Pro A12-8800B
\ amdgcn APU ROCm - A10-8700P
\ ``amdgcn`` APU ROCm - A10-8700P
- Pro A10-8700B
- A10-8780P
\ amdgcn APU - A10-9600P
\ ``amdgcn`` APU - A10-9600P
- A10-9630P
- A12-9700P
- A12-9730P
- FX-9800P
- FX-9830P
\ amdgcn APU - E2-9010
\ ``amdgcn`` APU - E2-9010
- A6-9210
- A9-9410
gfx802 - tonga amdgcn dGPU ROCm Same as gfx800
gfx803 - fiji amdgcn dGPU ROCm - Radeon R9 Nano
``gfx802`` - ``tonga`` ``amdgcn`` dGPU ROCm Same as gfx800
``gfx803`` - ``fiji`` ``amdgcn`` dGPU ROCm - Radeon R9 Nano
- Radeon R9 Fury
- Radeon R9 FuryX
- Radeon Pro Duo
- FirePro S9300x2
- Radeon Instinct MI8
\ - polaris10 amdgcn dGPU ROCm - Radeon RX 470
\ - ``polaris10`` ``amdgcn`` dGPU ROCm - Radeon RX 470
- Radeon RX 480
- Radeon Instinct MI6
\ - polaris11 amdgcn dGPU ROCm - Radeon RX 460
gfx804 amdgcn dGPU Same as gfx803
gfx810 - stoney amdgcn APU
\ - ``polaris11`` ``amdgcn`` dGPU ROCm - Radeon RX 460
``gfx810`` - ``stoney`` ``amdgcn`` APU
**GCN GFX9** [AMD-GCN-GFX9]_
--------------------------------------------------------------------
gfx900 amdgcn dGPU - Radeon Vega
-------------------------------------------------------------------------
``gfx900`` ``amdgcn`` dGPU ROCm - Radeon Vega
Frontier Edition
- Radeon RX Vega 56
- Radeon RX Vega 64
- Radeon RX Vega 64
Liquid
- Radeon Instinct MI25
gfx901 amdgcn dGPU ROCm Same as gfx900
except XNACK is
enabled
gfx902 amdgcn APU *TBA*
``gfx902`` ``amdgcn`` APU *TBA*
.. TODO
Add product
names.
gfx903 amdgcn APU Same as gfx902
except XNACK is
enabled
========== =========== ============ ===== ======= ==================
=========== =============== ============ ===== ======= ==================
.. _amdgpu-target-features:
Target Features
---------------
Target features control how code is generated to support certain
features. Not all target features are supported by all processors. The
runtime must ensure that the features supported by the device used to
execute the code match the features enabled when generating the
code. A mismatch of features may result in incorrect execution, or a
reduction in performance.
Use the ``clang -m[no-]<TargetFeature>`` option to specify the AMD GPU
target features.
For example:
``-mxnack``
Enable the *XNACK* feature.
``-mno-xnack``
Disable the *XNACK* feature.
.. table:: AMDGPU Target Features
:name: amdgpu-target-feature-table
============== ======== ==================================================
Target Feature Default Description
============== ======== ==================================================
-m[no-]xnack disabled Enable/disable generating code that has
memory clauses that are compatible with
having XNACK replay enabled.
This is used for demand paging and page
migration. If XNACK replay is enabled in
the device, then if a page fault occurs
the code may execute incorrectly if the
XNACK feature is not enabled. Executing
code that has the feature enabled on a
device that does not have XNACK replay
enabled will execute correctly, but may
be less performant than code with the
feature disabled.
This feature is supported by the
``amdgcn`` architecture for GFX7-GFX9.
============== ======== ==================================================
.. _amdgpu-address-spaces:
@ -268,14 +313,14 @@ The memory model supported is based on the HSA memory model [HSA]_ which is
based in turn on HRF-indirect with scope inclusion [HRF]_. The happens-before
relation is transitive over the synchonizes-with relation independent of scope,
and synchonizes-with allows the memory scope instances to be inclusive (see
table :ref:`amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table`).
table :ref:`amdgpu-amdhsa-llvm-sync-scopes-table`).
This is different to the OpenCL [OpenCL]_ memory model which does not have scope
inclusion and requires the memory scopes to exactly match. However, this
is conservatively correct for OpenCL.
.. table:: AMDHSA LLVM Sync Scopes for AMDHSA
:name: amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table
.. table:: AMDHSA LLVM Sync Scopes
:name: amdgpu-amdhsa-llvm-sync-scopes-table
================ ==========================================================
LLVM Sync Scope Description
@ -364,16 +409,18 @@ The AMDGPU backend uses the following ELF header:
========================== ===============================
``e_ident[EI_CLASS]`` ``ELFCLASS64``
``e_ident[EI_DATA]`` ``ELFDATA2LSB``
``e_ident[EI_OSABI]`` ``ELFOSABI_AMDGPU_HSA``,
``ELFOSABI_AMDGPU_PAL`` or
``ELFOSABI_AMDGPU_MESA3D``
``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA``,
``ELFABIVERSION_AMDGPU_PAL`` or
``ELFABIVERSION_AMDGPU_MESA3D``
``e_type`` ``ET_REL`` or ``ET_DYN``
``e_ident[EI_OSABI]`` - ``ELFOSABI_NONE``
- ``ELFOSABI_AMDGPU_HSA``
- ``ELFOSABI_AMDGPU_PAL``
- ``ELFOSABI_AMDGPU_MESA3D``
``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA``
- ``ELFABIVERSION_AMDGPU_PAL``
- ``ELFABIVERSION_AMDGPU_MESA3D``
``e_type`` - ``ET_REL``
- ``ET_DYN``
``e_machine`` ``EM_AMDGPU``
``e_entry`` 0
``e_flags`` 0
``e_flags`` See :ref:`amdgpu-elf-header-e_flags-table`
========================== ===============================
..
@ -385,6 +432,7 @@ The AMDGPU backend uses the following ELF header:
Name Value
=============================== =====
``EM_AMDGPU`` 224
``ELFOSABI_NONE`` 0
``ELFOSABI_AMDGPU_HSA`` 64
``ELFOSABI_AMDGPU_PAL`` 65
``ELFOSABI_AMDGPU_MESA3D`` 66
@ -394,23 +442,27 @@ The AMDGPU backend uses the following ELF header:
=============================== =====
``e_ident[EI_CLASS]``
The ELF class is always ``ELFCLASS64``. The AMDGPU backend only supports 64
The ELF class is:
* ``ELFCLASS32`` for ``r600`` architecture.
* ``ELFCLASS64`` for ``amdgcn`` architecture which only supports 64
bit applications.
``e_ident[EI_DATA]``
All AMDGPU targets use ELFDATA2LSB for little-endian byte ordering.
All AMDGPU targets use ``ELFDATA2LSB`` for little-endian byte ordering.
``e_ident[EI_OSABI]``
One of the following AMD GPU architecture specific OS ABIs:
One of the following AMD GPU architecture specific OS ABIs
(see :ref:`amdgpu-os-table`):
* ``ELFOSABI_AMDGPU_HSA`` is used to specify that the code object conforms to
the AMD HSA runtime ABI [HSA]_.
* ``ELFOSABI_NONE`` for *unknown* OS.
* ``ELFOSABI_AMDGPU_PAL`` is used to specify that the code object conforms to
the AMD PAL runtime ABI.
* ``ELFOSABI_AMDGPU_HSA`` for ``amdhsa`` OS.
* ``ELFOSABI_AMDGPU_MESA3D`` is used to specify that the code object conforms
to the AMD MESA runtime ABI.
* ``ELFOSABI_AMDGPU_PAL`` for ``amdpal`` OS.
* ``ELFOSABI_AMDGPU_MESA3D`` for ``mesa3D`` OS.
``e_ident[EI_ABIVERSION]``
The ABI version of the AMD GPU architecture specific OS ABI to which the code
@ -423,7 +475,7 @@ The AMDGPU backend uses the following ELF header:
runtime ABI.
* ``ELFABIVERSION_AMDGPU_MESA3D`` is used to specify the version of AMD MESA
runtime ABI.
3D runtime ABI.
``e_type``
Can be one of the following values:
@ -439,17 +491,74 @@ The AMDGPU backend uses the following ELF header:
The AMD HSA runtime loader requires a ``ET_DYN`` code object.
``e_machine``
The value ``EM_AMDGPU`` is used for the machine for all members of the AMD GPU
architecture family. The specific member is specified in the
``NT_AMD_AMDGPU_ISA`` entry in the ``.note`` section (see
:ref:`amdgpu-note-records`).
The value ``EM_AMDGPU`` is used for the machine for all processors supported
by the ``r600`` and ``amdgcn`` architectures (see
:ref:`amdgpu-processor-table`). The specific processor is specified in the
``EF_AMDGPU_MACH`` bit field of the ``e_flags`` (see
:ref:`amdgpu-elf-header-e_flags-table`).
``e_entry``
The entry point is 0 as the entry points for individual kernels must be
selected in order to invoke them through AQL packets.
``e_flags``
The value is 0 as no flags are used.
The AMDGPU backend uses the following ELF header flags:
.. table:: AMDGPU ELF Header ``e_flags``
:name: amdgpu-elf-header-e_flags-table
================================= ========== =============================
Name Value Description
================================= ========== =============================
**AMDGPU Processor Flag** See :ref:`amdgpu-processor-table`.
-------------------------------------------- -----------------------------
``EF_AMDGPU_MACH`` 0x000000ff AMDGPU processor selection
mask for
``EF_AMDGPU_MACH_xxx`` values
defined in
:ref:`amdgpu-ef-amdgpu-mach-table`.
================================= ========== =============================
.. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
:name: amdgpu-ef-amdgpu-mach-table
================================= ========== =============================
Name Value Description (see
:ref:`amdgpu-processor-table`)
================================= ========== =============================
``EF_AMDGPU_MACH_NONE`` 0 *not specified*
``EF_AMDGPU_MACH_R600_R600`` 1 ``r600``
``EF_AMDGPU_MACH_R600_R630`` 2 ``r630``
``EF_AMDGPU_MACH_R600_RS880`` 3 ``rs880``
``EF_AMDGPU_MACH_R600_RV670`` 4 ``rv670``
``EF_AMDGPU_MACH_R600_RV710`` 5 ``rv710``
``EF_AMDGPU_MACH_R600_RV730`` 6 ``rv730``
``EF_AMDGPU_MACH_R600_RV770`` 7 ``rv770``
``EF_AMDGPU_MACH_R600_CEDAR`` 8 ``cedar``
``EF_AMDGPU_MACH_R600_REDWOOD`` 9 ``redwood``
``EF_AMDGPU_MACH_R600_SUMO`` 10 ``sumo``
``EF_AMDGPU_MACH_R600_JUNIPER`` 11 ``juniper``
``EF_AMDGPU_MACH_R600_CYPRESS`` 12 ``cypress``
``EF_AMDGPU_MACH_R600_BARTS`` 13 ``barts``
``EF_AMDGPU_MACH_R600_TURKS`` 14 ``turks``
``EF_AMDGPU_MACH_R600_CAICOS`` 15 ``caicos``
``EF_AMDGPU_MACH_R600_CAYMAN`` 16 ``cayman``
*reserved* 17-31 Reserved for ``r600``
architecture processors.
``EF_AMDGPU_MACH_AMDGCN_GFX600`` 32 ``gfx600``
``EF_AMDGPU_MACH_AMDGCN_GFX601`` 33 ``gfx601``
``EF_AMDGPU_MACH_AMDGCN_GFX700`` 34 ``gfx700``
``EF_AMDGPU_MACH_AMDGCN_GFX701`` 35 ``gfx701``
``EF_AMDGPU_MACH_AMDGCN_GFX702`` 36 ``gfx702``
``EF_AMDGPU_MACH_AMDGCN_GFX703`` 37 ``gfx703``
``EF_AMDGPU_MACH_AMDGCN_GFX800`` 38 ``gfx800``
``EF_AMDGPU_MACH_AMDGCN_GFX801`` 39 ``gfx801``
``EF_AMDGPU_MACH_AMDGCN_GFX802`` 40 ``gfx802``
``EF_AMDGPU_MACH_AMDGCN_GFX803`` 41 ``gfx803``
``EF_AMDGPU_MACH_AMDGCN_GFX810`` 42 ``gfx810``
``EF_AMDGPU_MACH_AMDGCN_GFX900`` 43 ``gfx900``
``EF_AMDGPU_MACH_AMDGCN_GFX902`` 44 ``gfx902``
================================= ========== =============================
Sections
--------
@ -515,11 +624,11 @@ if needed.
Note Records
------------
As required by ``ELFCLASS64``, minimal zero byte padding must be generated after
the ``name`` field to ensure the ``desc`` field is 4 byte aligned. In addition,
minimal zero byte padding must be generated to ensure the ``desc`` field size is
a multiple of 4 bytes. The ``sh_addralign`` field of the ``.note`` section must
be at least 4 to indicate at least 8 byte alignment.
As required by ``ELFCLASS32`` and ``ELFCLASS64``, minimal zero byte padding must
be generated after the ``name`` field to ensure the ``desc`` field is 4 byte
aligned. In addition, minimal zero byte padding must be generated to ensure the
``desc`` field size is a multiple of 4 bytes. The ``sh_addralign`` field of the
``.note`` section must be at least 4 to indicate at least 8 byte alignment.
The AMDGPU backend code object uses the following ELF note records in the
``.note`` section. The *Description* column specifies the layout of the note
@ -537,7 +646,6 @@ Additional note records can be present.
Name Type Description
===== ============================== ======================================
"AMD" ``NT_AMD_AMDGPU_HSA_METADATA`` <metadata null terminated string>
"AMD" ``NT_AMD_AMDGPU_ISA`` <isa name null terminated string>
===== ============================== ======================================
..
@ -550,49 +658,9 @@ Additional note records can be present.
============================== =====
*reserved* 0-9
``NT_AMD_AMDGPU_HSA_METADATA`` 10
``NT_AMD_AMDGPU_ISA`` 11
*reserved* 11
============================== =====
``NT_AMD_AMDGPU_ISA``
Specifies the instruction set architecture used by the machine code contained
in the code object.
This note record is required for code objects containing machine code for
processors matching the ``amdgcn`` architecture in table
:ref:`amdgpu-processors`.
The null terminated string has the following syntax:
*architecture*\ ``-``\ *vendor*\ ``-``\ *os*\ ``-``\ *environment*\ ``-``\ *processor*
where:
*architecture*
The architecture from table :ref:`amdgpu-target-triples-table`.
This is always ``amdgcn`` when the target triple OS is ``amdhsa`` (see
:ref:`amdgpu-target-triples`).
*vendor*
The vendor from table :ref:`amdgpu-target-triples-table`.
For the AMDGPU backend this is always ``amd``.
*os*
The OS from table :ref:`amdgpu-target-triples-table`.
*environment*
An environment from table :ref:`amdgpu-target-triples-table`, or blank if
the environment has no affect on the execution of the code object.
For the AMDGPU backend this is currently always blank.
*processor*
The processor from table :ref:`amdgpu-processors-table`.
For example:
``amdgcn-amd-amdhsa--gfx901``
``NT_AMD_AMDGPU_HSA_METADATA``
Specifies extensible metadata associated with the code objects executed on HSA
[HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when
@ -904,10 +972,6 @@ non-AMD key names should be prefixed by "*vendor-name*.".
the kernel code. See
:ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
for the mapping definition.
"DebugProps" mapping Mapping of properties related to
the kernel debugging. See
:ref:`amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table`
for the mapping definition.
================= ============== ========= ================================
..
@ -1175,9 +1239,9 @@ non-AMD key names should be prefixed by "*vendor-name*.".
private address space
memory required for a
work-item in
bytes. If
IsDynamicCallstack
is 1 then additional
bytes. If the kernel
uses a dynamic call
stack then additional
space must be added
to this value for the
call stack.
@ -1188,7 +1252,7 @@ non-AMD key names should be prefixed by "*vendor-name*.".
be a power of 2.
"WavefrontSize" integer Required Wavefront size. Must
be a power of 2.
"NumSGPRs" integer Number of scalar
"NumSGPRs" integer Required Number of scalar
registers used by a
wavefront for
GFX6-GFX9. This
@ -1204,44 +1268,37 @@ non-AMD key names should be prefixed by "*vendor-name*.".
rounded up to the
allocation
granularity.
"NumVGPRs" integer Number of vector
"NumVGPRs" integer Required Number of vector
registers used by
each work-item for
GFX6-GFX9
"MaxFlatWorkGroupSize" integer Maximum flat
"MaxFlatWorkGroupSize" integer Required Maximum flat
work-group size
supported by the
kernel in work-items.
"IsDynamicCallStack" boolean Indicates if the
generated machine
code is using a
dynamically sized
call stack.
Must be >=1 and
consistent with any
non-0 values in
FixedWorkGroupSize.
"FixedWorkGroupSize" sequence of Corresponds to the
3 integers dispatch work-group
size X, Y, Z. If
omitted, defaults to
0, 0, 0. If an
element is non-0 then
the kernel must only
be launched with a
matching corresponding
work-group size.
"IsXNACKEnabled" boolean Indicates if the
generated machine
code is capable of
supporting XNACK.
supporting XNACK. See
:ref:`amdgpu-target-features`.
============================ ============== ========= =====================
..
.. table:: AMDHSA Code Object Kernel Debug Properties Metadata Mapping
:name: amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table
=================================== ============== ========= ==============
String Key Value Type Required? Description
=================================== ============== ========= ==============
"DebuggerABIVersion" sequence of
2 integers
"ReservedNumVGPRs" integer
"ReservedFirstVGPR" integer
"PrivateSegmentBufferSGPR" integer
"WavefrontPrivateSegmentOffsetSGPR" integer
=================================== ============== ========= ==============
.. TODO
Plan to remove the debug properties metadata.
Kernel Dispatch
~~~~~~~~~~~~~~~
@ -1451,17 +1508,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
then additional space must
be added to this value for
the call stack.
95:64 4 bytes MaxFlatWorkGroupSize Maximum flat work-group
size supported by the
kernel in work-items.
96 1 bit IsDynamicCallStack Indicates if the generated
machine code is using a
dynamically sized call
stack.
97 1 bit IsXNACKEnabled Indicates if the generated
machine code is capable of
suppoting XNACK.
127:98 30 bits Reserved, must be 0.
127:64 8 bytes Reserved, must be 0.
191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly
negative) from base
address of kernel
@ -1469,7 +1516,30 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
entry point instruction
which must be 256 byte
aligned.
383:192 24 Reserved, must be 0.
223:192 4 bytes MaxFlatWorkGroupSize Maximum flat work-group
size supported by the
kernel in work-items. If
an exact work-group size
is required then must be
omitted or 0 and
ReqdWorkGroupSize* must
be set to non-0.
239:224 2 bytes ReqdWorkGroupSizeX If present and non-0 then
the kernel
must be executed with the
specified work-group size
for X.
255:240 2 bytes ReqdWorkGroupSizeY If present and non-0 then
the kernel
must be executed with the
specified work-group size
for Y.
271:256 2 bytes ReqdWorkGroupSizeZ If present and non-0 then
the kernel
must be executed with the
specified work-group size
for Z.
383:271 14 Reserved, must be 0.
bytes
415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS)
program settings used by
@ -1509,7 +1579,10 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
should always be 0.
457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and
should always be 0.
463:458 6 bits Reserved, must be 0.
462:458 5 bits Reserved, must be 0.
463 1 bit IsXNACKEnabled Indicates if the generated
machine code is capable of
supporting XNACK.
511:464 6 Reserved, must be 0.
bytes
512 **Total size 64 bytes.**
@ -1528,7 +1601,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
granularity is device
specific:
GFX6-9
GFX6-GFX9
- max_vgpr 1..256
- roundup((max_vgpg + 1)
/ 4) - 1
@ -1540,7 +1613,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
granularity is device
specific:
GFX6-8
GFX6-GFX8
- max_sgpr 1..112
- roundup((max_sgpg + 1)
/ 8) - 1
@ -1679,7 +1752,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
CP is responsible for
filling in
``COMPUTE_PGM_RSRC1.CDBG_USER``.
26 1 bit FP16_OVFL GFX6-8
26 1 bit FP16_OVFL GFX6-GFX8
Reserved, must be 0.
GFX9
Wavefront starts execution
@ -2166,7 +2239,7 @@ Flat Scratch register pair are adjacent SGRRs so they can be moved as a 64 bit
value to the hardware required SGPRn-3 and SGPRn-4 respectively.
The global segment can be accessed either using buffer instructions (GFX6 which
has V# 64 bit address support), flat instructions (GFX7-9), or global
has V# 64 bit address support), flat instructions (GFX7-GFX9), or global
instructions (GFX9).
If buffer operations are used then the compiler can generate a V# with the
@ -2212,7 +2285,7 @@ Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
GFX6
Flat scratch is not supported.
GFX7-8
GFX7-GFX8
1. The low word of Flat Scratch Init is 32 bit byte offset from
``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
being managed by SPI for the queue executing the kernel dispatch. This is
@ -2305,7 +2378,7 @@ For GFX6-GFX9:
same wavefront.
* The vector memory operations are performed as wavefront wide operations and
completion is reported to a wavefront in execution order. The exception is
that for GFX7-9 ``flat_load/store/atomic`` instructions can report out of
that for GFX7-GFX9 ``flat_load/store/atomic`` instructions can report out of
vector memory order if they access LDS memory, and out of LDS operation order
if they access global memory.
* The vector memory operations access a single vector L1 cache shared by all
@ -2332,7 +2405,7 @@ For GFX6-GFX9:
* The L2 cache can be kept coherent with other agents on some targets, or ranges
of virtual addresses can be set up to bypass it to ensure system coherence.
Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-8),
Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-GFX8),
or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
memory, atomic memory orderings are not meaningful and all accesses are treated
as non-atomic.