llvm-mirror/docs/AMDGPUUsage.rst

==============================
User Guide for AMDGPU Back-end
==============================

Introduction
============

The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
the R600 family up until the current Volcanic Islands (GCN Gen 3).


Assembler
=========

The assembler is currently considered experimental.

For syntax examples look in test/MC/AMDGPU.

Below some of the currently supported features (modulo bugs).  These
all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
are also supported but may be missing some instructions and have more bugs:

DS Instructions
---------------
All DS instructions are supported.

FLAT Instructions
------------------
These instructions are only present in the Sea Islands and Volcanic Islands
instruction set.  All FLAT instructions are supported for these architectures

MUBUF Instructions
------------------
All non-atomic MUBUF instructions are supported.

SMRD Instructions
-----------------
Only the s_load_dword* SMRD instructions are supported.

SOP1 Instructions
-----------------
All SOP1 instructions are supported.

SOP2 Instructions
-----------------
All SOP2 instructions are supported.

SOPC Instructions
-----------------
All SOPC instructions are supported.

SOPP Instructions
-----------------

Unless otherwise mentioned, all SOPP instructions that have one or more
operands accept integer operands only.  No verification is performed
on the operands, so it is up to the programmer to be familiar with the
range or acceptable values.

s_waitcnt
^^^^^^^^^

s_waitcnt accepts named arguments to specify which memory counter(s) to
wait for.

.. code-block:: nasm

   // Wait for all counters to be 0
   s_waitcnt 0

   // Equivalent to s_waitcnt 0.  Counter names can also be delimited by
   // '&' or ','.
   s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)

   // Wait for vmcnt counter to be 1.
   s_waitcnt vmcnt(1)

VOP1, VOP2, VOP3, VOPC Instructions
-----------------------------------

All 32-bit and 64-bit encodings should work.

The assembler will automatically detect which encoding size to use for
VOP1, VOP2, and VOPC instructions based on the operands.  If you want to force
a specific encoding size, you can add an _e32 (for 32-bit encoding) or
_e64 (for 64-bit encoding) suffix to the instruction.  Most, but not all
instructions support an explicit suffix.  These are all valid assembly
strings:

.. code-block:: nasm

   v_mul_i32_i24 v1, v2, v3
   v_mul_i32_i24_e32 v1, v2, v3
   v_mul_i32_i24_e64 v1, v2, v3

Assembler Directives
--------------------

.hsa_code_object_version major, minor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

*major* and *minor* are integers that specify the version of the HSA code
object that will be generated by the assembler.  This value will be stored
in an entry of the .note section.

.hsa_code_object_isa [major, minor, stepping, vendor, arch]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

*major*, *minor*, and *stepping* are all integers that describe the instruction
set architecture (ISA) version of the assembly program.

*vendor* and *arch* are quoted strings.  *vendor* should always be equal to
"AMD" and *arch* should always be equal to "AMDGPU".

If no arguments are specified, then the assembler will derive the ISA version,
*vendor*, and *arch* from the value of the -mcpu option that is passed to the
assembler.

ISA version, *vendor*, and *arch* will all be stored in a single entry of the
.note section.

.amd_kernel_code_t
^^^^^^^^^^^^^^^^^^

This directive marks the beginning of a list of key / value pairs that are used
to specify the amd_kernel_code_t object that will be emitted by the assembler.
The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
any amd_kernel_code_t values that are unspecified a default value will be
used.  The default value for all keys is 0, with the following exceptions:

- *kernel_code_version_major* defaults to 1.
- *machine_kind* defaults to 1.
- *machine_version_major*, *machine_version_minor*, and
  *machine_version_stepping* are derived from the value of the -mcpu option
  that is passed to the assembler.
- *kernel_code_entry_byte_offset* defaults to 256.
- *wavefront_size* defaults to 6.
- *kernarg_segment_alignment*, *group_segment_alignment*, and
  *private_segment_alignment* default to 4.  Note that alignments are specified
  as a power of two, so a value of **n** means an alignment of 2^ **n**.

The *.amd_kernel_code_t* directive must be placed immediately after the
function label and before any instructions.

For a full list of amd_kernel_code_t keys, see the examples in
test/CodeGen/AMDGPU/hsa.s.  For an explanation of the meanings of the different
keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h

Here is an example of a minimal amd_kernel_code_t specification:

.. code-block:: nasm

   .hsa_code_object_version 1,0
   .hsa_code_object_isa

   .text

   hello_world:

      .amd_kernel_code_t
         enable_sgpr_kernarg_segment_ptr = 1
         is_ptr64 = 1
         compute_pgm_rsrc1_vgprs = 0
         compute_pgm_rsrc1_sgprs = 0
         compute_pgm_rsrc2_user_sgpr = 2
         kernarg_segment_byte_size = 8
         wavefront_sgpr_count = 2
         workitem_vgpr_count = 3
     .end_amd_kernel_code_t

     s_load_dwordx2 s[0:1], s[0:1] 0x0
     v_mov_b32 v0, 3.14159
     s_waitcnt lgkmcnt(0)
     v_mov_b32 v1, s0
     v_mov_b32 v2, s1
     flat_store_dword v0, v[1:2]
     s_endpgm
R600 -> AMDGPU rename llvm-svn: 239657 2015-06-13 05:28:10 +02:00			`==============================`
			`User Guide for AMDGPU Back-end`
			`==============================`
R600/SI: Start implementing an assembler This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994 2014-11-14 15:08:00 +01:00
			`Introduction`
			`============`

R600 -> AMDGPU rename llvm-svn: 239657 2015-06-13 05:28:10 +02:00			`The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with`
R600/SI: Initial support for assembler and inline assembly This is currently considered experimental, but most of the more commonly used instructions should work. So far only SI has been extensively tested, CI and VI probably work too, but may be buggy. The current set of tests cases do not give complete coverage, but I think it is sufficient for an experimental assembler. See the documentation in R600Usage for more information. llvm-svn: 234381 2015-04-08 03:09:26 +02:00			`the R600 family up until the current Volcanic Islands (GCN Gen 3).`
R600/SI: Start implementing an assembler This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994 2014-11-14 15:08:00 +01:00

			`Assembler`
			`=========`

R600/SI: Initial support for assembler and inline assembly This is currently considered experimental, but most of the more commonly used instructions should work. So far only SI has been extensively tested, CI and VI probably work too, but may be buggy. The current set of tests cases do not give complete coverage, but I think it is sufficient for an experimental assembler. See the documentation in R600Usage for more information. llvm-svn: 234381 2015-04-08 03:09:26 +02:00			`The assembler is currently considered experimental.`

R600 -> AMDGPU rename llvm-svn: 239657 2015-06-13 05:28:10 +02:00			`For syntax examples look in test/MC/AMDGPU.`
R600/SI: Initial support for assembler and inline assembly This is currently considered experimental, but most of the more commonly used instructions should work. So far only SI has been extensively tested, CI and VI probably work too, but may be buggy. The current set of tests cases do not give complete coverage, but I think it is sufficient for an experimental assembler. See the documentation in R600Usage for more information. llvm-svn: 234381 2015-04-08 03:09:26 +02:00
			`Below some of the currently supported features (modulo bugs). These`
			`all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands`
			`are also supported but may be missing some instructions and have more bugs:`

			`DS Instructions`
			`---------------`
			`All DS instructions are supported.`

R600/SI: Add assembler support for FLAT instructions - Add glc, slc, and tfe operands to flat instructions - Add missing flat instructions - Fix the encoding of flat_load_dwordx3 and flat_store_dwordx3. llvm-svn: 239637 2015-06-12 22:47:06 +02:00			`FLAT Instructions`
			`------------------`
			`These instructions are only present in the Sea Islands and Volcanic Islands`
			`instruction set. All FLAT instructions are supported for these architectures`

R600/SI: Initial support for assembler and inline assembly This is currently considered experimental, but most of the more commonly used instructions should work. So far only SI has been extensively tested, CI and VI probably work too, but may be buggy. The current set of tests cases do not give complete coverage, but I think it is sufficient for an experimental assembler. See the documentation in R600Usage for more information. llvm-svn: 234381 2015-04-08 03:09:26 +02:00			`MUBUF Instructions`
			`------------------`
			`All non-atomic MUBUF instructions are supported.`

			`SMRD Instructions`
			`-----------------`
			`Only the s_load_dword* SMRD instructions are supported.`

			`SOP1 Instructions`
			`-----------------`
			`All SOP1 instructions are supported.`

			`SOP2 Instructions`
			`-----------------`
			`All SOP2 instructions are supported.`

			`SOPC Instructions`
			`-----------------`
			`All SOPC instructions are supported.`
R600/SI: Start implementing an assembler This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994 2014-11-14 15:08:00 +01:00
			`SOPP Instructions`
			`-----------------`

R600/SI: Initial support for assembler and inline assembly This is currently considered experimental, but most of the more commonly used instructions should work. So far only SI has been extensively tested, CI and VI probably work too, but may be buggy. The current set of tests cases do not give complete coverage, but I think it is sufficient for an experimental assembler. See the documentation in R600Usage for more information. llvm-svn: 234381 2015-04-08 03:09:26 +02:00			`Unless otherwise mentioned, all SOPP instructions that have one or more`
			`operands accept integer operands only. No verification is performed`
			`on the operands, so it is up to the programmer to be familiar with the`
			`range or acceptable values.`
R600/SI: Start implementing an assembler This was done using the Sparc and PowerPC AsmParsers as guides. So far it is very simple and only supports sopp instructions. llvm-svn: 221994 2014-11-14 15:08:00 +01:00
			`s_waitcnt`
			`^^^^^^^^^`

			`s_waitcnt accepts named arguments to specify which memory counter(s) to`
			`wait for.`

			`.. code-block:: nasm`

			`// Wait for all counters to be 0`
			`s_waitcnt 0`

			`// Equivalent to s_waitcnt 0. Counter names can also be delimited by`
			`// '&' or ','.`
			`s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)`

			`// Wait for vmcnt counter to be 1.`
			`s_waitcnt vmcnt(1)`

R600/SI: Initial support for assembler and inline assembly This is currently considered experimental, but most of the more commonly used instructions should work. So far only SI has been extensively tested, CI and VI probably work too, but may be buggy. The current set of tests cases do not give complete coverage, but I think it is sufficient for an experimental assembler. See the documentation in R600Usage for more information. llvm-svn: 234381 2015-04-08 03:09:26 +02:00			`VOP1, VOP2, VOP3, VOPC Instructions`
			`-----------------------------------`

			`All 32-bit and 64-bit encodings should work.`

			`The assembler will automatically detect which encoding size to use for`
			`VOP1, VOP2, and VOPC instructions based on the operands. If you want to force`
			`a specific encoding size, you can add an _e32 (for 32-bit encoding) or`
			`_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all`
			`instructions support an explicit suffix. These are all valid assembly`
			`strings:`

			`.. code-block:: nasm`

			`v_mul_i32_i24 v1, v2, v3`
			`v_mul_i32_i24_e32 v1, v2, v3`
			`v_mul_i32_i24_e64 v1, v2, v3`
AMDGPU/SI: Add hsa code object directives Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10757 llvm-svn: 240831 2015-06-26 23:15:07 +02:00
			`Assembler Directives`
			`--------------------`

			`.hsa_code_object_version major, minor`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			`major and minor are integers that specify the version of the HSA code`
			`object that will be generated by the assembler. This value will be stored`
			`in an entry of the .note section.`

			`.hsa_code_object_isa [major, minor, stepping, vendor, arch]`
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`

			`major, minor, and stepping are all integers that describe the instruction`
			`set architecture (ISA) version of the assembly program.`

			`vendor and arch are quoted strings. vendor should always be equal to`
			`"AMD" and arch should always be equal to "AMDGPU".`

			`If no arguments are specified, then the assembler will derive the ISA version,`
			`vendor, and arch from the value of the -mcpu option that is passed to the`
			`assembler.`

			`ISA version, vendor, and arch will all be stored in a single entry of the`
			`.note section.`
AMDGPU/SI: Update amd_kernel_code_t definition and add assembler support Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10772 llvm-svn: 240839 2015-06-26 23:58:31 +02:00
			`.amd_kernel_code_t`
			`^^^^^^^^^^^^^^^^^^`

			`This directive marks the beginning of a list of key / value pairs that are used`
			`to specify the amd_kernel_code_t object that will be emitted by the assembler.`
			`The list must be terminated by the .end_amd_kernel_code_t directive. For`
			`any amd_kernel_code_t values that are unspecified a default value will be`
			`used. The default value for all keys is 0, with the following exceptions:`

			`- kernel_code_version_major defaults to 1.`
			`- machine_kind defaults to 1.`
			`- machine_version_major, machine_version_minor, and`
			`machine_version_stepping are derived from the value of the -mcpu option`
			`that is passed to the assembler.`
			`- kernel_code_entry_byte_offset defaults to 256.`
			`- wavefront_size defaults to 6.`
			`- kernarg_segment_alignment, group_segment_alignment, and`
			`private_segment_alignment default to 4. Note that alignments are specified`
			`as a power of two, so a value of n means an alignment of 2^ n.`

			`The .amd_kernel_code_t directive must be placed immediately after the`
			`function label and before any instructions.`

			`For a full list of amd_kernel_code_t keys, see the examples in`
			`test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different`
			`keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h`

			`Here is an example of a minimal amd_kernel_code_t specification:`

			`.. code-block:: nasm`

			`.hsa_code_object_version 1,0`
			`.hsa_code_object_isa`

			`.text`

			`hello_world:`

			`.amd_kernel_code_t`
			`enable_sgpr_kernarg_segment_ptr = 1`
			`is_ptr64 = 1`
			`compute_pgm_rsrc1_vgprs = 0`
			`compute_pgm_rsrc1_sgprs = 0`
			`compute_pgm_rsrc2_user_sgpr = 2`
			`kernarg_segment_byte_size = 8`
			`wavefront_sgpr_count = 2`
			`workitem_vgpr_count = 3`
			`.end_amd_kernel_code_t`

			`s_load_dwordx2 s[0:1], s[0:1] 0x0`
			`v_mov_b32 v0, 3.14159`
			`s_waitcnt lgkmcnt(0)`
			`v_mov_b32 v1, s0`
			`v_mov_b32 v2, s1`
			`flat_store_dword v0, v[1:2]`
			`s_endpgm`