1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 12:41:49 +01:00

[NFC][AMDGPU] Reorganize description of scratch handling

Differential Revision: https://reviews.llvm.org/D93440
This commit is contained in:
Tony 2020-12-17 02:45:47 +00:00
parent 328670a855
commit 062d45d53c

View File

@ -107,21 +107,21 @@ specific information.
.. table:: AMDGPU Processors
:name: amdgpu-processor-table
=========== =============== ============ ===== ================= =========== =============== ======================
Processor Alternative Target dGPU/ Target Target OS Support Example
Processor Triple APU Features Properties *(see* Products
Architecture Supported `amdgpu-os`_
*and
corresponding
runtime release
notes for
current
information and
level of
support)*
=========== =============== ============ ===== ================= =========== =============== ======================
=========== =============== ============ ===== ================= =============== =============== ======================
Processor Alternative Target dGPU/ Target Target OS Support Example
Processor Triple APU Features Properties *(see* Products
Architecture Supported `amdgpu-os`_
*and
corresponding
runtime release
notes for
current
information and
level of
support)*
=========== =============== ============ ===== ================= =============== =============== ======================
**Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
-------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
``r600`` ``r600`` dGPU - Does not
support
generic
@ -143,7 +143,7 @@ specific information.
address
space
**Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
-------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
``rv710`` ``r600`` dGPU - Does not
support
generic
@ -160,7 +160,7 @@ specific information.
address
space
**Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
-------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
``cedar`` ``r600`` dGPU - Does not
support
generic
@ -187,7 +187,7 @@ specific information.
address
space
**Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
-------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
``barts`` ``r600`` dGPU - Does not
support
generic
@ -209,208 +209,208 @@ specific information.
address
space
**GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-------------------------------------------------------------------------------------------------------------------
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
-----------------------------------------------------------------------------------------------------------------------
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
support
generic
address
space
``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
- ``verde`` support
generic
address
space
``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
- ``oland`` support
generic
address
space
**GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
-------------------------------------------------------------------------------------------------------------------
``gfx700`` - ``kaveri`` ``amdgcn`` APU - *rocm-amdhsa* - A6-7000
- *pal-amdhsa* - A6 Pro-7050B
- *pal-amdpal* - A8-7100
- A8 Pro-7150B
- A10-7300
- A10 Pro-7350B
- FX-7500
- A8-7200P
- A10-7400P
- FX-7600P
``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro W8100
- *pal-amdhsa* - FirePro W9100
- *pal-amdpal* - FirePro S9150
- FirePro S9170
``gfx702`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 290
- *pal-amdhsa* - Radeon R9 290x
- *pal-amdpal* - Radeon R390
- Radeon R390x
``gfx703`` - ``kabini`` ``amdgcn`` APU - *pal-amdhsa* - E1-2100
- ``mullins`` - *pal-amdpal* - E1-2200
- E1-2500
- E2-3000
- E2-3800
- A4-5000
- A4-5100
- A6-5200
- A4 Pro-3340B
``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - *pal-amdhsa* - Radeon HD 7790
- *pal-amdpal* - Radeon HD 8770
- R7 260
- R7 260X
``gfx705`` ``amdgcn`` APU - *pal-amdhsa* *TBA*
- *pal-amdpal*
.. TODO::
-----------------------------------------------------------------------------------------------------------------------
``gfx700`` - ``kaveri`` ``amdgcn`` APU - Offset - *rocm-amdhsa* - A6-7000
flat - *pal-amdhsa* - A6 Pro-7050B
scratch - *pal-amdpal* - A8-7100
- A8 Pro-7150B
- A10-7300
- A10 Pro-7350B
- FX-7500
- A8-7200P
- A10-7400P
- FX-7600P
``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro W8100
flat - *pal-amdhsa* - FirePro W9100
scratch - *pal-amdpal* - FirePro S9150
- FirePro S9170
``gfx702`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 290
flat - *pal-amdhsa* - Radeon R9 290x
scratch - *pal-amdpal* - Radeon R390
- Radeon R390x
``gfx703`` - ``kabini`` ``amdgcn`` APU - Offset - *pal-amdhsa* - E1-2100
- ``mullins`` flat - *pal-amdpal* - E1-2200
scratch - E1-2500
- E2-3000
- E2-3800
- A4-5000
- A4-5100
- A6-5200
- A4 Pro-3340B
``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Offset - *pal-amdhsa* - Radeon HD 7790
flat - *pal-amdpal* - Radeon HD 8770
scratch - R7 260
- R7 260X
``gfx705`` ``amdgcn`` APU - Offset - *pal-amdhsa* *TBA*
flat - *pal-amdpal*
scratch .. TODO::
Add product
names.
Add product
names.
**GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
-------------------------------------------------------------------------------------------------------------------
``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - A6-8500P
- *pal-amdhsa* - Pro A6-8500B
- *pal-amdpal* - A8-8600P
- Pro A8-8600B
- FX-8800P
- Pro A12-8800B
- A10-8700P
- Pro A10-8700B
- A10-8780P
- A10-9600P
- A10-9630P
- A12-9700P
- A12-9730P
- FX-9800P
- FX-9830P
- E2-9010
- A6-9210
- A9-9410
``gfx802`` - ``iceland`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 285
- ``tonga`` - *pal-amdhsa* - Radeon R9 380
- *pal-amdpal* - Radeon R9 385
``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano
- *pal-amdhsa* - Radeon R9 Fury
- *pal-amdpal* - Radeon R9 FuryX
- Radeon Pro Duo
- FirePro S9300x2
- Radeon Instinct MI8
\ - ``polaris10`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 470
- *pal-amdhsa* - Radeon RX 480
- *pal-amdpal* - Radeon Instinct MI6
\ - ``polaris11`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 460
- *pal-amdhsa*
- *pal-amdpal*
``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro S7150
- *pal-amdhsa* - FirePro S7100
- *pal-amdpal* - FirePro W7100
- Mobile FirePro
M7170
``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - *rocm-amdhsa* *TBA*
- *pal-amdhsa*
- *pal-amdpal* .. TODO::
-----------------------------------------------------------------------------------------------------------------------
``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* - A6-8500P
flat - *pal-amdhsa* - Pro A6-8500B
scratch - *pal-amdpal* - A8-8600P
- Pro A8-8600B
- FX-8800P
- Pro A12-8800B
- A10-8700P
- Pro A10-8700B
- A10-8780P
- A10-9600P
- A10-9630P
- A12-9700P
- A12-9730P
- FX-9800P
- FX-9830P
- E2-9010
- A6-9210
- A9-9410
``gfx802`` - ``iceland`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 285
- ``tonga`` flat - *pal-amdhsa* - Radeon R9 380
scratch - *pal-amdpal* - Radeon R9 385
``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano
- *pal-amdhsa* - Radeon R9 Fury
- *pal-amdpal* - Radeon R9 FuryX
- Radeon Pro Duo
- FirePro S9300x2
- Radeon Instinct MI8
\ - ``polaris10`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 470
flat - *pal-amdhsa* - Radeon RX 480
scratch - *pal-amdpal* - Radeon Instinct MI6
\ - ``polaris11`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 460
flat - *pal-amdhsa*
scratch - *pal-amdpal*
``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro S7150
flat - *pal-amdhsa* - FirePro S7100
scratch - *pal-amdpal* - FirePro W7100
- Mobile FirePro
M7170
``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* *TBA*
flat - *pal-amdhsa*
scratch - *pal-amdpal* .. TODO::
Add product
names.
Add product
names.
**GCN GFX9 (Vega)** [AMD-GCN-GFX9]_
-------------------------------------------------------------------------------------------------------------------
``gfx900`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* - Radeon Vega
- *pal-amdhsa* Frontier Edition
- *pal-amdpal* - Radeon RX Vega 56
- Radeon RX Vega 64
- Radeon RX Vega 64
Liquid
- Radeon Instinct MI25
``gfx902`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - Ryzen 3 2200G
- *pal-amdhsa* - Ryzen 5 2400G
- *pal-amdpal*
``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA*
- *pal-amdhsa*
- *pal-amdpal* .. TODO::
-----------------------------------------------------------------------------------------------------------------------
``gfx900`` ``amdgcn`` dGPU - xnack - Absolute - *rocm-amdhsa* - Radeon Vega
flat - *pal-amdhsa* Frontier Edition
scratch - *pal-amdpal* - Radeon RX Vega 56
- Radeon RX Vega 64
- Radeon RX Vega 64
Liquid
- Radeon Instinct MI25
``gfx902`` ``amdgcn`` APU - xnack - Absolute - *rocm-amdhsa* - Ryzen 3 2200G
flat - *pal-amdhsa* - Ryzen 5 2400G
scratch - *pal-amdpal*
``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA*
- *pal-amdhsa*
- *pal-amdpal* .. TODO::
Add product
names.
Add product
names.
``gfx906`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* - Radeon Instinct MI50
- xnack - *pal-amdhsa* - Radeon Instinct MI60
- *pal-amdpal* - Radeon VII
- Radeon Pro VII
``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA*
- xnack
.. TODO::
``gfx906`` ``amdgcn`` dGPU - sramecc - Absolute - *rocm-amdhsa* - Radeon Instinct MI50
- xnack flat - *pal-amdhsa* - Radeon Instinct MI60
scratch - *pal-amdpal* - Radeon VII
- Radeon Pro VII
``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA*
- xnack - Absolute
flat .. TODO::
scratch
Add product
names.
Add product
names.
``gfx909`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* *TBA*
flat
scratch .. TODO::
``gfx909`` ``amdgcn`` APU - xnack - *pal-amdpal* *TBA*
Add product
names.
.. TODO::
Add product
names.
``gfx90c`` ``amdgcn`` APU - xnack - *pal-amdpal* - Ryzen 7 4700G
- Ryzen 7 4700GE
- Ryzen 5 4600G
- Ryzen 5 4600GE
- Ryzen 3 4300G
- Ryzen 3 4300GE
- Ryzen Pro 4000G
- Ryzen 7 Pro 4700G
- Ryzen 7 Pro 4750GE
- Ryzen 5 Pro 4650G
- Ryzen 5 Pro 4650GE
- Ryzen 3 Pro 4350G
- Ryzen 3 Pro 4350GE
``gfx90c`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* - Ryzen 7 4700G
flat - Ryzen 7 4700GE
scratch - Ryzen 5 4600G
- Ryzen 5 4600GE
- Ryzen 3 4300G
- Ryzen 3 4300GE
- Ryzen Pro 4000G
- Ryzen 7 Pro 4700G
- Ryzen 7 Pro 4750GE
- Ryzen 5 Pro 4650G
- Ryzen 5 Pro 4650GE
- Ryzen 3 Pro 4350G
- Ryzen 3 Pro 4350GE
**GCN GFX10 (RDNA 1)** [AMD-GCN-GFX10-RDNA1]_
-------------------------------------------------------------------------------------------------------------------
``gfx1010`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5700
- wavefrontsize64 - *pal-amdhsa* - Radeon RX 5700 XT
- xnack - *pal-amdpal* - Radeon Pro 5600 XT
- Radeon Pro 5600M
``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- wavefrontsize64 - *pal-amdhsa*
- xnack - *pal-amdpal*
.. TODO::
-----------------------------------------------------------------------------------------------------------------------
``gfx1010`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5700
- wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5700 XT
- xnack scratch - *pal-amdpal* - Radeon Pro 5600 XT
- Radeon Pro 5600M
``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- wavefrontsize64 - Absolute - *pal-amdhsa*
- xnack flat - *pal-amdpal*
scratch .. TODO::
Add product
names.
Add product
names.
``gfx1012`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5500
- wavefrontsize64 - *pal-amdhsa* - Radeon RX 5500 XT
- xnack - *pal-amdpal*
``gfx1012`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5500
- wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5500 XT
- xnack scratch - *pal-amdpal*
**GCN GFX10 (RDNA 2)** [AMD-GCN-GFX10-RDNA2]_
-------------------------------------------------------------------------------------------------------------------
``gfx1030`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- wavefrontsize64 - *pal-amdhsa*
- *pal-amdpal* .. TODO::
-----------------------------------------------------------------------------------------------------------------------
``gfx1030`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
- wavefrontsize64 flat - *pal-amdhsa*
scratch - *pal-amdpal* .. TODO::
Add product
names.
Add product
names.
``gfx1031`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
- wavefrontsize64 - *pal-amdhsa*
- *pal-amdpal* .. TODO::
``gfx1031`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
- wavefrontsize64 flat - *pal-amdhsa*
scratch - *pal-amdpal* .. TODO::
Add product
names.
Add product
names.
``gfx1032`` ``amdgcn`` dGPU - cumode - *pal-amdhsa* *TBA*
- wavefrontsize64 - *pal-amdhsa*
- *pal-amdpal* .. TODO::
``gfx1032`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
- wavefrontsize64 flat - *pal-amdhsa*
scratch - *pal-amdpal* .. TODO::
Add product
names.
Add product
names.
``gfx1033`` ``amdgcn`` APU - cumode - *pal-amdpal* *TBA*
- wavefrontsize64
.. TODO::
``gfx1033`` ``amdgcn`` APU - cumode - Absolute - *pal-amdpal* *TBA*
- wavefrontsize64 flat
scratch .. TODO::
Add product
names.
Add product
names.
=========== =============== ============ ===== ================= =========== =============== ======================
=========== =============== ============ ===== ================= =============== =============== ======================
.. _amdgpu-target-features:
@ -4162,18 +4162,9 @@ SGPR register initial state is defined in
(kernel descriptor enable of
field) SGPRs
========== ========================== ====== ==============================
First Private Segment Buffer 4 This is 4 SGPRs:
(enable_sgpr_private
First Private Segment Buffer 4 See
(enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`.
_segment_buffer)
V# that can be used,
together with Scratch
Wavefront Offset as an
offset, to access the
private memory space using a
segment address.
CP uses the value provided
by the runtime.
then Dispatch Ptr 2 64-bit address of AQL dispatch
(enable_sgpr_dispatch_ptr) packet for kernel dispatch
actually executing.
@ -4193,87 +4184,8 @@ SGPR register initial state is defined in
then Dispatch Id 2 64-bit Dispatch ID of the
(enable_sgpr_dispatch_id) dispatch packet being
executed.
then Flat Scratch Init 2 This is 2 SGPRs:
(enable_sgpr_flat_scratch
_init) GFX6
Not supported.
GFX7-GFX8
The first SGPR is a 32-bit
byte offset from
``SH_HIDDEN_PRIVATE_BASE_VIMID``
to per SPI base of memory
for scratch for the queue
executing the kernel
dispatch. CP obtains this
from the runtime. (The
Scratch Segment Buffer base
address is
``SH_HIDDEN_PRIVATE_BASE_VIMID``
plus this offset.) The value
of Scratch Wavefront Offset must
be added to this offset by
the kernel machine code,
right shifted by 8, and
moved to the FLAT_SCRATCH_HI
SGPR register.
FLAT_SCRATCH_HI corresponds
to SGPRn-4 on GFX7, and
SGPRn-6 on GFX8 (where SGPRn
is the highest numbered SGPR
allocated to the wavefront).
FLAT_SCRATCH_HI is
multiplied by 256 (as it is
in units of 256 bytes) and
added to
``SH_HIDDEN_PRIVATE_BASE_VIMID``
to calculate the per wavefront
FLAT SCRATCH BASE in flat
memory instructions that
access the scratch
aperture.
The second SGPR is 32-bit
byte size of a single
work-item's scratch memory
usage. CP obtains this from
the runtime, and it is
always a multiple of DWORD.
CP checks that the value in
the kernel dispatch packet
Private Segment Byte Size is
not larger and requests the
runtime to increase the
queue's scratch size if
necessary. The kernel code
must move it to
FLAT_SCRATCH_LO which is
SGPRn-3 on GFX7 and SGPRn-5
on GFX8. FLAT_SCRATCH_LO is
used as the FLAT SCRATCH
SIZE in flat memory
instructions. Having CP load
it once avoids loading it at
the beginning of every
wavefront.
GFX9-GFX10
This is the
64-bit base address of the
per SPI scratch backing
memory managed by SPI for
the queue executing the
kernel dispatch. CP obtains
this from the runtime (and
divides it if there are
multiple Shader Arrays each
with its own SPI). The value
of Scratch Wavefront Offset must
be added by the kernel
machine code and the result
moved to the FLAT_SCRATCH
SGPR which is SGPRn-6 and
SGPRn-5. It is used as the
FLAT SCRATCH BASE in flat
memory instructions.
then Flat Scratch Init 2 See
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
then Private Segment Size 1 The 32-bit byte size of a
(enable_sgpr_private single
work-item's
@ -4338,19 +4250,10 @@ SGPR register initial state is defined in
then Work-Group Info 1 {first_wavefront, 14'b0000,
(enable_sgpr_workgroup ordered_append_term[10:0],
_info) threadgroup_size_in_wavefronts[5:0]}
then Scratch Wavefront Offset 1 This is 1 SGPR:
(enable_sgpr_private
_segment_wavefront_offset)
32-bit byte offset from base
of scratch base of queue
executing the kernel
dispatch. Must be used as an
offset with Private segment
address when using Scratch
Segment Buffer. It must be
used to set up FLAT SCRATCH
for flat addressing (see
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
then Scratch Wavefront Offset 1 See
(enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
_segment_wavefront_offset) and
:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`.
========== ========================== ====== ==============================
The order of the VGPR registers is defined, but the compiler can specify which
@ -4390,12 +4293,11 @@ The setting of registers is done by GPU CP/ADC/SPI hardware as follows:
combination including none.
3. Scratch Wavefront Offset is set by SPI in a per wavefront basis which is why
its value cannot be included with the flat scratch init value which is per
queue.
queue (see :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
or (X, Y, Z).
See :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch` for Flat Scratch register
pair initialization.
5. Flat Scratch register pair initialization is described in
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
The global segment can be accessed either using buffer instructions (GFX6 which
has V# 64-bit address support), flat instructions (GFX7-GFX10), or global
@ -4474,48 +4376,98 @@ pointer are replaced with immediate ``0`` offsets.
Flat Scratch
++++++++++++
GFX6
Flat scratch is not supported.
There are different methods used for initializing flat scratch:
* If the *Target Properties* column of :ref:`amdgpu-processor-table`
specifies *Does not support generic address space*:
Flat scratch is not supported and there is no flat scratch register pair.
* If the *Target Properties* column of :ref:`amdgpu-processor-table`
specifies *Offset flat scratch*:
If the kernel or any function it calls may use flat operations to access
scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
(FLAT_SCRATCH_LO/FLAT_SCRATCH_HI). Initialization uses Flat Scratch Init and
Scratch Wavefront Offset SGPR registers (see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
1. The low word of Flat Scratch Init is the 32-bit byte offset from
``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
being managed by SPI for the queue executing the kernel dispatch. This is
the same value used in the Scratch Segment Buffer V# base address.
CP obtains this from the runtime. (The Scratch Segment Buffer base address
is ``SH_HIDDEN_PRIVATE_BASE_VIMID`` plus this offset.)
The prolog must add the value of Scratch Wavefront Offset to get the
wavefront's byte scratch backing memory offset from
``SH_HIDDEN_PRIVATE_BASE_VIMID``.
The Scratch Wavefront Offset must also be used as an offset with Private
segment address when using the Scratch Segment Buffer.
Since FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right
shifted by 8 before moving into FLAT_SCRATCH_HI.
FLAT_SCRATCH_HI corresponds to SGPRn-4 on GFX7, and SGPRn-6 on GFX8 (where
SGPRn is the highest numbered SGPR allocated to the wavefront).
FLAT_SCRATCH_HI is multiplied by 256 (as it is in units of 256 bytes) and
added to ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to calculate the per wavefront
FLAT SCRATCH BASE in flat memory instructions that access the scratch
aperture.
2. The second word of Flat Scratch Init is 32-bit byte size of a single
work-items scratch memory usage.
CP obtains this from the runtime, and it is always a multiple of DWORD. CP
checks that the value in the kernel dispatch packet Private Segment Byte
Size is not larger and requests the runtime to increase the queue's scratch
size if necessary.
CP directly loads from the kernel dispatch packet Private Segment Byte Size
field and rounds up to a multiple of DWORD. Having CP load it once avoids
loading it at the beginning of every wavefront.
The kernel prolog code must move it to FLAT_SCRATCH_LO which is SGPRn-3 on
GFX7 and SGPRn-5 on GFX8. FLAT_SCRATCH_LO is used as the FLAT SCRATCH SIZE
in flat memory instructions.
* If the *Target Properties* column of :ref:`amdgpu-processor-table`
specifies *Absolute flat scratch*:
GFX7-GFX10
If the kernel or any function it calls may use flat operations to access
scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
(FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which are in SGPRn-4/SGPRn-3). Initialization
uses Flat Scratch Init and Scratch Wavefront Offset SGPR registers (see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
GFX7-GFX8
The Flat Scratch Init is the 64-bit address of the base of scratch backing
memory being managed by SPI for the queue executing the kernel dispatch.
1. The low word of Flat Scratch Init is 32-bit byte offset from
``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
being managed by SPI for the queue executing the kernel dispatch. This is
the same value used in the Scratch Segment Buffer V# base address. The
prolog must add the value of Scratch Wavefront Offset to get the
wavefront's byte scratch backing memory offset from
``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since FLAT_SCRATCH_LO is in units of 256
bytes, the offset must be right shifted by 8 before moving into
FLAT_SCRATCH_LO.
2. The second word of Flat Scratch Init is 32-bit byte size of a single
work-items scratch memory usage. This is directly loaded from the kernel
dispatch packet Private Segment Byte Size and rounded up to a multiple of
DWORD. Having CP load it once avoids loading it at the beginning of every
wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT
SCRATCH SIZE.
CP obtains this from the runtime.
GFX9-GFX10
The Flat Scratch Init is the 64-bit address of the base of scratch backing
memory being managed by SPI for the queue executing the kernel dispatch. The
prolog must add the value of the wave's Scratch Wavefront Offset and moved
as a 64-bit value to the FLAT_SCRATCH pair for use as the flat scratch base
in flat memory instructions.
The kernel prolog must add the value of the wave's Scratch Wavefront Offset
and move the result as a 64-bit value to the FLAT_SCRATCH SGPR register pair
which is SGPRn-6 and SGPRn-5. It is used as the FLAT SCRATCH BASE in flat
memory instructions.
The Scratch Wavefront Offset must also be used as an offset with Private
segment address when using the Scratch Segment Buffer (see
:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`).
.. _amdgpu-amdhsa-kernel-prolog-private-segment-buffer:
Private Segment Buffer
++++++++++++++++++++++
A set of four SGPRs beginning at a four-aligned SGPR index are always selected
to serve as the scratch V# for the kernel as follows:
Private Segment Buffer SGPR register is used to initilize 4 SGPRs
that are used as a V# to access scratch. CP uses the value provided by the
runtime. It is used, together with Scratch Wavefront Offset as an offset, to
access the private memory space using a segment address. See
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`.
The scratch V# is a four-aligned SGPR and always selected for the kernel as
follows:
- If it is known during instruction selection that there is stack usage,
SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if