mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2025-01-31 12:41:49 +01:00
[NFC][AMDGPU] Reorganize description of scratch handling
Differential Revision: https://reviews.llvm.org/D93440
This commit is contained in:
parent
328670a855
commit
062d45d53c
@ -107,21 +107,21 @@ specific information.
|
||||
.. table:: AMDGPU Processors
|
||||
:name: amdgpu-processor-table
|
||||
|
||||
=========== =============== ============ ===== ================= =========== =============== ======================
|
||||
Processor Alternative Target dGPU/ Target Target OS Support Example
|
||||
Processor Triple APU Features Properties *(see* Products
|
||||
Architecture Supported `amdgpu-os`_
|
||||
*and
|
||||
corresponding
|
||||
runtime release
|
||||
notes for
|
||||
current
|
||||
information and
|
||||
level of
|
||||
support)*
|
||||
=========== =============== ============ ===== ================= =========== =============== ======================
|
||||
=========== =============== ============ ===== ================= =============== =============== ======================
|
||||
Processor Alternative Target dGPU/ Target Target OS Support Example
|
||||
Processor Triple APU Features Properties *(see* Products
|
||||
Architecture Supported `amdgpu-os`_
|
||||
*and
|
||||
corresponding
|
||||
runtime release
|
||||
notes for
|
||||
current
|
||||
information and
|
||||
level of
|
||||
support)*
|
||||
=========== =============== ============ ===== ================= =============== =============== ======================
|
||||
**Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``r600`` ``r600`` dGPU - Does not
|
||||
support
|
||||
generic
|
||||
@ -143,7 +143,7 @@ specific information.
|
||||
address
|
||||
space
|
||||
**Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``rv710`` ``r600`` dGPU - Does not
|
||||
support
|
||||
generic
|
||||
@ -160,7 +160,7 @@ specific information.
|
||||
address
|
||||
space
|
||||
**Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``cedar`` ``r600`` dGPU - Does not
|
||||
support
|
||||
generic
|
||||
@ -187,7 +187,7 @@ specific information.
|
||||
address
|
||||
space
|
||||
**Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``barts`` ``r600`` dGPU - Does not
|
||||
support
|
||||
generic
|
||||
@ -209,208 +209,208 @@ specific information.
|
||||
address
|
||||
space
|
||||
**GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
|
||||
support
|
||||
generic
|
||||
address
|
||||
space
|
||||
``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
|
||||
``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
|
||||
- ``verde`` support
|
||||
generic
|
||||
address
|
||||
space
|
||||
``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
|
||||
``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal*
|
||||
- ``oland`` support
|
||||
generic
|
||||
address
|
||||
space
|
||||
**GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
``gfx700`` - ``kaveri`` ``amdgcn`` APU - *rocm-amdhsa* - A6-7000
|
||||
- *pal-amdhsa* - A6 Pro-7050B
|
||||
- *pal-amdpal* - A8-7100
|
||||
- A8 Pro-7150B
|
||||
- A10-7300
|
||||
- A10 Pro-7350B
|
||||
- FX-7500
|
||||
- A8-7200P
|
||||
- A10-7400P
|
||||
- FX-7600P
|
||||
``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro W8100
|
||||
- *pal-amdhsa* - FirePro W9100
|
||||
- *pal-amdpal* - FirePro S9150
|
||||
- FirePro S9170
|
||||
``gfx702`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 290
|
||||
- *pal-amdhsa* - Radeon R9 290x
|
||||
- *pal-amdpal* - Radeon R390
|
||||
- Radeon R390x
|
||||
``gfx703`` - ``kabini`` ``amdgcn`` APU - *pal-amdhsa* - E1-2100
|
||||
- ``mullins`` - *pal-amdpal* - E1-2200
|
||||
- E1-2500
|
||||
- E2-3000
|
||||
- E2-3800
|
||||
- A4-5000
|
||||
- A4-5100
|
||||
- A6-5200
|
||||
- A4 Pro-3340B
|
||||
``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - *pal-amdhsa* - Radeon HD 7790
|
||||
- *pal-amdpal* - Radeon HD 8770
|
||||
- R7 260
|
||||
- R7 260X
|
||||
``gfx705`` ``amdgcn`` APU - *pal-amdhsa* *TBA*
|
||||
- *pal-amdpal*
|
||||
.. TODO::
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``gfx700`` - ``kaveri`` ``amdgcn`` APU - Offset - *rocm-amdhsa* - A6-7000
|
||||
flat - *pal-amdhsa* - A6 Pro-7050B
|
||||
scratch - *pal-amdpal* - A8-7100
|
||||
- A8 Pro-7150B
|
||||
- A10-7300
|
||||
- A10 Pro-7350B
|
||||
- FX-7500
|
||||
- A8-7200P
|
||||
- A10-7400P
|
||||
- FX-7600P
|
||||
``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro W8100
|
||||
flat - *pal-amdhsa* - FirePro W9100
|
||||
scratch - *pal-amdpal* - FirePro S9150
|
||||
- FirePro S9170
|
||||
``gfx702`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 290
|
||||
flat - *pal-amdhsa* - Radeon R9 290x
|
||||
scratch - *pal-amdpal* - Radeon R390
|
||||
- Radeon R390x
|
||||
``gfx703`` - ``kabini`` ``amdgcn`` APU - Offset - *pal-amdhsa* - E1-2100
|
||||
- ``mullins`` flat - *pal-amdpal* - E1-2200
|
||||
scratch - E1-2500
|
||||
- E2-3000
|
||||
- E2-3800
|
||||
- A4-5000
|
||||
- A4-5100
|
||||
- A6-5200
|
||||
- A4 Pro-3340B
|
||||
``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Offset - *pal-amdhsa* - Radeon HD 7790
|
||||
flat - *pal-amdpal* - Radeon HD 8770
|
||||
scratch - R7 260
|
||||
- R7 260X
|
||||
``gfx705`` ``amdgcn`` APU - Offset - *pal-amdhsa* *TBA*
|
||||
flat - *pal-amdpal*
|
||||
scratch .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
**GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - A6-8500P
|
||||
- *pal-amdhsa* - Pro A6-8500B
|
||||
- *pal-amdpal* - A8-8600P
|
||||
- Pro A8-8600B
|
||||
- FX-8800P
|
||||
- Pro A12-8800B
|
||||
- A10-8700P
|
||||
- Pro A10-8700B
|
||||
- A10-8780P
|
||||
- A10-9600P
|
||||
- A10-9630P
|
||||
- A12-9700P
|
||||
- A12-9730P
|
||||
- FX-9800P
|
||||
- FX-9830P
|
||||
- E2-9010
|
||||
- A6-9210
|
||||
- A9-9410
|
||||
``gfx802`` - ``iceland`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 285
|
||||
- ``tonga`` - *pal-amdhsa* - Radeon R9 380
|
||||
- *pal-amdpal* - Radeon R9 385
|
||||
``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano
|
||||
- *pal-amdhsa* - Radeon R9 Fury
|
||||
- *pal-amdpal* - Radeon R9 FuryX
|
||||
- Radeon Pro Duo
|
||||
- FirePro S9300x2
|
||||
- Radeon Instinct MI8
|
||||
\ - ``polaris10`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 470
|
||||
- *pal-amdhsa* - Radeon RX 480
|
||||
- *pal-amdpal* - Radeon Instinct MI6
|
||||
\ - ``polaris11`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 460
|
||||
- *pal-amdhsa*
|
||||
- *pal-amdpal*
|
||||
``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro S7150
|
||||
- *pal-amdhsa* - FirePro S7100
|
||||
- *pal-amdpal* - FirePro W7100
|
||||
- Mobile FirePro
|
||||
M7170
|
||||
``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - *rocm-amdhsa* *TBA*
|
||||
- *pal-amdhsa*
|
||||
- *pal-amdpal* .. TODO::
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* - A6-8500P
|
||||
flat - *pal-amdhsa* - Pro A6-8500B
|
||||
scratch - *pal-amdpal* - A8-8600P
|
||||
- Pro A8-8600B
|
||||
- FX-8800P
|
||||
- Pro A12-8800B
|
||||
- A10-8700P
|
||||
- Pro A10-8700B
|
||||
- A10-8780P
|
||||
- A10-9600P
|
||||
- A10-9630P
|
||||
- A12-9700P
|
||||
- A12-9730P
|
||||
- FX-9800P
|
||||
- FX-9830P
|
||||
- E2-9010
|
||||
- A6-9210
|
||||
- A9-9410
|
||||
``gfx802`` - ``iceland`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 285
|
||||
- ``tonga`` flat - *pal-amdhsa* - Radeon R9 380
|
||||
scratch - *pal-amdpal* - Radeon R9 385
|
||||
``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano
|
||||
- *pal-amdhsa* - Radeon R9 Fury
|
||||
- *pal-amdpal* - Radeon R9 FuryX
|
||||
- Radeon Pro Duo
|
||||
- FirePro S9300x2
|
||||
- Radeon Instinct MI8
|
||||
\ - ``polaris10`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 470
|
||||
flat - *pal-amdhsa* - Radeon RX 480
|
||||
scratch - *pal-amdpal* - Radeon Instinct MI6
|
||||
\ - ``polaris11`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 460
|
||||
flat - *pal-amdhsa*
|
||||
scratch - *pal-amdpal*
|
||||
``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro S7150
|
||||
flat - *pal-amdhsa* - FirePro S7100
|
||||
scratch - *pal-amdpal* - FirePro W7100
|
||||
- Mobile FirePro
|
||||
M7170
|
||||
``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* *TBA*
|
||||
flat - *pal-amdhsa*
|
||||
scratch - *pal-amdpal* .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
**GCN GFX9 (Vega)** [AMD-GCN-GFX9]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
``gfx900`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* - Radeon Vega
|
||||
- *pal-amdhsa* Frontier Edition
|
||||
- *pal-amdpal* - Radeon RX Vega 56
|
||||
- Radeon RX Vega 64
|
||||
- Radeon RX Vega 64
|
||||
Liquid
|
||||
- Radeon Instinct MI25
|
||||
``gfx902`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - Ryzen 3 2200G
|
||||
- *pal-amdhsa* - Ryzen 5 2400G
|
||||
- *pal-amdpal*
|
||||
``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA*
|
||||
- *pal-amdhsa*
|
||||
- *pal-amdpal* .. TODO::
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``gfx900`` ``amdgcn`` dGPU - xnack - Absolute - *rocm-amdhsa* - Radeon Vega
|
||||
flat - *pal-amdhsa* Frontier Edition
|
||||
scratch - *pal-amdpal* - Radeon RX Vega 56
|
||||
- Radeon RX Vega 64
|
||||
- Radeon RX Vega 64
|
||||
Liquid
|
||||
- Radeon Instinct MI25
|
||||
``gfx902`` ``amdgcn`` APU - xnack - Absolute - *rocm-amdhsa* - Ryzen 3 2200G
|
||||
flat - *pal-amdhsa* - Ryzen 5 2400G
|
||||
scratch - *pal-amdpal*
|
||||
``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA*
|
||||
- *pal-amdhsa*
|
||||
- *pal-amdpal* .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
``gfx906`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* - Radeon Instinct MI50
|
||||
- xnack - *pal-amdhsa* - Radeon Instinct MI60
|
||||
- *pal-amdpal* - Radeon VII
|
||||
- Radeon Pro VII
|
||||
``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA*
|
||||
- xnack
|
||||
.. TODO::
|
||||
``gfx906`` ``amdgcn`` dGPU - sramecc - Absolute - *rocm-amdhsa* - Radeon Instinct MI50
|
||||
- xnack flat - *pal-amdhsa* - Radeon Instinct MI60
|
||||
scratch - *pal-amdpal* - Radeon VII
|
||||
- Radeon Pro VII
|
||||
``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA*
|
||||
- xnack - Absolute
|
||||
flat .. TODO::
|
||||
scratch
|
||||
Add product
|
||||
names.
|
||||
|
||||
Add product
|
||||
names.
|
||||
``gfx909`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* *TBA*
|
||||
flat
|
||||
scratch .. TODO::
|
||||
|
||||
``gfx909`` ``amdgcn`` APU - xnack - *pal-amdpal* *TBA*
|
||||
Add product
|
||||
names.
|
||||
|
||||
.. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
|
||||
``gfx90c`` ``amdgcn`` APU - xnack - *pal-amdpal* - Ryzen 7 4700G
|
||||
- Ryzen 7 4700GE
|
||||
- Ryzen 5 4600G
|
||||
- Ryzen 5 4600GE
|
||||
- Ryzen 3 4300G
|
||||
- Ryzen 3 4300GE
|
||||
- Ryzen Pro 4000G
|
||||
- Ryzen 7 Pro 4700G
|
||||
- Ryzen 7 Pro 4750GE
|
||||
- Ryzen 5 Pro 4650G
|
||||
- Ryzen 5 Pro 4650GE
|
||||
- Ryzen 3 Pro 4350G
|
||||
- Ryzen 3 Pro 4350GE
|
||||
``gfx90c`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* - Ryzen 7 4700G
|
||||
flat - Ryzen 7 4700GE
|
||||
scratch - Ryzen 5 4600G
|
||||
- Ryzen 5 4600GE
|
||||
- Ryzen 3 4300G
|
||||
- Ryzen 3 4300GE
|
||||
- Ryzen Pro 4000G
|
||||
- Ryzen 7 Pro 4700G
|
||||
- Ryzen 7 Pro 4750GE
|
||||
- Ryzen 5 Pro 4650G
|
||||
- Ryzen 5 Pro 4650GE
|
||||
- Ryzen 3 Pro 4350G
|
||||
- Ryzen 3 Pro 4350GE
|
||||
|
||||
**GCN GFX10 (RDNA 1)** [AMD-GCN-GFX10-RDNA1]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
``gfx1010`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5700
|
||||
- wavefrontsize64 - *pal-amdhsa* - Radeon RX 5700 XT
|
||||
- xnack - *pal-amdpal* - Radeon Pro 5600 XT
|
||||
- Radeon Pro 5600M
|
||||
``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 - *pal-amdhsa*
|
||||
- xnack - *pal-amdpal*
|
||||
.. TODO::
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``gfx1010`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5700
|
||||
- wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5700 XT
|
||||
- xnack scratch - *pal-amdpal* - Radeon Pro 5600 XT
|
||||
- Radeon Pro 5600M
|
||||
``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 - Absolute - *pal-amdhsa*
|
||||
- xnack flat - *pal-amdpal*
|
||||
scratch .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
``gfx1012`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5500
|
||||
- wavefrontsize64 - *pal-amdhsa* - Radeon RX 5500 XT
|
||||
- xnack - *pal-amdpal*
|
||||
``gfx1012`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5500
|
||||
- wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5500 XT
|
||||
- xnack scratch - *pal-amdpal*
|
||||
**GCN GFX10 (RDNA 2)** [AMD-GCN-GFX10-RDNA2]_
|
||||
-------------------------------------------------------------------------------------------------------------------
|
||||
``gfx1030`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 - *pal-amdhsa*
|
||||
- *pal-amdpal* .. TODO::
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
``gfx1030`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 flat - *pal-amdhsa*
|
||||
scratch - *pal-amdpal* .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
``gfx1031`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 - *pal-amdhsa*
|
||||
- *pal-amdpal* .. TODO::
|
||||
``gfx1031`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 flat - *pal-amdhsa*
|
||||
scratch - *pal-amdpal* .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
``gfx1032`` ``amdgcn`` dGPU - cumode - *pal-amdhsa* *TBA*
|
||||
- wavefrontsize64 - *pal-amdhsa*
|
||||
- *pal-amdpal* .. TODO::
|
||||
``gfx1032`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA*
|
||||
- wavefrontsize64 flat - *pal-amdhsa*
|
||||
scratch - *pal-amdpal* .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
``gfx1033`` ``amdgcn`` APU - cumode - *pal-amdpal* *TBA*
|
||||
- wavefrontsize64
|
||||
.. TODO::
|
||||
``gfx1033`` ``amdgcn`` APU - cumode - Absolute - *pal-amdpal* *TBA*
|
||||
- wavefrontsize64 flat
|
||||
scratch .. TODO::
|
||||
|
||||
Add product
|
||||
names.
|
||||
Add product
|
||||
names.
|
||||
|
||||
=========== =============== ============ ===== ================= =========== =============== ======================
|
||||
=========== =============== ============ ===== ================= =============== =============== ======================
|
||||
|
||||
.. _amdgpu-target-features:
|
||||
|
||||
@ -4162,18 +4162,9 @@ SGPR register initial state is defined in
|
||||
(kernel descriptor enable of
|
||||
field) SGPRs
|
||||
========== ========================== ====== ==============================
|
||||
First Private Segment Buffer 4 This is 4 SGPRs:
|
||||
(enable_sgpr_private
|
||||
First Private Segment Buffer 4 See
|
||||
(enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`.
|
||||
_segment_buffer)
|
||||
V# that can be used,
|
||||
together with Scratch
|
||||
Wavefront Offset as an
|
||||
offset, to access the
|
||||
private memory space using a
|
||||
segment address.
|
||||
|
||||
CP uses the value provided
|
||||
by the runtime.
|
||||
then Dispatch Ptr 2 64-bit address of AQL dispatch
|
||||
(enable_sgpr_dispatch_ptr) packet for kernel dispatch
|
||||
actually executing.
|
||||
@ -4193,87 +4184,8 @@ SGPR register initial state is defined in
|
||||
then Dispatch Id 2 64-bit Dispatch ID of the
|
||||
(enable_sgpr_dispatch_id) dispatch packet being
|
||||
executed.
|
||||
then Flat Scratch Init 2 This is 2 SGPRs:
|
||||
(enable_sgpr_flat_scratch
|
||||
_init) GFX6
|
||||
Not supported.
|
||||
GFX7-GFX8
|
||||
The first SGPR is a 32-bit
|
||||
byte offset from
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID``
|
||||
to per SPI base of memory
|
||||
for scratch for the queue
|
||||
executing the kernel
|
||||
dispatch. CP obtains this
|
||||
from the runtime. (The
|
||||
Scratch Segment Buffer base
|
||||
address is
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID``
|
||||
plus this offset.) The value
|
||||
of Scratch Wavefront Offset must
|
||||
be added to this offset by
|
||||
the kernel machine code,
|
||||
right shifted by 8, and
|
||||
moved to the FLAT_SCRATCH_HI
|
||||
SGPR register.
|
||||
FLAT_SCRATCH_HI corresponds
|
||||
to SGPRn-4 on GFX7, and
|
||||
SGPRn-6 on GFX8 (where SGPRn
|
||||
is the highest numbered SGPR
|
||||
allocated to the wavefront).
|
||||
FLAT_SCRATCH_HI is
|
||||
multiplied by 256 (as it is
|
||||
in units of 256 bytes) and
|
||||
added to
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID``
|
||||
to calculate the per wavefront
|
||||
FLAT SCRATCH BASE in flat
|
||||
memory instructions that
|
||||
access the scratch
|
||||
aperture.
|
||||
|
||||
The second SGPR is 32-bit
|
||||
byte size of a single
|
||||
work-item's scratch memory
|
||||
usage. CP obtains this from
|
||||
the runtime, and it is
|
||||
always a multiple of DWORD.
|
||||
CP checks that the value in
|
||||
the kernel dispatch packet
|
||||
Private Segment Byte Size is
|
||||
not larger and requests the
|
||||
runtime to increase the
|
||||
queue's scratch size if
|
||||
necessary. The kernel code
|
||||
must move it to
|
||||
FLAT_SCRATCH_LO which is
|
||||
SGPRn-3 on GFX7 and SGPRn-5
|
||||
on GFX8. FLAT_SCRATCH_LO is
|
||||
used as the FLAT SCRATCH
|
||||
SIZE in flat memory
|
||||
instructions. Having CP load
|
||||
it once avoids loading it at
|
||||
the beginning of every
|
||||
wavefront.
|
||||
GFX9-GFX10
|
||||
This is the
|
||||
64-bit base address of the
|
||||
per SPI scratch backing
|
||||
memory managed by SPI for
|
||||
the queue executing the
|
||||
kernel dispatch. CP obtains
|
||||
this from the runtime (and
|
||||
divides it if there are
|
||||
multiple Shader Arrays each
|
||||
with its own SPI). The value
|
||||
of Scratch Wavefront Offset must
|
||||
be added by the kernel
|
||||
machine code and the result
|
||||
moved to the FLAT_SCRATCH
|
||||
SGPR which is SGPRn-6 and
|
||||
SGPRn-5. It is used as the
|
||||
FLAT SCRATCH BASE in flat
|
||||
memory instructions.
|
||||
then Flat Scratch Init 2 See
|
||||
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
|
||||
then Private Segment Size 1 The 32-bit byte size of a
|
||||
(enable_sgpr_private single
|
||||
work-item's
|
||||
@ -4338,19 +4250,10 @@ SGPR register initial state is defined in
|
||||
then Work-Group Info 1 {first_wavefront, 14'b0000,
|
||||
(enable_sgpr_workgroup ordered_append_term[10:0],
|
||||
_info) threadgroup_size_in_wavefronts[5:0]}
|
||||
then Scratch Wavefront Offset 1 This is 1 SGPR:
|
||||
(enable_sgpr_private
|
||||
_segment_wavefront_offset)
|
||||
32-bit byte offset from base
|
||||
of scratch base of queue
|
||||
executing the kernel
|
||||
dispatch. Must be used as an
|
||||
offset with Private segment
|
||||
address when using Scratch
|
||||
Segment Buffer. It must be
|
||||
used to set up FLAT SCRATCH
|
||||
for flat addressing (see
|
||||
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
|
||||
then Scratch Wavefront Offset 1 See
|
||||
(enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
|
||||
_segment_wavefront_offset) and
|
||||
:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`.
|
||||
========== ========================== ====== ==============================
|
||||
|
||||
The order of the VGPR registers is defined, but the compiler can specify which
|
||||
@ -4390,12 +4293,11 @@ The setting of registers is done by GPU CP/ADC/SPI hardware as follows:
|
||||
combination including none.
|
||||
3. Scratch Wavefront Offset is set by SPI in a per wavefront basis which is why
|
||||
its value cannot be included with the flat scratch init value which is per
|
||||
queue.
|
||||
queue (see :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
|
||||
4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
|
||||
or (X, Y, Z).
|
||||
|
||||
See :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch` for Flat Scratch register
|
||||
pair initialization.
|
||||
5. Flat Scratch register pair initialization is described in
|
||||
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
|
||||
|
||||
The global segment can be accessed either using buffer instructions (GFX6 which
|
||||
has V# 64-bit address support), flat instructions (GFX7-GFX10), or global
|
||||
@ -4474,48 +4376,98 @@ pointer are replaced with immediate ``0`` offsets.
|
||||
Flat Scratch
|
||||
++++++++++++
|
||||
|
||||
GFX6
|
||||
Flat scratch is not supported.
|
||||
There are different methods used for initializing flat scratch:
|
||||
|
||||
* If the *Target Properties* column of :ref:`amdgpu-processor-table`
|
||||
specifies *Does not support generic address space*:
|
||||
|
||||
Flat scratch is not supported and there is no flat scratch register pair.
|
||||
|
||||
* If the *Target Properties* column of :ref:`amdgpu-processor-table`
|
||||
specifies *Offset flat scratch*:
|
||||
|
||||
If the kernel or any function it calls may use flat operations to access
|
||||
scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
|
||||
(FLAT_SCRATCH_LO/FLAT_SCRATCH_HI). Initialization uses Flat Scratch Init and
|
||||
Scratch Wavefront Offset SGPR registers (see
|
||||
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
|
||||
|
||||
1. The low word of Flat Scratch Init is the 32-bit byte offset from
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
|
||||
being managed by SPI for the queue executing the kernel dispatch. This is
|
||||
the same value used in the Scratch Segment Buffer V# base address.
|
||||
|
||||
CP obtains this from the runtime. (The Scratch Segment Buffer base address
|
||||
is ``SH_HIDDEN_PRIVATE_BASE_VIMID`` plus this offset.)
|
||||
|
||||
The prolog must add the value of Scratch Wavefront Offset to get the
|
||||
wavefront's byte scratch backing memory offset from
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID``.
|
||||
|
||||
The Scratch Wavefront Offset must also be used as an offset with Private
|
||||
segment address when using the Scratch Segment Buffer.
|
||||
|
||||
Since FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right
|
||||
shifted by 8 before moving into FLAT_SCRATCH_HI.
|
||||
|
||||
FLAT_SCRATCH_HI corresponds to SGPRn-4 on GFX7, and SGPRn-6 on GFX8 (where
|
||||
SGPRn is the highest numbered SGPR allocated to the wavefront).
|
||||
FLAT_SCRATCH_HI is multiplied by 256 (as it is in units of 256 bytes) and
|
||||
added to ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to calculate the per wavefront
|
||||
FLAT SCRATCH BASE in flat memory instructions that access the scratch
|
||||
aperture.
|
||||
2. The second word of Flat Scratch Init is 32-bit byte size of a single
|
||||
work-items scratch memory usage.
|
||||
|
||||
CP obtains this from the runtime, and it is always a multiple of DWORD. CP
|
||||
checks that the value in the kernel dispatch packet Private Segment Byte
|
||||
Size is not larger and requests the runtime to increase the queue's scratch
|
||||
size if necessary.
|
||||
|
||||
CP directly loads from the kernel dispatch packet Private Segment Byte Size
|
||||
field and rounds up to a multiple of DWORD. Having CP load it once avoids
|
||||
loading it at the beginning of every wavefront.
|
||||
|
||||
The kernel prolog code must move it to FLAT_SCRATCH_LO which is SGPRn-3 on
|
||||
GFX7 and SGPRn-5 on GFX8. FLAT_SCRATCH_LO is used as the FLAT SCRATCH SIZE
|
||||
in flat memory instructions.
|
||||
|
||||
* If the *Target Properties* column of :ref:`amdgpu-processor-table`
|
||||
specifies *Absolute flat scratch*:
|
||||
|
||||
GFX7-GFX10
|
||||
If the kernel or any function it calls may use flat operations to access
|
||||
scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
|
||||
(FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which are in SGPRn-4/SGPRn-3). Initialization
|
||||
uses Flat Scratch Init and Scratch Wavefront Offset SGPR registers (see
|
||||
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
|
||||
|
||||
GFX7-GFX8
|
||||
The Flat Scratch Init is the 64-bit address of the base of scratch backing
|
||||
memory being managed by SPI for the queue executing the kernel dispatch.
|
||||
|
||||
1. The low word of Flat Scratch Init is 32-bit byte offset from
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
|
||||
being managed by SPI for the queue executing the kernel dispatch. This is
|
||||
the same value used in the Scratch Segment Buffer V# base address. The
|
||||
prolog must add the value of Scratch Wavefront Offset to get the
|
||||
wavefront's byte scratch backing memory offset from
|
||||
``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since FLAT_SCRATCH_LO is in units of 256
|
||||
bytes, the offset must be right shifted by 8 before moving into
|
||||
FLAT_SCRATCH_LO.
|
||||
2. The second word of Flat Scratch Init is 32-bit byte size of a single
|
||||
work-items scratch memory usage. This is directly loaded from the kernel
|
||||
dispatch packet Private Segment Byte Size and rounded up to a multiple of
|
||||
DWORD. Having CP load it once avoids loading it at the beginning of every
|
||||
wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT
|
||||
SCRATCH SIZE.
|
||||
CP obtains this from the runtime.
|
||||
|
||||
GFX9-GFX10
|
||||
The Flat Scratch Init is the 64-bit address of the base of scratch backing
|
||||
memory being managed by SPI for the queue executing the kernel dispatch. The
|
||||
prolog must add the value of the wave's Scratch Wavefront Offset and moved
|
||||
as a 64-bit value to the FLAT_SCRATCH pair for use as the flat scratch base
|
||||
in flat memory instructions.
|
||||
The kernel prolog must add the value of the wave's Scratch Wavefront Offset
|
||||
and move the result as a 64-bit value to the FLAT_SCRATCH SGPR register pair
|
||||
which is SGPRn-6 and SGPRn-5. It is used as the FLAT SCRATCH BASE in flat
|
||||
memory instructions.
|
||||
|
||||
The Scratch Wavefront Offset must also be used as an offset with Private
|
||||
segment address when using the Scratch Segment Buffer (see
|
||||
:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`).
|
||||
|
||||
.. _amdgpu-amdhsa-kernel-prolog-private-segment-buffer:
|
||||
|
||||
Private Segment Buffer
|
||||
++++++++++++++++++++++
|
||||
|
||||
A set of four SGPRs beginning at a four-aligned SGPR index are always selected
|
||||
to serve as the scratch V# for the kernel as follows:
|
||||
Private Segment Buffer SGPR register is used to initilize 4 SGPRs
|
||||
that are used as a V# to access scratch. CP uses the value provided by the
|
||||
runtime. It is used, together with Scratch Wavefront Offset as an offset, to
|
||||
access the private memory space using a segment address. See
|
||||
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`.
|
||||
|
||||
The scratch V# is a four-aligned SGPR and always selected for the kernel as
|
||||
follows:
|
||||
|
||||
- If it is known during instruction selection that there is stack usage,
|
||||
SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if
|
||||
|
Loading…
x
Reference in New Issue
Block a user