1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 04:02:41 +01:00

[AMDGPU] Update gfx90a memory model support

Update AMDGPU gfx90a memory model to make coarse grain memory allocations
consistent when fine grained system scope atomic acquire and release is
performed.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105137
This commit is contained in:
Tony Tye 2021-05-07 20:55:23 +00:00
parent 16bf4a4717
commit 47ca9dba51
7 changed files with 625 additions and 51 deletions

View File

@ -6093,10 +6093,10 @@ For GFX90A:
ensures a previous vector memory operation has completed before executing a ensures a previous vector memory operation has completed before executing a
subsequent vector memory or LDS operation and so can be used to meet the subsequent vector memory or LDS operation and so can be used to meet the
requirements of acquire and release. requirements of acquire and release.
* The L2 cache of one agent can be kept coherent with other agents by using * The L2 cache of one agent can be kept coherent with other agents by:
the MTYPE CC (cache-coherent) with the PTE C-bit for memory local to the L2, using the MTYPE RW (read-write) or MTYPE CC (cache-coherent) with the PTE
and MTYPE UC (uncached) with the PTE C-bit set for memory not local to the C-bit for memory local to the L2; and using the MTYPE NC (non-coherent) with
L2. the PTE C-bit set or MTYPE UC (uncached) for memory not local to the L2.
* Any local memory cache lines will be automatically invalidated by writes * Any local memory cache lines will be automatically invalidated by writes
from CUs associated with other L2 caches, or writes from the CPU, due to from CUs associated with other L2 caches, or writes from the CPU, due to
@ -6108,13 +6108,21 @@ For GFX90A:
the CPU cache due to the L2 probe filter and and the PTE C-bit being set. the CPU cache due to the L2 probe filter and and the PTE C-bit being set.
* Since all work-groups on the same agent share the same L2, no L2 * Since all work-groups on the same agent share the same L2, no L2
invalidation or writeback is required for coherence. invalidation or writeback is required for coherence.
* Since local memory reads and writes of work-groups in different agents * To ensure coherence of local and remote memory writes of work-groups in
access memory using MTYPE CC, no L2 invalidate or writeback is required different agents a ``buffer_wbl2`` is required. It will writeback dirty L2
for coherence. MTYPE CC causes write through to DRAM and local reads to be cache lines of MTYPE RW (used for local coarse grain memory) and MTYPE NC
invalidated by remote writes with with the PTE C-bit. ()used for remote coarse grain memory). Note that MTYPE CC (used for local
* Since remote memory reads and writes of work-groups in different agents fine grain memory) causes write through to DRAM, and MTYPE UC (used for
access memory using MTYPE UC, no L2 invalidate or writeback is required remote fine grain memory) bypasses the L2, so both will never result in
for coherence. MTYPE UC causes direct accesses to DRAM. dirty L2 cache lines.
* To ensure coherence of local and remote memory reads of work-groups in
different agents a ``buffer_invl2`` is required. It will invalidate L2
cache lines with MTYPE NC (used for remote coarse grain memory). Note that
MTYPE CC (used for local fine grain memory) and MTYPE RW (used for local
coarse memory) cause local reads to be invalidated by remote writes with
with the PTE C-bit so these cache lines are not invalidated. Note that
MTYPE UC (used for remote fine grain memory) bypasses the L2, so will
never result in L2 cache lines that need to be invalidated.
* PCIe access from the GPU to the CPU memory is kept coherent by using the * PCIe access from the GPU to the CPU memory is kept coherent by using the
MTYPE UC (uncached) which bypasses the L2. MTYPE UC (uncached) which bypasses the L2.
@ -6384,14 +6392,15 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
2. s_waitcnt vmcnt(0) 2. s_waitcnt vmcnt(0)
- Must happen before - Must happen before
following following buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures the load - Ensures the load
has completed has completed
before invalidating before invalidating
the cache. the cache.
3. buffer_wbinvl1_vol 3. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -6401,7 +6410,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -6444,13 +6455,15 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
lgkmcnt(0). lgkmcnt(0).
- Must happen before - Must happen before
following following
buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures the flat_load - Ensures the flat_load
has completed has completed
before invalidating before invalidating
the caches. the caches.
3. buffer_wbinvl1_vol 3. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -6459,8 +6472,10 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
atomic/atomicrmw. atomic/atomicrmw.
- Ensures that - Ensures that
following following
L1 loads will not see loads will not see
stale global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -6579,7 +6594,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
2. s_waitcnt vmcnt(0) 2. s_waitcnt vmcnt(0)
- Must happen before - Must happen before
following following buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures the - Ensures the
atomicrmw has atomicrmw has
@ -6587,7 +6602,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
invalidating the invalidating the
caches. caches.
3. buffer_wbinvl1_vol 3. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -6597,8 +6613,10 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
MTYPE RW and CC L2 memory nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -6641,6 +6659,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
lgkmcnt(0). lgkmcnt(0).
- Must happen before - Must happen before
following following
buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures the - Ensures the
atomicrmw has atomicrmw has
@ -6648,7 +6667,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
invalidating the invalidating the
caches. caches.
3. buffer_wbinvl1_vol 3. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -6658,7 +6678,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -6734,7 +6756,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
value read by the value read by the
fence-paired-atomic. fence-paired-atomic.
3. buffer_wbinvl1_vol 2. buffer_wbinvl1_vol
- If not TgSplit execution - If not TgSplit execution
mode, omit. mode, omit.
@ -6872,7 +6894,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
termed the termed the
fence-paired-atomic). fence-paired-atomic).
- Must happen before - Must happen before
the following the following buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures that the - Ensures that the
fence-paired atomic fence-paired atomic
@ -6887,7 +6909,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
the the
fence-paired-atomic. fence-paired-atomic.
2. buffer_wbinvl1_vol 2. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before any - Must happen before any
following global/generic following global/generic
@ -6897,7 +6920,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -6991,8 +7016,18 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
released. released.
2. buffer/global/flat_store 2. buffer/global/flat_store
store atomic release - system - global 1. s_waitcnt lgkmcnt(0) & store atomic release - system - global 1. buffer_wbl2
- generic vmcnt(0) - generic
- Must happen before
following s_waitcnt.
- Performs L2 writeback to
ensure previous
global/generic
store/atomicrmw are
visible at system scope.
2. s_waitcnt lgkmcnt(0) &
vmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
omit lgkmcnt(0). omit lgkmcnt(0).
@ -7035,7 +7070,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
store that is being store that is being
released. released.
2. buffer/global/flat_store 3. buffer/global/flat_store
atomicrmw release - singlethread - global 1. buffer/global/flat_atomic atomicrmw release - singlethread - global 1. buffer/global/flat_atomic
- wavefront - generic - wavefront - generic
atomicrmw release - singlethread - local *If TgSplit execution mode, atomicrmw release - singlethread - local *If TgSplit execution mode,
@ -7123,8 +7158,18 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
is being released. is being released.
2. buffer/global/flat_atomic 2. buffer/global/flat_atomic
atomicrmw release - system - global 1. s_waitcnt lgkmcnt(0) & atomicrmw release - system - global 1. buffer_wbl2
- generic vmcnt(0) - generic
- Must happen before
following s_waitcnt.
- Performs L2 writeback to
ensure previous
global/generic
store/atomicrmw are
visible at system scope.
2. s_waitcnt lgkmcnt(0) &
vmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
omit lgkmcnt(0). omit lgkmcnt(0).
@ -7165,7 +7210,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
store that is being store that is being
released. released.
2. buffer/global/flat_atomic 3. buffer/global/flat_atomic
fence release - singlethread *none* *none* fence release - singlethread *none* *none*
- wavefront - wavefront
fence release - workgroup *none* 1. s_waitcnt lgkm/vmcnt(0) fence release - workgroup *none* 1. s_waitcnt lgkm/vmcnt(0)
@ -7298,7 +7343,20 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
following following
fence-paired-atomic. fence-paired-atomic.
fence release - system *none* 1. s_waitcnt lgkmcnt(0) & fence release - system *none* 1. buffer_wbl2
- If OpenCL and
address space is
local, omit.
- Must happen before
following s_waitcnt.
- Performs L2 writeback to
ensure previous
global/generic
store/atomicrmw are
visible at system scope.
2. s_waitcnt lgkmcnt(0) &
vmcnt(0) vmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
@ -7588,7 +7646,17 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
will not see stale will not see stale
global data. global data.
atomicrmw acq_rel - system - global 1. s_waitcnt lgkmcnt(0) & atomicrmw acq_rel - system - global 1. buffer_wbl2
- Must happen before
following s_waitcnt.
- Performs L2 writeback to
ensure previous
global/generic
store/atomicrmw are
visible at system scope.
2. s_waitcnt lgkmcnt(0) &
vmcnt(0) vmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
@ -7629,11 +7697,11 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
atomicrmw that is atomicrmw that is
being released. being released.
2. buffer/global_atomic 3. buffer/global_atomic
3. s_waitcnt vmcnt(0) 4. s_waitcnt vmcnt(0)
- Must happen before - Must happen before
following following buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures the - Ensures the
atomicrmw has atomicrmw has
@ -7641,7 +7709,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
invalidating the invalidating the
caches. caches.
4. buffer_wbinvl1_vol 5. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -7651,7 +7720,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -7726,7 +7797,17 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
will not see stale will not see stale
global data. global data.
atomicrmw acq_rel - system - generic 1. s_waitcnt lgkmcnt(0) & atomicrmw acq_rel - system - generic 1. buffer_wbl2
- Must happen before
following s_waitcnt.
- Performs L2 writeback to
ensure previous
global/generic
store/atomicrmw are
visible at system scope.
2. s_waitcnt lgkmcnt(0) &
vmcnt(0) vmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
@ -7767,8 +7848,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
atomicrmw that is atomicrmw that is
being released. being released.
2. flat_atomic 3. flat_atomic
3. s_waitcnt vmcnt(0) & 4. s_waitcnt vmcnt(0) &
lgkmcnt(0) lgkmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
@ -7776,7 +7857,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- If OpenCL, omit - If OpenCL, omit
lgkmcnt(0). lgkmcnt(0).
- Must happen before - Must happen before
following following buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures the - Ensures the
atomicrmw has atomicrmw has
@ -7784,7 +7865,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
invalidating the invalidating the
caches. caches.
4. buffer_wbinvl1_vol 5. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -7794,7 +7876,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.
@ -7902,7 +7986,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
the the
acquire-fence-paired-atomic. acquire-fence-paired-atomic.
3. buffer_wbinvl1_vol 2. buffer_wbinvl1_vol
- If not TgSplit execution - If not TgSplit execution
mode, omit. mode, omit.
@ -8007,7 +8091,20 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
requirements of requirements of
acquire. acquire.
fence acq_rel - system *none* 1. s_waitcnt lgkmcnt(0) & fence acq_rel - system *none* 1. buffer_wbl2
- If OpenCL and
address space is
local, omit.
- Must happen before
following s_waitcnt.
- Performs L2 writeback to
ensure previous
global/generic
store/atomicrmw are
visible at system scope.
2. s_waitcnt lgkmcnt(0) &
vmcnt(0) vmcnt(0)
- If TgSplit execution mode, - If TgSplit execution mode,
@ -8048,7 +8145,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
atomic/store atomic/store
atomic/atomicrmw. atomic/atomicrmw.
- Must happen before - Must happen before
the following the following buffer_invl2 and
buffer_wbinvl1_vol. buffer_wbinvl1_vol.
- Ensures that the - Ensures that the
preceding preceding
@ -8087,7 +8184,8 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
requirements of requirements of
release. release.
2. buffer_wbinvl1_vol 3. buffer_invl2;
buffer_wbinvl1_vol
- Must happen before - Must happen before
any following any following
@ -8098,7 +8196,9 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx90a-table`.
- Ensures that - Ensures that
following following
loads will not see loads will not see
stale L1 global data. stale L1 global data,
nor see stale L2 MTYPE
NC global data.
MTYPE RW and CC memory will MTYPE RW and CC memory will
never be stale in L2 due to never be stale in L2 due to
the memory probes. the memory probes.

View File

@ -452,6 +452,12 @@ public:
SIAtomicScope Scope, SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace, SIAtomicAddrSpace AddrSpace,
Position Pos) const override; Position Pos) const override;
bool insertRelease(MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace,
bool IsCrossAddrSpaceOrdering,
Position Pos) const override;
}; };
class SIGfx10CacheControl : public SIGfx7CacheControl { class SIGfx10CacheControl : public SIGfx7CacheControl {
@ -1265,9 +1271,26 @@ bool SIGfx90ACacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
bool Changed = false; bool Changed = false;
MachineBasicBlock &MBB = *MI->getParent();
DebugLoc DL = MI->getDebugLoc();
if (Pos == Position::AFTER)
++MI;
if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) { if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
switch (Scope) { switch (Scope) {
case SIAtomicScope::SYSTEM: case SIAtomicScope::SYSTEM:
// Ensures that following loads will not see stale remote VMEM data or
// stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW and
// CC will never be stale due to the local memory probes.
BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INVL2));
// Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
// hardware does not reorder memory operations by the same wave with
// respect to a preceding "BUFFER_INVL2". The invalidate is guaranteed to
// remove any cache lines of earlier writes by the same wave and ensures
// later reads by the same wave will refetch the cache lines.
Changed = true;
break;
case SIAtomicScope::AGENT: case SIAtomicScope::AGENT:
// Same as GFX7. // Same as GFX7.
break; break;
@ -1297,11 +1320,62 @@ bool SIGfx90ACacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
/// Other address spaces do not have a cache. /// Other address spaces do not have a cache.
if (Pos == Position::AFTER)
--MI;
Changed |= SIGfx7CacheControl::insertAcquire(MI, Scope, AddrSpace, Pos); Changed |= SIGfx7CacheControl::insertAcquire(MI, Scope, AddrSpace, Pos);
return Changed; return Changed;
} }
bool SIGfx90ACacheControl::insertRelease(MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace,
bool IsCrossAddrSpaceOrdering,
Position Pos) const {
bool Changed = false;
MachineBasicBlock &MBB = *MI->getParent();
DebugLoc DL = MI->getDebugLoc();
if (Pos == Position::AFTER)
++MI;
if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) != SIAtomicAddrSpace::NONE) {
switch (Scope) {
case SIAtomicScope::SYSTEM:
// Inserting a "S_WAITCNT vmcnt(0)" before is not required because the
// hardware does not reorder memory operations by the same wave with
// respect to a following "BUFFER_WBL2". The "BUFFER_WBL2" is guaranteed
// to initiate writeback of any dirty cache lines of earlier writes by the
// same wave. A "S_WAITCNT vmcnt(0)" is needed after to ensure the
// writeback has completed.
BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_WBL2));
// Followed by same as GFX7, which will ensure the necessary "S_WAITCNT
// vmcnt(0)" needed by the "BUFFER_WBL2".
Changed = true;
break;
case SIAtomicScope::AGENT:
case SIAtomicScope::WORKGROUP:
case SIAtomicScope::WAVEFRONT:
case SIAtomicScope::SINGLETHREAD:
// Same as GFX7.
break;
default:
llvm_unreachable("Unsupported synchronization scope");
}
}
if (Pos == Position::AFTER)
--MI;
Changed |=
SIGfx7CacheControl::insertRelease(MI, Scope, AddrSpace,
IsCrossAddrSpaceOrdering, Pos);
return Changed;
}
bool SIGfx10CacheControl::enableLoadCacheBypass( bool SIGfx10CacheControl::enableLoadCacheBypass(
const MachineBasicBlock::iterator &MI, const MachineBasicBlock::iterator &MI,
SIAtomicScope Scope, SIAtomicScope Scope,

View File

@ -424,9 +424,11 @@ define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat(double addrspace(1)*
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1 ; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: v_mov_b32_e32 v4, 0 ; GFX90A-NEXT: v_mov_b32_e32 v4, 0
; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0 ; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc ; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3] ; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
@ -470,9 +472,11 @@ define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat_system(double addrsp
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1 ; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: v_mov_b32_e32 v4, 0 ; GFX90A-NEXT: v_mov_b32_e32 v4, 0
; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0 ; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc ; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3] ; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
@ -526,9 +530,11 @@ define double @global_atomic_fadd_f64_rtn_pat(double addrspace(1)* %ptr, double
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1] ; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0 ; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc ; GFX90A-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5] ; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
@ -571,9 +577,11 @@ define double @global_atomic_fadd_f64_rtn_pat_system(double addrspace(1)* %ptr,
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1] ; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0 ; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc ; GFX90A-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5] ; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
@ -655,9 +663,11 @@ define amdgpu_kernel void @flat_atomic_fadd_f64_noret_pat(double* %ptr) #1 {
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0 ; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
; GFX90A-NEXT: v_pk_mov_b32 v[4:5], s[0:1], s[0:1] op_sel:[0,1] ; GFX90A-NEXT: v_pk_mov_b32 v[4:5], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc ; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3] ; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
@ -702,9 +712,11 @@ define amdgpu_kernel void @flat_atomic_fadd_f64_noret_pat_system(double* %ptr) #
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0 ; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
; GFX90A-NEXT: v_pk_mov_b32 v[4:5], s[0:1], s[0:1] op_sel:[0,1] ; GFX90A-NEXT: v_pk_mov_b32 v[4:5], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc ; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: s_waitcnt lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
@ -730,9 +742,11 @@ define double @flat_atomic_fadd_f64_rtn_pat(double* %ptr) #1 {
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1] ; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0 ; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc ; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5] ; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
@ -775,9 +789,11 @@ define double @flat_atomic_fadd_f64_rtn_pat_system(double* %ptr) #1 {
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1] ; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0 ; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc ; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: s_waitcnt lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5] ; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]

View File

@ -70,9 +70,11 @@ define amdgpu_kernel void @global_atomic_fadd_ret_f32(float addrspace(1)* %ptr)
; GFX90A-NEXT: v_mov_b32_e32 v1, v0 ; GFX90A-NEXT: v_mov_b32_e32 v1, v0
; GFX90A-NEXT: v_mov_b32_e32 v2, 0 ; GFX90A-NEXT: v_mov_b32_e32 v2, 0
; GFX90A-NEXT: v_add_f32_e32 v0, 4.0, v1 ; GFX90A-NEXT: v_add_f32_e32 v0, 4.0, v1
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc ; GFX90A-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1 ; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3] ; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
@ -527,9 +529,11 @@ define amdgpu_kernel void @global_atomic_fadd_ret_f32_system(float addrspace(1)*
; GFX90A-NEXT: v_mov_b32_e32 v1, v0 ; GFX90A-NEXT: v_mov_b32_e32 v1, v0
; GFX90A-NEXT: v_mov_b32_e32 v2, 0 ; GFX90A-NEXT: v_mov_b32_e32 v2, 0
; GFX90A-NEXT: v_add_f32_e32 v0, 4.0, v1 ; GFX90A-NEXT: v_add_f32_e32 v0, 4.0, v1
; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc ; GFX90A-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
; GFX90A-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: buffer_wbinvl1_vol ; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1 ; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3] ; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]

View File

@ -1275,13 +1275,17 @@ define amdgpu_kernel void @system_acquire_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_acquire_fence: ; GFX90A-TGSPLIT-LABEL: system_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1319,11 +1323,13 @@ define amdgpu_kernel void @system_release_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_release_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_release_fence: ; GFX90A-TGSPLIT-LABEL: system_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1367,13 +1373,17 @@ define amdgpu_kernel void @system_acq_rel_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence: ; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1417,13 +1427,17 @@ define amdgpu_kernel void @system_seq_cst_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence: ; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1467,13 +1481,17 @@ define amdgpu_kernel void @system_one_as_acquire_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence: ; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1511,11 +1529,13 @@ define amdgpu_kernel void @system_one_as_release_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_release_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_one_as_release_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence: ; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1559,13 +1579,17 @@ define amdgpu_kernel void @system_one_as_acq_rel_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence: ; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:
@ -1609,13 +1633,17 @@ define amdgpu_kernel void @system_one_as_seq_cst_fence() {
; ;
; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence: ; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry ; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-NOTTGSPLIT-NEXT: s_endpgm ; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
; ;
; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence: ; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
; GFX90A-TGSPLIT: ; %bb.0: ; %entry ; GFX90A-TGSPLIT: ; %bb.0: ; %entry
; GFX90A-TGSPLIT-NEXT: buffer_wbl2
; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) ; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
; GFX90A-TGSPLIT-NEXT: buffer_invl2
; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol ; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
; GFX90A-TGSPLIT-NEXT: s_endpgm ; GFX90A-TGSPLIT-NEXT: s_endpgm
entry: entry:

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff