1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00
Commit Graph

572 Commits

Author SHA1 Message Date
Michael Liao
b2d24acf01 [amdgpu] Add 64-bit PC support when expanding unconditional branches.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106445
2021-07-26 14:50:30 -04:00
Carl Ritson
8b2d2affc5 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Stanislav Mekhanoshin
b044663832 [AMDGPU] Mark relevant rematerializable VOP2 instructions
Differential Revision: https://reviews.llvm.org/D106023
2021-07-21 14:24:59 -07:00
Stanislav Mekhanoshin
c90b3afab5 [AMDGPU] Mark all relevant VOP1 instructions rematerializable
Differential Revision: https://reviews.llvm.org/D105919
2021-07-21 14:05:32 -07:00
Sebastian Neubauer
b992832f74 [AMDGPU] Use isMetaInstruction for instruction size
Meta instructions have a size of 0. Use isMetaInstruction instead of
listing them explicitly.

Differential Revision: https://reviews.llvm.org/D106043
2021-07-15 12:23:11 +02:00
Stanislav Mekhanoshin
1dae83cabb [AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.

Differential Revision: https://reviews.llvm.org/D105836
2021-07-14 13:03:58 -07:00
Sebastian Neubauer
9f94b4e5f6 [AMDGPU] Mark waterfall loops as SI_WATERFALL_LOOP
This way, they can be detected later, e.g. by the
SIOptimizeVGPRLiveRange pass.

Differential Revision: https://reviews.llvm.org/D105467
2021-07-13 12:15:08 +02:00
Stanislav Mekhanoshin
327d625bef [AMDGPU] Make some VOP1 instructions rematerializable
This is a pilot change to verify the logic. The rest will be
done in a same way, at least the rest of VOP1.

Differential Revision: https://reviews.llvm.org/D105742
2021-07-12 23:43:45 -07:00
Stanislav Mekhanoshin
bb5a67dc29 [AMDGPU] Fix immediate sign during V_MOV_B64_PSEUDO expansion
Creating a V_MOV_B32 with zero extended immediate source
prevented conversion to V_BFREV_B32.

Differential Revision: https://reviews.llvm.org/D105235
2021-07-01 09:00:29 -07:00
Stanislav Mekhanoshin
4b0ec23e84 [AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants
This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.

This gives 10-20% uplift in a set of huge apps heavily using double
precession math.

Fixes: SWDEV-292645

Differential Revision: https://reviews.llvm.org/D104874
2021-06-30 11:45:38 -07:00
Piotr Sobczak
8db71b419f [AMDGPU] Fix 224-bit spills
Related to D104622.

Differential Revision: https://reviews.llvm.org/D105109
2021-06-29 17:52:16 +02:00
Carl Ritson
9a5c628361 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Carl Ritson
f3236bf71f [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.

Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is.  Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103672
2021-06-08 11:11:40 +09:00
Stanislav Mekhanoshin
82e56dec58 [AMDGPU] All GWS instructions need aligned VGPR on gfx90a
Fixes: SWDEV-288006

Differential Revision: https://reviews.llvm.org/D103197
2021-06-01 17:08:03 -07:00
Brendon Cahoon
d300f6cd3b [AMDGPU] Update SCC defs to VCC when uses are changed to VCC
The FixSGPRCopies pass converts instructions to VALU when
removing illegal VGPR to SGPR copies. Instructions that use SCC
are changed to use VCC instead. When that happens, the pass must
also change instructions that define SCC to define VCC.

The pass was not changing the SCC definition when an ADDC is
converted due to a input that is a VGPR to SGPR copy. But, the
initial ADD insruction, which define SCC, is not converted.
This causes a compilation failure due to a use of an undefined
physical register.

This patch adds code that inserts the SCC definition in the
MoveToVALU worklist when a SCC use is converted to a VCC use.

Differential Revision: https://reviews.llvm.org/D102111
2021-05-14 18:05:05 -04:00
Matt Arsenault
15058e16a1 AMDGPU: Fix assert when rewriting saddr d16 loads
moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.
2021-05-14 13:24:19 -04:00
Jay Foad
85f4b8dffe [AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions
A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:

MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)

ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34

Differential Revision: https://reviews.llvm.org/D73957
2021-05-14 10:10:43 +01:00
David Stuttard
0a28768900 [AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling
A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
2021-05-14 09:25:44 +01:00
Sebastian Neubauer
154e1ab9f4 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00
Stanislav Mekhanoshin
a8a85ab673 [AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU
Extend the legalization of global SADDR loads and stores
with changing to VADDR to the FLAT scratch instructions.

Differential Revision: https://reviews.llvm.org/D101408
2021-05-03 10:57:14 -07:00
Stanislav Mekhanoshin
ba07f46415 [AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU
Instead of legalizing saddr operand with a readfirstlane
when address is moved from SGPR to VGPR we can just
change the opcode.

Differential Revision: https://reviews.llvm.org/D101405
2021-05-03 10:36:26 -07:00
David Stuttard
dbb5e2911e [AMDGPU] Tidy up some simple expressions for clarity NFC
Slight refactor for clarity.

Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5

Differential Revision: https://reviews.llvm.org/D101610
2021-04-30 11:13:54 +01:00
Jay Foad
55e098ac95 [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Sebastian Neubauer
70e58c35e1 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin
17050632cf [AMDGPU] Fix copyPhysReg to not produce unalined vgpr access
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.

Differential Revision: https://reviews.llvm.org/D98549
2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin
196e7f3138 [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Jay Foad
d51dcd4c3d [AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_*
D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject
V_MOVs with extra implicit operands, but it accidentally rejected all
V_MOVs because of their implicit use of exec. Fix it but avoid adding a
moderately expensive call to MI.getDesc().getNumImplicitUses().

In real graphics shaders this changes quite a few vgpr copies into move-
immediates, which is good for avoiding stalls on GFX10.

Differential Revision: https://reviews.llvm.org/D98347
2021-03-10 16:18:12 +00:00
Ruiling Song
19ee89a560 [AMDGPU] Remove SI_MASK_BRANCH
This is already deprecated, so remove code working on this.
Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97545
2021-03-09 09:13:23 +08:00
Piotr Sobczak
02c8fb7a0c [AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258
2021-03-03 14:19:16 +01:00
Piotr Sobczak
97e89dc154 [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257
2021-03-03 09:33:57 +01:00
Jay Foad
ba178527e9 [AMDGPU] Better codegen for i64 bitreverse
Differential Revision: https://reviews.llvm.org/D97547
2021-02-26 15:51:36 +00:00
Matt Arsenault
f1ba6f4d9b AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin
f1c6dbc4d5 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Piotr Sobczak
55a4cf137d [AMDGPU] Add implicit vcc_lo on S_CBRANCH_VCCNZ in wave32
* Update skip-if-dead.ll with tests for wave32.
* Fix the crash in verifier in one newly enabled test by adding
  missing fixImplicitOperands in branch insertion code.

```
*** Bad machine code: Using an undefined physical register ***
- function:    test_kill_divergent_loop
- basic block: %bb.2 bb (0xad96308)
- instruction: S_CBRANCH_VCCNZ %bb.1, implicit $vcc_lo
- operand 1:   implicit $vcc_lo
LLVM ERROR: Found 1 machine code errors.
```

* Simplify "cbranch_kill" to not use interp instructions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96793
2021-02-17 15:14:57 +01:00
Carl Ritson
fb4b457dbd [AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746
2021-02-11 20:31:29 +09:00
Jay Foad
ee839f7374 [AMDGPU] Use named unified buffer format constant. NFC. 2021-02-08 17:34:36 +00:00
Carl Ritson
88bf797971 [AMDGPU] Mark V_SET_INACTIVE as defining SCC
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95509
2021-01-29 09:46:41 +09:00
dfukalov
f3ae5b9b8c [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Joe Nash
521d6a1785 [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Sebastian Neubauer
94e83a8359 [AMDGPU] Fix failing assert with scratch ST mode
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.

Differential Revision: https://reviews.llvm.org/D94406
2021-01-12 09:54:02 +01:00
dfukalov
d069b95364 [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
dfukalov
b7b67e3e9a [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Sebastian Neubauer
4b0b1e7b26 [AMDGPU] Unify flat offset logic
Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into
AMDGPUBaseInfo.

Differential Revision: https://reviews.llvm.org/D93287
2020-12-15 14:59:59 +01:00
Austin Kerbow
c65fb146ba [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Matt Arsenault
b3aa28b9d4 AMDGPU: Factor out large flat offset splitting 2020-11-13 11:22:13 -05:00
Stanislav Mekhanoshin
4dbdbbe753 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin
f12ec97f64 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad
8325244191 [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Stanislav Mekhanoshin
682b617208 [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Sebastian Neubauer
7e4be9501b [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00