1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

5441 Commits

Author SHA1 Message Date
Sebastian Neubauer
7ea56efeb5 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00
Sebastian Neubauer
537305eda7 [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
Tim Renouf
83e3834a8d [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf
2a63696860 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jay Foad
d6e4e20e6b [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Petar Avramovic
cf35e6aad4 AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Stanislav Mekhanoshin
d05e59d827 [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Matt Arsenault
5f462fb314 AMDGPU: Reorder checks 2020-11-02 10:21:48 -05:00
Jay Foad
330e9dc042 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f0b38dc11fbce2553fc715067aaf42f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00
Jay Foad
e2e08ec61b Fix ds_read2/write2 unaligned offsets 2020-11-02 13:57:13 +00:00
Christudasan Devadasan
f694eea8d3 [AMDGPU] Some refactoring after D90404. NFC. 2020-11-01 13:18:53 +05:30
Christudasan Devadasan
45dd9c1c1d [AMDGPU] Add alignment check for v3 to v4 load type promotion
It should be enabled only when the load alignment is at least 8-byte.

Fixes: SWDEV-256824

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90404
2020-11-01 12:05:34 +05:30
Matt Arsenault
f52c63425c AMDGPU: Fix missing writelane cases to skip with exec=0 2020-10-30 11:15:11 -04:00
alex-t
6cb436b706 [AMDGPU] SILowerControlFlow::removeMBBifRedundant. Refactoring plus fix for the null MBB pointer in MF->splice
Detailed description: This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90314
2020-10-30 14:46:08 +03:00
Jay Foad
aaa3ac4c2d [AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds
By setting up the AsmStrings correctly we can remove some special cases
from AMDGPUInstPrinter::printOffset.

Differential Revision: https://reviews.llvm.org/D90307
2020-10-29 17:31:59 +00:00
Jay Foad
828db82d6d [AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".

All the codegen changes are caused by the post-RA scheduler no longer
treating readlane/writelane as scheduling barriers due to having
unmodelled side effects. (The pseudos are hasSideEffects = 0, but the
real instructions are hasSideEffects = ? which TableGen conservatively
treats as 1.)

Differential Revision: https://reviews.llvm.org/D90401
2020-10-29 16:00:53 +00:00
Jay Foad
c9407d40f8 [AMDGPU] Remove gds operand from ds_gws_* MachineInstrs
The operand value was always 1 (except in some bad MIR tests) so it was
redundant.

Differential Revision: https://reviews.llvm.org/D90378
2020-10-29 15:04:23 +00:00
Jay Foad
27bb902eb9 [AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode
Differential Revision: https://reviews.llvm.org/D90374
2020-10-29 14:54:33 +00:00
Jay Foad
cf260c4b89 [AMDGPU] Fix double space in disassembly of some DPP instructions
Differential Revision: https://reviews.llvm.org/D90373
2020-10-29 14:54:33 +00:00
Jay Foad
fc7508e993 [AMDGPU] Simplify insertNoops functions. NFC. 2020-10-29 10:55:20 +00:00
Austin Kerbow
aefa9d35ee [AMDGPU] Add Reset function to GCNHazardRecognizer
Reset the tracked emitted instructions when starting scheduling on a new
region.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90347
2020-10-28 16:32:32 -07:00
Jay Foad
36faa379a8 [AMDGPU] Allow some modifiers on VOP3B instructions
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.

This partially reverts 3b99f12a4e6f "AMDGPU: Remove modifiers from v_div_scale_*".

Differential Revision: https://reviews.llvm.org/D90296
2020-10-28 21:54:14 +00:00
Jay Foad
8c257f3f1e [AMDGPU] Fix double space in disassembly of SDWA instructions with vcc
Differential Revision: https://reviews.llvm.org/D90317
2020-10-28 21:39:39 +00:00
Austin Kerbow
deb4fbaa0b [AMDGPU] Fix inserting combined s_nop in bundles
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90334
2020-10-28 14:34:04 -07:00
Paul C. Anagnostopoulos
961a515ed0 [TableGen] [AMDGPU] Add !sub operator for subtraction
Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1))

Differential Revision: https://reviews.llvm.org/D90107
2020-10-28 12:27:53 -04:00
Jay Foad
4cf43bb623 [AMDGPU] Omit needless string concatenations. NFC. 2020-10-28 12:56:52 +00:00
Carl Ritson
30a3048ed2 [AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90236
2020-10-28 12:15:15 +09:00
Stanislav Mekhanoshin
2bb7f5cddb [AMDGPU] Change predicate for fma/fmac legacy
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.

Differential Revision: https://reviews.llvm.org/D90250
2020-10-27 12:03:52 -07:00
Michael Liao
18c2025616 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00
Michael Liao
ebdef472f9 [amdgpu] Enable use of AA during codegen.
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
  disable this feature. By Default, it's enabled.

Differential Revision: https://reviews.llvm.org/D89320
2020-10-27 09:46:23 -04:00
Jay Foad
1e89a40c5d [AMDGPU] Use DPP instead of Ext in a couple of class names. NFC. 2020-10-27 10:22:30 +00:00
Carl Ritson
d7606fe865 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Stanislav Mekhanoshin
c9969d6a73 Fixed release build after D89170 2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin
3016803f3e [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Stanislav Mekhanoshin
3dc15fa04e [AMDGPU] Fix VC warning about singed/unsigned comparison. NFC.
This is the warning reported in https://reviews.llvm.org/D89599
2020-10-26 11:55:57 -07:00
Benjamin Kramer
6892b335d9 [AMDGPU] Avoid unused variable warning in Release builds. NFC.
SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'
2020-10-26 18:11:57 +01:00
Jay Foad
7ac5d3d4c1 [AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC. 2020-10-26 14:03:35 +00:00
Sebastian Neubauer
d177eafb58 [AMDGPU] Emit new pal metadata by default
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.

Differential Revision: https://reviews.llvm.org/D90035
2020-10-26 10:16:17 +01:00
Christudasan Devadasan
53166b8424 [AMDGPU] Avoid offset register in MUBUF for direct stack object accesses
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.

Fixes: SWDEV-228562

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89234
2020-10-26 11:08:37 +05:30
dfukalov
efaecfc60e [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
Stanislav Mekhanoshin
270ab79b5a [AMDGPU] Fixed isLegalRegOperand() with physregs
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.

That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.

Differential Revision: https://reviews.llvm.org/D90064
2020-10-23 11:33:34 -07:00
vpykhtin
9d887381ef [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse.
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.

First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.

I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.

Differential Revision: https://reviews.llvm.org/D89386
2020-10-23 19:17:48 +03:00
Jay Foad
9321aed101 [AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differential Revision: https://reviews.llvm.org/D90028
2020-10-23 16:16:13 +01:00
Matt Arsenault
46f491ab64 AMDGPU: Don't query for TII in TII 2020-10-23 10:34:24 -04:00
Matt Arsenault
ae7605c97d AMDGPU: Increase branch size estimate with offset bug
This will be relaxed to insert a nop if the offset hits the bad value,
so over estimate branch instruction sizes.
2020-10-23 10:34:24 -04:00
Jay Foad
4d988f5d60 [AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy
Differential Revision: https://reviews.llvm.org/D88955
2020-10-23 09:31:00 +01:00
Tim Corringham
ec36700577 [AMDGPU] Add amdgpu specific loop threshold metadata
Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU
specific unroll threshold value to be specified on a loop by loop basis.

The intention is to be able to to allow more nuanced hints, e.g. specifying a
low threshold value to indicate that a loop may be unrolled if cheap enough
rather than using the all or nothing llvm.loop.unroll.disable metadata.

Differential Revision: https://reviews.llvm.org/D84779
2020-10-22 17:21:32 +01:00
Piotr Sobczak
e82875aa15 [AMDGPU] Fix expansion of i16 MULH
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.

Differential Revision: https://reviews.llvm.org/D89965
2020-10-22 17:05:06 +02:00
Matt Arsenault
bc2bb11de8 AMDGPU: Fix not always reserving VGPRs used for SGPR spilling
The VGPRs used for SGPR spills need to be reserved, even if we aren't
speculatively reserving one.

This was broken by 117e5609e98b43f925c678b72f816ad3a1c3eee7.
2020-10-22 10:19:19 -04:00
Matt Arsenault
b6cd2c3015 AMDGPU: Implement getNoPreservedMask
We don't support funclets for exception handling and I hit this when
manually reducing MIR.
2020-10-22 10:17:31 -04:00