1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

18 Commits

Author SHA1 Message Date
Carl Ritson
9a5c628361 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Baptiste Saleil
c4a674e21e [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
2021-04-26 17:21:49 -04:00
Petar Avramovic
4315e26388 [AMDGPU] Extend gfx10 test coverage. NFC.
Differential Revision: https://reviews.llvm.org/D99267
2021-03-29 11:13:55 +02:00
Matt Arsenault
f1ba4465de AMDGPU: Use kill instruction to hint soft clause live ranges
Previously we would use a bundle to hint the register allocator to not
overwrite the pointers in a sequence of loads to avoid breaking soft
clauses. This bundling was based on a fuzzy register pressure
heuristic, so we could not guarantee using more registers than are
really available. This would result in register allocator failing on
unsatisfiable bundles. Use a kill to artificially extend the live
ranges, so we can always succeed at register allocation even if it
means extra spills in the worst case.

This seems to capture most of the benefit of the bundle while avoiding
most of the risk presented by the bundle. However the lit tests do
show a handful of regressions. In some cases with sequences of
volatile loads, unused load components end up getting reallocated to
the next load which forces a wait between. There are also a few small
scheduling regressions where a hazard used to be avoided, and one
spill torture test which for some reason nearly doubles the stack
usage. There is also a bit of noise from leftover kills (it may make
sense for post-RA pseudos to strip all of these out).
2021-02-26 18:26:40 -05:00
Austin Kerbow
d1f23a1772 [AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882
2021-01-26 11:25:51 -08:00
Matt Arsenault
c22abcde1d AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer
instruction to materialize the 0 vs. the 64-bit copy.
2020-11-16 11:51:06 -05:00
Sebastian Neubauer
0b5f33b84b Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Sebastian Neubauer
26c9591459 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
Matt Arsenault
38b9cda6d0 AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Jay Foad
3f23d4b8c3 [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
Jay Foad
7622b45921 [AMDGPU] Remove dubious logic in bidirectional list scheduler
Summary:
pickNodeBidirectional tried to compare the best top candidate and the
best bottom candidate by examining TopCand.Reason and BotCand.Reason.
This is unsound because, after calling pickNodeFromQueue, Cand.Reason
does not reflect the most important reason why Cand was chosen. Rather
it reflects the most recent reason why it beat some other potential
candidate, which could have been for some low priority tie breaker
reason.

I have seen this cause problems where TopCand is a good candidate, but
because TopCand.Reason is ORDER (which is very low priority) it is
repeatedly ignored in favour of a mediocre BotCand. This is not how
bidirectional scheduling is supposed to work.

To fix this I changed the code to always compare TopCand and BotCand
directly, like the generic implementation of pickNodeBidirectional does.
This removes some uncommented AMDGPU-specific logic; if this logic turns
out to be important then perhaps it could be moved into an override of
tryCandidate instead.

Graphics shader benchmarking on gfx10 shows a lot more positive than
negative effects from this change.

Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB

Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68338
2020-02-28 21:35:34 +00:00
Sebastian Neubauer
92565aa596 [SelectionDAG] Optimize build_vector of truncates and shifts
Add a simplification to fuse a manual vector extract with shifts and
truncate into a bitcast.

Unpacking and packing values into vectors is only optimized with
extractelement instructions, not when manually unpacked using shifts
and truncates.
This patch simplifies shifts and truncates into a bitcast if possible.

Simplify (build_vec (trunc $1)
                    (trunc (srl $1 width))
                    (trunc (srl $1 (2 * width))) ...)
to (bitcast $1)

Differential Revision: https://reviews.llvm.org/D73892
2020-02-10 15:04:07 +01:00
Stanislav Mekhanoshin
65193fd6df [AMDGPU] Allow narrowing muti-dword loads
Currently BE allows only a little load narrowing because
of the fear it will produce sub-dword ext loads. However,
we can always allow narrowing if we are shrinking one
multi-dword load to another multi-dword load.

In particular we were unable to reduce s_load_dwordx8 into
s_load_dwordx4 if identity shuffle was used to extract
low 4 dwords.

Differential Revision: https://reviews.llvm.org/D73133
2020-01-24 11:03:41 -08:00
Stanislav Mekhanoshin
973468e2e7 Allow combining of extract_subvector to extract element
Differential Revision: https://reviews.llvm.org/D73132
2020-01-24 10:50:26 -08:00
Jay Foad
d10f996d89 [AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtr
Summary:
The typo has been present since memOpsHaveSameBasePtr was introduced in
r313208.

It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than
it was supposed to.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71616
2019-12-17 18:54:27 +00:00
Matt Arsenault
a09c49dee2 AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Matt Arsenault
0b429caefd AMDGPU: Custom lower vector_shuffle for v4i16/v4f16
Ordinarily it is lowered as a build_vector of each extract_vector_elt,
which in turn get lowered to bitcasts and bit shifts. Very little
understand the lowered extract pattern, resulting in much worse
code. We treat concat_vectors of v2i16 as legal, so prefer that.

llvm-svn: 364959
2019-07-02 19:15:45 +00:00
Matt Arsenault
25324b5081 AMDGPU: Add baseline test for packed shufflevector
llvm-svn: 364691
2019-06-28 23:43:40 +00:00