1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

13 Commits

Author SHA1 Message Date
Piotr Sobczak
b2cdcb043a [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Re-commit D81925 with a bugfix D82370.

Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
2020-06-25 10:38:23 +02:00
Matt Arsenault
74c3954b0a Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5cea02f629d035f807460affbb65ae7ad.

Reported to break thousands of piglit tests.
2020-06-24 11:21:30 -04:00
alex-t
c89646a3e3 [AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82194
2020-06-24 11:50:40 +03:00
Piotr Sobczak
b9c5311c77 Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with
expensive checks enabled.

This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.
2020-06-19 16:41:04 +02:00
Piotr Sobczak
3a8e847d39 [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81925
2020-06-19 16:17:46 +02:00
Stanislav Mekhanoshin
6d06d957a7 [AMDGPU] Tune threshold for cmp/select vector lowering
It was set in total vector size while the idea was to limit
a number of instructions. Now it started to work with doubles
and thresholds needs to be updated.

Differential Revision: https://reviews.llvm.org/D80322
2020-05-21 08:59:35 -07:00
Stanislav Mekhanoshin
84f3733d30 [AMDGPU] Always expand ext/insertelement with divergent idx
Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.

Differential Revision: https://reviews.llvm.org/D80032
2020-05-20 15:51:29 -07:00
Stanislav Mekhanoshin
a8a78bdad8 [AMDGPU] Make v16f64/v16i64 legal
This allows indirect VGPR addressing to work.

Differential Revision: https://reviews.llvm.org/D79960
2020-05-14 14:46:55 -07:00
Stanislav Mekhanoshin
475fa9072b [AMDGPU] Optimized indirect multi-VGPR addressing
SelectMOVRELOffset prevents peeling of a constant from an index
if final base could be negative. isBaseWithConstantOffset() succeeds
if a value is an "add" or "or" operator. In case of "or" it shall
be an add-like "or" which never changes a sign of the sum given a
non-negative offset. I.e. we can safely allow peeling if operator is
an "or".

Differential Revision: https://reviews.llvm.org/D79898
2020-05-13 14:53:16 -07:00
Stanislav Mekhanoshin
0437d2eefe [AMDGPU] Make v4i64/v4f64/v8i64/v8f64 legal
We can produce such vectors in the Promote Alloca pass,
but we are unable to use movrel to operate it and lower
via scratch. Making it legal makes SI_INDIRECT patterns
work.

There is more work to do in subsequent changes:

1. We initialize m0 twice to access each dword. It shall
be possible to only do it once and increment base register
number instead.
2. We also need v16i64/v16f64 but these first need to be
added to tablegen.

Differential Revision: https://reviews.llvm.org/D79808
2020-05-12 16:05:12 -07:00
Matt Arsenault
430af4239a AMDGPU: Change boolean content type to 0 or 1
The usage of target boolean checks is overly inflexible, since sext
and zext of a compare are equally cheap. The choice is arbitrary, but
using 0/1 to some degree is the choice of lower resistance since
that's what most targets use. This enables a few combines that don't
bother to support ZeroOrNegativeOneBooleanContent.
2019-11-15 13:43:47 +05:30
Stanislav Mekhanoshin
c3e6f1db48 [AMDGPU] combine extractelement into several selects
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.

Differential Revision: https://reviews.llvm.org/D54351

llvm-svn: 346800
2018-11-13 21:18:21 +00:00
Stanislav Mekhanoshin
e469694231 Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling
Legalizer used to request an ext load from i8 to i1 when promoting
vector element type to i8. Fixed.

Differential Revision: https://reviews.llvm.org/D54440

llvm-svn: 346795
2018-11-13 20:26:27 +00:00