Fix 64-bit copy to SCC by restricting the pattern resulting
in such a copy to subtargets supporting 64-bit scalar compare,
and mapping the copy to S_CMP_LG_U64.
Before introducing the S_CSELECT pattern with explicit SCC
(0045786f146e78afee49eee053dc29ebc842fee1), there was no need
for handling 64-bit copy to SCC ($scc = COPY sreg_64).
The proposed handling to read only the low bits was however
based on a false premise that it is only one bit that matters,
while in fact the copy source might be a vector of booleans and
all bits need to be considered.
The practical problem of mapping the 64-bit copy to SCC is that
the natural instruction to use (S_CMP_LG_U64) is not available
on old hardware. Fix it by restricting the problematic pattern
to subtargets supporting the instruction (hasScalarCompareEq64).
Differential Revision: https://reviews.llvm.org/D85207
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom
zone.
Differential Revision: https://reviews.llvm.org/D72392
Summary:
Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Re-commit D81925 with a bugfix D82370.
Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
Summary:
Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81925
Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence.
Reviewers: rampitec, arsenm, vpykhtin
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78091
Summary:
Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary.
This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step.
Reviewers: arsenm, vpykhtin, rampitec
Reviewed By: arsenm, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa
Differential Revision: https://reviews.llvm.org/D76371
Summary:
There's a lot of test case churn but the overall effect is to increase
the number of back-to-back v_sub,v_subbrev pairs, which can execute with
no delay even on gfx10.
Reviewers: arsenm, rampitec, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75999
We probably want this, and I've meant to turn this on for a long
time. SC actually emits a special case to early-out for a 1
denominator, which perhaps should also be considered.
Summary:
This fixes some tests that did not specify -mcpu. Doing that disables
all subtarget features, which gives behavior that (a) does not
necessarily correspond to any actual target, and (b) can change as we
add new subtarget features.
Also added gfx1010 to memtime test.
Differential Revision: https://reviews.llvm.org/D74594
Change-Id: I8c0fe4fa03e9a93ef8bb722cd42d22e064526309
I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
so produces significantly smaller code than the inline DAG expansion.
This now requires width reductions of 64-bit divisions before
introducing the expanded loops.
This helps work around missing legalization in GlobalISel for
division, which are the only remaining core instructions that didn't
work at all.
I think this is plausibly a better implementation than exists in the
DAG, although turning it on by default misses out on the constant
value optimizations and also needs benchmarking.