llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Piotr Sobczak	cb35142c34	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (0045786f146e78afee49eee053dc29ebc842fee1), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
Jay Foad	3f23d4b8c3	[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall. Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth is greater than the already scheduled latency, which causes a stall. The fix is to apply the heuristic if the depth of either candidate is greater than the already scheduled latency. All this also applies to the BotHeightReduce heuristic in the bottom zone. Differential Revision: https://reviews.llvm.org/D72392	2020-07-17 11:02:13 +01:00
Piotr Sobczak	b2cdcb043a	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
Matt Arsenault	74c3954b0a	Revert "[AMDGPU] Enable compare operations to be selected by divergence" This reverts commit 521ac0b5cea02f629d035f807460affbb65ae7ad. Reported to break thousands of piglit tests.	2020-06-24 11:21:30 -04:00
alex-t	c89646a3e3	[AMDGPU] Enable compare operations to be selected by divergence Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82194	2020-06-24 11:50:40 +03:00
Piotr Sobczak	b9c5311c77	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.	2020-06-19 16:41:04 +02:00
Piotr Sobczak	3a8e847d39	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
alex-t	1e68a16e75	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Konstantin Pyzhov	dcee563d39	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 09:05:58 -04:00
Konstantin Pyzhov	e9d27deea3	Revert e1730cfeb3588f20dcf4a96b181ad52761666e52	2020-04-06 05:56:11 -04:00
Konstantin Pyzhov	64d224eff5	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 05:10:37 -04:00
alex-t	fd26f332c3	[AMDGPU] Enable divergence driven ISel for ADD/SUB i64 Summary: Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary. This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step. Reviewers: arsenm, vpykhtin, rampitec Reviewed By: arsenm, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa Differential Revision: https://reviews.llvm.org/D76371	2020-03-20 17:06:11 +03:00
Jay Foad	cc94ceb292	[AMDGPU] Extend macro fusion for ADDC and SUBB to SUBBREV Summary: There's a lot of test case churn but the overall effect is to increase the number of back-to-back v_sub,v_subbrev pairs, which can execute with no delay even on gfx10. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75999	2020-03-11 17:59:21 +00:00
Matt Arsenault	93f691bf87	AMDGPU: Enable integer division bypass We probably want this, and I've meant to turn this on for a long time. SC actually emits a special case to early-out for a 1 denominator, which perhaps should also be considered.	2020-02-19 17:50:19 -05:00
Tim Renouf	10fe28f702	[AMDGPU] Fix some tests that did not specify -mcpu Summary: This fixes some tests that did not specify -mcpu. Doing that disables all subtarget features, which gives behavior that (a) does not necessarily correspond to any actual target, and (b) can change as we add new subtarget features. Also added gfx1010 to memtime test. Differential Revision: https://reviews.llvm.org/D74594 Change-Id: I8c0fe4fa03e9a93ef8bb722cd42d22e064526309	2020-02-17 14:02:32 +00:00
Matt Arsenault	36eb29ea7a	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.	2020-02-14 20:06:02 -05:00
Matt Arsenault	1b559ce622	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.	2020-02-14 11:16:08 -08:00
Matt Arsenault	3b77fefca7	AMDGPU: Cleanup and generate 64-bit div tests Split out r600 tests, and try to be more consistent with coverage. Cover a few more cases for 24-bit optimization and constants.	2020-01-20 17:19:39 -05:00

18 Commits