llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00

Author	SHA1	Message	Date
Marek Olsak	f30857fef7	AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556	2016-06-13 16:05:57 +00:00
Matt Arsenault	f138b4af09	AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554	2016-06-13 15:53:52 +00:00
Benjamin Kramer	e80783f62f	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Matt Arsenault	93cec24ce4	AMDGPU: Add function for getting instruction size llvm-svn: 271936	2016-06-06 20:10:33 +00:00
Matt Arsenault	bc59009ab9	AMDGPU: Handle flat in getMemOpBaseRegImmOfs It can still report the base register, and the uses give up when it fails. llvm-svn: 271575	2016-06-02 20:05:20 +00:00
Matt Arsenault	dffa97a6fa	AMDGPU: Fix incorrectly setting kill flag when copying register tuples This fixes some verifier errors when trackLivenessAfterRegAlloc is enabled. llvm-svn: 271446	2016-06-02 00:04:30 +00:00
Matt Arsenault	06895e862f	AMDGPU: Fix verifier error when spilling SGPRs The current SGPR spilling test does not stress this because it is using s_buffer_load instructions to increase SGPR pressure and spill, but their output operands have the same SReg_32_XM0 constraint. This fixes an error when the SReg_32 output from most instructions is spilled. llvm-svn: 270301	2016-05-21 00:53:42 +00:00
Matt Arsenault	c34a7d2258	AMDGPU: Handle cbranch vccz/vccnz llvm-svn: 270297	2016-05-21 00:29:40 +00:00
Matt Arsenault	5438a4669d	AMDGPU: Implement ReverseBranchCondition llvm-svn: 270296	2016-05-21 00:29:34 +00:00
Matt Arsenault	a197a65904	AMDGPU: Implement AnalyzeBranch Original patch by Tom Stellard llvm-svn: 270295	2016-05-21 00:29:27 +00:00
Matt Arsenault	4449ad7408	AMDGPU: Remove verifier check for scc live ins We only really need this to be true for SIFixSGPRCopies. I'm not sure there's any way this could happen before that point. Fixes a case where MachineCSE could introduce a cross block scc use. llvm-svn: 269391	2016-05-13 04:15:48 +00:00
Tom Stellard	097fcbcf91	AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260 We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265	2016-05-02 18:02:24 +00:00
Tom Stellard	51b37329c1	AMDGPU/SI: Enable the post-ra scheduler Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143	2016-04-30 00:23:06 +00:00
Tom Stellard	33134ca52e	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043	2016-04-29 14:34:26 +00:00
Etienne Bergeron	37376d525a	Fix incorrect redundant expression in target AMDGPU. Summary: The expression is detected as a redundant expression. Turn out, this is probably a bug. ``` /home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression] if (isSMRD(FirstLdSt) && isSMRD(FirstLdSt)) { ``` Reviewers: rnk, tstellarAMD Subscribers: arsenm, cfe-commits Differential Revision: http://reviews.llvm.org/D19460 llvm-svn: 267415	2016-04-25 15:06:33 +00:00
Nicolai Haehnle	f54e57a212	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102	2016-04-22 04:04:08 +00:00
Nicolai Haehnle	e4cebbe0dc	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Jun Bum Lim	ad7ab4cf46	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Matt Arsenault	2dfb6d03c5	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Tom Stellard	7c8ab79409	AMDGPU/SI: Fix spilling of 96-bit registers Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152	2016-04-12 23:57:30 +00:00
Tom Stellard	b255b03243	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678	2016-04-07 14:47:07 +00:00
Nicolai Haehnle	69b2d0adeb	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Matthias Braun	8623ebcaee	RegisterScavenger: Take a reference as enterBasicBlock() argument. Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511	2016-04-06 02:47:09 +00:00
Tom Stellard	fe96362048	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Nicolai Haehnle	94ebbdf753	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982	2016-03-21 20:28:33 +00:00
Michel Danzer	fe529a11c9	AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626	2016-03-16 09:10:35 +00:00
Nikolay Haustov	941b85256d	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212	2016-03-11 09:27:25 +00:00
Chad Rosier	689669fdb4	[TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC. http://reviews.llvm.org/D17967 llvm-svn: 263021	2016-03-09 16:00:35 +00:00
Tom Stellard	c656bfbad2	AMDGPU/SI: Add support for spiling SGPRs to scratch buffer Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732	2016-03-04 18:31:18 +00:00
Matt Arsenault	7fe831ea78	AMDGPU: Simplify boolean conditional return statements Patch by Richard Thomson llvm-svn: 262536	2016-03-02 23:00:21 +00:00
Matt Arsenault	f18c3f7466	AMDGPU: Cleanup suggested in bug 23960 llvm-svn: 262456	2016-03-02 04:05:14 +00:00
Changpeng Fang	929a348e60	AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356	2016-03-01 17:51:23 +00:00
Tom Stellard	060bccc1f3	AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385	2016-02-20 00:37:25 +00:00
Tom Stellard	5d2e8e7ab0	[AMDGPU] Rename $dst operand to $vdst for VOP instructions. Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986	2016-02-16 18:14:56 +00:00
Tom Stellard	9943755afb	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765	2016-02-12 23:45:29 +00:00
Matt Arsenault	628b2818b6	AMDGPU: Set element_size in private resource descriptor Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651	2016-02-12 02:40:47 +00:00
Tom Stellard	7b646abe2d	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599	2016-02-11 21:45:07 +00:00
Tom Stellard	4c0cead548	AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist Summary: When we split SMRD instructions into two MUBUFs we were adding the users of the newly created MUBUFs to the VALU worklist. However, the only users these instructions had was the REG_SEQUENCE that was inserted by splitSMRD when the original SMRD instruction was split. We need to make sure to add the users of the original SMRD to the VALU worklist before it is split. I have a test case, but it requires one other bug fix, so it will be added in a later commt. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17101 llvm-svn: 260588	2016-02-11 21:14:34 +00:00
Matt Arsenault	36dc1c179e	AMDGPU: Fix constant bus use check with subregisters If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495	2016-02-11 06:15:39 +00:00
Tom Stellard	350a5c65b3	AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16862 llvm-svn: 259900	2016-02-05 18:44:57 +00:00
Tom Stellard	4b06827ba4	AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054	2016-01-28 16:04:37 +00:00
Nicolai Haehnle	a2b45f7350	AMDGPU/SI: Add SI Machine Scheduler Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609	2016-01-13 16:10:10 +00:00
Nicolai Haehnle	8f90c2a416	AMDGPU/SI: Fold operands with sub-registers Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074	2016-01-07 17:10:29 +00:00
Nicolai Haehnle	50527c616a	AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15629 llvm-svn: 256073	2015-12-19 01:36:26 +00:00
Nicolai Haehnle	d44890b802	AMDGPU: fix overlapping copies in copyPhysReg Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072	2015-12-19 01:16:06 +00:00
Changpeng Fang	555ffaefab	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022	2015-12-18 20:04:28 +00:00
Changpeng Fang	5a4421c8c1	Revert "AMDGPU/SI: Test commit" This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021	2015-12-18 20:04:26 +00:00
Changpeng Fang	aa7bdc66d8	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020	2015-12-18 19:57:41 +00:00
Nicolai Haehnle	5b0b47c0c0	AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906	2015-12-17 16:46:42 +00:00
Tom Stellard	d2a657e08d	AMDGPU/SI: Emit constant arrays in the .text section Summary: This allows us to remove the END_OF_TEXT_LABEL hack we had been using and simplifies the fixups used to compute the address of constant arrays. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15257 llvm-svn: 255204	2015-12-10 02:13:01 +00:00

1 2 3

104 Commits