llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Matt Arsenault	7af9eb8150	AMDGPU/GlobalISel: Use different technique for sample v3s16 values Avoid relying on implicit_def values, and odd sized G_INSERT/G_EXTRACT	2020-08-24 10:07:30 -04:00
Matt Arsenault	1221909666	AMDGPU/GlobalISel: Add baseline, failing unmerge tests	2020-08-24 10:07:30 -04:00
Matt Arsenault	9583788fb1	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Matt Arsenault	21aca8a3e2	GlobalISel: Reduce G_SHL width if source is extension shl ([sza]ext x, y) => zext (shl x, y). Turns expensive 64 bit shifts into 32 bit if it does not overflow the source type: This is a port of an AMDGPU DAG combine added in 5fa289f0d8ff85b9e14d2f814a90761378ab54ae. InstCombine does this already, but we need to do it again here to apply it to shifts introduced for lowered getelementptrs. This will help matching addressing modes that use 32-bit offsets in a future patch. TableGen annoyingly assumes only a single match data operand, so introduce a reusable struct. However, this still requires defining a separate GIMatchData for every combine which is still annoying. Adds a morally equivalent function to the existing getShiftAmountTy. Without this, we would have to do try to repeatedly query the legalizer info and guess at what type to use for the shift.	2020-08-24 09:42:40 -04:00
Bjorn Pettersson	8f041837f3	[SelectionDAG] Fix miscompile bug in expandFunnelShift This is a fixup of commit 0819a6416fd217 (D77152) which could result in miscompiles. The miscompile could only happen for targets where isOperationLegalOrCustom could return different values for FSHL and FSHR. The commit mentioned above added logic in expandFunnelShift to convert between FSHL and FSHR by swapping direction of the funnel shift. However, that transform is only legal if we know that the shift count (modulo bitwidth) isn't zero. Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if Z modulo bitwidth, could be zero. ``` $ ./alive-tv /tmp/test.ll ---------------------------------------- define i32 @src(i32 %x, i32 %y, i32 %z) { %0: %t0 = fshl i32 %x, i32 %y, i32 %z ret i32 %t0 } => define i32 @tgt(i32 %x, i32 %y, i32 %z) { %0: %t0 = sub i32 32, %z %t1 = fshr i32 %x, i32 %y, i32 %t0 ret i32 %t1 } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #x00000000 (0) i32 %y = #x00000400 (1024) i32 %z = #x00000000 (0) Source: i32 %t0 = #x00000000 (0) Target: i32 %t0 = #x00000020 (32) i32 %t1 = #x00000400 (1024) Source value: #x00000000 (0) Target value: #x00000400 (1024) ``` It could be possible to add back the transform, given that logic is added to check that (Z % BW) can't be zero. Since there were no test cases proving that such a transform actually would be useful I decided to simply remove the faulty code in this patch. Reviewed By: foad, lebedev.ri Differential Revision: https://reviews.llvm.org/D86430	2020-08-24 09:52:11 +02:00
Matt Arsenault	4175888640	GlobalISel: Merge FewerElements for G_BUILD_VECTOR/G_CONCAT_VECTORS This switches from using G_EXTRACT in odd cases to widen with undef and unmerge.	2020-08-22 10:25:53 -04:00
Stanislav Mekhanoshin	f6faf01d47	[AMDGPU] Avoid sorting stalls in regbank-reassign This is the slowest operation in the already slow pass. Instead of sorting just put a stall list into an ordered map. Differential Revision: https://reviews.llvm.org/D86253	2020-08-21 11:49:41 -07:00
Mirko Brkusanin	afddfeaf4a	[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag Do not break down local loads and stores so ds_read/write_b96/b128 in ISelLowering can be selected on subtargets that support them and if align requirements allow them. Differential Revision: https://reviews.llvm.org/D84403	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	09694f5b10	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	08706e7bce	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	49f2d14543	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Jay Foad	6d725be5b3	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
Michael Liao	1cf2d56956	[amdgpu] Add codegen support for HIP dynamic shared memory. Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496	2020-08-20 21:29:18 -04:00
Matt Arsenault	587fbc0a85	CodeGen: Don't drop AA metadata when splitting MachineMemOperands Assuming this is used to split a memory access into smaller pieces, the new access should still have the same aliasing properties as the original memory access. As far as I can tell, this wasn't intentionally dropped. It may be necessary to drop this if you are moving the operand outside of the bounds of the original object in such a way that it may alias another IR object, but I don't think any of the existing users are doing this. Some of the uses widen into unused alignment padding, which I think is OK.	2020-08-20 16:17:30 -04:00
Matt Arsenault	bd23f78f2f	AMDGPU/GlobalISel: Legalize odd sized loads with widening Custom lower and widen odd sized loads up to the alignment. The default set of legalization actions doesn't have a way to represent this. This fixes naturally aligned <3 x s8> and <3 x s16> loads. This also starts moving towards eliminating the buggy and overcomplicated legalization rules for narrowing. All the memory size changes should be done in the lower or custom action, not NarrowScalar / FewerElements. These currently have redundant and ambiguous code with the lower action.	2020-08-20 16:15:53 -04:00
Matt Arsenault	734b071bb5	GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.	2020-08-19 18:53:24 -04:00
Matt Arsenault	92c99a3fbc	AMDGPU/GlobalISel: Add some bitcast tests	2020-08-19 10:38:39 -04:00
Matt Arsenault	a85deae728	AMDGPU/GlobalISel: Add selection tests for pointer constants	2020-08-19 10:23:56 -04:00
Ronak Chauhan	4697f34ed6	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit cacfb02d28a3cabd4e45d2535cb0686cef48a2c9. Reverting due to buildbot failures.	2020-08-19 13:12:29 +05:30
Ronak Chauhan	142f4dd209	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D80713	2020-08-19 08:49:07 +05:30
Changpeng Fang	c3904f6ffc	AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc Summary: When the resource descriptor is of vgpr, we need a waterfall loop to read into a sgpr. In this patchm we generalized the implementation to work for any regster class sizes, and extend the work to MIMG instructions. Fixes: SWDEV-223405 Reviewers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D82603	2020-08-18 16:27:36 -07:00
Matt Arsenault	418515b7d0	GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT Add unit tests since AMDGPU will only trigger this for gigantic vectors, and won't use the annoying odd sized breakdown case.	2020-08-18 13:51:19 -04:00
Amara Emerson	d1d273ff1c	[GlobalISel] Add a combine for ashr(shl x, c), c --> sext_inreg x, c' By detecting this sign extend pattern early, we can uncover opportunities for more optimizations. Differential Revision: https://reviews.llvm.org/D85965	2020-08-18 10:42:15 -07:00
Jessica Paquette	7e08e6c7a3	[GlobalISel][CallLowering] Look through call parameters for flags We weren't looking through the parameters on calls at all. E.g., say you had ``` declare i32 @zext(i32 zeroext %x) ... %y = call i32 @zext(i32 %something) ... ``` At the point of the call, we wouldn't know that the %something should have the zeroext attribute. This sets flags in about the same way as TargetLoweringBase::ArgListEntry::setAttributes. Differential Revision: https://reviews.llvm.org/D86125	2020-08-18 08:48:56 -07:00
Matt Arsenault	e8f21df36b	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	441f0f056c	AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT The code to determine the value size was overcomplicated and only correct in the case where the result register already had a register class assigned. We can always take the size directly from the register's type.	2020-08-18 09:28:01 -04:00
Matt Arsenault	ec8cead821	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00
Matt Arsenault	38b9cda6d0	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits.	2020-08-17 15:28:14 -04:00
Matt Arsenault	956b5431ba	AMDGPU: Add baseline tests for global saddr matching	2020-08-17 15:23:13 -04:00
Stanislav Mekhanoshin	bcd631b8fb	[AMDGPU] Define spill opcodes for all AGPR sizes Since we have defined all these sizes I believe we shall be able to spill these as well. Differential Revision: https://reviews.llvm.org/D86098	2020-08-17 12:17:23 -07:00
Matt Arsenault	a71dba11e9	GlobalISel: Revisit users of other merge opcodes in artifact combiner The artifact combiner searches for the uses of G_MERGE_VALUES for unmerge/trunc that need further combining. This also needs to handle the vector merge opcodes the same way. This fixes leaving behind some pairs I expected to be removed, that were if the legalizer is run a second time.	2020-08-17 13:56:53 -04:00
Matt Arsenault	b3efd01f95	AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset We may have an SGPR->VGPR copy if a totally uniform pointer calculation is used for a VGPR pointer operand. Also hack around a bug in MUBUF matching which would incorrectly use MUBUF for global when flat was requested. This should really be a predicate on the parent pattern, but the DAG always checked this manually inside the complex pattern.	2020-08-17 12:31:38 -04:00
Matt Arsenault	630fcdd4be	AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping	2020-08-17 09:53:26 -04:00
Matt Arsenault	ccf0f19849	AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics This should use the default mapping and insert a copy to the vcc bank, and not try to insert a readfirstlane.	2020-08-17 09:53:25 -04:00
Matt Arsenault	aa9d3db2ef	AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics Global instructions have the signed offsets.	2020-08-17 09:19:15 -04:00
Matt Arsenault	68ef3ca8e3	AMDGPU: Fix global atomic saddr operand class	2020-08-15 12:12:28 -04:00
Matt Arsenault	ae1d280938	AMDGPU: Fix matching wrong offsets for global atomic loads These used signed offsets with a different size.	2020-08-15 12:12:17 -04:00
Matt Arsenault	2f56005d71	AMDGPU: Correct definitions for global saddr instructions The VGPR component is a 32-bit offset, not 64-bits. I'm not sure what the correct syntax is for this. This maintains the vaddr position and leaves saddr in the end "off" position. This is particularly terrible for stores, since the operand order is now <vgpr offset>, <data>, <sgpr base>, splitting the pointer operands. I suppose this is a logical consequence from the mistake of not putting the data operand first. I'm not sure what sp3 does.	2020-08-15 12:11:57 -04:00
Matt Arsenault	dd82110acb	AMDGPU: Remove SIFixupVectorISel pass This was only used for matching the saddr addressing mode of global instructions, but this was not implemented correctly. The instruction definitions aren't even correct, and are defined as using a 64-bit VGPR component. Eliminate this pass to enable correcting the instruction definitions. A new matching implementation can work in GlobalISel or relying on DAG divergence information for the base address.	2020-08-15 12:11:51 -04:00
Stanislav Mekhanoshin	96ccc24be1	[AMDGPU] Fix MAI ld/st hazard handling It did not process hazard for ds_permute because it does not load or store even though it is DS. Differential Revision: https://reviews.llvm.org/D86003	2020-08-14 17:07:37 -07:00
Matt Arsenault	2348cd927a	TableGen/GlobalISel: Partially handle immAllOnesV/immAllZerosV These should really match either G_BUILD_VECTOR or G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing mechanism for matching alternative opcodes. There is GIM_SwitchOpcode, but it seems to assume it's oly only used for matcher optimization. I could also omit any opcode check and rely on the matcher directly checking the opcode, but the table optimizer currently assumes there has to be an opcode check. Also doesn't try to handle undef elements like the DAG version.	2020-08-14 13:55:30 -04:00
Matt Arsenault	895bb86d57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00
Sebastian Neubauer	48229dd1c2	[AMDGPU] Enable .rodata for amdpal os PAL recently got support for multiple ELF sections and relocations, therefore we can now use .rodata sections instead of forcing constants into .text. Differential Revision: https://reviews.llvm.org/D85895	2020-08-14 09:05:48 +02:00
Austin Kerbow	fce2dc7bf9	[AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded If we need a scratch register for the spill don't use the same scratch register that is being used for the MBUF offset. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85772	2020-08-13 14:12:00 -07:00
Stanislav Mekhanoshin	e9e70ce1dd	[AMDGPU] Inhibit SDWA if target instruction has FI Differential Revision: https://reviews.llvm.org/D85918	2020-08-13 11:34:28 -07:00
Matt Arsenault	d149ae14cc	DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses I think not unconditionally passing getDstAlign is broken, but leave that for another change.	2020-08-13 09:33:17 -04:00
Carl Ritson	3802b95d05	[AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole SIPreEmitPeephole does not process all terminators, which means it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded by a branch to the early exit block. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D85872	2020-08-13 21:52:41 +09:00
Carl Ritson	e685b4674e	[AMDGPU] Pre-commit test for D85872	2020-08-13 13:07:27 +09:00
Ruiling Song	9b5efb57d0	[AMDGPU] Fix crash when dag-combining bitcast From the code after the 'break', they are processing 64bit scalar and vector bitcast. So I think the break-condition should be (cond1 \|\| cond2) This means we only execute following code if (64bit and dest-is-vector). Also remove a previous fix which is not needed with this new fix. (introduced in: 1349a04ef5f594dda705ec80474dda4837f26dba) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85804	2020-08-13 10:23:13 +08:00
Matt Arsenault	a7f874dd31	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd Remove the intermediate transform in the DAG path. I believe this is the last non-deprecated intrinsic that needs handling.	2020-08-12 10:04:53 -04:00

1 2 3 4 5 ...

3878 Commits