llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00

Author	SHA1	Message	Date
Michael Liao	b2d24acf01	[amdgpu] Add 64-bit PC support when expanding unconditional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445	2021-07-26 14:50:30 -04:00
Carl Ritson	8b2d2affc5	[AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr. Previously these were rounded up to VReg_256 this saves VGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103800	2021-07-22 10:42:15 +09:00
Stanislav Mekhanoshin	b044663832	[AMDGPU] Mark relevant rematerializable VOP2 instructions Differential Revision: https://reviews.llvm.org/D106023	2021-07-21 14:24:59 -07:00
Stanislav Mekhanoshin	c90b3afab5	[AMDGPU] Mark all relevant VOP1 instructions rematerializable Differential Revision: https://reviews.llvm.org/D105919	2021-07-21 14:05:32 -07:00
Sebastian Neubauer	b992832f74	[AMDGPU] Use isMetaInstruction for instruction size Meta instructions have a size of 0. Use isMetaInstruction instead of listing them explicitly. Differential Revision: https://reviews.llvm.org/D106043	2021-07-15 12:23:11 +02:00
Stanislav Mekhanoshin	1dae83cabb	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
Sebastian Neubauer	9f94b4e5f6	[AMDGPU] Mark waterfall loops as SI_WATERFALL_LOOP This way, they can be detected later, e.g. by the SIOptimizeVGPRLiveRange pass. Differential Revision: https://reviews.llvm.org/D105467	2021-07-13 12:15:08 +02:00
Stanislav Mekhanoshin	327d625bef	[AMDGPU] Make some VOP1 instructions rematerializable This is a pilot change to verify the logic. The rest will be done in a same way, at least the rest of VOP1. Differential Revision: https://reviews.llvm.org/D105742	2021-07-12 23:43:45 -07:00
Stanislav Mekhanoshin	bb5a67dc29	[AMDGPU] Fix immediate sign during V_MOV_B64_PSEUDO expansion Creating a V_MOV_B32 with zero extended immediate source prevented conversion to V_BFREV_B32. Differential Revision: https://reviews.llvm.org/D105235	2021-07-01 09:00:29 -07:00
Stanislav Mekhanoshin	4b0ec23e84	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874	2021-06-30 11:45:38 -07:00
Piotr Sobczak	8db71b419f	[AMDGPU] Fix 224-bit spills Related to D104622. Differential Revision: https://reviews.llvm.org/D105109	2021-06-29 17:52:16 +02:00
Carl Ritson	9a5c628361	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622	2021-06-24 12:41:22 +09:00
Carl Ritson	f3236bf71f	[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are required for a MIMG address operand. Maintain _V8 instruction variants of pseudo instructions allowing assembly prior to GFX10 to work as-is. Currently the validator can tell for GFX10 what the correct size is, so will disallow oversize address registers. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103672	2021-06-08 11:11:40 +09:00
Stanislav Mekhanoshin	82e56dec58	[AMDGPU] All GWS instructions need aligned VGPR on gfx90a Fixes: SWDEV-288006 Differential Revision: https://reviews.llvm.org/D103197	2021-06-01 17:08:03 -07:00
Brendon Cahoon	d300f6cd3b	[AMDGPU] Update SCC defs to VCC when uses are changed to VCC The FixSGPRCopies pass converts instructions to VALU when removing illegal VGPR to SGPR copies. Instructions that use SCC are changed to use VCC instead. When that happens, the pass must also change instructions that define SCC to define VCC. The pass was not changing the SCC definition when an ADDC is converted due to a input that is a VGPR to SGPR copy. But, the initial ADD insruction, which define SCC, is not converted. This causes a compilation failure due to a use of an undefined physical register. This patch adds code that inserts the SCC definition in the MoveToVALU worklist when a SCC use is converted to a VCC use. Differential Revision: https://reviews.llvm.org/D102111	2021-05-14 18:05:05 -04:00
Matt Arsenault	15058e16a1	AMDGPU: Fix assert when rewriting saddr d16 loads moveOperands does not handle moving tied operands since it would generally have to fixup the tied operand references. Avoid the assert by untying and retying after the modification. These in place modifications really aren't managable.	2021-05-14 13:24:19 -04:00
Jay Foad	85f4b8dffe	[AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions A consequence is that checkInstOffsetsDoNotOverlap can now distinguish sp+offset from fp+offset, so it knows that it shouldn't try to work out whether the accesses overlap just by comparing the offsets. For example in these two instructions: MIR: BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5) %4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5) ISA: buffer_store_dword v0, off, s[0:3], s32 offset:4 buffer_load_dword v0, off, s[0:3], s34 Differential Revision: https://reviews.llvm.org/D73957	2021-05-14 10:10:43 +01:00
David Stuttard	0a28768900	[AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling A16 support for image instructions assembly/disassembly (gfx10) was missing Also refactor MIMG op addr size calcs to common function We'd got 3 places where the same operation was being done. One test is now marked XFAIL until a related codegen patch is in place Differential Revision: https://reviews.llvm.org/D102231 Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb	2021-05-14 09:25:44 +01:00
Sebastian Neubauer	154e1ab9f4	[AMDGPU] Restrict immediate scratch offsets gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292	2021-05-07 14:51:32 +02:00
Stanislav Mekhanoshin	a8a85ab673	[AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU Extend the legalization of global SADDR loads and stores with changing to VADDR to the FLAT scratch instructions. Differential Revision: https://reviews.llvm.org/D101408	2021-05-03 10:57:14 -07:00
Stanislav Mekhanoshin	ba07f46415	[AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU Instead of legalizing saddr operand with a readfirstlane when address is moved from SGPR to VGPR we can just change the opcode. Differential Revision: https://reviews.llvm.org/D101405	2021-05-03 10:36:26 -07:00
David Stuttard	dbb5e2911e	[AMDGPU] Tidy up some simple expressions for clarity NFC Slight refactor for clarity. Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5 Differential Revision: https://reviews.llvm.org/D101610	2021-04-30 11:13:54 +01:00
Jay Foad	55e098ac95	[AMDGPU] Allow multiple uses of the same literal In GFX10 VOP3 can have a literal, which opens up the possibility of two operands using the same literal value, which is allowed and only counts as one use of the constant bus. AMDGPUAsmParser::validateConstantBusLimitations already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D100770	2021-04-20 16:44:01 +01:00
Sebastian Neubauer	70e58c35e1	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin	17050632cf	[AMDGPU] Fix copyPhysReg to not produce unalined vgpr access RA can insert something like a sub1_sub2 COPY of a wide VGPR tuple which results in the unaligned acces with v_pk_mov_b32 after the copy is expanded. This is regression after D97316. Differential Revision: https://reviews.llvm.org/D98549	2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin	196e7f3138	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Jay Foad	d51dcd4c3d	[AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_* D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject V_MOVs with extra implicit operands, but it accidentally rejected all V_MOVs because of their implicit use of exec. Fix it but avoid adding a moderately expensive call to MI.getDesc().getNumImplicitUses(). In real graphics shaders this changes quite a few vgpr copies into move- immediates, which is good for avoiding stalls on GFX10. Differential Revision: https://reviews.llvm.org/D98347	2021-03-10 16:18:12 +00:00
Ruiling Song	19ee89a560	[AMDGPU] Remove SI_MASK_BRANCH This is already deprecated, so remove code working on this. Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97545	2021-03-09 09:13:23 +08:00
Piotr Sobczak	02c8fb7a0c	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	97e89dc154	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Jay Foad	ba178527e9	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Matt Arsenault	f1ba6f4d9b	AMDGPU: Add even aligned VGPR/AGPR register classes gfx90a operations require even aligned registers, but this was previously achieved by reserving registers inside the full class. Ideally this would be captured in the static instruction definitions for the operands, and we would have different instructions per subtarget. The hackiest part of this is we need to manually reassign AGPR register classes after instruction selection (we get away without this for VGPRs since those types are actually registered for legal types).	2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin	f1c6dbc4d5	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Piotr Sobczak	55a4cf137d	[AMDGPU] Add implicit vcc_lo on S_CBRANCH_VCCNZ in wave32 * Update skip-if-dead.ll with tests for wave32. * Fix the crash in verifier in one newly enabled test by adding missing fixImplicitOperands in branch insertion code. ``` * Bad machine code: Using an undefined physical register * - function: test_kill_divergent_loop - basic block: %bb.2 bb (0xad96308) - instruction: S_CBRANCH_VCCNZ %bb.1, implicit $vcc_lo - operand 1: implicit $vcc_lo LLVM ERROR: Found 1 machine code errors. ``` * Simplify "cbranch_kill" to not use interp instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96793	2021-02-17 15:14:57 +01:00
Carl Ritson	fb4b457dbd	[AMDGPU] Move kill lowering to WQM pass and add live mask tracking Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94746	2021-02-11 20:31:29 +09:00
Jay Foad	ee839f7374	[AMDGPU] Use named unified buffer format constant. NFC.	2021-02-08 17:34:36 +00:00
Carl Ritson	88bf797971	[AMDGPU] Mark V_SET_INACTIVE as defining SCC V_SET_INACTIVE is implemented with S_NOT which clobbers SCC. Mark sure it is marked appropriately. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D95509	2021-01-29 09:46:41 +09:00
dfukalov	f3ae5b9b8c	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Joe Nash	521d6a1785	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
Sebastian Neubauer	94e83a8359	[AMDGPU] Fix failing assert with scratch ST mode In ST mode, flat scratch instructions have neither an sgpr nor a vgpr for the address. This lead to an assertion when inserting hard clauses. Differential Revision: https://reviews.llvm.org/D94406	2021-01-12 09:54:02 +01:00
dfukalov	d069b95364	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
dfukalov	b7b67e3e9a	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Sebastian Neubauer	4b0b1e7b26	[AMDGPU] Unify flat offset logic Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into AMDGPUBaseInfo. Differential Revision: https://reviews.llvm.org/D93287	2020-12-15 14:59:59 +01:00
Austin Kerbow	c65fb146ba	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Matt Arsenault	b3aa28b9d4	AMDGPU: Factor out large flat offset splitting	2020-11-13 11:22:13 -05:00
Stanislav Mekhanoshin	4dbdbbe753	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin	f12ec97f64	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	8325244191	[AMDGPU] Make use of SIInstrInfo::isEXP. NFC.	2020-11-11 17:01:20 +00:00
Stanislav Mekhanoshin	682b617208	[AMDGPU] Omit buffer resource with flat scratch. Differential Revision: https://reviews.llvm.org/D90979	2020-11-09 08:05:20 -08:00
Sebastian Neubauer	7e4be9501b	[AMDGPU] Add amdgpu_gfx calling convention Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other calls in amdgpu, with registers excluded for return address, stack pointer and stack buffer descriptor. Differential Revision: https://reviews.llvm.org/D88540	2020-11-09 16:51:44 +01:00

1 2 3 4 5 ...

572 Commits