llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Matt Arsenault	9a23ae4dfd	AMDGPU: Enable ConstrainCopy DAG mutation This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146	2016-11-16 20:35:23 +00:00
Tom Stellard	22310389fc	AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131	2016-11-16 18:42:17 +00:00
Konstantin Zhuravlyov	2cfadc98ed	[AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFC Differential Revision: https://reviews.llvm.org/D26711 llvm-svn: 287077	2016-11-16 03:39:12 +00:00
Konstantin Zhuravlyov	3b80a654dd	[AMDGPU] Handle f16 select{_cc} - Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074	2016-11-16 03:16:26 +00:00
Jan Vesely	725ec9f30f	AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056	2016-11-15 23:55:15 +00:00
Tom Stellard	5b2ea8ac3c	AMDGPU/SI: Fix pattern for i16 = sign_extend i1 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035	2016-11-15 21:25:56 +00:00
Matt Arsenault	ec52b721e2	AMDGPU: Enable store clustering Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021	2016-11-15 20:22:55 +00:00
Matt Arsenault	3f9081ee63	AMDGPU: Analyze mubuf with immediate soffset Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018	2016-11-15 20:14:27 +00:00
Matt Arsenault	9b1cfd8def	AMDGPU: Fix return after else llvm-svn: 287015	2016-11-15 19:58:54 +00:00
Matt Arsenault	00156f1cc9	AMDGPU: Replace assert(false) with unreachable llvm-svn: 287013	2016-11-15 19:34:37 +00:00
Stanislav Mekhanoshin	3f2c8aa170	[AMDGPU] Add wave barrier builtin The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007	2016-11-15 19:00:15 +00:00
Sam Kolton	8216be3f47	[AMDGPU] TableGen: change individual instruction flags to bit type from bits<1> Summary: This is needed to be able to use this flags in InstrMappings. Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26666 llvm-svn: 286960	2016-11-15 13:39:07 +00:00
Matt Arsenault	888d5d9299	AMDGPU: Fix f16 fabs/fneg llvm-svn: 286931	2016-11-15 02:25:28 +00:00
Matt Arsenault	8a32f9e2eb	AMDGPU: Set hasExtraSrcRegAllocReq on v_div_scale_* This doesn't solve any problems I know about, but this should have more conservative assumptions about the operands' llvm-svn: 286913	2016-11-15 00:05:42 +00:00
Matt Arsenault	a70754dc4d	AMDGPU: Fix formatting of 1/2pi immediate llvm-svn: 286912	2016-11-15 00:04:33 +00:00
Changpeng Fang	7bf8e07bd1	AMDGPU/SI: Support data types other than V4f32 in image intrinsics Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860	2016-11-14 18:33:18 +00:00
Matt Arsenault	96a200f6e7	AMDGPU: Implement SGPR spilling with scalar stores nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766	2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov	a5d550fe9d	[AMDGPU] Add f16 support (VI+) Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753	2016-11-13 07:01:11 +00:00
Tom Stellard	57da18085b	AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687	2016-11-12 00:19:11 +00:00
Tom Stellard	8b29b594be	AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676	2016-11-11 23:35:42 +00:00
Sam Kolton	4adb37c807	[AMDGPU] TargetStreamer: Fix .note section name llvm-svn: 286591	2016-11-11 13:41:52 +00:00
Yaxun Liu	c5e52544d9	AMDGPU: Attempt to fix build failure on x86-64 selfhost build Remove redundant include file. llvm-svn: 286552	2016-11-11 02:48:50 +00:00
Stanislav Mekhanoshin	b36f36e6c4	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530	2016-11-11 00:22:34 +00:00
Joerg Sonnenberger	e3fe95c483	Fix requirements. llvm-svn: 286527	2016-11-10 23:53:45 +00:00
Evandro Menezes	7a30a0c01d	[DAG Combiner] Fix the native computation of the Newton series for reciprocals The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523	2016-11-10 23:31:06 +00:00
Yaxun Liu	a4fdadbaa3	AMDGPU: Emit runtime metadata as a note element in .note section Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502	2016-11-10 21:18:49 +00:00
Tom Stellard	fca8e2011d	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464	2016-11-10 16:02:37 +00:00
Stanislav Mekhanoshin	8fd3071d3d	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171	2016-11-07 23:04:50 +00:00
Matt Arsenault	74174e09f8	AMDGPU: Remove unnecessary and on conditional branch The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134	2016-11-07 19:09:33 +00:00
Matt Arsenault	2e0eb02554	AMDGPU: Preserve vcc undef flags when inverting branch If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133	2016-11-07 19:09:27 +00:00
Matt Arsenault	963aa9b692	AMDGPU: Try to fix (non-clang?) bot builds llvm-svn: 286120	2016-11-07 16:52:50 +00:00
Matt Arsenault	c8bd3705e6	AMDGPU: Refactor copyPhysReg Separate the subregister splitting logic to re-use later. llvm-svn: 286118	2016-11-07 16:39:22 +00:00
Tom Stellard	8e72cd5271	Revert "AMDGPU: Add VI i16 support" This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995	2016-11-04 13:06:34 +00:00
Tom Stellard	1ea5ffdcf4	AMDGPU/SI: Re add VIInstructions.td to unbreak bots This file is unused as of r285939, but we need to keep it around for bots that don't do full rebuilds. We should be able to delete this again in a few days. llvm-svn: 285948	2016-11-03 17:56:46 +00:00
Tom Stellard	1eb5b9fee5	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939	2016-11-03 17:13:50 +00:00
Alexander Timofeev	a4e8e9604c	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads. hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919	2016-11-03 14:37:13 +00:00
Nicolai Haehnle	f8cd283595	AMDGPU: Allow additional implicit operands on MOVRELS instructions Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835	2016-11-02 17:03:11 +00:00
Matt Arsenault	3fd81027a1	AMDGPU: Handle CopyToReg in getOperandRegClass llvm-svn: 285768	2016-11-01 23:22:17 +00:00
Matt Arsenault	269a08f691	AMDGPU: Use brev for materializing SGPR constants This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765	2016-11-01 23:14:20 +00:00
Matt Arsenault	930d9b8579	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762	2016-11-01 22:55:07 +00:00
Matt Arsenault	3085819547	AMDGPU: Stop creating unused virtual registers These are only used in the spill to VMEM path. Move them to the one use. llvm-svn: 285756	2016-11-01 21:58:07 +00:00
Matt Arsenault	5a24f1da90	AMDGPU: Workaround for instruction size with literals Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743	2016-11-01 20:42:24 +00:00
Konstantin Zhuravlyov	f9b60c4ef0	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32 This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716	2016-11-01 17:49:33 +00:00
Tom Stellard	dc07351d2b	AMDGPU: Fix buildbots broken by r285704 llvm-svn: 285711	2016-11-01 17:20:03 +00:00
Alex Bradbury	2fa138eff6	[TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h As it stands, the OperandMatchResultTy is only included in the generated header if there is custom operand parsing. However, almost all backends make use of MatchOperand_Success and friends from OperandMatchResultTy for e.g. parseRegister. This is a pain when starting an AsmParser for a new backend that doesn't yet have custom operand parsing. Move the enum to MCTargetAsmParser.h. This patch is a prerequisite for D23563 Differential Revision: https://reviews.llvm.org/D23496 llvm-svn: 285705	2016-11-01 16:32:05 +00:00
Tom Stellard	f13abceda7	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64 I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704	2016-11-01 16:31:48 +00:00
Valery Pykhtin	d1c9440bfa	[AMDGPU] Expand vector mulhu/mulhs Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684	2016-11-01 10:26:48 +00:00
Matt Arsenault	bb971d2e8b	AMDGPU: Whitespace fixes llvm-svn: 285659	2016-11-01 00:55:14 +00:00
Artem Tamazov	0eaeb67ad9	[AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions. Fixes Bug 30808. Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced. Old gfx6/7-specific def renamed to smrd_offset_8 for clarity. Lit tests updated. Differential Revision: https://reviews.llvm.org/D26085 llvm-svn: 285590	2016-10-31 16:07:39 +00:00
Matt Arsenault	3ee7b5cf1b	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed llvm-svn: 285490	2016-10-29 04:05:06 +00:00

1 2 3 4 5 ...

1267 Commits