llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

Author	SHA1	Message	Date
Kazushi (Jam) Marukawa	93a65c706c	[VE] Optimize address calculation Optimize address calculations using LEA/LEASL instructions. Update comments in VEISelLowering.cpp also. Update an existing regression test optimized by this modification. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90878	2020-11-06 19:46:59 +09:00
Simon Moll	9bff67230d	[VE][TTI] don't advertise vregs/vops Claim to not have any vector support to dissuade SLP, LV and friends from generating SIMD IR for the VE target. We will take this back once vector isel is stable. Reviewed By: kaz7, fhahn Differential Revision: https://reviews.llvm.org/D90462	2020-11-06 11:12:10 +01:00
Craig Topper	752bbf7d27	[RISCV] Only enable GPR<->FPR32 bitconvert isel patterns on RV32. NFCI Bitconvert requires the bitwidth to match on both sides. On RV64 the GPR size is i64 so bitconvert between f32 isn't possible. The node should never be generated so the pattern won't ever match, but moving the patterns under IsRV32 makes it more obviously impossible. It also moves it to a similar location to the patterns for the custom nodes we use for RV64.	2020-11-05 16:15:25 -08:00
Konstantin Pyzhov	8ae1180dde	[AMDGPU] Corrected declaration of VOPC instructions with SDWA addressing mode. Removed "implicit def VCC" from declarations of AMDGPU VOPC instructions since they do not implicitly write to VCC in SDWA mode. Differential Revision: https://reviews.llvm.org/D89168	2020-11-05 11:15:50 -05:00
Michael Liao	a374f1fd9a	[amdgpu] Add `llvm.amdgcn.endpgm` support. - `llvm.amdgcn.endpgm` is added to enable "abort" support. Differential Revision: https://reviews.llvm.org/D90809	2020-11-05 19:06:50 -05:00
Yuriy Chernyshov	db5411a244	Do not construct std::string from nullptr While I am trying to forbid such usages systematically in https://reviews.llvm.org/D79427 / P2166R0 to C++ standard, this PR fixes this (definitelly incorrect) usage in llvm. This code is unreachable, so it could not cause any harm Reviewed By: nikic, dblaikie Differential Revision: https://reviews.llvm.org/D87697	2020-11-05 15:23:26 -08:00
Craig Topper	699db15672	[RISCV] Add isel patterns for fnmadd/fnmsub with an fneg on the second operand instead of the first. The multiply part of FMA is commutable, but TargetSelectionDAG.td doesn't have it marked as commutable so tablegen won't automatically create the additional patterns. So manually add commuted patterns.	2020-11-05 14:00:25 -08:00
Kazushi (Jam) Marukawa	81d1ab5e65	[VE] Add isReMaterializable and isAsCheapAsAMove flags Add isReMaterializable and isCheapAsAMove flags to integer instructions which cost cheap. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90833	2020-11-06 06:09:10 +09:00
Sanjay Patel	a8868babff	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Amara Emerson	d203a1a538	[AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates. These were previously handled by pattern matching shuffles in the selector, but adding a new opcode and making it equivalent to the AArch64duplane SDAG node allows us to select more patterns, like lane indexed FMLAs (patch adding a test for that will be committed later). The pattern matching code has been simply moved to postlegalize lowering. Differential Revision: https://reviews.llvm.org/D90820	2020-11-05 11:18:11 -08:00
Craig Topper	88538f2b2f	[RISCV] Use the 'si' lib call for (double (fp_to_sint/uint i32 X)) when F extension is enabled. D80526 added custom lowering to pick the si lib call on RV64, but this custom handling is only enabled when the F and D extension are both disabled. This prevents the si library call from being used for double when F is enabled but D is not. This patch changes the behavior so we always enable the Custom hook on RV64 and decide in ReplaceNodeResults if we should emit a libcall based on whether the FP type should be softened or not. Differential Revision: https://reviews.llvm.org/D90817	2020-11-05 10:46:45 -08:00
Stanislav Mekhanoshin	6d0a401669	[AMDGPU] Add default 1 glc operand to rtn atomics This change adds a real glc operand to the return atomic instead of just string " glc" in the middle of the asm string. Improves asm parser diagnostics. Differential Revision: https://reviews.llvm.org/D90730	2020-11-05 10:41:59 -08:00
Craig Topper	30b0a5166f	[RISCV] Remove shadow register list passed to AllocateReg when allocating FP registers for calling convention The _F and _D registers are already sub/super registers. When one gets allocated all its aliases are already marked as allocated. We don't need to explicitly shadow it too. I believe shadow is for calling conventions like 64-bit Windows on X86 where have rules like this CCIfType<[i32], CCAssignToRegWithShadow<[ECX , EDX , R8D , R9D ], [XMM0, XMM1, XMM2, XMM3]>> For that calling convention the argument number determines which register is used regardless of how many scalars or vectors came before it. Removing this removes a question I had in D90738. Differential Revision: https://reviews.llvm.org/D90801	2020-11-05 09:49:42 -08:00
Craig Topper	c3e6dee7df	[RISCV] Add isel patterns for fshl with immediate to select FSRI/FSRIW There is no FSLI instruction, but we can emulate it using FSRI by swapping operands and subtracting the immediate from the bitwidth. Differential Revision: https://reviews.llvm.org/D90826	2020-11-05 09:37:43 -08:00
Sander de Smalen	e0eda3654e	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Fangrui Song	c34a7c7b51	[X86] Enable shrink-wrapping for no-frame-pointer non-nounwind functions on platforms not using compact unwind The current compact unwind scheme does not work when the prologue is not at the start (the instructions before the prologue cannot be described). (Technically this is fixable, but it requires multiple compact unwind descriptors for one function.) rL255175 chose to not perform shrink-wrapping for no-frame-pointer functions not marked as nounwind to work around PR25614. This is overly limited, as platforms not supporting compact unwind (all non-Darwin) does not need the workaround. This patch restricts the limitation to compact unwind platforms. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D89930	2020-11-04 16:51:48 -08:00
Arthur Eubanks	b3f5096b36	Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback This allows targets to skip optional optimization passes at -O0. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D90777	2020-11-04 13:11:40 -08:00
Arthur Eubanks	3a6a4e9f83	Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback" This reverts commit 7a83aa0520d24ee5285a9c60b97b57a1db1d65e8. Causing buildbot failures.	2020-11-04 12:57:32 -08:00
Arthur Eubanks	753c4830f9	[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback This allows targets to skip optional optimization passes at -O0. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D90777	2020-11-04 12:53:30 -08:00
Eric Astor	40fb1cd465	[ms] [llvm-ml] Lex MASM strings, including escaping Allow single-quoted strings and double-quoted character values, as well as doubled-quote escaping. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89731	2020-11-04 15:28:43 -05:00
Cameron McInally	4e8609f657	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
Mircea Trofin	724ebfc377	[NFC] Use Register/MCRegister Differential Revision: https://reviews.llvm.org/D90724	2020-11-04 12:20:17 -08:00
Craig Topper	3631995968	[RISCV] Remove assertsexti32 from fslw/fsrw isel patterns. The operations in these patterns shouldn't be effected by sign bits. And the pattern is starting from a sign_extend_inreg so we aren't expecting sign bits to be passed through either. Differential Revision: https://reviews.llvm.org/D90739	2020-11-04 11:37:58 -08:00
Craig Topper	1d80c19eed	[RISCV] Correct the operand order for fshl/fshr to fsl/fsr instructions. fsl/fsr take their shift amount in $rs2 or an immediate. The sources are $rs1 and $rs3. fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and operand 1 in the lower bits. fshl returns the high bits after shifting and fshr returns the low bits. So a shift amount of 0 returns operand 0 for fshl and operand 1 for fshr. fsl/fsr concatenate their operands in different orders such that $rs1 will be returned for a shift amount of 0. So $rs1 needs to come from operand 0 of fshl and operand 1 of fshr. Differential Revision: https://reviews.llvm.org/D90735	2020-11-04 11:13:25 -08:00
Craig Topper	4bde11ee29	[RISCV] Remove assertsexti32 from inputs to riscv_sllw/srlw nodes in B extension isel patterns. riscv_sllw/srlw only reads the lower 32 bits of the first operand. And the lower 5 bits of the second operands. Whether the upper 32 bits of the input are sign bits or not doesn't matter. Also use ineg and not to shorten the patterns. Differential Revision: https://reviews.llvm.org/D90668	2020-11-04 10:35:05 -08:00
Craig Topper	f850868dc3	[RISCV] Check all 64-bits of the mask in SelectRORIW. We need to ensure the upper 32 bits of the mask are zero. So that the srl shifts zeroes into the lower 32 bits. Differential Revision: https://reviews.llvm.org/D90585	2020-11-04 10:15:30 -08:00
Christopher Tetreault	78607d1352	[UBSan] Cannot negate smallest negative signed integer Silence warning Undefined Behavior Sanitzer warning: runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D90710	2020-11-04 10:07:52 -08:00
Craig Topper	7eeba8eadb	[RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI We don't need custom matching, we just a need a predicate to check the immediate is greater than 32. We can use the existing ImmSub32 to adjust the immediate. I've also used the new predicate in the other location that used ImmSub32. I tried to create a test case where we would break without the greater than 32 check on that pattern, but DAG combine defeated me. Still seemed safer to have it. Differential Revision: https://reviews.llvm.org/D90546	2020-11-04 09:59:14 -08:00
Joe Nash	3dea5f368c	[AMDGPU] Resolve pseudo registers at encoding uses Pseudo-registers allow different register encodings between gpu generations. Make sure we resolve the pseudo regs to real regs whenever we get their hardware encoding. Using the correct encodings revealed a register bank conflict and an unnecessary write dependency. Tests have been updated to match. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90721 Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca	2020-11-04 12:52:32 -05:00
Sebastian Neubauer	0003aeadad	[AMDGPU] Fix iterating in SIFixSGPRCopies The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated. This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for divergent calls in branches before. Differential Revision: https://reviews.llvm.org/D90596	2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos	9295b21984	[TableGen] Add !interleave operator to concatenate a list of values with delimiters Add a test. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D90469	2020-11-04 09:23:54 -05:00
Simon Moll	1caff81b49	[VE] Add +vpu attribute `+vpu` controls whether VEISelLowering adds any vregs. This defaults to `-vpu` to have scalar code generation out of the box. We bring up vector isel under the `+vpu` flag. Once vector isel is stable we switch to `+vpu` and advertise vregs and vops in TTI. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D90465	2020-11-04 12:42:00 +01:00
Kerry McLaughlin	1de78af5c4	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
Sebastian Neubauer	7ea56efeb5	[AMDGPU] Set rsrc1 flags for graphics shaders Before they were only set for compute kernels and compute shaders but not for other shaders. Differential Revision: https://reviews.llvm.org/D89399	2020-11-04 12:25:41 +01:00
Sebastian Neubauer	537305eda7	[AMDGPU] Fix ieee mode default value Previously, the default value for ieee mode was - on for compute kernels and compute shaders, - off for all shaders except compute shaders. This commit changes the default to be - on for compute kernels, - off for shaders. This aligns the default value with the settings that are actually in use. To my knowledge, all users of shader calling conventions (mesa and llpc) disable the ieee mode by default. Differential Revision: https://reviews.llvm.org/D89388	2020-11-04 12:25:38 +01:00
David Green	2f8823c9c5	[ARM] Remove unused variable. NFC	2020-11-04 09:00:03 +00:00
Sander de Smalen	ca12e64408	[NFCI] Replace AArch64StackOffset by StackOffset. This patch replaces the AArch64StackOffset class by the generic one defined in TypeSize.h. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D88983	2020-11-04 08:49:00 +00:00
Amara Emerson	4126569494	[AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD. For the <2 x float> case, instead of adding another combine or legalization to get it into a <4 x float> form, I'm just adding a GISel specific selection pattern to cover it. Differential Revision: https://reviews.llvm.org/D90699	2020-11-03 17:25:14 -08:00
Julien Jorge	8488b68226	[WebAssembly] Don't fold frame offset for global addresses When machine instructions are in the form of ``` %0 = CONST_I32 @str %1 = ADD_I32 %stack.0, %0 %2 = LOAD 0, 0, %1 ``` In the `ADD_I32` instruction, it is possible to fold it if `%0` is a `CONST_I32` from an immediate number. But in this case it is a global address, so we shouldn't do that. But we haven't checked if the operand of `ADD` is an immediate so far. This fixes the problem. (The case applies the same for `ADD_I64` and `CONST_I64` instructions.) Fixes https://bugs.llvm.org/show_bug.cgi?id=47944. Patch by Julien Jorge (jjorge@quarkslab.com) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D90577	2020-11-03 14:56:25 -08:00
Sanjay Patel	4b28ca8f6f	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
Jordan Rupprecht	e4fd7971cd	[NFC] Inline wasm assertion-only variable	2020-11-03 13:06:59 -08:00
Andy Wingo	d85c4c569d	[WebAssembly] Implement ref.null This patch adds a new "heap type" operand kind to the WebAssembly MC layer, used by ref.null. Currently the possible values are "extern" and "func"; when typed function references come, though, this operand may be a type index. Note that the "heap type" production is still known as "refedtype" in the draft proposal; changing its name in the spec is ongoing (https://github.com/WebAssembly/reference-types/issues/123). The register form of ref.null is still untested. Differential Revision: https://reviews.llvm.org/D90608	2020-11-03 10:46:23 -08:00
Craig Topper	b3e56d6425	[RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions. DAGCombine doesn't canonicalize rotl/rotr with immediate so we need patterns for both. Remove the custom matcher for rotl to RORI and just use a SDNodeXForm to convert the immediate instead. Doing this gives priority to the rev32/rev16 versions of grevi over rori since an explicit immediate is more precise than any immediate. I also added rotr patterns for rev32/rev16. And removed the (or (shl), (shr)) patterns that should be combined to rotl by DAG combine. There is at least one other grev pattern that probably needs a another rotr pattern, but we need more test coverage first. Differential Revision: https://reviews.llvm.org/D90575	2020-11-03 10:04:52 -08:00
Esme-Yi	f754a1db1b	Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA." This reverts commit 119ab2181e6ed823849c93d55af8e989c28c9f3c.	2020-11-03 16:34:02 +00:00
Tim Renouf	83e3834a8d	[AMDGPU] Add gfx1033 target Differential Revision: https://reviews.llvm.org/D90447 Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761	2020-11-03 16:27:48 +00:00
Tim Renouf	2a63696860	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
Jay Foad	d6e4e20e6b	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jameson Nash	11a667f122	make the AsmPrinterHandler array public This lets external consumers customize the output, similar to how AssemblyAnnotationWriter lets the caller define callbacks when printing IR. The array of handlers already existed, this just cleans up the code so that it can be exposed publically. Replaces https://reviews.llvm.org/D74158 Differential Revision: https://reviews.llvm.org/D89613	2020-11-03 10:02:09 -05:00
Sanjay Patel	d761b7c23c	[x86] update cost table comments for maxnum; NFC Follow-up suggested in D90613.	2020-11-03 08:09:59 -05:00
David Green	be5f4f896c	[ARM] Remove unused variable. NFC	2020-11-03 12:58:10 +00:00

1 2 3 4 5 ...

59994 Commits