llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00

Author	SHA1	Message	Date
Stanislav Mekhanoshin	2faf2dbdbb	[AMDGPU] Mark sin/cos load folding as modifying the function. When the load value is folded into the sin/cos operation, the AMDGPU library calls simplifier could still mark the function as unmodified. Instead ensure if there is an early return, return whether the load was folded into the sin/cos call. Authored by MJDSys Differential Revision: https://reviews.llvm.org/D91401	2020-11-13 14:49:33 -08:00
Sam Clegg	372af5a9de	[WebAssembly] Move GlobalTLSAddress handling to WebAssemblyISelLowering. NFC I'm not why it was added to DAGToDAG oringally but it seems to make sense alongside the non-TLS version: LowerGlobalAddress Differential Revision: https://reviews.llvm.org/D91432	2020-11-13 14:35:51 -08:00
Heejin Ahn	fb835643a1	[WebAssembly] Rename atomic.notify and *.atomic.wait - atomic.notify -> memory.atomic.notify - i32.atomic.wait -> memory.atomic.wait32 - i64.atomic.wait -> memory.atomic.wait64 See https://github.com/WebAssembly/threads/pull/149. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D91447	2020-11-13 12:04:48 -08:00
Baptiste Saleil	f21e542c3b	[PowerPC] Add paired vector load and store builtins and intrinsics This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs. Differential Revision: https://reviews.llvm.org/D90799	2020-11-13 12:35:10 -06:00
Jessica Paquette	ea446fa791	[AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc When we see ``` %sub = G_SUB 0, %x %select = G_SELECT %cc, %t, %sub ``` Fold away the G_SUB by producing ``` %select = CSNEG %t, %x, cc ``` Simple IR example: https://godbolt.org/z/K8TEnh This is valid on both sides of the select, but for now, just handle one side. It may make more sense to handle swapping sides during post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90723	2020-11-13 10:12:51 -08:00
Jessica Paquette	ea9106c312	[AArch64][GlobalISel] NFC: Use CmpInst::isUnsigned instead of static helper Reducing some code duplication. We had a helper for checking if a predicate is unsigned. Remove that and use the existing function in Instructions.cpp. Differential Revision: https://reviews.llvm.org/D91288	2020-11-13 09:35:42 -08:00
Wouter van Oortmerssen	fcaec2bb33	[WebAssembly] Added R_WASM_FUNCTION_OFFSET_I64 for use with DWARF DW_AT_low_pc Needed for wasm64, see discussion in https://reviews.llvm.org/D91203 Differential Revision: https://reviews.llvm.org/D91395	2020-11-13 09:32:31 -08:00
Jessica Paquette	18f4a04bc7	[GlobalISel] Add matchers for specific constants and a matcher for negations It's fairly common to need matchers for a specific constant value, or for common idioms like finding a negated register. Add - `m_SpecificICst`, which returns true when matching a specific value.. - `m_ZeroInt`, which returns true when an integer 0 is matched. - `m_Neg`, which returns when a register is negated. Also update a few places which use idioms related to the new matchers. Differential Revision: https://reviews.llvm.org/D91397	2020-11-13 09:24:54 -08:00
Matt Arsenault	b3aa28b9d4	AMDGPU: Factor out large flat offset splitting	2020-11-13 11:22:13 -05:00
Sam Clegg	258639659d	[WebAssembly] Add new relocation type for TLS data symbols These relocations represent offsets from the __tls_base symbol. Previously we were just using normal MEMORY_ADDR relocations and relying on the linker to select a segment-offset rather and absolute value in Symbol::getVirtualAddress(). Using an explicit relocation type allows allow us to clearly distinguish absolute from relative relocations based on the relocation information alone. One place this is useful is being able to reject absolute relocation in the PIC case, but still accept TLS relocations. Differential Revision: https://reviews.llvm.org/D91276	2020-11-13 07:59:29 -08:00
Matt Arsenault	5020d4daa4	AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage	2020-11-13 10:58:17 -05:00
Kerry McLaughlin	c26a89a1b4	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
Simon Pilgrim	16ce8e35dd	Fix MSVC signed/unsigned comparison warning. NFCI.	2020-11-13 10:20:48 +00:00
Kazushi (Jam) Marukawa	c5d772e8d0	[VE] Add vst intrinsic instructions Add vst intrinsic instructions and a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91406	2020-11-13 19:11:57 +09:00
Jay Foad	a68bf7cb45	[AMDGPU] One more use of the new export target names. NFC.	2020-11-13 09:44:09 +00:00
serge-sans-paille	82b6e6053d	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Craig Topper	c2ee7a175e	[X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4. This was a mistake introduced in D91294. I'm not sure how to exercise this with the existing code, but I hit it while trying some follow up experiments.	2020-11-12 21:48:45 -08:00
Craig Topper	da8115ff1d	[X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0. I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue. Should fix PR48147. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91294	2020-11-12 21:28:18 -08:00
Stanislav Mekhanoshin	4dbdbbe753	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Jessica Paquette	2e5d95bf74	[AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants Select the following: - G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc - G_SELECT cc 0, -1 -> CSINV zreg, zreg cc - G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc - G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc - G_SELECT cc, t, 1 -> CSINC t, zreg, cc - G_SELECT cc, t, -1 -> CSINC t, zreg, cc (IR example: https://godbolt.org/z/YfPna9) These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td. Unfortunately, it doesn't seem like we can import patterns that use NZCV like those ones do. E.g. ``` def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV), (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>; ``` So we have to manually select these for now. This replaces `selectSelectOpc` with an `emitSelect` function, which performs these optimizations. Differential Revision: https://reviews.llvm.org/D90701	2020-11-12 14:44:01 -08:00
Kazushi (Jam) Marukawa	aa1acddbb3	[VE] Support vld intrinsics Add intrinsics for vector load instructions. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91332	2020-11-13 07:34:42 +09:00
Stanislav Mekhanoshin	f12ec97f64	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	7dc590a5ce	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Jay Foad	d376fdebe3	[AMDGPU] Define and use names for export targets. NFC. Differential Revision: https://reviews.llvm.org/D91289	2020-11-12 19:57:14 +00:00
Craig Topper	fbc8fa8586	[MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes. Follow up from a similar patch on RISCV 637f19c36b323cc3ab597f6ef138db53be395949 Nothing reads this Glue value that I could see. The SDNode def in the td file does not have the SDNPOutGlue flag so I don't think this glue would get properly propagated to MachineSDNodes if it was used.	2020-11-12 10:34:01 -08:00
Craig Topper	03edfee10d	[RISCV] Don't include CodeGen layer files in MC layer -Use MCRegister instead of Register in MC layer. -Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits. Differential Revision: https://reviews.llvm.org/D91114	2020-11-12 07:45:38 -08:00
Craig Topper	5ab383a3c0	[RISCV] Add an ANDI to shift amount of FSL/FSR instructions The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present. We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness. Differential Revision: https://reviews.llvm.org/D90905	2020-11-12 07:33:40 -08:00
David Green	9d0284e0e8	[ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP Of course there was something missing, in this case a check that the def of the count register we are adding to a t2DoLoopStartTP would dominate the insertion point. In the future, when we remove some of these COPY's in between, the t2DoLoopStartTP will always become the last instruction in the block, preventing this from happening. In the meantime we need to check they are created in a sensible order. Differential Revision: https://reviews.llvm.org/D91287	2020-11-12 13:47:46 +00:00
Kazushi (Jam) Marukawa	8fbbc2d925	[VE] Change the default type of v64 register class Change the default type of v64 register class from v512i32 to v256f64. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91301	2020-11-12 19:07:07 +09:00
David Sherwood	187863ee97	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Amara Emerson	bda2529bef	[AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB.	2020-11-11 22:46:53 -08:00
Baptiste Saleil	c452241c8f	[PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore This patch adds support for accumulator/unprimed accumulator register copy, spill and restore for MMA. Authored By: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D90616	2020-11-11 16:23:45 -06:00
Jessica Paquette	3ff178e54c	[AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support When there is full fp16 support, there is no reason to widen 16-bit G_FCONSTANTs to 32 bits. Mark them as legal in this case. Also, we currently import a pattern for materializing a 16-bit 0.0. Add a testcase showing we select it. (All other 16-bit G_FCONSTANTS are not yet selected.) Differential Revision: https://reviews.llvm.org/D89164	2020-11-11 13:25:11 -08:00
Craig Topper	74aa09eae4	[RISCV] Remove traces of Glue from RISCVISD::SELECT_CC We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag. Since we don't seem to need the Glue just get rid of it from both places. Differential Revision: https://reviews.llvm.org/D91199	2020-11-11 09:30:48 -08:00
Jessica Paquette	41541293b2	[AArch64][GlobalISel] Select arith extended add/sub in manual selection code The manual selection code for add/sub was not checking if it was possible to fold in shifts + extends (the *rx opcode variants). As a result, we could never select things like ``` cmp x1, w0, uxtw #2 ``` Because we don't import any patterns for compares. This adds support for the arithmetic shifted register forms and updates tests for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91207	2020-11-11 09:26:03 -08:00
Jessica Paquette	ea9b12d4bc	[AArch64][GlobalISel] Select negative arithmetic immediates in manual selector Previously, we only handled negative arithmetic immediates in the imported selector code. Since we don't import code for, say, compares, we were missing opportunities for things like ``` %cst:gpr(s64) = G_CONSTANT i64 -10 %cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst -> %adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv %cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv ``` Instead, we would have to materialize the constant and emit a SUBS. This adds support for selection like above for SUB, SUBS, ADD, and ADDS. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91108	2020-11-11 09:20:05 -08:00
Jay Foad	0665c5dea6	[AMDGPU] Separate out real exp instructions by subtarget. NFC. Differential Revision: https://reviews.llvm.org/D91247	2020-11-11 17:13:40 +00:00
Jay Foad	dfb3b2257d	[AMDGPU] Split exp instructions out into their own tablegen file. NFC. Differential Revision: https://reviews.llvm.org/D91246	2020-11-11 17:13:40 +00:00
Jay Foad	8325244191	[AMDGPU] Make use of SIInstrInfo::isEXP. NFC.	2020-11-11 17:01:20 +00:00
Jay Foad	6dcb8b5cd3	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit 8b08fa0103c8d8e624b19fad5a5006e7a783ecb7. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Caroline Concatto	bc83bf6558	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Simon Pilgrim	a67d0946d3	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	859b066624	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Kerry McLaughlin	e6fbb8fc55	[SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter(). Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print the details of masked scatters (is truncating, signed or scaled). This is the first in a series of patches which adds support for scalable masked scatters Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90939	2020-11-11 10:58:24 +00:00
Sam Parker	d182f68a32	[NFC][ARM] Replace lambda with any_of	2020-11-11 10:02:55 +00:00
Amara Emerson	8ba538f8a2	[AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG. These do things like turn a multiply of a pow-2+1 into a shift and and add, which is a common pattern that pops up, and is universally better than expensive madd instructions with a constant. I've added check lines to an existing codegen test since the code being ported is almost identical, however the mul by negative pow2 constant tests don't generate the same code because we're missing some generic G_MUL combines still. Differential Revision: https://reviews.llvm.org/D91125	2020-11-10 22:21:13 -08:00
Gaurav Jain	c4730a384c	[NFC] Use [MC]Register for x86 target Differential Revision: https://reviews.llvm.org/D91161	2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa	51f907845b	[VE] Implement FoldImmediate Implement FoldImmediate for only integer aritihmetic operations. Add regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D91150	2020-11-11 08:08:32 +09:00
Pirama Arumuga Nainar	8ada0737df	[ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt. Previously we used setRegClass to rgpr, which may expand the register domain if the result was already in a constrained class (tcgpr in the above PR). Differential Revision: https://reviews.llvm.org/D91192	2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin	a006cc60e7	[AMDGPU] Set default op_sel_hi on accvgpr read/write These are opsel opcodes with op_sel actually being ignored. As a such op_sel_hi needs to be set to default 1 even though these bits are ignored. This is compatibility change. Differential Revision: https://reviews.llvm.org/D91202	2020-11-10 13:07:29 -08:00

1 2 3 4 5 ...

60088 Commits