llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 04:02:41 +01:00

Author	SHA1	Message	Date
Alexander Belyaev	4bc5332eae	[llvm] Inline getAssociatedFunction() in LLVM_DEBUG. Function* F is used only inside LLVM_DEBUG, so that it causes unused variable warning.	2021-07-24 11:49:21 +02:00
Amara Emerson	c79055fae4	[GlobalISel] Add GUnmerge, GMerge, GConcatVectors, GBuildVector abstractions. NFC. Use these to slightly simplify some code in the artifact combiner.	2021-07-23 22:32:26 -07:00
Kuter Dinel	a502a514d2	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997	2021-07-24 06:07:15 +03:00
Thomas Lively	a913c9bb30	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Thomas Lively	40588d371f	[WebAssembly][NFC] Simplify SIMD bitconvert pattern Differential Revision: https://reviews.llvm.org/D106680	2021-07-23 14:43:48 -07:00
Craig Topper	a4ed0c97a6	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Craig Topper	da55d61e8f	[X86] Fix a bug in TEST with immediate creation This code tries to form a TEST from CMP+AND with an optional truncate in between. If we looked through the truncate, we may have extra bits in the AND mask that shouldn't participate in the checks. Normally SimplifyDemendedBits takes care of this, but the AND may have another user. So manually mask out any extra bits. Fixes PR51175. Differential Revision: https://reviews.llvm.org/D106634	2021-07-23 09:03:53 -07:00
Benjamin Kramer	f11d6ca5ae	[llvm][sve] Silence unused variable warning in Release builds. NFC	2021-07-23 16:16:35 +02:00
Sanjay Patel	845cd9f16b	[x86] improve CMOV codegen by pushing add into operands This is not the transform direction we want in general, but by the time we have a CMOV, we've already tried everything else that could be better. The transform increases the uses of the other add operand, but that is safe according to Alive2: https://alive2.llvm.org/ce/z/Yn6p-A We could probably extend this to other binops (not just add). This is the motivating pattern discussed in: https://llvm.org/PR51069 The test with i8 shows a missed fold because there's a trunc sitting in front of the add. That can be handled with a small follow-up. Differential Revision: https://reviews.llvm.org/D106607	2021-07-23 09:39:32 -04:00
David Truby	130948388d	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
Simon Pilgrim	2e0db61480	[X86][AVX] lowerV2X128Shuffle - attempt to recognise broadcastf128 subvector load As noticed on PR50053 we were failing to recognise when a shuffle of a load was really a subvector broadcast load	2021-07-23 13:10:38 +01:00
David Green	f41dff2733	[AArch64] Add worst case shuffle costs This adds some missing single source shuffle costs for AArch64, of i16 and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3 coming from the perfect shuffle tables. The larger vector sizes expand into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily chose 8 for the cost to be expensive but not too expensive. Differential Revision: https://reviews.llvm.org/D106241	2021-07-23 09:01:58 +01:00
Sebastian Neubauer	0e5d17756d	[AMDGPU] Fix running ResourceUsageAnalysis Clear the map when running the analysis multiple times. The assertion that should ensure that every function is only analyzed once triggered sometimes (once every ~70 compiles of some graphics pipelines) when two functions of subsequent runs were allocated at the same address. Differential Revision: https://reviews.llvm.org/D106452	2021-07-23 09:25:15 +02:00
Carl Ritson	924e1a4716	[AMDGPU] Add maximum NSA size limit ISA feature Add maximum NSA size limit as an ISA feature. Use this to reduce NSA usage on GFX10.1 to avoid stability issues with 4 and 5 dwords NSA instructions. Maintain use of longer NSA instructions on GFX10.3. Note: this also contains some minor fixes for GlobalISel which did not work correctly with non-NSA form instructions on GFX10. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103348	2021-07-23 16:16:06 +09:00
Cullen Rhodes	9458751292	[AArch64][AsmParser] NFC: when creating a token IsSuffix=false should be default Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106568	2021-07-23 06:36:06 +00:00
Hsiangkai Wang	21b283ef66	[RISCV] Add FrameSetup/FrameDestroy flag to prologue/epilog instructions. Differential Revision: https://reviews.llvm.org/D105086	2021-07-23 11:35:19 +08:00
Vitaly Buka	381f03cdf9	[NFC][asan] Always pass Dominator Trees into forAllReachableExits	2021-07-22 18:01:38 -07:00
Thomas Johnson	4866aceb76	[ARC] Add tablegen definition for the Find Leading Set (FLS) instruction Differential Revision: https://reviews.llvm.org/D106602	2021-07-22 17:42:25 -07:00
Paulo Matos	e8be0ee828	[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR Reland of 31859f896. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104797	2021-07-22 22:07:24 +02:00
Simon Pilgrim	c4477aac03	[CostModel][X86] Adjust shift SSE4 legalized costs based on llvm-mca reports. Update shl/lshr/ashr costs based on the worst case costs from the script in D103695 - many of the 128-bit shifts (usually where integer multiplies aren't used) have similar behaviour to AVX1 so we can merge them.	2021-07-22 20:07:32 +01:00
Simon Pilgrim	815b215830	[X86] Fix SLM FP<->INT throughputs. Noticed while trying to clean up the shift costs model for SSE4 targets using the script in D10369 - SLM double-pumps all the 128-bit vector conversion ops and only use FP0 pipe - numbers taken from Intel AOM + Agner.	2021-07-22 19:39:04 +01:00
Thomas Johnson	60773756e3	[ARC] Add disassembly for the conditioned RSUB immediate instruction Differential Revision: https://reviews.llvm.org/D106497	2021-07-22 11:34:39 -07:00
David Green	f13ef26613	[AArch64] Adjust the cost of integer sum reductions This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of an add is assumed to be 1. This brings it inline with the other reductions. Differential Revision: https://reviews.llvm.org/D106240	2021-07-22 18:19:54 +01:00
Simon Pilgrim	8371f55768	[CostModel][X86] Adjust shift SSE legalized costs based on llvm-mca reports. Update shl/lshr/ashr costs based on the worst case costs from the script in D103695.	2021-07-22 18:12:49 +01:00
Victor Huang	017f21fed1	[PowerPC] Add PowerPC "__stbcx" builtin and intrinsic for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtin and intrinsic for "__stbcx". Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D106484	2021-07-22 10:48:46 -05:00
Cullen Rhodes	5202ca9718	[AArch64][SME] Improve diagnostic for vector select register Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D106540	2021-07-22 13:46:40 +00:00
Fraser Cormack	4c884f4ac3	[RISCV] Fix a crash when lowering split float arguments Lowering certain float vectors without legal vector types could cause a crash due to a bad interaction between passing floats via GPRs and argument splitting. Split vector floats appear just like scalar floats. Under certain situations we choose to pass these float arguments via GPRs and use an XLenVT location and set the 'BCvt' info to track how they must be converted back to floating-point values. However, later logic for handling split arguments may take over, in which case we lose the previous information and set the 'Indirect' info, thus incorrectly lowering to integer types. I don't believe that we would have come across the notion of split floating-point arguments before. This patch addresses the issue by updating the lowering so that split arguments are only passed indirectly when they are scalar integer types. This has some change to how we lower some larger illegal float vectors, as can be seen in 'fastcc-float.ll' where the vector is now passed partly in registers and partly on the stack. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D102852	2021-07-22 09:55:26 +01:00
Fraser Cormack	ecdacaf414	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This relands a6ca88e908b5befcd9b0f8c8cb40f53095cc17bc which was originally reverted due to overflow bugs in e3fa2b1eab60342dc882b7b888658b03c472fa2b. This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-22 09:36:12 +01:00
Ben Shi	7d4933eff2	[RISCV] Optimize multiplication in the zba extension with SHADD This patch make the following optimization. (mul x, 3 power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105796	2021-07-22 10:28:41 +08:00
Carl Ritson	8b2d2affc5	[AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr. Previously these were rounded up to VReg_256 this saves VGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103800	2021-07-22 10:42:15 +09:00
Carl Ritson	41b211a722	[AMDGPU] Allow frontends to disable null export for pixel shaders Disable null export (for kills) when a frontend defines a pixel shader as not exporting using amdgpu-color-export and amdgpu-depth-export function attrbutes. This allows the generation of export free pixel shaders. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105683	2021-07-22 10:20:46 +09:00
Thomas Lively	8403ff42b3	[WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8 Use the standard target-independent intrinsic to take advantage of standard optimizations. Differential Revision: https://reviews.llvm.org/D106506	2021-07-21 16:45:54 -07:00
Jessica Paquette	f9cc27a4e9	[AArch64][GlobalISel] Change \| -> \|\| in an if I wrote the wrong type of OR by mistake.	2021-07-21 14:57:31 -07:00
Stanislav Mekhanoshin	54565ba720	[AMDGPU] Mark relevant rematerializable VOP3 instructions Differential Revision: https://reviews.llvm.org/D106110	2021-07-21 14:44:13 -07:00
Stanislav Mekhanoshin	b044663832	[AMDGPU] Mark relevant rematerializable VOP2 instructions Differential Revision: https://reviews.llvm.org/D106023	2021-07-21 14:24:59 -07:00
David Green	f01cf44407	[ARM] Pass SelectionDAG to methods that dont require DCI. NFC In these methods DCI is never used, only the DAG from it. Pass the DAG directly, cleaning up the code a little.	2021-07-21 22:11:09 +01:00
Stanislav Mekhanoshin	c90b3afab5	[AMDGPU] Mark all relevant VOP1 instructions rematerializable Differential Revision: https://reviews.llvm.org/D105919	2021-07-21 14:05:32 -07:00
Stanislav Mekhanoshin	5ee240dc93	[AMDGPU] Move perfhint analysis This is SCC pass, moving it to the end of SCC PM saves one Function PM. This needs the analysis to take into account memory access width since it is now places after the load/store optimizer (D105651). Differential Revision: https://reviews.llvm.org/D105652	2021-07-21 13:06:49 -07:00
Jessica Paquette	cb309a9442	[AArch64][GlobalISel] Widen s2 and s4 G_IMPLICIT_DEF + G_FREEZE These had ``` .clampScalar(0, s1, 64) .widenScalarToNextPow2(0, 8) ``` If you have s2 or s4, then `widenScalarToNextPow2` does nothing. This changes the `widenScalarToNextPow2` rule to use s8 as the minimum type instead, allowing us to correctly widen s2 and s4. This does not impact s1, since it's marked as legal already. Differential Revision: https://reviews.llvm.org/D106413	2021-07-21 12:59:20 -07:00
Stanislav Mekhanoshin	5b3e6630e5	[AMDGPU] Tune perfhint analysis to account access width A function with less memory instructions but wider access is the same as a function with more but narrower accesses in terms of memory boundness. In fact the pass would give different answers before and after vectorization without this change. Differential Revision: https://reviews.llvm.org/D105651	2021-07-21 12:46:10 -07:00
Craig Topper	6a9e481d78	[RISCV] Cleanup comment around vector tail policy handling. NFC vmv.x.s and reductions don't ignore tail policy anymore.	2021-07-21 12:45:08 -07:00
Eli Friedman	4384bac220	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Thomas Lively	d9c07b1710	[WebAssembly] Codegen for v128.load{32,64}_zero Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal instruction selection patterns. The wasm_simd128.h intrinsics header was already using portable code for the corresponding intrinsics, so now it produces the correct instructions. Differential Revision: https://reviews.llvm.org/D106400	2021-07-21 09:02:12 -07:00
Eric Astor	be0514d8cd	[ms] [llvm-ml] Restrict implicit RIP-relative addressing to named-variable references ML64.EXE applies implicit RIP-relative addressing only to memory references that include a named-variable reference. Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D105372	2021-07-21 11:49:58 -04:00
Quinn Pham	2b9fe667ec	[PowerPC] Floating Point Builtins for XL Compat. This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds builtins related to floating point operations Reviewed By: #powerpc, nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D103986	2021-07-21 08:33:39 -05:00
Sebastian Neubauer	5f547ad156	[AMDGPU] Improve killed check for vgpr optimization The killed flag is not always set. E.g. when a variable is used in a loop, it is never marked as killed, although it is unused in following basic blocks. Also, we try to deprecate kill flags and not use them. Check if the register is live in the endif block. If not, consider it killed in the then and else blocks. The vgpr-liverange tests have two new tests with loops (pre-committed, so the diff is visible). I also needed to change the subtarget to gfx10.1, otherwise calls are not working. Differential Revision: https://reviews.llvm.org/D106291	2021-07-21 15:24:59 +02:00
Jay Foad	fd3020376e	[AMDGPU] NFC refactoring in isel for buffer access intrinsics Rename getBufferOffsetForMMO to updateBufferMMO and pass in the MMO to be updated, in preparation for the bug fix in D106284. Call updateBufferMMO consistently for all buffer intrinsics, even the ones that use setBufferOffsets to decompose a combined offset expression. Add a getIdxEn helper function. Differential Revision: https://reviews.llvm.org/D106354	2021-07-21 11:12:49 +01:00
Cullen Rhodes	23e61e0bd4	[AArch64][SME] Support .arch and .arch_extension assembler directives Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105566	2021-07-21 08:40:27 +00:00
Tim Northover	a5f7171155	ARM: don't return by popping PC if we have to adjust the stack afterwards. In mandatory tail calling conventions we might have to deallocate stack space used by our arguments before return. This happens after popping CSRs, so the pop cannot be turned into the return itself in this case. The else branch here was already a nop, so removing it as a tidy-up.	2021-07-21 09:35:14 +01:00
Tim Northover	bd9402142a	AArch64: support 8 & 16-bit atomic operations in GlobalISel We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they assume the value types will have been legalized to 32-bits. So this adds the ability to widen them to both AArch64 & generic GISel infrastructure.	2021-07-21 09:35:14 +01:00
Cullen Rhodes	f8b2068905	[AArch64][SME] Add mova instructions This patch adds the mova instruction to insert/extract an SVE vector register to/from a ZA tile vector. The preferred MOV aliases are also implemented. Depends on D105572. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm, CarolineConcatto Differential Revision: https://reviews.llvm.org/D105574	2021-07-21 08:20:01 +00:00
Cullen Rhodes	db780333ba	[AArch64][SME] Add ldr and str instructions The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D105573	2021-07-21 08:17:13 +00:00
Tianqing Wang	5f5c9808cd	[X86] Update MachineLoopInfo in CMOV conversion. If a CMOV is in a loop and is converted to branches, CMOV conversion wouldn't add newly created basic blocks to loop info. Since the candidates is collected based on loops, instructions in these basic blocks will be ignored. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D104623	2021-07-21 10:53:46 +08:00
Albion Fung	d88c540901	[PowerPC] Implemented mtmsr, mfspr, mtspr Builtins Implemented builtins for mtmsr, mfspr, mtspr on PowerPC; the patch is intended for XL Compatibility. Differential revision: https://reviews.llvm.org/D106130	2021-07-20 17:51:00 -05:00
Jon Roelofs	ff40770cd1	[AArch64][GlobalISel] Legalize ctpop for v2s64, v2s32, v4s32, v4s16, v8s16 https://llvm.godbolt.org/z/nTTK6M5qe Differential revision: https://reviews.llvm.org/D106388	2021-07-20 15:37:56 -07:00
Albion Fung	63ce4846c7	[PowerPC] Store, load, move from and to registers related builtins This patch implements store, load, move from and to registers related builtins, as well as the builtin for stfiw. The patch aims to provide feature parady with xlC on AIX. Differential revision: https://reviews.llvm.org/D105946	2021-07-20 15:46:14 -05:00
Jessica Paquette	16d5479ea5	[AArch64][GlobalISel] Select llvm.aarch64.neon.st2 intrinsics Add manual selection code similar to the code in AArch64ISelDAGToDAG, and add `createTuple` helpers similar to the code there as well. This accounted for around 111 fallbacks while building clang for AArch64 with GlobalISel. This also should make it easy to add selection code for other store intrinsics. As a minor cleanup, this uses `createQTuple` in the other place where we use REG_SEQUENCE. Differential Revision: https://reviews.llvm.org/D106332	2021-07-20 13:23:46 -07:00
Eli Friedman	a773ea23aa	[AArch64] Use the CMP_SWAP_128 variants added in 843c6140. Accidentally forgot to flip the opcode... and I didn't notice because it was working fine for the GlobalISel.	2021-07-20 13:23:27 -07:00
Fangrui Song	afd2d1339a	[AArch64] Delete unused Opcode after D106039	2021-07-20 12:51:44 -07:00
Eli Friedman	91da841387	[AArch64] Fix i128 cmpxchg using ldxp/stxp. Basically two parts to this fix: 1. Stop using AtomicExpand to expand cmpxchg i128 2. Fix AArch64ExpandPseudoInsts to use a correct expansion. From ARM architecture reference: To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from the Load-Exclusive pair. Fixes https://bugs.llvm.org/show_bug.cgi?id=51102 Differential Revision: https://reviews.llvm.org/D106039	2021-07-20 12:38:12 -07:00
Victor Huang	f281101580	[PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch add the builtin and emit target independent code for __cmpb. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D105194	2021-07-20 13:06:22 -05:00
Craig Topper	a2254d3fcc	[RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants. If we need to shift left anyway we might be able to take advantage of LUI implicitly shifting its immediate left by 12 to cover part of the shift. This allows us to use more bits of the LUI immediate to avoid an ADDI. isDesirableToCommuteWithShift now considers compressed instruction opportunities when deciding if commuting should be allowed. I believe this is the same or similar to one of the optimizations from D79492. Reviewed By: luismarques, arcbbb Differential Revision: https://reviews.llvm.org/D105417	2021-07-20 09:22:06 -07:00
Craig Topper	95c86f6829	[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2) Replace some existing isel patterns that are covered by the new code. SLLIUWPat has been removed in favor of folding its root case into the new code. The other uses in isel patterns for shXadd.uw have been switched to using hardcoded AND masks. This is based on the original version of D49585 from ARM. The final version of that was made a DAG combine, but I've chosen to keep it as custom isel. I'm not convinced DAG combine is as good with shift pairs as it is with and+shift. I saw some issues optimizing the shifts created by vscale lowering if an and isn't created for from a shift pair. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106230	2021-07-20 08:53:55 -07:00
Stefan Pintilie	dbd743acfc	[PowerPC] Inefficient register allocation of ACC registers results in many copies. ACC registers are a combination of four consecutive vector registers. If the vector registers are assigned first this often forces a number of copies to appear just before the ACC register is created. If the ACC register is assigned first then fewer copies are generated when the vector registers are assigned. This patch tries to force the register allocator to assign the ACC registers first and then the UACC registers and then the vector pair registers. It does this by changing the priority of the register classes. This patch also adds hints to help the register allocator assign UACC registers from known ACC registers and vector pair registers from known UACC registers. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105854	2021-07-20 10:53:40 -05:00
Craig Topper	9d554c2dc6	[RISCV] Use unordered indexed loads for MGATHER. I don't think the semantics of the llvm masked gather intrinsic care about the order the elements are loaded. For example, type legalization by splitting will chain them in parallel. This is different than scatter which we do chain in order. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106025	2021-07-20 08:46:02 -07:00
Bradley Smith	b354b5b95c	[AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts Instead move them to the instcombine that happens in AArch64TargetTransformInfo. Differential Revision: https://reviews.llvm.org/D106144	2021-07-20 14:17:30 +00:00
Simon Pilgrim	9e495cac81	[X86] X86InstCombineIntrinsic.cpp - silence clang-tidy warnings about incorrect uses of auto. NFCI. We were using auto instead of auto* in a number of places which failed the llvm-qualified-auto check. Additionally we were using auto in some places where the type wasn't immediately obvious - the style guide rule of thumb is only to use auto from casts etc. where the type is already explicitly stated.	2021-07-20 13:37:45 +01:00
Sebastian Neubauer	d89384c520	[AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. This is more accurate than guessing the maximum register usage without looking at the actual usage. As before, assume that indirect calls will hit a function in the current module. Differential Revision: https://reviews.llvm.org/D105839	2021-07-20 13:48:50 +02:00
Stanislav Mekhanoshin	c99f31d90e	[AMDGPU] Disable LDS lowering for GFX shaders Apparently these need external LDS symbols to remain. Fixes: SC1-3279 Differential Revision: https://reviews.llvm.org/D106288	2021-07-20 02:55:25 -07:00
Sander de Smalen	04cb04f74d	[AArch64][SVE][InstCombine] last{a,b} of a splat vector Replace last{a,b}(splat(X)) with X, irrespective of the predicate. Patch by/Committing on behalf of: Usman Nadeem (mnadeem) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105520	2021-07-20 09:44:43 +01:00
Cullen Rhodes	6765229853	[AArch64][SME] Add system registers and related instructions This patch adds the new system registers introduced in SME: - ID_AA64SMFR0_EL1 (ro) SME feature identifier. - SMCR_ELx (r/w) streaming mode control register for configuring effective SVE Streaming SVE Vector length when the PE is in Streaming SVE mode. - SVCR (r/w) streaming vector control register, visible at all exception levels. Provides access to PSTATE.SM and PSTATE.ZA using MSR and MRS instructions. - SMPRI_EL1 (r/w) streaming mode execution priority register. - SMPRIMAP_EL2 (r/w) streaming mode priority mapping register. - SMIDR_EL1 (ro) streaming mode identification register. - TPIDR2_EL0 (r/w) for use by SME software to manage per-thread SME context. - MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for labelling memory accesses performed in streaming mode. Also added in this patch are the SME mode change instructions. Three MSR immediate instructions are implemented to set or clear PSTATE.SM, PSTATE.ZA, or both respectively: - MSR SVCRSM, #<imm1> - MSR SVCRZA, #<imm1> - MSR SVCRSMZA, #<imm1> The following smstart/smstop aliases are also implemented for convenience: smstart -> MSR SVCRSMZA, #1 smstart sm -> MSR SVCRSM, #1 smstart za -> MSR SVCRZA, #1 smstop -> MSR SVCRSMZA, #0 smstop sm -> MSR SVCRSM, #0 smstop za -> MSR SVCRZA, #0 The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105576	2021-07-20 08:06:26 +00:00
Amara Emerson	c7647f3217	[AArch64][GlobalISel] Don't form truncstores in postlegalizer-lowering for s128. We don't support truncating s128 stores, so don't form them.	2021-07-20 00:04:34 -07:00
Kai Luo	98d913fa91	[PowerPC] Fallback to base's implementation of shouldExpandAtomicCmpXchgInIR and shouldExpandAtomicCmpXchgInIR If we can't decide `shouldExpandAtomicCmpXchgInIR` or `shouldExpandAtomicCmpXchgInIR` in PPC's implementation after https://reviews.llvm.org/rGb9c3941cd61de1e1b9e4f3311ddfa92394475f4b, resort to base's implementation. This fixes internal build of OpenMP which uses atomic operations on float. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106234	2021-07-20 06:14:24 +00:00
Matt Arsenault	d58cbace52	AArch64/GlobalISel: Preserve memory types	2021-07-19 20:21:05 -04:00
Derek Schuff	0c7127a873	[WebAssembly] Generate R_WASM_FUNCTION_OFFSET relocs in debuginfo sections Debug info sections need R_WASM_FUNCTION_OFFSET_I32 relocs (with FK_Data_4 fixup kinds) to refer to functions (instead of R_WASM_TABLE_INDEX as is used in data sections). Usually this is done in a convoluted way, with unnamed temp data symbols which target the start of the function, in which case WasmObjectWriter::recordRelocation converts it to use the section symbol instead. However in some cases the function can actually be undefined; in this case the dwarf generator uses the function symbol (a named undefined function symbol) instead. In that case the section-symbol transform doesn't work and we need to generate the correct reloc type a different way. In this change WebAssemblyWasmObjectWriter::getRelocType takes the fixup section type into account to choose the correct reloc type. Fixes PR50408 Differential Revision: https://reviews.llvm.org/D103557	2021-07-19 14:02:33 -07:00
Jonas Paulsson	6713c55c41	[SystemZ] Handle NoRegister in SystemZTargetLowering::emitMemMemWrapper(). Bugfix: The compiler should be able to generate a memset to nullptr. Review: Ulrich Weigand	2021-07-19 20:04:44 +02:00
Amy Huang	f2ec69fb59	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit c305557acdaad453e32309d575fe9c6c7090c099.	2021-07-19 11:03:33 -07:00
Wouter van Oortmerssen	8a42745952	[WebAssembly] Support R_WASM_MEMORY_ADDR_TLS_SLEB64 for wasm64 Also fixed TLS tests swapping addr & value in store op Differential Revision: https://reviews.llvm.org/D106096	2021-07-19 10:22:43 -07:00
Craig Topper	fa61017019	[SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend. RISCV would prefer a sign extended constant since that works better with our constant materialization. We have an existing TLI hook we use to control sign extension of setcc operands in type legalization. That hook happens to do the right check we need here, but might be straying from its original purpose. With only RISCV defining this hook in tree, I wasn't sure if it was worth adding another hook with identical behavior. This is an alternative to D105785 where I tried to handle this in the RISCV backend by not creating ANY_EXTENDs in some places. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105918	2021-07-19 09:25:28 -07:00
Simon Pilgrim	5157b0c187	[X86] Fix case of IsAfterLegalize argument. NFC. Pulled out of D106280	2021-07-19 17:15:28 +01:00
David Green	2fbf734dc0	[ARM] Remove PromotedBitwiseVT for NEON types This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32, treating them the same as the AArch64 and MVE backends where we just add the relevant patterns for each legal type. This prevents a lot of bitcasts from being added to the DAG, which have the potential to make optimizations more difficult. It does mean adding extra patterns, and some codegen can change due to the types now being legal, not promoted. Differential Revision: https://reviews.llvm.org/D105588	2021-07-19 16:36:33 +01:00
Matt Arsenault	2b17cee2de	AArch64/GlobalISel: Cleanup unnecessary size checks in call lowering The CCValAssign types should now be accurate, so these are no longer necessary.	2021-07-19 11:01:30 -04:00
Jeremy Morse	6df259fb43	[InstrRef][X86] Drop debug instruction numbers from x87 instructions Avoid a crash when using instruction referencing if x87 floating point instructions are used. These instructions are significantly mutated when they're rewritten from referring to registers, to referring to floating-point-stack positions. As a result, their operands are re-ordered, and (InstrRef) LiveDebugValues asserts when it sees a DBG_INSTR_REF referring to a non-reg non-def register operand. To fix this, drop the instruction numbers, and thus variable locations. This patch adds a helper utility do do that. Dropping the variable locations is sub-optimal, but applying DBG_VALUEs to the $fp0 and similar registers is dropped on emission too. It seems we've never done well at describing variables that live in x87 registers, at all. Differential Revision: https://reviews.llvm.org/D105657	2021-07-19 15:08:27 +01:00
Jay Foad	0888947b95	[AMDGPU] Fix typo in comments idexen -> idxen	2021-07-19 13:39:30 +01:00
Kazushi (Jam) Marukawa	b3be1e45dd	[VE] Set getExtendForAtomicOps to ISD::ANY_EXTEND The implementation of subword atomics does not actually guarantee the result is zero-extended, which now caused failures after https://reviews.llvm.org/D101342 was landed. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D106225	2021-07-19 19:58:44 +09:00
Kazushi (Jam) Marukawa	92e22e34b8	[VE] Disable relative lookup table converter pass for VE VE's linker, /opt/nec/ve/bin/nld, doesn't implement relative lookup table. The relative lookup table is introduced by https://reviews.llvm.org/D94355, but we need to disable it at the moment. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D106224	2021-07-19 19:25:33 +09:00
Florian Mayer	070bb38c2e	[NFC] [MTE] helper for stack tagging lifetimes. Reviewed By: eugenis, vitalybuka Differential Revision: https://reviews.llvm.org/D106135	2021-07-19 11:09:16 +01:00
Cullen Rhodes	fa9e7701df	[AArch64][SME] Add SVE2 instructions added in SME This patch adds support for the following instructions: SCLAMP, UCLAMP, REV, DUP (predicate) The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D105577	2021-07-19 08:03:05 +00:00
David Green	48eaab481d	[ARM] Extend more reductions during lowering This relaxes the VMLAV and VADDV reduction recognition code to handle smaller than legal types, extending them as needed. That was already handled for some reductions, this extends it to more types in a more generic way. If a smaller than legal value is found it is extended to the legal type as needed. Differential Revision: https://reviews.llvm.org/D106051	2021-07-19 08:58:03 +01:00
Sander de Smalen	14bad329d1	[AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors. The case for nxv2f32/nxv2i32 was already covered by D104573. This patch builds on top of that by making the mechanism work for nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106138	2021-07-19 08:29:28 +01:00
Eli Friedman	5fd061997c	[X86] Remove incorrect use of known bits in shuffle simplification. This reverts commit 2a419a0b9957ebac9e11e4b43bc9fbe42a9207df. The result of a shufflevector must not propagate poison from any element other than the one noted in the shuffle mask. The regressions outside of fptoui-may-overflow.ll can probably be recovered some other way; for example, using isGuaranteedNotToBePoison. See discussion on https://reviews.llvm.org/D106053 for more background. Differential Revision: https://reviews.llvm.org/D106222	2021-07-18 18:13:11 -07:00
Simon Pilgrim	1d449cdfe0	[X86][SSE] matchShuffleWithPACK - avoid poison pollution from bitcasting multiple elements together. D106053 exposed that we've not been taking into account that by bitcasting smaller elements together and then performing a ComputeKnownBits on the result we'd be allowing a poison element to influence other neighbouring elements being used in the pack. Instead we now peek through any existing bitcast to ensure that the source type already matches the width source of the pack node we're trying to match. This has also been a chance to stop matchShuffleWithPACK creating unused nodes on the fly which could affect oneuse tests during shuffle lowering/combining. The only regression we're seeing is due to being unable to peek through a bitcast as its on the other side of a extract_subvector - which should go away once we finally allow shuffle combining across different vector widths (by making matchShuffleWithPACK using const SelectionDAG& we've gotten closer to this - see PR45974).	2021-07-18 14:25:28 +01:00
Jon Roelofs	970c655f5e	[AArch64][GlobalISel] Legalize bswap <2 x i16> Differential revision: https://reviews.llvm.org/D105935	2021-07-17 15:31:15 -07:00
David Green	cd0a482d93	[ARM] Lower non-extended small gathers via truncated gathers. Corollary to 1113e06821e6baffc84b8caf96a28bf62e6d28dc this allows us to match gather that dont produce a full vector width results. They use an extended gather which is truncated back to the original type.	2021-07-17 22:38:31 +01:00
Eli Friedman	d7fd0583bf	[AArch64] Prepare for changes to STEP_VECTOR. Rewrite patterns to assume that the operand of STEP_VECTOR is a constant. The old patterns will stop working when the operand is changed from a Constant to a TargetConstant. (See D105673.) Add test coverage for certain patterns that weren't exercised by existing regression tests. Differential Revision: https://reviews.llvm.org/D105847	2021-07-17 14:13:41 -07:00
Nikita Popov	17b52c557b	[OpaquePtr] Remove uses of CreateGEP() without element type Remove uses of to-be-deprecated API. In cases where the correct element type was not immediately obvious to me, fall back to explicit getPointerElementType().	2021-07-17 22:56:27 +02:00
Craig Topper	f02d48e3e1	[RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8.	2021-07-17 11:24:20 -07:00
Nikita Popov	2c50f2c7e2	[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type wasn't immediately obvious to me.	2021-07-17 18:32:36 +02:00
Nikita Popov	2e74b1c954	[OpaquePtr] Remove uses of CreateConstGEP1_64() without element type Remove uses of to-be-deprecated API.	2021-07-17 16:43:20 +02:00
Nikita Popov	dd3e030cca	[BPF] Use elementtype attribute for preserve.array/struct.index intrinsics Use the elementtype attribute introduced in D105407 for the llvm.preserve.array/struct.index intrinsics. It carries the element type of the GEP these intrinsics effectively encode. This patch: * Adds a verifier check that the attribute is required. * Adds it in the IRBuilder methods for these intrinsics. * Autoupgrades old bitcode without the attribute. * Updates the lowering code to use the attribute rather than the pointer element type. * Updates lots of tests to specify the attribute. * Adds -force-opaque-pointers to the intrinsic-array.ll test to demonstrate they work now. https://reviews.llvm.org/D106184	2021-07-17 11:09:18 +02:00
Craig Topper	b2f709390a	[RISCV] Manually emit the best shift for VSCALE lowering to improve codegen. We assume VLENB is a multiple of 8 and previously relied on shift pairs being optimized to an AND+SHL/SHR and computeKnownBits removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd to have multiple uses. This patch manually emits the best shift to workaround this.	2021-07-17 00:52:07 -07:00
jacquesguan	fa273ae7c0	[RISCV] Make VLEN no greater than 65536 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106134	2021-07-17 12:47:46 +08:00
Carl Ritson	9ee7bab63e	[AMDGPU] Tidy SReg/SGPR definitions using template class Use a multiclass to consistently define SReg/SGPR/TTMP register classes. Add missing TTMP registers for 96b, 160b, 192b, 224b. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105800	2021-07-17 11:26:46 +09:00
Matt Arsenault	acc2bc5604	Mips/GlobalISel: Remove leftover dead code	2021-07-16 20:20:55 -04:00
David Green	37fc7e01a1	[ARM] Fix for matching reductions that are both sext and zext. Fix a silly mistake that was not making sure that _both_ operands were the correct extend code.	2021-07-16 23:11:42 +01:00
Nemanja Ivanovic	b425bc6346	[PowerPC] Implement intrinsics for mtfsf[i] This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`). The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands. Reviewed By: #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D105957	2021-07-16 16:26:11 -05:00
Jon Roelofs	1b0137f82a	[RISCV] Compose vector subregs hierarchically This fixes the test I broke in: https://reviews.llvm.org/D105953#2883579 Differential revision: https://reviews.llvm.org/D106168	2021-07-16 12:32:13 -07:00
Simon Pilgrim	6a5b1b579f	[X86][SSE] combineX86ShufflesRecursively - bail if constant folding fails due to oneuse limits. Fixes issue reported on D105827 where a single shuffle of a constant (with multiple uses) was caught in an infinite loop where one shuffle (UNPCKL) used an undef arg but then that got recombined to SHUFPS as the constant value had its own undef that confused matching.....	2021-07-16 19:21:46 +01:00
Lei Huang	a6ca36648b	[PowerPC] Implement XL compact math builtins Implement a subset of builtins required for compatiblilty with AIX XL compiler. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105930	2021-07-16 13:21:13 -05:00
Craig Topper	ed610623c2	[RISCV] Use tail agnostic policy for fixed vector vwmacc(u). This adds new pseudoinstructions with ForceTailAgnostic set. This matches what we did for non-widening VMACC. We should move to a tail policy operand on the pseudos when we expand the intrinsic interface to include the tail policy.	2021-07-16 10:41:09 -07:00
Craig Topper	290e015b4f	[RISCV] Refactor where in the multiclass hierarchy we add commutable VFMADD/VFMACC instructions. NFC I'm preparing to add tail agnostic versions of VWMACC and VFWMACC so this will make them more consistent.	2021-07-16 10:41:09 -07:00
Guozhi Wei	7d6ba24baf	[X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D104684	2021-07-16 10:16:03 -07:00
Craig Topper	84de2a93a3	[RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions. If the upper 32 bits are zero and bit 31 is set, we might be able to use zext.w to fill in the zeros after using an lui and/or addi. Most of this patch is plumbing the subtarget features into the constant materialization. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105509	2021-07-16 09:35:56 -07:00
Craig Topper	ad9e84d420	[RISCV] Add curly braces around a case body that declares variables. NFC This is at the end of the switch so doesn't cause any issues now, but if a new case is added it will break.	2021-07-16 09:35:56 -07:00
Matt Arsenault	7e3f114325	Mips/GlobalISel: Use LLT form of getMachineMemOperand NFC here since it's just using a scalar anyway.	2021-07-16 11:41:32 -04:00
Masoud Ataei	33e29ff74a	[PowerPC] Updated the error message of MASSV pass to mention vectorization is needed be enable on P8 and later targets. Differential Revision: https://reviews.llvm.org/D106091	2021-07-16 14:45:09 +00:00
Amy Kwan	9f4ca39c11	[PowerPC] Update Refactored Load/Store Implementation, XForm VSX Patterns, and Tests This patch includes the following updates to the load/store refactoring effort introduced in D93370: - Update various VSX patterns that use to "force" an XForm, to instead just XForm. This allows the ability for the patterns to compute the most optimal addressing mode (and to produce a DForm instruction when possible) - Update pattern and test case for the LXVD2X/STXVD2X intrinsics - Update LIT test cases that use to use the XForm instruction to use the DForm instruction Differential Revision: https://reviews.llvm.org/D95115	2021-07-16 09:28:48 -05:00
Fraser Cormack	b8f7435c4c	Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID" This reverts commit a6ca88e908b5befcd9b0f8c8cb40f53095cc17bc. More caution is required to avoid overflow/underflow. Thanks to the santizers for catching this.	2021-07-16 15:00:20 +01:00
Matt Arsenault	f27b93051e	AMDGPU/GlobalISel: Preserve more memory types	2021-07-16 08:57:26 -04:00
Matt Arsenault	dc97583234	AMDGPU/GlobalISel: Redo kernel argument load handling This avoids relying on G_EXTRACT on unusual types, and also properly decomposes structs into multiple registers. This also preserves the LLTs in the memory operands.	2021-07-16 08:56:54 -04:00
Dmitry Preobrazhensky	aa2ec7ce25	[AMDGPU][MC] Added missing isCall/isBranch flags Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072	2021-07-16 14:59:10 +03:00
Nicholas Guy	340bf152dd	[AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882	2021-07-16 12:00:57 +01:00
Cullen Rhodes	3f31269929	[AArch64][SME] Add load and store instructions This patch adds support for following contiguous load and store instructions: * LD1B, LD1H, LD1W, LD1D, LD1Q * ST1B, ST1H, ST1W, ST1D, ST1Q A new register class and operand is added for the 32-bit vector select register W12-W15. The differences in the following tests which have been re-generated are caused by the introduction of this register class: * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir D88663 attempts to resolve the issue with the store pair test differences in the AArch64 load/store optimizer. The GlobalISel differences are caused by changes in the enum values of register classes, tests have been updated with the new values. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105572	2021-07-16 10:11:10 +00:00
Fraser Cormack	28d982df90	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-16 10:35:13 +01:00
Mehdi Amini	7d809bb14e	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 07:38:16 +00:00
Mehdi Amini	b708f244c7	Revert "Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer" This reverts commit af9321739b20becf170e6bb5060b8d780e1dc8dd. Still some specific config broken in some way that requires more investigation.	2021-07-16 07:35:13 +00:00
Mehdi Amini	64ec18abb6	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 06:54:26 +00:00
Mehdi Amini	0fd38b8415	Revert "Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer" This reverts commit 42f588f39c5ce6f521e3709b8871d1fdd076292f. Broke some buildbots	2021-07-16 03:46:53 +00:00
Mehdi Amini	a9a8a9a361	Use ManagedStatic and lazy initialization of cl::opt in libSupport to make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959	2021-07-16 03:33:20 +00:00
Matt Arsenault	240dff7427	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Victor Huang	61ce66a632	[PowerPC] Add PowerPC population count, reversed load and store related builtins and instrinsics for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and instrisics for population count, reversed load and store related operations. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D106021	2021-07-15 17:23:56 -05:00
Harald van Dijk	f675df37ba	[X86] Fix handling of maskmovdqu in X32 The maskmovdqu instruction is an odd one: it has a 32-bit and a 64-bit variant, the former using EDI, the latter RDI, but the use of the register is implicit. In 64-bit mode, a 0x67 prefix can be used to get the version using EDI, but there is no way to express this in assembly in a single instruction, the only way is with an explicit addr32. This change adds support for the instruction. When generating assembly text, that explicit addr32 will be added. When not generating assembly text, it will be kept as a single instruction and will be emitted with that 0x67 prefix. When parsing assembly text, it will be re-parsed as ADDR32 followed by MASKMOVDQU64, which still results in the correct bytes when converted to machine code. The same applies to vmaskmovdqu as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103427	2021-07-15 22:56:08 +01:00
Jessica Paquette	9f8303a62c	[AArch64][GlobalISel] Clamp <n x p0> vecs when legalizing G_EXTRACT_VECTOR_ELT This case was missing from G_EXTRACT_VECTOR_ELT. It's the same as for s64. https://godbolt.org/z/Tnq4acY8z Differential Revision: https://reviews.llvm.org/D105952	2021-07-15 14:05:28 -07:00
Artem Belevich	0226799a56	[NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction. That should allow clang to compile mma.h from CUDA-11.3. Differential Revision: https://reviews.llvm.org/D105384	2021-07-15 12:02:09 -07:00
Sushma Unnibhavi	f7010a4bb7	[M68k][GloballSel] LegalizerInfo implementation Added rules for G_ADD, G_SUB, G_MUL, G_UDIV to be legal. Differential Revision: https://reviews.llvm.org/D105536	2021-07-15 13:00:43 -06:00
Sam Tebbs	d16f1096a9	[ARM][LowOverheadLoops] Make some stack spills valid for tail predication This patch makes vector spills valid for tail predication when all loads from the same stack slot are within the loop Differential Revision: https://reviews.llvm.org/D105443	2021-07-15 19:23:52 +01:00
Quinn Pham	ba35dd5a19	[PowerPC] Fix popcntb XL Compat Builtin for 32bit This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins. Reviewed By: #powerpc, nemanjai, amyk Differential Revision: https://reviews.llvm.org/D105360	2021-07-15 13:19:47 -05:00
Stanislav Mekhanoshin	5c27957af8	[AMDGPU] Refine -O0 and -O1 passes. Differential Revision: https://reviews.llvm.org/D105579	2021-07-15 09:51:54 -07:00
David Green	47a6aacab4	[ARM] Expand types handled in VQDMULH recognition We have a DAG combine for recognizing the sequence of nodes that make up an MVE VQDMULH, but only currently handles specifically legal types. This patch expands that to other power-2 vector types. For smaller than legal types this means any_extending the type and casting it to a legal type, using a VQDMULH where we only use some of the lanes. The result is sign extended back to the original type, to properly set the invalid lanes. Larger than legal types are split into chunks with extracts and concat back together. Differential Revision: https://reviews.llvm.org/D105814	2021-07-15 14:47:53 +01:00
Simon Pilgrim	87679ebe78	[TTI] Consistently make getMinVectorRegisterBitWidth() methods const. NFCI. The underlying getMinVectorRegisterBitWidth() methods are const, but it was missed in a couple of TargetTransformInfo wrappers. Noticed while working on D103925	2021-07-15 13:27:55 +01:00
Irina Dobrescu	16f19bf26e	[AArch64][GlobalISel] Optimise lowering for some vector types for min/max Differential Revision: https://reviews.llvm.org/D105696	2021-07-15 11:34:32 +01:00
Sebastian Neubauer	b992832f74	[AMDGPU] Use isMetaInstruction for instruction size Meta instructions have a size of 0. Use isMetaInstruction instead of listing them explicitly. Differential Revision: https://reviews.llvm.org/D106043	2021-07-15 12:23:11 +02:00
Cullen Rhodes	c55c74c634	[AArch64][SME] Add outer product instructions This patch adds support for the following outer product instructions: * BFMOPA, BFMOPS, FMOPA, FMOPS, SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, USMOPS. Depends on D105570. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105571	2021-07-15 09:51:06 +00:00
Bogdan Graur	baec013412	Fixes memory sanitizer 'use-of-uninitialized-value' diagnostic. Differential Revision: https://reviews.llvm.org/D106047	2021-07-15 11:17:04 +02:00
Kai Luo	bb52bc77a5	[PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D103614	2021-07-15 01:12:09 +00:00
Thomas Lively	3c50e4a7a7	[WebAssembly] Codegen for v128.storeX_lane instructions Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50435. Differential Revision: https://reviews.llvm.org/D106019	2021-07-14 16:15:25 -07:00
Jon Roelofs	ffc7470172	[AArch64] Fix selection of G_UNMERGE <2 x s16> Differential revision: https://reviews.llvm.org/D106007	2021-07-14 13:40:56 -07:00
Stanislav Mekhanoshin	1dae83cabb	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
David Green	5da5b28644	[ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y) For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we have an add of an existing VMLALVA, this patch pushes the add up above the VMLALVA so that it may potentially be simplified further, for example being folded into another VMLALV. Differential Revision: https://reviews.llvm.org/D105686	2021-07-14 20:06:49 +01:00
Eli Friedman	0af449d2a7	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00

1 2 3 4 5 ...

63656 Commits