llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 04:02:41 +01:00

Author	SHA1	Message	Date
Craig Topper	a7bbefe73d	[LegalizeTypes][VE] Don't Expand BITREVERSE/BSWAP during type legalization promotion if they will be promoted for NVT in op legalization. We were trying to expand these if they were going to be expanded in op legalization so that we generated the minimum number of operations. We failed to take into account that NVT could be promoted to another legal type in op legalization. Hoping this fixes the issue on the VE target reported as a follow up to D96681. The check line changes were taken from before 1e46b6f4012399a2fef5fbbb4ed06fc919835414 so this patch does appear to improve some cases that had previously regressed.	2021-06-29 11:00:11 -07:00
Jeremy Morse	cc491fbe18	Catch an extremely obvious memory leak, thanks asan https://lab.llvm.org/buildbot/#/builders/5/builds/9208 (dbg-phis-merging-in-ldv.mir and dbg-phis-with-loops.mir in the asan check stage)	2021-06-29 15:47:17 +01:00
Jeremy Morse	62063d6d86	[DebugInstrRef][3/3] Follow DBG_PHI instructions through LiveDebugValues This patch reads machine value numbers from DBG_PHI instructions (marking where SSA PHIs used to be), and matches them up with DBG_INSTR_REF instructions that refer to them. Essentially they are two separate parts of a DBG_VALUE: the place to read the value (register and program position), and where the variable is assigned that value. Sometimes these DBG_PHIs can be duplicated, usually by tail duplication. This corresponds to the SSA structure of the program being destroyed, and the original PHI being split. When this happens: run LLVMs standard SSAUpdater utility, to work out what values should appear in which blocks. The majority of this patch is boilerplate to make use of SSAUpdater. If there are any additional PHIs on the path between multiple DBG_PHIs and their using DBG_INSTR_REF, their existance is validated, just in case a value gets clobbered along the way (see dbg-phis-with-loops.mir for several examples). Differential Revision: https://reviews.llvm.org/D86814	2021-06-29 14:45:13 +01:00
Michael Liao	58a44149ab	[MIRParser] Add machine metadata. - Add standalone metadata parsing support so that machine metadata nodes could be populated before and accessed during MIR is parsed. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103282	2021-06-28 22:29:36 -04:00
Scott Linder	47e3a5ca06	[DebugInfo] Enforce implicit constraints on `distinct` MDNodes Add UNIQUED and DISTINCT properties in Metadata.def and use them to implement restrictions on the `distinct` property of MDNodes: * DIExpression can currently be parsed from IR or read from bitcode as `distinct`, but this property is silently dropped when printing to IR. This causes accepted IR to fail to round-trip. As DIExpression appears inline at each use in the canonical form of IR, it cannot actually be `distinct` anyway, as there is no syntax to describe it. * Similarly, DIArgList is conceptually always uniqued. It is currently restricted to only appearing in contexts where there is no syntax for `distinct`, but for consistency it is treated equivalently to DIExpression in this patch. * DICompileUnit is already restricted to always being `distinct`, but along with adding general support for the inverse restriction I went ahead and described this in Metadata.def and updated the parser to be general. Future nodes which have this restriction can share this support. The new UNIQUED property applies to DIExpression and DIArgList, and forbids them to be `distinct`. It also implies they are canonically printed inline at each use, rather than via MDNode ID. The new DISTINCT property applies to DICompileUnit, and requires it to be `distinct`. A potential alternative change is to forbid the non-inline syntax for DIExpression entirely, as is done with DIArgList implicitly by requiring it appear in the context of a function. For example, we would forbid: !named = !{!0} !0 = !DIExpression() Instead we would only accept the equivalent inlined version: !named = !{!DIExpression()} This essentially removes the ability to create a `distinct` DIExpression by construction, as there is no syntax for `distinct` inline. If this patch is accepted as-is, the result would be that the non-canonical version is accepted, but the following would be an error and produce a diagnostic: !named = !{!0} ; error: 'distinct' not allowed for !DIExpression() !0 = distinct !DIExpression() Also update some documentation to consistently use the inline syntax for DIExpression, and to describe the restrictions on `distinct` for nodes where applicable. Reviewed By: StephenTozer, t-tye Differential Revision: https://reviews.llvm.org/D104827	2021-06-28 21:20:04 +00:00
Melanie Blower	423a70f3f3	[llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP optimization at expression level This intrinsic blocks floating point transformations by the optimizer. Author: Pengfei Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn Differential Revision: https://reviews.llvm.org/D99675	2021-06-28 12:26:52 -04:00
Sander de Smalen	a82143ea32	Reland [GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize. This patch relands https://reviews.llvm.org/D104454, but fixes some failing builds on Mac OS which apparently has a different definition for size_t, that caused 'ambiguous operator overload' for the implicit conversion of TypeSize to a scalar value. This reverts commit b732e6c9a8438e5204ac96c8ca76f9b11abf98ff.	2021-06-28 15:24:27 +01:00
Ahsan Saghir	74fd9c1f8e	Teach peephole optimizer to not emit sub-register defs Peephole optimizer should not be introducing sub-reg definitions as they are illegal in machine SSA phase. This patch modifies the optimizer to not emit sub-register definitions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103408	2021-06-28 09:24:07 -05:00
Brendon Cahoon	16a6bb6581	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
David Blaikie	dfefcbf53b	PR37255: DebugInfo: LTO with -g inlined into -gmlt combined with Split DWARF without CU cross-references A combination of features ^ that lead to a mismatch of expectations about how a subprogram definition DIE would be produced with/without a declaration when taking full -g debug info and inlining it into a -gmlt CU - specifically when using Split DWARF that doesn't support cross-CU references, so we have to put the -g debug info into the -gmlt CU, which gets confusing about which mode is respected. This patch comes down on respecting the CU the debug info is emitted into, rather than preserving the full debug info when it's emitted into the gmlt CU.	2021-06-27 14:40:38 -07:00
David Green	6a316cf978	[ISel] Port AArch64 SABD and UABD to DAGCombine This ports the AArch64 SABD and USBD over to DAG Combine, where they can be used by more backends (notably MVE in a follow-up patch). The matching code has changed very little, just to handle legal operations and types differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing a ubds/abdu which is zexted to the original type. Differential Revision: https://reviews.llvm.org/D91937	2021-06-26 19:34:16 +01:00
David Green	20d5a26896	[DAG] Fold neg(splat(neg(x)) -> splat(x) This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can come up in the lowering of right shifts under AArch64, where we generate a shift left of a negated number. Differential Revision: https://reviews.llvm.org/D103755	2021-06-25 19:53:29 +01:00
Hendrik Greving	36b5d3f1af	[ModuloSchedule] Pass loop block explicitly to kernel rewriter. This change is NFC upstream. We pass in the loop's block to the kernel rewriter explicitly, instead of assuming it's the loop's top block. This change is made for downstream targets where this assumption doesn't hold. Differential Revision: https://reviews.llvm.org/D104811	2021-06-25 09:51:22 -07:00
Sander de Smalen	4d07cbe876	Revert "[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize." This patch seems to be causing build errors, reverting it for now. This reverts commit aeab9d9570ac8cb554aff6e1af24a471fdf5b4e5.	2021-06-25 17:37:16 +01:00
Sander de Smalen	9d34fb6e49	[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize. To reflect that the size may be scalable, a TypeSize is returned instead of an unsigned. In places where the result is used, it currently relies on an implicit cast of TypeSize -> uint64_t, which asserts that the type is not scalable. This patch is NFC for fixed-width vectors. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104454	2021-06-25 17:06:50 +01:00
Sander de Smalen	5eab663b62	[GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104453	2021-06-25 15:54:00 +01:00
Sander de Smalen	88c55d538f	[GlobalISel] NFC: Change LLT::scalarOrVector to take ElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104452	2021-06-25 11:26:16 +01:00
Martin Storsjö	9d14adb9f6	[llvm] Rename StringRef _lower() method calls to _insensitive() This is a mechanical change. This actually also renames the similarly named methods in the SmallString class, however these methods don't seem to be used outside of the llvm subproject, so this doesn't break building of the rest of the monorepo.	2021-06-25 00:22:01 +03:00
Craig Topper	c1782c733f	[TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant. We don't constant fold based on demanded bits elsewhere in SimplifyDemandedBits, so I don't think we should shrink them either. The affected ARM test changes because a constant become non-opaque and eventually enabled some constant folding. This no longer happens. I checked and InstCombine is able to simplify this test. I'm not sure exactly what it was trying to test. Reviewed By: lebedev.ri, dmgreen Differential Revision: https://reviews.llvm.org/D104832	2021-06-24 10:09:36 -07:00
Sander de Smalen	ac11cfc716	[GlobalISel] NFC: Change LLT::vector to take ElementCount. This also adds new interfaces for the fixed- and scalable case: * LLT::fixed_vector * LLT::scalable_vector The strategy for migrating to the new interfaces was as follows: * If the new LLT is a (modified) clone of another LLT, taking the same number of elements, then use LLT::vector(OtherTy.getElementCount()) or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2) or operator. That is because there is no reason to specifically restrict the types to 'fixed_vector'. If the algorithm works on the number of elements (as unsigned), then just use fixed_vector. This will need to be fixed up in the future when modifying the algorithm to also work for scalable vectors, and will need then need additional tests to confirm the behaviour works the same for scalable vectors. * If the test used the '/Scalable=/true` flag of LLT::vector, then this is replaced by LLT::scalable_vector. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104451	2021-06-24 11:26:12 +01:00
Stephen Tozer	782c047ef4	Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This is a partial reapply of the original commit and the followup commit that were previously reverted; this reapply also includes a small fix for a potential source of non-determinism, but also has a small change to turn off variadic debug value salvaging, to ensure that any future revert/reapply steps to disable and renable this feature do not risk causing conflicts. Differential Revision: https://reviews.llvm.org/D91722 This reverts commit 386b66b2fc297cda121a3cc8a36887a6ecbcfc68.	2021-06-24 09:46:38 +01:00
Carl Ritson	e6a4177023	[ValueTypes] Define MVTs for v3i64/v3f64 to complement v6i32/v6f32 Having type symmetry with these is somewhat necessary when implementing support for 192-bit values. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104621	2021-06-24 12:41:22 +09:00
modimo	e702056af0	[NFC] [DwarfEHPrepare] Add additional stats for EH Stats added: 1. NumCleanupLandingPadsUnreachable: how many cleanup landing pads were optimized as unreachable 1. NumCleanupLandingPadsRemaining: how many cleanup landing pads remain 1. NumNoUnwind: Number of functions with nounwind attribute 1. NumUnwind: Number of functions with unwind attribute DwarfEHPrepare is always run a single time as part of `TargetPassConfig::addISelPasses()` which makes it an ideal place near the end of the pipeline to record this information. Example output from clang built with exceptions cumulative during thinLTO backend (NumCleanupLandingPadsUnreachable was not incremented): "dwarfehprepare.NumCleanupLandingPadsRemaining": 123660, "dwarfehprepare.NumNoUnwind": 323836, "dwarfehprepare.NumUnwind": 472893, Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104161	2021-06-23 17:09:30 -07:00
Craig Topper	db11347a21	[CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt. This optimization pre-promotes the input and constants for a switch instruction to a legal type so that all the generated compares share the same extend. Since RISCV prefers sext for i32 to i64 extends, we should honor that to use sext.w instead of a pair of shifts. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D104612	2021-06-23 15:38:11 -07:00
Xun Li	3626fc401a	[SjLj] Insert UnregisterFn before musttail call When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104807	2021-06-23 15:33:55 -07:00
Xun Li	99fd766094	Revert "[SjLj] Insert UnregisterFn before musttail call" This reverts commit f36703ada3dc18388ef5cdcbb8f39f74c27ad8e9. Test failure: https://lab.llvm.org/buildbot#builders/104/builds/3450	2021-06-23 15:31:35 -07:00
Xun Li	6c523b4fc3	[SjLj] Insert UnregisterFn before musttail call When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract. Differential Revision: https://reviews.llvm.org/D104807	2021-06-23 14:29:46 -07:00
Jinsong Ji	4cae41ddbe	[DAGCombine] Check reassoc flags in aggressive fsub fusion The is from discussion in https://reviews.llvm.org/D104247#inline-993387 The contract and reassoc flags shouldn't imply each other . All the aggressive fsub fusion reassociate operations, we should guard them with reassoc flag check. Reviewed By: mcberg2017 Differential Revision: https://reviews.llvm.org/D104723	2021-06-23 13:59:40 +00:00
Jon Roelofs	f2b70884ff	[Remarks] Make memsize remarks report as an analysis, not a missed opportunity. Differential revision: https://reviews.llvm.org/D104078	2021-06-22 18:22:47 -07:00
zhijian	3f6513e8c5	[AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table. Summary: generate eh_info when vector registers are saved according to the traceback table. struct eh_info_t { unsigned version; /* EH info version 0 / #if defined(64BIT) char _pad[4]; / padding / #endif unsigned long lsda; / Pointer to Language Specific Data Area / unsigned long personality; / Pointer to the personality routine */ }; the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function Reviewers: Jason Liu Differential Revision: https://reviews.llvm.org/D103651	2021-06-22 13:01:31 -04:00
Fangrui Song	61ffec433f	Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular) Before: `warning: stack size limit exceeded (888) in main` After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned) Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104667	2021-06-22 09:55:20 -07:00
Sander de Smalen	fd053d5ffe	[GlobalISel] Add scalable property to LLT types. This patch aims to add the scalable property to LLT. The rest of the patch-series changes the interfaces to take/return ElementCount and TypeSize, which both have the ability to represent the scalable property. The changes are mostly mechanical and aim to be non-functional changes for fixed-width vectors. For scalable vectors some unit tests have been added, but no effort has been put into making any of the GlobalISel algorithms work with scalable vectors yet. That will be left as future work. The work is split into a series of 5 patches to make reviews easier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D104450	2021-06-22 08:43:34 +01:00
Eli Friedman	0c81356419	Rename MachineMemOperand::getOrdering -> getSuccessOrdering. Since this method can apply to cmpxchg operations, make sure it's clear what value we're actually retrieving. This will help ensure we don't accidentally ignore the failure ordering of cmpxchg in the future. We could potentially introduce a getOrdering() method on AtomicSDNode that asserts the operation isn't cmpxchg, but not sure that's worthwhile. Differential Revision: https://reviews.llvm.org/D103338	2021-06-21 16:49:27 -07:00
Nick Desaulniers	2aca733d9e	[IR] convert warn-stack-size from module flag to fn attr Otherwise, this causes issues when building with LTO for object files that use different values. Link: https://github.com/ClangBuiltLinux/linux/issues/1395 Reviewed By: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D104342	2021-06-21 15:09:25 -07:00
Jinsong Ji	4d8c0e830b	[DAGCombine] reassoc flag shouldn't enable contract According to IR LangRef, the FMF flag: contract Allow floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add). reassoc Allow reassociation transformations for floating-point instructions. This may dramatically change results in floating-point. My understanding is that these two flags shouldn't imply each other, as we might have a SDNode that can be reassociated with others, but not contractble. eg: We may want following fmul/fad/fsub to freely reassoc, but don't want fma being generated here. %F = fmul reassoc double %A, %B ; <double> [#uses=1] %G = fmul reassoc double %C, %D ; <double> [#uses=1] %H = fadd reassoc double %F, %G ; <double> [#uses=1] %I = fsub reassoc double %H, %E ; <double> [#uses=1] Before https://reviews.llvm.org/D45710, `reassoc` flag actually did not imply isContratable either. The current implementation also only check the flag in fadd node, ignoring fmul node, this patch update that as well. Reviewed By: spatel, qiucf Differential Revision: https://reviews.llvm.org/D104247	2021-06-21 21:15:43 +00:00
Hendrik Greving	6abba09064	RegisterCoalescer: Fix iterating through use operands. Fixes a minor bug when trying to iterate through use operands when updating debug use operands. Extends a test to include above. Differential Revision: https://reviews.llvm.org/D104576	2021-06-21 09:17:54 -07:00
Craig Topper	0dab191ced	[TypePromotion] Prune Intrinsic includes. NFC TypePromotion is meant to be a generic pass and doesn't reference any ARM intrinsics so it shouldn't include IntrinsicsARM.h. The other Intrinsic related headers appear to be unneeded as well.	2021-06-20 13:04:02 -07:00
Fangrui Song	2beef4520d	Simplify some typedef struct	2021-06-19 11:36:44 -07:00
Michael Liao	d21f701c76	[MIRPrinter] Add machine metadata support. - Distinct metadata needs generating in the codegen to attach correct AAInfo on the loads/stores after lowering, merging, and other relevant transformations. - This patch adds 'MachhineModuleSlotTracker' to help assign slot numbers to these newly generated unnamed metadata nodes. - To help 'MachhineModuleSlotTracker' track machine metadata, the original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage', which provides basic interfaces to create/retrive metadata slots. In addition, once LLVM IR is processsed, additional hooks are also introduced to help collect machine metadata and assign them slot numbers. - Finally, if there is any such machine metadata, 'MIRPrinter' outputs an additional 'machineMetadataNodes' field containing all the definition of those nodes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103205	2021-06-19 12:48:08 -04:00
Hongtao Yu	7fbb587058	[CSSPGO] Undoing the concept of dangling pseudo probe As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen. I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104477	2021-06-18 15:14:11 -07:00
Simon Pilgrim	ccd7a98f97	[DAG] SelectionDAG::computeKnownBits - use APInt::insertBits to merge subvector knownbits. NFCI. As noticed on D104472 we can use APInt::insertBits which will avoid a lot of temporary APInt creations	2021-06-18 14:59:01 +01:00
Jon Roelofs	921bd72ae7	[GISel] Eliminate redundant bitmasking This was a GISel vs SDAG regression that showed up at -Os on arm64 in: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test https://llvm.godbolt.org/z/aecjodsjG Differential revision: https://reviews.llvm.org/D103334	2021-06-17 12:53:00 -07:00
David Green	751ee64aee	[InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass The Interleave Access pass will convert shuffle(binop(load, load)) to binop(shuffle(load), shuffle(load)), in order to create more interleaving load patterns (VLD2/3/4) that might have been messed up by instcombine. As shown in D104247 we were missing copying IR flags to the new instruction though, which should just be kept the same as the original instruction. Differential Revision: https://reviews.llvm.org/D104255	2021-06-17 09:53:33 +01:00
Bjorn Pettersson	29ffba4b56	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Sushma Unnibhavi	900cbf02d0	[M68k][GloballSel] Adding initial GlobalISel infrastructure Wiring up GlobalISel for the M68k backend Differential Revision: https://reviews.llvm.org/D101819	2021-06-16 10:48:38 -06:00
David Spickett	0a8120a8f5	[llvm][AArch64] Handle arrays of struct properly (from IR) This only applies to FastIsel. GlobalIsel seems to sidestep the issue. This fixes https://bugs.llvm.org/show_bug.cgi?id=46996 One of the things we do in llvm is decide if a type needs consecutive registers. Previously, we just checked if it was an array or not. (plus an SVE specific check that is not changing here) This causes some confusion when you arbitrary IR like: ``` %T1 = type { double, i1 }; define [ 1 x %T1 ] @foo() { entry: ret [ 1 x %T1 ] zeroinitializer } ``` We see it is an array so we call CC_AArch64_Custom_Block which bails out when it sees the i1, a type we don't want to put into a block. This leaves the location of the double in some kind of intermediate state and leads to odd codegen. Which then crashes the backend because it doesn't know how to implement what it's been asked for. You get this: ``` renamable $d0 = FMOVD0 $w0 = COPY killed renamable $d0 ``` Rather than this: ``` $d0 = FMOVD0 $w0 = COPY $wzr ``` The backend knows how to copy 64 bit to 64 bit registers, but not 64 to 32. It can certainly be taught how but the real issue seems to be us even trying to assign a register block in the first place. This change makes the logic of AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters a bit more in depth. If we find an array, also check that all the nested aggregates in that array have a single member type. Then CC_AArch64_Custom_Block's assumption of a type that looks like [ N x type ] will be valid and we get the expected codegen. New tests have been added to exercise these situations. Note that some of the output is not ABI compliant. The aim of this change is to simply handle these situations and not to make our processing of arbitrary IR ABI compliant. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104123	2021-06-16 13:56:01 +00:00
Dylan Fleming	17f5ba8d25	[SVE] Fix PromoteIntRes_TRUNCATE not to call getVectorNumElements Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104115	2021-06-16 13:09:43 +01:00
Rong Xu	022ca8be28	[SampleFDO] Place the discriminator flag variable into the used list. We create flag variable "__llvm_fs_discriminator__" in the binary to indicate that FSAFDO hierarchical discriminators are used. This variable might be GC'ed by the linker since it is not explicitly reference. I initially added the var to the use list in pass MIRFSDiscriminator but it did not work. It turned out the used global list is collected in lowering (before MIR pass) and then emitted in the end of pass pipeline. Here I add the variable to the use list in IR level's AddDiscriminators pass. The machine level code is still keep in the case IR's AddDiscriminators is not invoked. If this is the case, this just use -Wl,--export-dynamic-symbol=__llvm_fs_discriminator__ to force the emit. Differential Revision: https://reviews.llvm.org/D103988	2021-06-15 21:51:04 -07:00
Rong Xu	17842d79ed	Revert "[SampleFDO] Using common linkage for the discriminator flag variable" This reverts commit 434fed5aff5e62460e2e984c7cc2674c12779b1e. Post commit review suggested to use another implmenentation. Detailed can be found in the review.	2021-06-15 21:22:23 -07:00
Rong Xu	42c862d6e9	[SampleFDO] Using common linkage for the discriminator flag variable We create flag variable "__llvm_fs_discriminator__" in the binary to indicate that FSAFDO hierarchical discriminators are used. This variable might be GC'ed by the linker since it is not explicitly reference. I initially added the var to the use list in pass MIRFSDiscriminator but it did not work. It turned out the used global list is collected in lowering (before MIR pass) and then emitted in the end of pass pipeline. In this patch, we use a "common" linkage for this variable so that it will be GC'ed by the linker. Differential Revision: https://reviews.llvm.org/D103988	2021-06-15 14:51:27 -07:00
Roman Lebedev	8401a57c05	[TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0)) Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element), and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element of some vector, then we can just continue traversal into said source vector. This comes up in X86 vector uniform shift lowering. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104250	2021-06-14 23:52:53 +03:00
Saleem Abdulrasool	c92efb7d33	SelectionDAG: repair the Windows build 6e5628354e22f3ca40b04295bac540843b8e6482 regressed the Windows build as the return type no longer matched in both branches for the return value type deduction. This uses a bit more compiler magic to deal with that.	2021-06-14 08:25:36 -07:00
zhijian	ad7e1ecf68	[AIX][XCOFF] emit vector info of traceback table. Summary: emit vector info of traceback table. Reviewers: Jason Liu,Hubert Tong Differential Revision: https://reviews.llvm.org/D93659	2021-06-14 11:15:22 -04:00
Roman Lebedev	7a71822528	[NFC][DAGCombine] Extract getFirstIndexOf() lambda back into a function Not all supported compilers like such lambdas, at least one buildbot is unhappy.	2021-06-14 16:25:59 +03:00
Roman Lebedev	9f4eaf3945	[DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size The sorting, obviously, must be stable, else we will have random assembly fluctuations. Apparently there was no test coverage that would benefit from that, so i've added one test. The sorting consists of two parts - just sort the input vectors, and recompute the shuffle mask -> input vector mapping. I don't believe we need to do anything else. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104187	2021-06-14 16:18:37 +03:00
Jeroen Dobbelaere	c08eaddde6	Intrinsic::getName: require a Module argument Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary. This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288 (as mentioned by @fhahn), after committing D91250. Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99173	2021-06-14 14:52:29 +02:00
RamNalamothu	a2306da6e0	Implement DW_CFA_LLVM_* for Heterogeneous Debugging Add support in MC/MIR for writing/parsing, and DebugInfo. This is part of the Extensions for Heterogeneous Debugging defined at https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html Specifically the CFI instructions implemented here are defined at https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#cfa-definition-instructions Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D76877	2021-06-14 08:51:50 +05:30
Simon Pilgrim	20d33f98a5	RegUsageInfoPropagate.cpp - remove unused <string> and <map> includes. NFCI.	2021-06-13 15:19:24 +01:00
Florian Hahn	f2662d35c8	Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" This reverts commit 1b748faf2bae246e2fc77d88420df13c2e60f4df because it breaks building the llvm-test-suite with -verify-machineinstrs on X86: http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/ Running llc -verify-machineinstr on X86 crashes on the IR below: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" %struct.widget = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8, i32, i32*, i32, i32, i32, i32, i32, %struct.baz, %struct.wobble.1, i32, i32, i32, i32, i32, i32, %struct.quux.2, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32**, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork, %struct.wombat.0, %struct.wobble, i32, i32, i32, i32, i32, i32, i32, i32, i32 (%struct.widget, %struct.eggs), i32, i32, i32, i32 } %struct.snork = type { %struct.spam, %struct.zot, i32 (%struct.wombat, %struct.widget, %struct.snork) } %struct.spam = type { i32, i32, i32, i32, i8, i32 } %struct.zot = type { i32, i32, i32, i32, i32, i8, i32* } %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32, i32), void (%struct.wombat, %struct.widget, %struct.zot)* } %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] } %struct.quux = type { i16, i8 } %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] } %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 } %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1, %struct.wobble.1, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* } %struct.zot.3 = type { i64, i16, i16, i16 } define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr { bb: %tmp = load i32, i32* undef, align 4 %tmp2 = sdiv i32 %tmp, 6 %tmp3 = sdiv i32 undef, 6 %tmp4 = load i32, i32* undef, align 4 %tmp5 = icmp eq i32 %tmp4, 4 %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2 %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0 %tmp8 = zext i16 undef to i32 %tmp9 = zext i16 undef to i32 %tmp10 = load i16, i16* undef, align 2 %tmp11 = zext i16 %tmp10 to i32 %tmp12 = zext i16 undef to i32 %tmp13 = zext i16 undef to i32 %tmp14 = zext i16 undef to i32 %tmp15 = load i16, i16* undef, align 2 %tmp16 = zext i16 %tmp15 to i32 %tmp17 = zext i16 undef to i32 %tmp18 = sub nsw i32 %tmp8, %tmp9 %tmp19 = shl nsw i32 undef, 1 %tmp20 = add nsw i32 %tmp19, %tmp18 %tmp21 = sub nsw i32 %tmp11, %tmp12 %tmp22 = shl nsw i32 undef, 1 %tmp23 = add nsw i32 %tmp22, %tmp21 %tmp24 = sub nsw i32 %tmp13, %tmp14 %tmp25 = shl nsw i32 undef, 1 %tmp26 = add nsw i32 %tmp25, %tmp24 %tmp27 = sub nsw i32 %tmp16, %tmp17 %tmp28 = shl nsw i32 undef, 1 %tmp29 = add nsw i32 %tmp28, %tmp27 %tmp30 = sub nsw i32 %tmp20, %tmp29 %tmp31 = sub nsw i32 %tmp23, %tmp26 %tmp32 = shl nsw i32 %tmp30, 1 %tmp33 = add nsw i32 %tmp32, %tmp31 store i32 %tmp33, i32* undef, align 4 %tmp34 = mul nsw i32 %tmp31, -2 %tmp35 = add nsw i32 %tmp34, %tmp30 store i32 %tmp35, i32* undef, align 4 %tmp36 = select i1 %tmp5, i32 undef, i32 undef br label %bb37 bb37: ; preds = %bb %tmp38 = load i32, i32* undef, align 4 %tmp39 = ashr i32 %tmp38, %tmp6 %tmp40 = load i32, i32* undef, align 4 %tmp41 = sdiv i32 %tmp39, %tmp40 store i32 %tmp41, i32* undef, align 4 ret void }	2021-06-12 11:41:38 +01:00
Arthur Eubanks	5f73af13bd	[NFC][OpaquePtr] Explicitly pass GEP source type in optimizeGatherScatterInst() Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103480	2021-06-11 11:49:59 -07:00
Matt Arsenault	71dc8a693a	GlobalISel: Reduce indentation and remove dead path	2021-06-11 13:45:24 -04:00
Matt Arsenault	f133d28419	CodeGen: Fix missing const	2021-06-11 13:45:24 -04:00
Tomas Matheson	bb84fe4048	[CodeGen][regalloc] Don't align stack slots if the stack can't be realigned Register allocation may spill virtual registers to the stack, which can increase alignment requirements of the stack frame. If the the function did not require stack realignment before register allocation, the registers required to do so may not be reserved/available. This results in a stack frame that requires realignment but can not be realigned. Instead, only increase the alignment of the stack if we are still able to realign. The register SpillAlignment will be ignored if we can't realign, and the backend will be responsible for emitting the correct unaligned loads and stores. This seems to be the assumed behaviour already, e.g. ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot are both `canRealignStack` aware. Differential Revision: https://reviews.llvm.org/D103602	2021-06-11 16:49:12 +01:00
Simon Pilgrim	165132af1b	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Carl Ritson	15d8bd80ce	[ValueTypes] Define MVTs for v6i32, v6f32, v7i32, v7f32 For use in AMDGPU selection DAG. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103881	2021-06-11 08:58:16 +09:00
Carl Ritson	03819dac19	[SDAG] Fix pow2 assumption when splitting vectors When reducing vector builds to shuffles it possible that the DAG combiner may try to extract invalid subvectors. This happens as the existing code assumes vectors will be power of 2 sizes, which is already untrue, but becomes more noticable with v6 and v7 types. Specifically the existing code assumes that half PowerOf2Ceil of a given vector index will fit twice into a given vector. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103880	2021-06-11 08:58:16 +09:00
Wolfgang Pieb	249a0bf5b1	[static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used. Reviewed By: rnk, MaskRay, efriedma Differential Revision: https://reviews.llvm.org/D103495	2021-06-10 16:44:47 -07:00
Nick Desaulniers	e9e1661fa2	[IR] make -warn-frame-size into a module attr -Wframe-larger-than= is an interesting warning; we can't know the frame size until PrologueEpilogueInsertion (PEI); very late in the compilation pipeline. -Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO and needed to be re-specified via -plugin-opt. Instead, make it part of the IR proper as a module level attribute, similar to D103048. Introduce -fwarn-stack-size CC1 option. Reviewed By: rsmith, qcolombet Differential Revision: https://reviews.llvm.org/D103928	2021-06-10 16:15:27 -07:00
David Spickett	17058150e4	Revert "Implementation of global.get/set for reftypes in LLVM IR" This reverts commit 31859f896cf90d64904134ce7b31230f374c3fcc. Causing SVE and RISCV-V test failures on bots.	2021-06-10 10:11:17 +00:00
Paulo Matos	fb9c8fd3dd	Implementation of global.get/set for reftypes in LLVM IR This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D95425	2021-06-10 10:07:45 +02:00
Jinsong Ji	34315a15e8	[AIX] Add traceback ssp canary bit support We will need to set the ssp canary bit in traceback table to communicate with unwinder about the canary. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D103202	2021-06-10 02:40:02 +00:00
Sanjay Patel	56083624ee	[SDAG] fix miscompile from merging stores of different sizes As shown in: https://llvm.org/PR50623 ...and the similar tests here, we were not accounting for store merging of different sizes that do not cover the entire range of the wide value to be stored. This is the easy fix: just make sure that all of the original stores are the same size, so when we calculate the wide width, it's a simple N * M check. This still allows all of the motivating optimizations from: D86420 / 54a5dd485c4d D87112 / 7a06b166b1af We could enhance this code to track individual bytes and allow merging multiple sizes.	2021-06-09 09:51:39 -04:00
Fraser Cormack	cb0fa6245f	[ValueTypes][RISCV] Cap RVV fixed-length vectors by size This patch changes RVV's policy for its supported list of fixed-length vector types by capping by vector size rather than element count. Now all 1024-byte vectors (of supported element types) are supported, rather than all 256-element vectors. This is a more natural fit for the architecture, and allows us to, for example, improve the support for vector bitcasts. This change necessitated the adding of some new simple types to avoid "regressing" on the number of currently-supported vectors. We round out the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and `v512f16`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103884	2021-06-09 12:15:37 +01:00
Matt Arsenault	fca6ba66d2	GlobalISel: Avoid use of G_INSERT in insertParts G_INSERT legalization is incomplete and doesn't work very well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES padding with undef values (although this can get pretty large). For the case of load/store narrowing, this is still performing the load/stores in irregularly sized pieces. It might be cleaner to split this down into equal sized pieces, and rely on load/store merging to optimize it.	2021-06-08 14:44:24 -04:00
Matt Arsenault	96cc3e31f6	GlobalISel: Hide virtual register creation in MIRBuilder	2021-06-08 14:44:24 -04:00
Nick Desaulniers	42052632ff	reland [IR] make -stack-alignment= into a module attr Relands commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13 with fixes for MIPS. Similar to D102742, specifying the stack alignment via CodegenOpts means that this flag gets dropped during LTO, unless the command line is re-specified as a plugin opt. Instead, encode this information as a module level attribute so that we don't have to expose this llvm internal flag when linking the Linux kernel with LTO. Looks like external dependencies might need a fix: * https://github.com/llvm-hs/llvm-hs/issues/345 * https://github.com/halide/Halide/issues/6079 Link: https://github.com/ClangBuiltLinux/linux/issues/1377 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103048	2021-06-08 10:59:46 -07:00
Justin Bogner	521d48da10	[GlobalISel] Handle non-multiples of the base type in narrowScalarAddSub When narrowing G_ADD and G_SUB, handle types that aren't a multiple of the type we're narrowing to. This allows us to handle types like s96 on 64 bit targets. Note that the test here has a couple of dead instructions because of the way the setup legalizes. I wasn't able to come up with a way to write this test that avoids that easily. Differential Revision: https://reviews.llvm.org/D97811	2021-06-08 10:13:38 -07:00
Justin Bogner	5328161b2b	[GlobalISel] Handle non-multiples of the base type in narrowScalarInsert When narrowing G_INSERT, handle types that aren't a multiple of the type we're narrowing to. This comes up if we're narrowing something like an s96 to fit in 64 bit registers and also for non-byte multiple packed types if they come up. This implementation handles these cases by extending the extra bits to the narrow size and truncating the result back to the destination size. Differential Revision: https://reviews.llvm.org/D97791	2021-06-08 10:13:38 -07:00
Simon Pilgrim	1fe60d3f0a	InstrEmitter.cpp - don't dereference a dyn_cast<>. dyn_cast<> can return nullptr which we would then dereference - use cast<> which will assert that the type is correct.	2021-06-08 17:59:04 +01:00
Nick Desaulniers	579f298a64	Revert "[IR] make -stack-alignment= into a module attr" This reverts commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13. Breaks the MIPS build.	2021-06-08 08:55:50 -07:00
Nick Desaulniers	5c936095e3	[IR] make -stack-alignment= into a module attr Similar to D102742, specifying the stack alignment via CodegenOpts means that this flag gets dropped during LTO, unless the command line is re-specified as a plugin opt. Instead, encode this information as a module level attribute so that we don't have to expose this llvm internal flag when linking the Linux kernel with LTO. Looks like external dependencies might need a fix: * https://github.com/llvm-hs/llvm-hs/issues/345 * https://github.com/halide/Halide/issues/6079 Link: https://github.com/ClangBuiltLinux/linux/issues/1377 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103048	2021-06-08 08:31:04 -07:00
Simon Pilgrim	8eda87aacf	[DAG] foldShuffleOfConcatUndefs - ensure shuffles of upper (undef) subvector elements is undef (PR50609) shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y)) If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value. Fixes PR50609	2021-06-08 15:49:41 +01:00
Hans Wennborg	5697956ae9	Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" > This reapplies c0f3dfb9, which was reverted following the discovery of > crashes on linux kernel and chromium builds - these issues have since > been fixed, allowing this patch to re-land. This reverts commit 36ec97f76ac0d8be76fb16ac521f55126766267d. The change caused non-determinism in the compiler, see comments on the code review at https://reviews.llvm.org/D91722. Reverting to unbreak people's builds until that can be addressed. This also reverts the follow-up "[DebugInfo] Limit the number of values that may be referenced by a dbg.value" in a0bd6105d80698c53ceaa64bbe6e3b7e7bbf99ee.	2021-06-08 14:54:08 +02:00
Kerry McLaughlin	d259f6577a	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
David Green	a2815f65e2	[DAG] Allow isNullOrNullSplat to see truncated zeroes This sets the AllowTruncation flag on isConstOrConstSplat in isNullOrNullSplat, allowing it to see truncated constant zeroes on architectures such as AArch64, where only a i32.i64 are legal. As a truncation of 0 is always 0, this should always be valid, allowing some extra folding to happen including some of the cases from D103755. Differential Revision: https://reviews.llvm.org/D103756	2021-06-08 10:18:58 +01:00
Arthur Eubanks	4cc1a31398	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" Needs to be discussed more. This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46 This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1 This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23 This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd	2021-06-07 16:07:44 -07:00
Matt Arsenault	eb4a9081cb	GlobalISel: Use MMO helper for getting the size in bits	2021-06-07 14:26:48 -04:00
Matt Arsenault	81260d0120	GlobalISel: Remove unnecessary .getReg(0)s	2021-06-07 14:26:48 -04:00
Guillaume Chatelet	517914e3e7	[NFC] Fix semantic discrepancy for MVT::LAST_VALUETYPE Differential Revision: https://reviews.llvm.org/D103251	2021-06-07 10:04:16 +00:00
Nikita Popov	6fa17c5cd4	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
Nikita Popov	435f6cc870	[TargetLowering] Move methods out of line (NFC) Move methods using IRBuilder out of line, so we can drop the dependency on the header.	2021-06-06 16:02:10 +02:00
Nikita Popov	a84c0ba160	[CodeGen] Add missing includes (NFC) These currently rely on the IRBuilder.h include in TargetLowering.h. Make them explicit.	2021-06-06 15:48:27 +02:00
Mirko Brkusanin	2afa995d4a	[AMDGPU][GlobalISel] Legalize G_ABS Legalize and select G_ABS so that we can use llvm.abs intrinsic Differential Revision: https://reviews.llvm.org/D102391	2021-06-04 14:46:43 +02:00
Jeremy Morse	f08c337615	Re-land ae4303b42c, "Track PHI values through register coalescing" Was reverted in 0507fc2ffc9, in phi-coalesce-subreg.mir I'd explicitly named some passes to run instead of specifying a range. As a result some two-address-instrs weren't correctly rewritten and the verifier got upset. Original commit message: [DebugInstrRef][2/3] Track PHI values through register coalescing In the instruction referencing variable location model, we store variable locations that point at PHIs in MachineFunction during register allocation. Unfortunately, register coalescing can substantially change the locations of registers, and so that PHI-variable-location side table needs maintenence during the pass. This patch builds an index from the side table, and whenever a vreg gets coalesced into another vreg, update the index to record the new vreg that the PHI happens in. It also accepts a limited range of subregister coalescing, for example merging a subregister into a larger class. Differential Revision: https://reviews.llvm.org/D86813	2021-06-04 11:32:02 +01:00
Fraser Cormack	5ee63b3d02	[SelectionDAG] Extend FoldConstantVectorArithmetic to SPLAT_VECTOR This patch extends the SelectionDAG's ability to constant-fold vector arithmetic to include support for SPLAT_VECTOR. This is not only for scalable-vector types but also for fixed-length vector types, which helps Hexagon in a couple of cases. The original RISC-V test case was in fact an infinite DAGCombine loop. The pattern `and (truncate v1), (truncate v2)` can be combined to `truncate (and v1, v2)` but the truncate can similarly be combined back to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or `v2` is a constant vector). It wasn't exposed in on fixed-length types because a TRUNCATE of a constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR. Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D103246	2021-06-04 09:53:15 +01:00
Esme-Yi	914b5eeb20	[Debug-Info] handle DW_CC_pass_by_value/DW_CC_pass_by_reference under strict DWARF. Summary: When -strict-dwarf=true is specified, the calling convention info DW_CC_pass_by_value or DW_CC_pass_by_reference can only be generated at DWARF5. Reviewed By: shchenz, dblaikie Differential Revision: https://reviews.llvm.org/D103300	2021-06-04 08:14:47 +00:00
Arthur Eubanks	b01ec7e228	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. Issues can be diagnosed with D103412. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-06-03 15:52:01 -07:00
Brendon Cahoon	f5ff020d9c	[GlobalISel] Add G_SBFX/G_UBFX to computeKnownBits Differential Revision: https://reviews.llvm.org/D102969	2021-06-03 16:01:47 -04:00
Eli Friedman	3474ad7aad	[AtomicExpand] Merge cmpxchg success and failure ordering when appropriate. If we're not emitting separate fences for the success/failure cases, we need to pass the merged ordering to the target so it can emit the correct instructions. For the PowerPC testcase, we end up with extra fences, but that seems like an improvement over missing fences. If someone wants to improve that, the PowerPC backed could be taught to emit the fences after isel, instead of depending on fences emitted by AtomicExpand. Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 . Differential Revision: https://reviews.llvm.org/D103342	2021-06-03 11:34:35 -07:00
Nikita Popov	1c866d4e4f	[ADT] Move DenseMapInfo for ArrayRef/StringRef into respective headers (NFC) This is a followup to D103422. The DenseMapInfo implementations for ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h headers, which means that these two headers no longer need to be included by DenseMapInfo.h. This required adding a few additional includes, as many files were relying on various things pulled in by ArrayRef.h. Differential Revision: https://reviews.llvm.org/D103491	2021-06-03 18:34:36 +02:00
Jeremy Morse	5cea2689df	Revert "[DebugInstrRef][2/3] Track PHI values through register coalescing" This reverts commit ae4303b42cfa5c8c14e3fff67d73af2f154aea9a. Expensive checks buildbot has found a problem with this: https://lab.llvm.org/buildbot/#/builders/16/builds/11863	2021-06-03 17:16:58 +01:00
Jeremy Morse	8e310e3825	[DebugInstrRef][2/3] Track PHI values through register coalescing In the instruction referencing variable location model, we store variable locations that point at PHIs in MachineFunction during register allocation. Unfortunately, register coalescing can substantially change the locations of registers, and so that PHI-variable-location side table needs maintenence during the pass. This patch builds an index from the side table, and whenever a vreg gets coalesced into another vreg, update the index to record the new vreg that the PHI happens in. It also accepts a limited range of subregister coalescing, for example merging a subregister into a larger class. Differential Revision: https://reviews.llvm.org/D86813	2021-06-03 17:06:51 +01:00
Fraser Cormack	69bae66509	[CodeGen] Fix a scalable-vector crash in VSELECT legalization The `DAGTypeLegalizer::WidenVSELECTMask` function is not (yet) ready for scalable vector types, and has numerous places in which it tries to grab either the fixed size or number of elements of its types. I believe that it should be possible to update this method to properly account for scalable-vector types, but we don't have test cases for that; RISC-V bails out early on as it has legal i1 vector masks. As such, this patch just prevents it from crashing. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103536	2021-06-03 10:24:55 +01:00
Fraser Cormack	5419f2ccbb	[ValueTypes] Fix scalable-vector changeExtendedVectorTypeToInteger The attached tests check for the regression in DAGCombiner's `visitVSELECT`, which may call this method. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103534	2021-06-03 09:36:56 +01:00
Sanjay Patel	e3cc7798ec	[SDAG] allow cast folding for vector sext-of-setcc with signed compare This extends 434c8e013a2c and ede3982792df to handle signed predicates by sign-extending the setcc operands. This is not shown directly in https://llvm.org/PR50055 , but the pattern is visible by changing the unsigned convert to signed in the source code.	2021-06-02 15:05:02 -04:00
Rong Xu	f505b894a2	[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041	2021-06-02 10:32:52 -07:00
Sanjay Patel	bb75641fb0	[SDAG] allow more cast folding for vector sext-of-setcc This is a follow-up to D103280 that eases the use restrictions, so we can handle the motivating case from: https://llvm.org/PR50055 The loop code is adapted from similar use checks in ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an easier way to filter out non-chain uses of load values. Differential Revision: https://reviews.llvm.org/D103462	2021-06-02 13:14:49 -04:00
Bjorn Pettersson	e9cfe99a84	[CodeGen] Refactor libcall lookups for RTLIB::POWI_* Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_* libcall for a given value type. This includes a small refactoring of ExpandFPLibCall and ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code, plus adding an ExpandFPLibCall version that can be called directly when expanding FPOWI/STRICT_FPOWI to ensure that we actually use the same RTLIB::Libcall when expanding the libcall as we used when checking the legality of such a call by doing a getLibcallName check. Differential Revision: https://reviews.llvm.org/D103050	2021-06-02 11:40:34 +02:00
Bjorn Pettersson	1603844caa	[LegalizeTypes] Avoid promotion of exponent in FPOWI The FPOWI DAG node is normally lowered to a libcall to one of the RTLIB::POWI* runtime functions and the exponent should normally have a type matching sizeof(int) when making the call. Thus, type promotion of the exponent could lead to an FPOWI with a type for the second operand that would be incorrect when doing the libcall (a situation which would be hard to detect post-legalization if we allow such FPOWI nodes). This patch is changing DAGTypeLegalizer::PromoteIntOp_FPOWI to do the rewrite into a libcall directly instead of promoting the operand. This way we can check that the exponent is smaller than sizeof(int) and we can let TargetLowering handle promotion as part of making the libcall. It could be noticed here that makeLibCall has some knowledge about targets such as 64-bit RISCV, for which the libcall argument should be extended to a type larger than sizeof(int). Differential Revision: https://reviews.llvm.org/D102950	2021-06-02 11:40:34 +02:00
Sriraman Tallam	f1e4168f92	Resubmit D85085 after fixing the tests that were failing. D85085 was pushed earlier but broke tests on mac and win: http://lab.llvm.org:8080/green/job/clang-stage1-RA/21182/consoleFull#-706149783d489585b-5106-414a-ac11-3ff90657619c Recommitting it after adding mtriple to the llc commands. Emit correct location lists with basic block sections. This patch addresses multiple things: 1) It ensures that const_value is emitted when possible with basic block sections. 2) It emits location lists such that the labels are always within the section boundary. 3) It fixes a bug when the parameter is first used in a non-entry block which is in a different section from the entry block. Differential Revision: https://reviews.llvm.org/D85085	2021-06-01 21:59:47 -07:00
Daniel Sanders	8c8a0a3167	fixup: Missing operator in [globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one My local compiler was fine with it but the bots complain about ambiguous types.	2021-06-01 13:58:03 -07:00
Daniel Sanders	71a22fb7f8	[globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one It's still in use in a few places so we can't delete it yet but there's not many at this point. Differential Revision: https://reviews.llvm.org/D103352	2021-06-01 13:23:48 -07:00
Jessica Paquette	852a8449e7	[GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width Also add a target hook which allows us to get around custom legalization on AArch64. Differential Revision: https://reviews.llvm.org/D99283	2021-06-01 10:56:17 -07:00
Guozhi Wei	b2dfe60e88	[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D101970	2021-06-01 10:31:30 -07:00
Sanjay Patel	5888c31732	[SDAG] add helper function for sext-of-setcc folds; NFC Try to make this easier to read as noted in D103280	2021-06-01 08:07:17 -04:00
Arthur Eubanks	8f3353aa63	[OpaquePtr] Remove some uses of PointerType::getElementType()	2021-05-31 16:11:25 -07:00
Arthur Eubanks	cfc7e26853	[OpaquePtr] Clean up some uses of Type::getPointerElementType() These depend on pointee types.	2021-05-31 09:54:57 -07:00
Arthur Eubanks	7178161bf7	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. addr-label.ll crashes on ARM due to this change. This is because a ARMConstantPoolConstant containing a BasicBlock to represent a blockaddress may hold an invalid pointer to a BasicBlock if the blockaddress is invalidated by its BasicBlock getting removed. In that case all referencing blockaddresses are RAUW a constant int. Making ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure that's the right fix. As a workaround, create a barrier right before ISel so that IR optimizations can't happen while a ARMConstantPoolConstant has been created. Reviewed By: rnk, MaskRay, compnerd Differential Revision: https://reviews.llvm.org/D99707	2021-05-31 08:32:36 -07:00
Sanjay Patel	4f81b668e8	[SDAG] add check to sext-of-setcc fold to bypass changing a legal op I accidentaly pushed a draft of D103280 that was discussed during the review, but it was not supposed to be the final version. Rather than revert and recommit, I'm updating the existing code. This way we have a record of the codegen diff that would result if we decide to remove this predicate in the future.	2021-05-31 08:58:11 -04:00
Sanjay Patel	dceff7ed2b	[SDAG] try harder to fold casts into vector compare sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) -- (when the zexts are free and a bunch of other conditions) We have a couple of similar folds to this already for vector selects, but this pattern slips through because it is only a setcc. The tests are based on the motivating case from: https://llvm.org/PR50055 ...but we need extra logic to get that example, so I've left that as a TODO for now. Differential Revision: https://reviews.llvm.org/D103280	2021-05-31 07:14:01 -04:00
Djordje Todorovic	c362f0a17e	[LiveDebugVariables] Stop trimming locations of non-inlined vars The D35953, D62650 and D73691 introduced trimming of variables locations in LiveDebugVariables pass, since there are some cases where after the virtregrewrite we have exploded number of DBG_VALUEs created for some inlined variables. As it looks, all problematic cases were regarding inlined variables, so it seems reasonable to stop trimming the location ranges for non-inlined variables. It has very good impact on the llvm-locstats report. Differential Revision: https://reviews.llvm.org/D102917	2021-05-31 02:59:19 -07:00
Florian Hahn	0fab9e3072	[DAGCombine] Poison-prove scalarizeExtractedVectorLoad. extractelement is poison if the index is out-of-bounds, so just scalarizing the load may introduce an out-of-bounds load, which is UB. To avoid introducing new UB, we can mask the index so it only contains valid indices. Fixes PR50382. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D103077	2021-05-30 11:40:55 +01:00
Pengxuan Zheng	e434a8e756	[SafeStack] Use proper API to get stack guard Using the proper API automatically sets `__stack_chk_guard` to `dso_local` if `Reloc::Static`. This wasn't strictly necessary until recently when dso_local was no longer implied by `TargetMachine::shouldAssumeDSOLocal` for `__stack_chk_guard`. By using the proper API, we can avoid generating unnecessary GOT relocations. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102646	2021-05-30 00:52:48 -07:00
Arthur Eubanks	f9c1930dea	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit 1c7f32334d4becc725b9025fd32291a0e5729acd. Some code still needs to properly set parameter ABI attributes, see D101806.	2021-05-29 23:08:15 -07:00
Arthur Eubanks	85767d0682	Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering" This reverts commit bc7d15c61da78864b35e3c114294d6e4db645611. Dependent change is to be reverted.	2021-05-29 22:40:33 -07:00
LemonBoy	f2c842c94a	[AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types. This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D103232	2021-05-29 08:57:27 +02:00
Eli Friedman	1638fc9086	[AArch64][RISCV] Make sure isel correctly honors failure orderings. If a cmpxchg specifies acquire or seq_cst on failure, make sure we generate code consistent with that ordering even if the success ordering is not acquire/seq_cst. At one point, it was ambiguous whether this sort of construct was valid, but the C++ standad and LLVM now accept arbitrary combinations of success/failure orderings. This doesn't address the corresponding issue in AtomicExpand. (This was reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .) Fixes https://bugs.llvm.org/show_bug.cgi?id=50512. Differential Revision: https://reviews.llvm.org/D103284	2021-05-28 12:47:40 -07:00
Craig Topper	22fc6f8fbe	[VP] Make getMaskParamPos/getVectorLengthParamPos return unsigned. Lowercase function names. Parameter positions seem like they should be unsigned. While there, make function names lowercase per coding standards. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103224	2021-05-28 11:28:47 -07:00
Craig Topper	7d56e782b7	[SelectionDAG] Fix typo in assert. NFC	2021-05-28 10:37:11 -07:00
Tim Northover	859ff3505c	SwiftTailCC: teach verifier musttail rules applicable to this CC. SwiftTailCC has a different set of requirements than the C calling convention for a tail call. The exact argument sequence doesn't have to match, but fewer ABI-affecting attributes are allowed. Also make sure the musttail diagnostic triggers if a musttail call isn't actually a tail call.	2021-05-28 11:12:00 +01:00
Amara Emerson	28e85e594b	[AArch64][GlobalISel] Legalize oversize G_EXTRACT_VECTOR_ELT sources. Also changes the fewerElements helper to use the lookthrough constant helper instead of m_ICst, since m_ICst doesn't look through extends. Differential Revision: https://reviews.llvm.org/D103227	2021-05-27 23:52:24 -07:00
Matt Arsenault	6a84ed648e	GlobalISel: Do not change register types in lowerLoad Adjusting the load register type is a widenScalar type action, not a lowering. lowerLoad should be reserved for operations that change the memory access size, such as unaligned load decomposition. With this trying to adjust the register type, it was hard to avoid infinite loops in the legalizer. Adds a bandaid to avoid regressing a few AArch64 tests, but I'm not sure what the exact condition is and there's probably a cleaner way to do this. For AMDGPU this regresses handling of some cases for unaligned loads, but the way this is currently working is a pretty ugly hack.	2021-05-27 11:49:37 -04:00
Nico Weber	7a5ca2da76	Revert "Emit correct location lists with basic block sections." Breaks check-llvm on non-linux, see comments on https://reviews.llvm.org/D85085 This reverts commit caae570978c490a137921b9516162a382831209e and follow-up commit 1546c52d971292ed4145b6d41aaca0d02229ebff.	2021-05-27 11:42:04 -04:00
Matt Arsenault	776d715ebe	VirtRegMap: Preserve LiveDebugVariables This avoids recomputing it between regalloc runs when allocation is split, and also avoids a debug info test regression.	2021-05-27 10:40:14 -04:00
Fraser Cormack	58a6d02787	[VP][SelectionDAG] Add a target-configurable EVL operand type This patch adds a way for the target to configure the type it uses for the explicit vector length operands of VP SDNodes. The type must be a legal integer type (there is still no target-independent legalization of this operand) and must currently be at least as big as i32, the type used by the IR intrinsics. An implicit zero-extension takes place on targets which choose a larger type. All VP nodes should be created with this type used for the EVL operand. This allows 64-bit RISC-V to avoid custom legalization of all VP nodes, keeping them in their target-independent form for that bit longer. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D103027	2021-05-27 15:27:36 +01:00
Fraser Cormack	5a44777de2	[DAGCombine][RISCV] Don't try to trunc-store combined vector stores DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told whether it's to use vector types and also whether it's to issue a truncating store. However, the truncating store code path assumes a scalar integer `ConstantSDNode`, and when using vector types it creates either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which is a constant. The `riscv64` target is able to expose a crash here because it switches on both code paths at the same time. The `f32` is stored as `i32` which must be promoted to `i64`, necessitating a truncating store. It also decides later that it prefers a vector store of `v2f32`. While vector truncating stores are legal, this combine is not able to emit them. We also don't have a test case. This patch adds an assert to catch this case more gracefully, and updates one of the caller functions to the function to turn off the use of truncating stores when preferring vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103173	2021-05-27 14:16:32 +01:00
Fraser Cormack	ab12d83795	[SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs This patch extends the cases in which the legalizer is able to express VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between boolean vector types, the mask itself is an all-ones or all-ones value of the operand type, so a 0/1 boolean type behaves identically to a 0/-1 type. This greatly helps RISC-V which relies on expansion for these nodes. It also allows scalable-vector bool VSELECTs to use the default expansion, where before it would crash in SelectionDAG::UnrollVectorOp. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103147	2021-05-27 10:08:57 +01:00
Amara Emerson	d5383816bc	[GlobalISel] Implement splitting of G_SHUFFLE_VECTOR. Thhis is a port from the DAG legalization. We're still missing some of the canonicalizations of shuffles but it's a start. Differential Revision: https://reviews.llvm.org/D102828	2021-05-27 00:28:38 -07:00
Jessica Paquette	d821abe3ce	[GlobalISel] Don't emit lost debug location remarks when legalizing tail calls There were a bunch of lost debug location remarks that show up when legalizing tail calls on AArch64. This would happen because we drop the return in the block where we emit the tail call. So, we end up dropping the debug location, which makes the LostDebugLocObserver report a missing debug location. Although it's true that we lose these debug locations, this isn't a particularly useful remark. We expect to drop these debug locations when emitting tail calls. Suppressing remarks in this case is preferable, since the amount of noise could hide actual debug location related bugs. To do this, I just plumbed the LostDebugLocObserver through the relevant LegalizerHelper functions. This is the only case I can think of where we need the LostDebugLocObserver in the LegalizerHelper. So, rather than storing it in the LegalizerHelper proper and mucking around with the constructors, I figured it'd be cleanest to take the simplest path for now. This clears up ~20 noisy lost debug location remarks on CTMark in AArch64 at -Os. Differential Revision: https://reviews.llvm.org/D103128	2021-05-26 17:16:11 -07:00
Sriraman Tallam	40b2a440c5	Emit correct location lists with basic block sections. This patch addresses multiple things: 1) It ensures that const_value is emitted when possible with basic block sections. 2) It emits location lists such that the labels are always within the section boundary. 3) It fixes a bug when the parameter is first used in a non-entry block which is in a different section from the entry block. Differential Revision: https://reviews.llvm.org/D85085	2021-05-26 17:12:31 -07:00
Jeremy Morse	2a2a79e361	[DebugInstrRef][1/3] Track PHI values through register allocation This patch introduces "DBG_PHI" instructions, a marker of where a PHI instruction used to be, before PHI elimination. Under the instruction referencing model, we want to know where every value in the function is defined -- and a PHI, even if implicit, is such a place. Just like instruction numbers, we can use this to identify a value to be used as a variable value, but we don't need to know what instruction defines that value, for example: bb1: DBG_PHI $rax, 1 [... more insts ... ] bb2: DBG_INSTR_REF 1, 0, !1234, !DIExpression() This specifies that on entry to bb1, whatever value is in $rax is known as value number one -- and the later DBG_INSTR_REF marks the position where variable !1234 should take on value number one. PHI locations are stored in MachineFunction for the duration of the regalloc phase in the DebugPHIPositions map. The map is populated by PHIElimination, and then flushed back into the instruction stream by virtregrewriter. A small amount of maintenence is needed in LiveDebugVariables to account for registers being split, but only for individual positions, not for entire ranges of blocks. Differential Revision: https://reviews.llvm.org/D86812	2021-05-26 20:24:00 +01:00
Heejin Ahn	3f66a78716	[WebAssembly] Add TargetInstrInfo::getCalleeOperand DwarfDebug unconditionally assumes for all call instructions the 0th operand is the callee operand, which seems to be true for other targets, but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand` method whose default implementation returns `getOperand(0)` and makes WebAssembly overrides it to use its own utility method to get the callee operand. This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was uncovered by this CL. Reviewed By: dschuff, djtodoro Differential Revision: https://reviews.llvm.org/D102978	2021-05-26 11:43:59 -07:00
Jonas Paulsson	0a9439773a	[SystemZ] Support i128 inline asm operands. Support virtual, physical and tied i128 register operands in inline assembly. i128 is on SystemZ not really supported and is not a legal type and generally such a value will be split into two i64 parts. There are however some instructions that require a pair of two GPR64 registers contained in the GR128 bit reg class, which is untyped. For inline assmebly operands, it proved to be very cumbersome to first follow the general behavior of splitting an i128 operand into two parts and then later rebuild the INLINEASM MI to have one GR128 register. Instead, some minor common code changes were made to SelectionDAGBUilder to only create one GR128 register part to begin with. In particular: - getNumRegisters() now has an optional parameter "RegisterVT" which is passed by AddInlineAsmOperands() and GetRegistersForValue(). - The bitcasting in GetRegistersForValue is not performed if RegVT is Untyped. - The RC for a tied use in AddInlineAsmOperands() is now computed either from the tied def (virtual register), or by getMinimalPhysRegClass() (physical register). - InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register class (DstRC) can also be computed for an illegal type. In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and joinRegisterPartsIntoValue() have been implemented to handle i128 operands. Differential Revision: https://reviews.llvm.org/D100788 Review: Ulrich Weigand	2021-05-26 10:08:32 -05:00
Tomas Matheson	4b71dc385f	[MC][NFCI] Factor out ELF section unique ID calculation Precursor to D100944. The logic for determining the unique ID had become quite difficult to reason about, so I have factored this out into a separate function. Differential Revision: https://reviews.llvm.org/D102336	2021-05-26 11:51:29 +01:00
Michael Liao	1d1d6e0b78	[SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics. - When memory intrinsics, such as memcpy, the attached scoped AA metadata is not passed down to the backend. As a result, the backend cannot schedule relevant memory operations around them following that hint. In this patch, SelectionDAG is enhanced to propagate that metadata (scoped AA only) when they are lowered into loads and stores. Differential Revision: https://reviews.llvm.org/D102215	2021-05-25 14:42:26 -04:00
Benjamin Kramer	9d856bcc84	[GlobalISel] Silence unused variable warning in Release builds. NFC.	2021-05-25 10:55:29 +02:00
Amara Emerson	afaf301e48	[GlobalISel] Fix MachineIRBuilder not using the DstOp argument for G_SHUFFLE_VECTOR.	2021-05-25 00:43:26 -07:00
Christudasan Devadasan	022be2495f	AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100726	2021-05-25 10:51:07 +05:30
Jon Roelofs	66a4976b23	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Re-landing now that the crasher this patch previously uncovered has been fixed in: https://reviews.llvm.org/D102935 Differential revision: https://reviews.llvm.org/D102452	2021-05-24 10:10:44 -07:00
David Green	35e013cb3d	[ARM] Allow findLoopPreheader to return headers with multiple loop successors The findLoopPreheader function will currently not find a preheader if it branches to multiple different loop headers. This patch adds an option to relax that, allowing ARMLowOverheadLoops to process more loops successfully. This helps with WhileLoopStart setup instructions that can branch/fallthrough to the low overhead loop and to branch to a separate loop from the same preheader (but I don't believe it is possible for both loops to be low overhead loops). Differential Revision: https://reviews.llvm.org/D102747	2021-05-24 12:22:15 +01:00
Philipp Krones	df7a8b162e	[MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo This makes it possible for targets to define their own MCObjectFileInfo. This MCObjectFileInfo is then used to determine things like section alignment. This is a follow up to D101462 and prepares for the RISCV backend defining the text section alignment depending on the enabled extensions. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101921	2021-05-23 14:15:23 -07:00
LemonBoy	ca1236f15d	[SelectionDAG] Fix argument copy elision with irregular types D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D102153	2021-05-22 09:43:37 +02:00
Nick Desaulniers	753fcd644e	[IR] make stack-protector-guard-* flags into module attrs D88631 added initial support for: - -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset= flags, and D100919 extended these to AArch64. Unfortunately, these flags aren't retained for LTO. Make them module attributes rather than TargetOptions. Link: https://github.com/ClangBuiltLinux/linux/issues/1378 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102742	2021-05-21 15:53:30 -07:00
Matt Arsenault	c2fa720a37	AMDGPU/GlobalISel: Add subtarget to a test SelectionDAG forces us to have a weird ABI for 16-bit values without legal 16-bit operations, but currently GlobalISel bypasses this and sometimes ends up using the gfx8+ ABI in some contexts. Make sure we're testing the normal ABI to avoid a test change in a future patch.	2021-05-21 23:57:38 +09:00
Stephen Tozer	15caaeb8e5	3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reapplies c0f3dfb9, which was reverted following the discovery of crashes on linux kernel and chromium builds - these issues have since been fixed, allowing this patch to re-land. This reverts commit 4397b7095d640f9b9426c4d0135e999c5a1de1c5.	2021-05-21 11:06:20 +01:00
Christudasan Devadasan	44fa7cd53e	GlobalISel: Help reduce operation width for instruction with two results. The function `reduceOperationWidth` helps to legalize a vector operation either by narrowing its type or by scalarizing the operation itself. It currently supports instructions with one result. This patch, in addition allows the same for instructions with two results (for instance, G_SDIVREM). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100725	2021-05-21 10:34:18 +05:30
Serge Pavlov	72fd6b9af9	[APFloat] convertToDouble/Float can work on shorter types Previously APFloat::convertToDouble may be called only for APFloats that were built using double semantics. Other semantics like single precision were not allowed although corresponding numbers could be converted to double without loss of precision. The similar restriction applied to APFloat::convertToFloat. With this change any APFloat that can be precisely represented by double can be handled with convertToDouble. Behavior of convertToFloat was updated similarly. It make the conversion operations more convenient and adds support for formats like half and bfloat. Differential Revision: https://reviews.llvm.org/D102671	2021-05-21 11:02:51 +07:00
Jessica Clarke	8c42ad8897	[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics Unlike normal loads these don't have an extension field, but we know from TargetLowering whether these are sign-extending or zero-extending, and so can optimise away unnecessary extensions. This was noticed on RISC-V, where sign extensions in the calling convention would result in unnecessary explicit extension instructions, but this also fixes some Mips inefficiencies. PowerPC sees churn in the tests as all the zero extensions are only for promoting 32-bit to 64-bit, but these zero extensions are still not optimised away as they should be, likely due to i32 being a legal type. This also simplifies the WebAssembly code somewhat, which currently works around the lack of target-independent combines with some ugly patterns that break once they're optimised away. Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits, where zero-extending atomics were incorrectly returning 0 rather than the (slightly confusing) required return value of 1. Re-landed again after D102819 fixed PowerPC to correctly zero-extend all of its atomics as it claimed to do, since the combination of that bug and this optimisation caused buildbot regressions. Reviewed By: RKSimon, atanasyan Differential Revision: https://reviews.llvm.org/D101342	2021-05-20 20:34:23 +01:00
Jon Roelofs	8ecb13b84d	Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths" This reverts commit 4bf69fb52b3c445ddcef5043c6b292efd14330e0. This broke spec2k6/403.gcc under -global-isel. Details to follow once I've reduced the problem.	2021-05-20 12:19:16 -07:00
Fraser Cormack	a3089bc5eb	[RISCV] Ensure shuffle splat operands are type-legal The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a type-legal splat value as it may implicitly extract a vector element from another shuffle. It is not permitted to introduce an illegal type when lowering shuffles. This patch addresses the crash by adding a boolean flag to `getSplatValue`, defaulting to false, which when set will ensure a type-legal return value. If it is unable to do that it will fail to return a splat value. I've been through the existing uses of `getSplatValue` in other targets and was unable to find a need or test cases showing a need to update their uses. In some cases, the call is made during `LegalizeVectorOps` which may still produce illegal scalar types. In other situations, the illegally-typed splat value may be quickly patched up to a legal type (such as any-extending the returned `extract_vector_elt` up to a legal type) before `LegalizeDAG` notices. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102687	2021-05-20 18:00:03 +01:00
Stephen Tozer	90c1db3c3c	[DebugInfo] Handle DIArgList in FastISel or GlobalIsel Currently, variadic dbg.values (i.e. those using a DIArgList as part of their location) are not handled properly by FastISel or GlobalISel, and will produce invalid DBG_VALUE instructions if they encounter them. This patch fixes this issue by emitting undef DBG_VALUE instructions for variadic dbg.values, so that no incorrect instruction is produced and any prior variable location is terminated. This is simply a quick-fix to prevent errors; a correct implementation should come later for these ISel pipelines to ensure that we do not drop debug information unnecessarily. Differential Revision: https://reviews.llvm.org/D102500	2021-05-20 17:37:28 +01:00
David Sherwood	f86b85d85e	[CodeGen] Add support for widening the result of EXTRACT_SUBVECTOR When trying to return a type such as <vscale x 1 x i32> from a function we crash in DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR when attempting to get the fixed number of elements in the vector. For the simple case we are dealing with, i.e. extracting <vscale x 1 x i32> from index 0 of input vector <vscale x 4 x i32> we can simply rely upon existing code that just returns the input. Differential Revision: https://reviews.llvm.org/D102605	2021-05-20 12:27:08 +01:00
David Sherwood	3bda651940	[CodeGen] Add support for widening INSERT_SUBVECTOR operands When attempting to return something like a <vscale x 1 x i32> type from a function we end up trying to widen the vector by inserting a <vscale x 1 x i32> subvector into an undefined <vscale x 4 x i32> vector. However, during legalisation we then attempt to widen the INSERT_SUBVECTOR operands and hit an error in WidenVectorOperand. This patch adds a new WidenVecOp_INSERT_SUBVECTOR function that currently only supports inserting subvectors into undefined vectors. Differential Revision: https://reviews.llvm.org/D102501	2021-05-20 10:37:03 +01:00
Amara Emerson	e9e3784d95	[GlobalISel] Fix div+rem -> divrem combine causing use-def violation.	2021-05-19 23:13:41 -07:00
Jon Roelofs	ad30e385ed	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Differential revision: https://reviews.llvm.org/D102452	2021-05-19 15:09:18 -07:00
Jessica Paquette	c0b812fa18	Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known" Add missing REQUIRES line to prelegalizer-combiner-icmp-to-true-false-known-bits.	2021-05-19 09:29:19 -07:00
Simon Moll	0dc8431dd3	[VP] make getFunctionalOpcode return an Optional The operation of some VP intrinsics do/will not map to regular instruction opcodes. Returning 'None' seems more intuitive here than 'Instruction::Call'. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102778	2021-05-19 17:08:34 +02:00
Anirudh Prasad	5ae5cfdb6b	[AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1 - This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html \| main RFC here ]]) - This patch in particular introduces HLASM Parser support for Z machine instructions. - The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate. - The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well. The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf \| HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format): ``` <TokA><spaces.><TokB><spaces.><TokC><spaces.*><TokD> ``` 1. TokA is referred to as the Name Entry. This token is optional 2. TokB is referred to as the Operation Entry. This token is mandatory. 3. TokC is referred to as the Operand Entry. This token is mandatory 4. TokD is referred to as the Remarks Entry. This token is optional - If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on. - If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field - TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error. - TokD if provided is taken as is, and emitted as a comment. The following additional approach was examined, but not taken: - Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage. Testing: - This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on. - Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated. Reviewed By: Kai, uweigand Differential Revision: https://reviews.llvm.org/D98276	2021-05-19 11:05:30 -04:00
Simon Pilgrim	3aef61c246	Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" Reports on D101970 indicate this is causing failures on multi-stage compiles.	2021-05-19 15:01:20 +01:00
Nico Weber	a80d7c0abf	Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known" This reverts commit 892497c806306a4b7185ead16d60b0ebcca0a304. Breaks tests, see comments on https://reviews.llvm.org/D102542	2021-05-19 09:02:27 -04:00
Sanjay Patel	151c1a6abb	[SDAG] propagate FMF from target-specific IR intrinsics This is a step towards relying more on node-level FMF rather than function-wide or target settings. I think it was just an oversight that we didn't get this path in D87361 or follow-on patches. The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF. We can't do much more than this currently because we also fail to propagate flags from x86-specific node to generic FMA node. That would be another patch, so the test just verifies that we can transfer from IR to initial SDAG node. Differential Revision: https://reviews.llvm.org/D102725	2021-05-19 07:50:50 -04:00
Tim Northover	d6d3226a57	MachineBasicBlock: add liveout iterator aware of which liveins are defined by the runtime. Using this in RegAlloc fast reduces register pressure, and in some cases allows x86 code to compile that wouldn't before.	2021-05-19 11:00:24 +01:00
Rong Xu	275da5fb80	Fix sanitizer test errors from commit 886629a8 Explictly handle the empty string in the Hash calculation.	2021-05-18 22:46:51 -07:00
Guozhi Wei	5f78ed293a	[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D101970	2021-05-18 18:02:36 -07:00
Rong Xu	8bcc53c7a0	Fix a buildbot failure from commit 886629a8	2021-05-18 16:53:34 -07:00
Rong Xu	4e30d875bf	[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has the following changes: (1) disable current discriminator encoding scheme, (2) new hierarchical discriminator for FSAFDO. For this patch, option "-enable-fs-discriminator=true" turns on the new functionality. Option "-enable-fs-discriminator=false" (the default) keeps the current SampleFDO behavior. When the fs-discriminator is enabled, we insert a flag variable, namely, llvm_fs_discriminator, to the object. This symbol will checked by create_llvm_prof tool, and used to generate a profile with FS-AFDO discriminators enabled. If this happens, for an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. Differential Revision: https://reviews.llvm.org/D102246	2021-05-18 16:23:43 -07:00
Arthur Eubanks	2682845f03	[NFC] Use ArgListEntry indirect types more in ISel lowering For opaque pointers, we're trying to avoid uses of PointerType::getElementType(). A couple of ISel places use PointerType::getElementType(). Some of these are easy to fix by using ArgListEntry's indirect types. The inalloca type wasn't stored there, as opposed to preallocated and byval which have their indirect types available, so add it and use it. This is a reland after an MSan fix in D102667. Differential Revision: https://reviews.llvm.org/D101713	2021-05-18 14:30:22 -07:00
Arthur Eubanks	b8775b2a78	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. This is a reland after fixing MSan issues in D102667. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-05-18 14:30:22 -07:00
Arthur Eubanks	7a1762f190	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Jessica Paquette	71450116c7	[GlobalISel] Simplify G_ICMP to true/false when the result is known Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs. E.g. x == x -> true x != x -> false load(x) > 1 -> true (when the load is known to be greater than 1) And so on. Differential Revision: https://reviews.llvm.org/D102542	2021-05-18 09:26:41 -07:00
David Green	cedff971cb	[RDA] Fix printing of regs / reg units. NFC It was printing RegUnits as Regs, leading to much confusion in the debug logs.	2021-05-18 08:07:30 +01:00
Ten Tzen	9ff115e8b2	[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1 This patch is the Part-1 (FE Clang) implementation of HW Exception handling. This new feature adds the support of Hardware Exception for Microsoft Windows SEH (Structured Exception Handling). This is the first step of this project; only X86_64 target is enabled in this patch. Compiler options: For clang-cl.exe, the option is -EHa, the same as MSVC. For clang.exe, the extra option is -fasync-exceptions, plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual. NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change. The rules for C code: For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules: * First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. * Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). * Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. The impact to C++ code: Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process. Design: A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial. This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option. One key element of this design is the ability to compute State number at block-level. Our algorithm is based on the following rationales: A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping into a _try is not allowed. The single entry must start with a seh_try_begin() invoke with a correct State number that is the initial state of the SEME. Through control-flow, state number is propagated into all blocks. Side exits marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[]. Note side exits can ONLY jump into parent scopes (lower state number). Thus, when a block succeeds various states from its predecessors, the lowest State triumphs others. If some exits flow to unreachable, propagation on those paths terminate, not affecting remaining blocks. For CPP code, object lifetime region is usually a SEME as SEH _try. However there is one rare exception: jumping into a lifetime that has Dtor but has no Ctor is warned, but allowed: Warning: jump bypasses variable with a non-trivial destructor In that case, the region is actually a MEME (multiple entry multiple exits). Our solution is to inject a eha_scope_begin() invoke in the side entry block to ensure a correct State. Implementation: Part-1: Clang implementation described below. Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining. Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary. All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small. Part-2 (will be in Part-2 patch): LLVM implementation described below. For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap). For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done. If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address. The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions). Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW exceptions can be passed through. Original llvm-dev [RFC] discussions can be found in these two threads below: https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html Differential Revision: https://reviews.llvm.org/D80344/new/	2021-05-17 22:42:17 -07:00
Serguei Katkov	b6a8f6d36b	[Statepoint Lowering] Cleanup: remove unused option statepoint-always-spill-base.	2021-05-18 12:15:15 +07:00
Nick Desaulniers	e282834e98	[AArch64] Support customizing stack protector guard Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags: 1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0 to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself). Address pr/47341 for aarch64. Fixes: https://github.com/ClangBuiltLinux/linux/issues/289 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen Differential Revision: https://reviews.llvm.org/D100919	2021-05-17 11:49:22 -07:00
Simon Pilgrim	633bafaf5f	[TargetLowering] prepareUREMEqFold/prepareSREMEqFold - account for non legal shift types Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366	2021-05-17 11:03:27 +01:00
Tim Northover	fc5daa6083	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Fraser Cormack	69fc258fc0	[DAGCombiner] Relax an assertion to an early return The select-of-constants transform was asserting that its constant vector inputs did not implicitly truncate their input without that as an explicit precondition to the function. This patch relaxes that assertion into an early return to skip the optimization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102393	2021-05-17 09:15:55 +01:00
Arthur Eubanks	99d811bdc5	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit 16748bd2fb1fe10d7d097961f1988327338f3f9f. Causes https://crbug.com/1209013	2021-05-16 22:02:10 -07:00
Arthur Eubanks	630c86e151	Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering" This reverts commit 85af8a8c1b574faa0d5d57d189ae051debdfada8.	2021-05-16 22:00:54 -07:00
Pan, Tao	f9d8052498	[SelectionDAG] Make fast and linearize visible by clang -pre-RA-sched ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast object file isn't linked into clang executable file as no symbol is referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp, then the ScheduleDAGFast object file will be linked into clang executable file. The static RegisterScheduler will register scheduler fast and linearize at clang boot time. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D101601	2021-05-17 11:25:15 +08:00
David Green	99f589780c	[CPG][ARM] Optimize towards branch on zero in codegenprepare This adds a simple fold into codegenprepare that converts comparison of branches towards comparison with zero if possible. For example: %c = icmp ult %x, 8 br %c, bla, blb %tc = lshr %x, 3 becomes %tc = lshr %x, 3 %c = icmp eq %tc, 0 br %c, bla, blb As a first order approximation, this can reduce the number of instructions needed to perform the branch as the shift is (often) needed anyway. At the moment this does not effect very much, as llvm tends to prefer the opposite form. But it can protect against regressions from commits like rG9423f78240a2. Simple cases of Add and Sub are added along with Shift, equally as the comparison to zero can often be folded with cpsr flags. Differential Revision: https://reviews.llvm.org/D101778	2021-05-16 17:54:06 +01:00
Pengxuan Zheng	3d401a1bf6	Support GCC's -fstack-usage flag This patch adds support for GCC's -fstack-usage flag. With this flag, a stack usage file (i.e., .su file) is generated for each input source file. The format of the stack usage file is also similar to what is used by GCC. For each function defined in the source file, a line with the following information is produced in the .su file. <source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic> "Static" means that the function's frame size is static and the size info is an accurate reflection of the frame size. While "dynamic" means the function's frame size can only be determined at run-time because the function manipulates the stack dynamically (e.g., due to variable size objects). The size info only reflects the size of the fixed size frame objects in this case and therefore is not a reliable measure of the total frame size. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100509	2021-05-15 10:22:49 -07:00
Simon Pilgrim	109645876c	IfConverter::MeetIfcvtSizeLimit - Fix uninitialized variable warnings. NFCI. Ensure the duplication instruction counts are initialized to zero (even though they aren't used) to silence static analysis warnings.	2021-05-15 14:51:54 +01:00
Nikita Popov	d8b6f240a9	[IR] Add BasicBlock::isEntryBlock() (NFC) This is a recurring and somewhat awkward pattern. Add a helper method for it.	2021-05-15 12:41:58 +02:00
Amara Emerson	7c35883ebc	[GlobalISel][CallLowering] Fix crash when handling a v3s32 type that's being passed as v2s64.	2021-05-14 16:30:51 -07:00
Sanjay Patel	1a03ebd2a9	[SDAG] reduce code duplication for extend_vec_inreg combines; NFC These are identical so far, and I was looking at adding a fold for a pattern with scalar_to_vector which would also nd up duplicated.	2021-05-14 08:29:57 -04:00
Tim Northover	5661b7eb80	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
David Spickett	3d7d76ad6f	[llvm][AsmPrinter] Restore source location to register clobber warning Since 5de2d189e6ad466a1f0616195e8c524a4eb3cbc0 this particular warning hasn't had the location of the source file containing the inline assembly. Fix this by reporting via LLVMContext. Which means that we no longer have the "instantiated into assembly here" lines but they were going to point to the start of the inline asm string anyway. This message is already tested via IR in llvm. However we won't have the required location info there so I've added a C file test in clang to cover it. (though strictly, this is testing llvm code) Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D102244	2021-05-14 08:22:57 +00:00
Chen Zheng	4eed210a79	[Debug-Info] change Tag type to dwarf::Tag for createAndAddDIE; NFC Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102207	2021-05-13 21:15:06 -04:00
Chen Zheng	b9ce1812f9	[Debug-Info] make DIE attributes generation under strict DWARF control Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D101024	2021-05-13 20:34:07 -04:00

... 2 3 4 5 6 ...

30976 Commits