llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

Author	SHA1	Message	Date
Steven Wu	0d2a8c0614	[Object][MachO] Handle end iterator in getSymbolType() Fix a bug in MachOObjectFile::getSymbolType() that it is not checking if the iterator is end() before deference the iterator. Instead, return `Other` type, which aligns with the behavior of `llvm-nm`. rdar://75291638 Reviewed By: davide, ab Differential Revision: https://reviews.llvm.org/D98739	2021-03-17 15:06:45 -07:00
David Green	6698320324	[ARM] Add VREV MVE shuffle costs This uses the shuffle mask cost from D98206 to give a better cost of MVE VREV instructions. This helps especially in VectorCombine where the cost of shuffles is used to reorder bitcasts, which this helps keep the phase ordering test for fp16 reductions producing optimal code. The isVREVMask has been moved to a header file to allow it to be used across target transform and isel lowering. Differential Revision: https://reviews.llvm.org/D98210	2021-03-17 21:21:43 +00:00
Arthur Eubanks	57267d06c3	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98805	2021-03-17 13:37:22 -07:00
Ricky Taylor	7965564c8e	[M68k] Forward declare getMCInstrBeads in one place At the moment `getMCInstrBeads` is forward-declared in a few places, bring this together into a single header file. This was done as part of the disassembler work, since the disassembler would otherwise add one more forward declaration. Differential Revision: https://reviews.llvm.org/D98533	2021-03-17 13:31:27 -07:00
Ricky Taylor	762c21ab09	[M68k] Use fixed asm string for MxPseudo instructions This is required because empty strings are not allowed when generating the assembly parser tables. Differential Revision: https://reviews.llvm.org/D98532	2021-03-17 13:31:27 -07:00
Nico Weber	418733b2b4	[lld-link] emit an error when writing a PDB > 4 GiB Maybe there's a way to make them work, but until I've investigated if tools can consume large PDBs, erroring out is better than slowly and silently consuming all available ram due to internal invariants being violated. (Patch to make writing larger files work at https://bugs.chromium.org/p/chromium/issues/detail?id=1179085#c25 but I haven't had time to check if windbg & co can consume these large PDBs. llvm-pdbutil can't, but we can fix that one at least :) ) Differential Revision: https://reviews.llvm.org/D98788	2021-03-17 15:15:08 -04:00
Philip Reames	2b6a185756	[LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC] (Triggered by a review comment on D98728, but otherwise unrelated.)	2021-03-17 12:14:01 -07:00
Craig Topper	be8cccd26d	[RISCV] Use getTargetExtractSubreg and getTargetInsertSubreg to simplify some code. NFCI	2021-03-17 12:10:19 -07:00
Philip Reames	b2328e9cf3	[LICM] Fix a crash when sinking instructions w/token operands It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not. This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well. Differential Revision: https://reviews.llvm.org/D98728	2021-03-17 11:18:46 -07:00
David Green	f5b24f17f1	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Craig Topper	2c438f2a59	[RISCV] Support masked load/store for fixed vectors. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98561	2021-03-17 10:26:15 -07:00
Stephen Tozer	fea97b90a1	Reapply "[DebugInfo] Handle multiple variable location operands in IR" Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit 01ac6d1587e8613ba4278786e8341f8b492ac941.	2021-03-17 16:45:25 +00:00
Bardia Mahjour	487229f50e	[CGSCC] Print CG node itself instead of its address Fix the debug output from cgscc	2021-03-17 12:36:55 -04:00
LemonBoy	f587aff82a	[LoopVectorize] Refine hasIrregularType predicate The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector. The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32. The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D97465	2021-03-17 17:03:47 +01:00
David Green	1f8f3be18e	[ARM] Use lrdsb for more thumb1 loads. Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we don't naturally have a rn, rm addressing mode, we can either generate "ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]". We currently generate the first, always creating a sxth. They are both the same number of instructions, but if we generate the second then the mov #0 will likely be CSE'd or pulled out of a loop, etc. This adjusts the ISel patterns to do that, creating a mov instead of a sxth. Differential Revision: https://reviews.llvm.org/D98693	2021-03-17 15:29:02 +00:00
Alexey Lapshin	da19cde461	[llvm-objcopy][NFC] Move ownership keeping code into restoreStatOnFile(). The D93881 added functionality which preserve ownership for output file if llvm-objcopy is called under root. That code was added into the place where output file is created. The llvm-objcopy already has a function which sets/restores rights/permissions for the output file. That is the restoreStatOnFile() function. This patch moves code (preserving ownershipping) into the restoreStatOnFile() function. Differential Revision: https://reviews.llvm.org/D98511	2021-03-17 17:27:00 +03:00
Hans Wennborg	d0e43622c0	Revert "[DebugInfo] Handle multiple variable location operands in IR" This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit df69c69427dea7f5b3b3a4d4564bc77b0926ec88.	2021-03-17 13:36:48 +01:00
Bradley Smith	85ceade375	[AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE Previously NEON used a target specific intrinsic for frintn, given that the FROUNDEVEN ISD node now exists, move over to that instead and add codegen support for that node for both NEON and fixed length SVE. Differential Revision: https://reviews.llvm.org/D98487	2021-03-17 11:41:22 +00:00
David Green	b0820d90be	[LV] Account for the cost of predication of scalarized load/store This adds the cost of an i1 extract and a branch to the cost in getMemInstScalarizationCost when the instruction is predicated. These predicated loads/store would generate blocks of something like: %c1 = extractelement <4 x i1> %C, i32 1 br i1 %c1, label %if, label %else if: %sa = extractelement <4 x i32> %a, i32 1 %sb = getelementptr inbounds float, float* %pg, i32 %sa %sv = extractelement <4 x float> %x, i32 1 store float %sa, float* %sb, align 4 else: So this increases the cost by the extract and branch. This is probably still too low in many cases due to the cost of all that branching, but there is already an existing hack increasing the cost using useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is a load or there are more than one store. This patch improves the cost for when there is only a single store, and hopefully at some point in the future the hack can be removed. Differential Revision: https://reviews.llvm.org/D98243	2021-03-17 10:57:50 +00:00
Bu Le	b2b1c4104c	[SLP] Fix the trunc instruction insertion problem Current SLP pass has this piece of code that inserts a trunc instruction after the vectorized instruction. In the case that the vectorized instruction is a phi node and not the last phi node in the BB, the trunc instruction will be inserted between two phi nodes, which will trigger verify problem in debug version or unpredictable error in another pass. This patch changes the algorithm to 'if the last vectorized instruction is a phi, insert it after the last phi node in current BB' to fix this problem.	2021-03-17 13:51:08 +03:00
Fraser Cormack	f84a9cd429	[RISCV] Optimize "dominant element" BUILD_VECTORs This patch adds an optimization path for BUILD_VECTOR nodes where the majority of the elements are identical. These can be splatted, with the remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can be tweaked as required - it is currently conservative. Undef elements are disregarded when judging the dominance of a particular element. This allows them to be covered by the splat value. In addition, vectors of 2 elements are always optimized to a splat (for the upper element) and an insert at element zero. This optimization is disabled when optimizing for size. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98700	2021-03-17 10:09:04 +00:00
Jay Foad	9d2225fc35	[AMDGPU] Split dot2-insts feature Split out some of the instructions predicated on the dot2-insts target feature into a new dot7-insts, in preparation for subtargets that have some but not all of these instructions. NFCI. Differential Revision: https://reviews.llvm.org/D98717	2021-03-17 09:42:21 +00:00
Arthur Eubanks	45770e01eb	[Unswitch] Guard dbgs logging with LLVM_DEBUG	2021-03-16 22:31:57 -07:00
Max Kazantsev	c38d6febb5	[BasicAA] Drop dependency on Loop Info. PR43276 BasicAA stores a reference to LoopInfo inside. This imposes an implicit requirement of keeping it up to date whenever we modify the IR (in particular, whenever we modify terminators of blocks that belong to loops). Failing to do so leads to incorrect state of the LoopInfo. Because general AA does not require loop info updates and provides to API to update it properly, the users of AA reasonably assume that there is no need to update the loop info. It may be a reason of bugs, as example in PR43276 shows. This patch drops dependence of BasicAA on LoopInfo to avoid this problem. This may potentially pessimize the result of queries to BasicAA. Differential Revision: https://reviews.llvm.org/D98627 Reviewed By: nikic	2021-03-17 11:43:44 +07:00
Anirudh Prasad	1c455c1c4d	Revert "[AsmParser][SystemZ][z/OS] Reland "Introduce HLASM Comment Syntax"" This reverts commit b605cfb336989705f391d255b7628062d3dfe9c3. Differential Revision: https://reviews.llvm.org/D98744	2021-03-16 18:39:04 -04:00
Zequan Wu	380ae2df93	Revert "[ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()" That commit caused chromium build to crash: https://bugs.chromium.org/p/chromium/issues/detail?id=1188885 This reverts commit edf7004851519464f86b0f641da4d6c9506decb1.	2021-03-16 14:36:21 -07:00
Sanjay Patel	0d0126a35e	[SLP] separate min/max matching from its instruction-level implementation; NFC The motivation is to handle integer min/max reductions independently of whether they are in the current cmp+sel form or the planned intrinsic form. We assumed that min/max included a select instruction, but we can decouple that implementation detail by checking the instructions themselves rather than relying on the recurrence (reduction) type.	2021-03-16 17:16:11 -04:00
Anirudh Prasad	efc20bdcf7	[AsmParser][SystemZ][z/OS] Reland "Introduce HLASM Comment Syntax" - Previously, https://reviews.llvm.org/D97703 was [[ https://reviews.llvm.org/D98543 \| reverted ]] as it broke when building the unit tests when shared libs on. - This patch reverts the "revert" and makes two minor changes - The first is it also links in the MCParser lib when building the unittest. This should resolve the issue when building with with shared libs on and off - The second renames the name of the unit test from `SystemZAsmLexer` to `SystemZAsmLexerTests` since the convention for unittest binaries is to suffix the name of the unit test with "Tests" Reviewed By: Kai Differential Revision: https://reviews.llvm.org/D98666	2021-03-16 17:11:46 -04:00
Roland McGrath	0f93f116e0	[AArch64] Parse "rng" feature flag in .arch directive Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D98566	2021-03-16 14:10:19 -07:00
Mohammad Hadi Jooybar	0efd7ce0e4	[InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages. Here is the original motivation for this patch: ``` #include<stdio.h> #include<malloc.h> typedef struct __Node{ double f; struct __Node next; } Node; void foo () { Node a = (Node) malloc (sizeof(Node)); a->next = NULL; a->f = 11.5f; Node ptr = a; double sum = 0.0f; while (ptr) { sum += ptr->f; ptr = ptr->next; } printf("%f\n", sum); } ``` By explicit assignment `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation . The final IR before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end, label %while.body while.end: ; preds = %while.body, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` Final IR after this patch: ``` ; Function Attrs: nofree nounwind define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { while.end: %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01) ret void } ``` IR before GVN before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` IR before GVN after this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2 %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0 store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.preheader while.body.preheader: ; preds = %entry br label %while.body while.body: ; preds = %while.body.preheader, %while.body %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ] %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %1 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %1 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %2 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %2, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert: ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next ``` to ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0 ``` GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure. This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96881	2021-03-16 17:05:44 -04:00
Ricky Taylor	ad0007c85e	[M68k] Add more specific operand classes This change adds an operand class for each addressing mode, which can then be used as part of the assembler to match instructions. Differential Revision: https://reviews.llvm.org/D98535	2021-03-16 13:37:50 -07:00
Philip Reames	422700f45c	[rs4gc] Simplify code by cloning existing instructions when inserting base chain [NFC] Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently. Differential Revision: https://reviews.llvm.org/D98316	2021-03-16 13:10:32 -07:00
Philip Reames	bedda2fc60	[rs4gc] don't force a conflict for a canonical broadcast A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added. The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier. Differential Revision: https://reviews.llvm.org/D98315	2021-03-16 12:59:06 -07:00
Philip Reames	93763f7273	[rs4gc] don't duplicate existing values which are provably base pointers RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer. This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change. Differential Revision: D98122	2021-03-16 12:51:21 -07:00
Nikita Popov	52b2bd3243	Revert "[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration" This reverts commit d40b4911bd9aca0573752e065f29ddd9aff280e1. This causes a large compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions	2021-03-16 20:41:26 +01:00
Liam Keegan	6c9c7f0375	[MemCpyOpt] Add missing MemorySSAWrapperPass dependency macro Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass, since MemCpyOpt now uses MemorySSA by default. Differential Revision: https://reviews.llvm.org/D98484	2021-03-16 20:30:00 +01:00
Mircea Trofin	d68c0a9cbc	[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges. The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism). Without the change in RegAllocGreedy.cpp, the compiler ices. This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating. I will follow up with a subsequent patch to improve the usability and maintainability of Query. Differential Revision: https://reviews.llvm.org/D98232	2021-03-16 12:10:10 -07:00
Maksym Wezdecki	abe2655fe0	Fix for memory leak reported by Valgrind If llvm so lib is dlopened and dlclosed several times, then memory leak can be observed, reported by Valgrind. This patch fixes the issue. Reviewed By: lattner, dblaikie Differential Revision: https://reviews.llvm.org/D83372	2021-03-16 11:01:31 -07:00
Philip Reames	3be79b1237	[gvn] CSE gc.relocates based on meaning, not spelling (try 2) This was (partially) reverted in cfe8f8e0 because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607. Original commit message follows: The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974	2021-03-16 10:59:31 -07:00
Florian Hahn	e31c38aacf	[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC) Instead of maintaining a separate map from predicated instructions to recipes, we can instead directly look at the VP operands. If the operand comes from a predicated instruction, the operand will be a VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.	2021-03-16 17:40:35 +00:00
Adrian Prantl	7fb4439f85	Support !heapallocsite attachments in StripDebugInfo(). They point into the DIType type system, so they need to be stripped as well. rdar://75341300 Differential Revision: https://reviews.llvm.org/D98668	2021-03-16 10:05:13 -07:00
Adrian Prantl	2a7cba934d	Support !heapallocsite attachments in stripNonLineTableDebugInfo(). They point into the DIType type system, so they need to be stripped as well. rdar://75341300 Differential Revision: https://reviews.llvm.org/D98667	2021-03-16 10:05:12 -07:00
Fangrui Song	7a78fa03a9	[RISCV] Support clang -fpatchable-function-entry && GNU function attribute 'patchable_function_entry' Similar to D72215 (AArch64) and D72220 (x86). ``` % clang -target riscv32 -march=rv64g -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o ... 0000000000000000 <main>: 0: 13 00 00 00 nop 4: 13 00 00 00 nop % clang -target riscv32 -march=rv64gc -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o ... 00000002 <main>: 2: 01 00 nop 4: 01 00 nop ``` Recently the mainline kernel started to use -fpatchable-function-entry=8 for riscv (https://git.kernel.org/linus/afc76b8b80112189b6f11e67e19cf58301944814). Differential Revision: https://reviews.llvm.org/D98610	2021-03-16 10:02:35 -07:00
Simonas Kazlauskas	59b63b74d5	[InstSimplify] Restrict a GEP transform to avoid provenance changes This is a follow-up to D98588, and fixes the inline `FIXME` about a GEP-related simplification not preserving the provenance. https://alive2.llvm.org/ce/z/qbQoAY Additional tests were added in {rGf125f28afdb59eba29d2491dac0dfc0a7bf1b60b} Depends on D98672 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98611	2021-03-16 18:53:05 +02:00
Thomas Preud'homme	efde4985b2	[MemDepAnalysis] Remove redundant comment. Exact same comment is found 2 lines above.	2021-03-16 15:51:17 +00:00
Joe Ellis	859445ea3f	[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible This commit folds sxtw'd or uxtw'd offsets into gather loads where possible with a DAGCombine optimization. As an example, the following code: 1 #include <arm_sve.h> 2 3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) { 4 return svld1sw_gather_s64offset_u64( 5 pred, base, svextw_s64_x(pred, offsets) 6 ); 7 } would previously lower to the following assembly: sxtw z0.d, p0/m, z0.d ld1sw { z0.d }, p0/z, [x0, z0.d] ret but now lowers to: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw] ret Differential Revision: https://reviews.llvm.org/D97858	2021-03-16 15:09:46 +00:00
Max Kazantsev	9f606251f1	[SCEV][NFC] Move check up the stack One of (and primary) callers of isBasicBlockEntryGuardedByCond is isKnownPredicateAt, which makes isKnownPredicate check before it. It already makes non-recursive check inside. So, on this execution path this check is made twice. The only other caller is isLoopEntryGuardedByCond. Moving the check there should save some compile time.	2021-03-16 22:09:17 +07:00
Craig Topper	eeed21c58e	[RISCV] Look through copies when trying to find an implicit def in addVSetVL. The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF before connecting it to the vector instruction. This occurs when constrainRegClass reduces to a class with less than 4 registers. I believe LMUL8 on masked instructions triggers this since the result can only use the v8, v16, or v24 register group as the mask is using v0. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98567	2021-03-16 07:59:09 -07:00
Joe Ellis	cab71b0250	[AArch64][SVEIntrinsicOpts] Factor out redundant SVE mul/fmul intrinsics This commit implements an IR-level optimization to eliminate idempotent SVE mul/fmul intrinsic calls. Currently, the following patterns are captured: fmul pg (dup_x 1.0) V => V mul pg (dup_x 1) V => V fmul pg V (dup_x 1.0) => V mul pg V (dup_x 1) => V fmul pg V (dup v pg 1.0) => V mul pg V (dup v pg 1) => V The result of this commit is that code such as: 1 #include <arm_sve.h> 2 3 svfloat64_t foo(svfloat64_t a) { 4 svbool_t t = svptrue_b64(); 5 svfloat64_t b = svdup_f64(1.0); 6 return svmul_m(t, a, b); 7 } will lower to a nop. This commit does not capture all possibilities; only the simple cases described above. There is still room for further optimisation. Differential Revision: https://reviews.llvm.org/D98033	2021-03-16 14:50:17 +00:00
Craig Topper	a9b14b0ccd	[RISCV] Improve i32 UADDSAT/USUBSAT on RV64. The default promotion uses zero extends that become shifts. We cam use sign extend instead which is better for RISCV. I've used two different implementations based on whether we have minu/maxu instructions. Differential Revision: https://reviews.llvm.org/D98683	2021-03-16 07:44:06 -07:00

1 2 3 4 5 ...

145085 Commits