llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

Author	SHA1	Message	Date
Nikita Popov	b04a55842b	[CGP] Remove unnecessary MaybeAlign use (NFC) Stores now always have an alignment.	2020-06-05 23:18:26 +02:00
Matt Arsenault	8930e8156d	GlobalISel: Add helper for constructing load from offset	2020-06-05 15:06:03 -04:00
Matt Arsenault	56ef93b0b0	GlobalISel: Make known bits/alignment API more consistent Just computing the alignment makes sense without caring about the general known bits, such as for non-integral pointers. Separate the two and start calling into the TargetLowering hooks for frame indexes. Start calling the TargetLowering implementation for FrameIndexes, which improves the AMDGPU matching for stack addressing modes. Also introduce a new hook for returning known alignment of target instructions. For AMDGPU, it would be useful to report the known alignment implied by certain intrinsic calls. Also stop using MaybeAlign.	2020-06-05 14:57:22 -04:00
Nikita Popov	c5a21124e0	[LiveDebugValues] Fix output stream (NFC) This should dump to the provided Out, rather than dbgs(), though they coincide in current usage.	2020-06-05 20:02:22 +02:00
Nikita Popov	b4ce0d5c90	[LiveDebugValues] Remove PendingInLocs (NFC) PendingInLocs ends up having the same value as InLocs, just computed a bit more indirectly. It is a leftover of a previous implementation approach. This patch drops PendingInLocs, as well as the Diff and Removed calulations, which are no longer needed. Differential Revision: https://reviews.llvm.org/D80868	2020-06-05 20:01:29 +02:00
Sander de Smalen	31807863bb	Reland D80640: [CodeGen][SVE] Calculate correct type legalization for scalable vectors. This reverts commit 9bcef270d7a319c6c0fdffc6c80984a8f0a30ecb.	2020-06-05 18:09:31 +01:00
Sander de Smalen	9dac7984eb	Revert "[CodeGen][SVE] Calculate correct type legalization for scalable vectors." Seems to break some buildbots, reverting the patch for now. This reverts commit 164f4b9d26fdf3cd640a09b63b5ec44d033cbe8a.	2020-06-05 16:03:52 +01:00
Sander de Smalen	b11af615da	[CodeGen][SVE] Calculate correct type legalization for scalable vectors. This patch updates TargetLoweringBase::computeRegisterProperties and TargetLoweringBase::getTypeConversion to support scalable vectors, and make the right calls on how to legalise them. These changes are required to legalise both MVTs and EVTs. Reviewers: efriedma, david-arm, ctetreau Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D80640	2020-06-05 15:20:34 +01:00
Denis Antrushin	7209d63282	Fix build breakage caused by 66a1b83bf93ec46f6d7a06c47d5981ae154f9ea0	2020-06-05 15:53:09 +03:00
Denis Antrushin	c8d9beb0e3	[TargetLowering][NFC] More efficient emitPatchpoint(). Current implementation of emitPatchpoint() is very inefficient: for every FrameIndex operand if creates new MachineInstr with that operand expanded and all other copied as is. Since PATCHPOINT/STATEPOINT instructions may have a lot of FrameIndex operands, we end up creating and erasing many machine instructions. But we can do it in single pass, with only one new machine instruction generated. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D81181	2020-06-05 14:57:29 +03:00
Kerry McLaughlin	5e3af5dc50	[CodeGen][SVE] Legalisation of extends with scalable types Summary: This patch adds legalisation of extensions where the operand of the extend is a legal scalable type but the result is not. EXTRACT_SUBVECTOR is used to split the result, before being replaced by target-specific [S\|U]UNPK[HI\|LO] operations. For example: ``` zext <vscale x 16 x i8> %a to <vscale x 16 x i16> ``` should emit: ``` uunpklo z2.h, z0.b uunpkhi z1.h, z0.b ``` Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79587	2020-06-05 12:08:42 +01:00
Philip Reames	d66e1d17e1	[Statepoint] Migrate a few tests to gc-live bundle format and fix assert The assert was missed in 0e7c7705, migrating the test revealed the problem.	2020-06-04 18:15:58 -07:00
Vedant Kumar	5ea9617e1d	[LiveDebugValues] Cache LexicalScopes::getMachineBasicBlocks, NFCI Summary: Cache the results from getMachineBasicBlocks in LexicalScopes to speed up UserValueScopes::dominates queries. This replaces the caching done in UserValueScopes. Compared to the old caching method, this reduces memory traffic when a VarLoc is copied (e.g. when a VarLocMap grows), and enables caching across basic blocks. When compiling sqlite 3.5.7 (CTMark version), this patch reduces the number of calls to getMachineBasicBlocks from 10,207 to 1,093. I also measured a small compile-time reduction (~ 0.1% of total wall time, on average, on my machine). As a drive-by, I made the DebugLoc in UserValueScopes a const reference to cut down on MetadataTracking traffic. Reviewers: jmorse, Orlando, aprantl, nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80957	2020-06-04 16:58:45 -07:00
Matt Arsenault	47a0b3e0e7	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Matt Arsenault	1d204ee0f0	RegAllocFast: Remove dead code	2020-06-04 09:38:31 -04:00
Sanjay Patel	37da3e96e3	[x86] add test/code comment for chain value use (PR46195); NFC	2020-06-04 09:15:17 -04:00
Simon Pilgrim	7f449074ca	[DAG] scalarizeBinOpOfSplats - extract from the source of splat vector (PR46189) D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.	2020-06-04 11:58:59 +01:00
Tim Northover	6f80acd90a	Revert "[DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI" This reverts commit 21dadd774f56778ef68c1ce307205dfbdacc793a. In at least PromoteIntBinOps, they wanted to know about users of all values produced by the node not just the integer being promoted. For example not replacing chain users if the operation was a load breaks the ordering of the DAG.	2020-06-04 11:53:14 +01:00
Madhur Amilkanthwar	4d31a6ef5a	Utility to dump .dot representation of SelectionDAG without firing viewer Summary: This patch adds support for dumping .dot representation of SelectionDAG. It is inspired from the fact that, a developer may want to just dump the graph at a predictable path with a simple name to compare. The exisitng utility (i.e. viewGraph) are overkill for this motive hence this patch adds the requires support while using the core routines from GraphWriter. Example usage: DAG.dumpDotGraph("/tmp/graph.dot", "MyGraph") will create /tmp/graph.dot file when DAG is an object of SelectionDAG class. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D80711	2020-06-04 11:51:48 +05:30
Philip Reames	77b9606e79	[Statepoint] Remove last of old ImmutableStatepoint code To do so, I had to sink the old school inline operand handling into GCStatepointInst which is non ideal. This code should be removed shortly and I was able to at least clean it up a bunch.	2020-06-03 20:31:17 -07:00
Philip Reames	409ef9346e	[Statepoint] Delete more dead code from old wrappers The verify() routine duplicates IR/Verifier.cpp checks, so while not technically dead it doesn't add any value either.	2020-06-03 20:10:30 -07:00
Matt Arsenault	92114f8000	GlobalISel: Start defining strict FP instructions The AMDGPU lowering for unconstrained G_FDIV sometimes needs to introduce a mode switch in the middle, so it's helpful to have constrained instructions available to legalize this. Right now nothing is preventing reordering of the mode switch with the other instructions in the expansion.	2020-06-03 20:46:37 -04:00
Quentin Colombet	59fadf092a	[RegisterCoalescer] Update empty subranges when rematerializing When we rematerialize a value as part of the coalescing, we may widen the register class of the destination register. When this happens, updateRegDefUses may create additional subranges to account for the wider register class. The created subranges are empty and if they are not defined by the rematerialized instruction we clean them up. However, if they are defined by the rematerialized instruction but unused, we failed to flag them as dead definition and would leave them as empty live-range. This is wrong because empty live-ranges don't interfere with anything, thus if we don't fix them, we would fail to account that the rematerialized instruction clobbers some lanes. E.g., let us consider the following pseudo code: def.lane_low64:reg128 = ldimm newdef:reg32 = COPY def.lane_low64_low32 When rematerialization happens for newdef, we end up with: newdef.lane_low64:reg128 = ldimm = use newdef.lane_low64_low32 Let's look at the live interval of newdef. Before rematerialization, we would get: newdef [defIdx, useIdx:0) 0@defIdx Right after updateRegDefUses, newdef register class is widen to reg128 and the subrange definitions will be augmented to fill the subreg that is used at the definition point, here lane_low64. The resulting live interval would be: newdef [newDefIdx, useIdx:0) 0@newDefIdx * lane_low64_high32 EMPTY * lane_low64_low32 [newDefIdx, useIdx:0) Before this patch this would be the final status of the live interval. Therefore we miss that lane_low64_high32 is actually live on the definition point of newdef. With this patch, after rematerializing, we check all the added subranges and for the ones that are defined but empty, we flag them as dead def. Thus, in that case, newdef would look like this: newdef [newDefIdx, useIdx:0) 0@newDefIdx * lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY * lane_low64_low32 [newDefIdx, useIdx:0) This fixes https://www.llvm.org/PR46154	2020-06-03 17:10:55 -07:00
Matt Arsenault	f94d7cb9ba	GlobalISel: Fail expansion of G_DYN_STACKALLOC for StackGrowsUp	2020-06-03 19:56:07 -04:00
Philip Reames	93f128fcae	[Statepoints][CGP] Minor parameter type cleanup	2020-06-03 16:00:38 -07:00
Matt Arsenault	b7b3e8e261	RegAllocFast: Record internal state based on register units Record internal state based on register units. This is often more efficient as there are typically fewer register units to update compared to iterating over all the aliases of a register. Original patch by Matthias Braun, but I've been rebasing and fixing it for almost 2 years and fixed a few bugs causing intermediate failures to make this patch independent of the changes in https://reviews.llvm.org/D52010.	2020-06-03 16:51:46 -04:00
Victor Huang	14cc6c217a	[CodeGen] Enable tail call position check for speculatable functions In the function "Analysis.cpp:isInTailCallPosition", it only checks whether a call is in a tail call position if the call has side effects, access memory or it is not safe to speculative execute. Therefore, a speculatable function will not go through tail call position check and improperly tail called when it is not in a tail-call position. This patch enables tail call position check for speculatable functions. Differential Revision: https://reviews.llvm.org/D80661	2020-06-03 10:37:45 -05:00
Kang Zhang	070c26f0c5	[LiveVariables] Don't set undef reg PHI used as live for FromMBB Summary: In the patch D73152, it adds a new function LiveVariables::addNewBlock. This new function will add the reg which PHI used to the MBB which reg is from. But the new function may cause LiveVariable Verification failed when the Src reg in PHI is undef. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D80077	2020-06-03 15:25:30 +00:00
Henry Kao	d1cc2de77b	[CodeGen][SVE] Replace deprecated calls in getCopyFromPartsVector() Summary: Replaced getVectorNumElements() with getVectorElementCount(). Added operator overloads for class ElementCount. Fixes warning in several AArch64 unit tests. Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80826	2020-06-03 11:20:02 -04:00
Simon Pilgrim	8ca5ce94d4	[DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits. If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly. This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI. Differential Revision: https://reviews.llvm.org/D80869	2020-06-03 16:11:54 +01:00
Simon Pilgrim	fbe2fc786a	[DAG] GetDemandedBits - don't bother asserting for a non-null cast<> result. NFC. cast<> will assert on failure anyhow. This lets us fold the cast<> with the getAPIntValue() that uses it.	2020-06-03 12:43:07 +01:00
Simon Pilgrim	fac721d969	TargetFrameLowering.h - remove unnecessary includes. NFC. Move TargetFrameLowering.h include to the top of the TargetFrameLoweringImpl.cpp includes (clang-format doesn't do this by default as the filenames don't match).	2020-06-03 11:12:42 +01:00
Kadir Cetinkaya	2cb560a712	[llvm] Fix unused variable warnings	2020-06-03 11:49:01 +02:00
Djordje Todorovic	2533162bf0	[CSInfo][MIPS][DwarfDebug] Add support for delay slots This adds call site info support for call instructions with delay slot. Search for instructions inside call delay slot, which load value into parameter forwarding registers. Return address of the call points to instruction after call delay slot, which is not the one, immediately after the call instruction. Patch by Nikola Tesic Differential revision: https://reviews.llvm.org/D78107	2020-06-03 11:25:17 +02:00
Eric Christopher	995cc9c405	Undo initialization of TRI in CGP as this is unconditionally initialized later.	2020-06-02 15:08:54 -07:00
Kadir Cetinkaya	def2e34328	[llvm] Fix unused variable warning	2020-06-02 22:46:24 +02:00
Eric Christopher	2f90457f4d	Fix up clang-tidy warnings around null and pointers.	2020-06-02 13:24:20 -07:00
Amy Kwan	97fd4517d5	[DAGCombiner] Combine shifts into multiply-high This patch implements a target independent DAG combine to produce multiply-high instructions from shifts. This DAG combine will combine shifts for any type as long as the MULH on the narrow type is legal. For now, it is enabled on PowerPC as PowerPC is the only target that has an implementation of the isMulhCheaperThanMulShift TLI hook introduced in D78271. Moreover, this DAG combine focuses on catching the pattern: (shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>) to produce mulhs when we have a sign-extend, and mulhu when we have a zero-extend. The patch performs the following checks: - Operation is a right shift arithmetic (sra) or logical (srl) - Input to the shift is a multiply - Both operands to the shift are sext/zext nodes - The extends into the multiply are both the same - The narrow type is half the width of the wide type - The shift amount is the width of the narrow type - The respective mulh operation is legal Differential Revision: https://reviews.llvm.org/D78272	2020-06-02 15:22:48 -05:00
Djordje Todorovic	475384322f	[CSInfo][NFC] Interpret loaded parameter value separately The collectCallSiteParameters() method searches for instructions which load values into registers used for parameters passing. Previously, interpretation of those values, loaded by one such instruction, was implemented inside collectCallSiteParameters() method. This patch moves the interpretation code from collectCallSiteParameters() method into a separate static method named interpretValue. New method is called from collectCallSiteParameters() to process each instruction from targeted instruction scope. The collectCallSiteParameters() searches for loaded parameter value among instructions which precede the call instruction, inside the same basic block. When needed, new method (interpretValue) could be used for searching any instruction scope. This is preparation for search of parameter value, loaded inside call delay slot. Patch by Nikola Tesic Differential revision: https://reviews.llvm.org/D78106	2020-06-02 13:05:04 +02:00
Sriraman Tallam	58452e0fcc	Options for Basic Block Sections, enabled in D68063 and D73674. This patch adds clang options: -fbasic-block-sections={all,<filename>,labels,none} and -funique-basic-block-section-names. LLVM Support for basic block sections is already enabled. + -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic block sections for all or a subset of basic blocks. "labels" only enables basic block symbols. + -funique-basic-block-section-names: Enables unique section names for basic block sections, disabled by default. Differential Revision: https://reviews.llvm.org/D68049	2020-06-02 00:23:32 -07:00
Denis Antrushin	40fd2ba064	[StatepointLowering] Handle UNDEF gc values. Do not spill UNDEF GC values. Instead, replace corresponding gc.relocate intrinsic with an (arbitrary, but recognizable) constant. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D80714	2020-06-02 10:18:33 +03:00
Richard Smith	04069444bf	Fix violations of [basic.class.scope]p2. These cases all follow the same pattern: struct A { friend class X; //... class X {}; }; But 'friend class X;' injects 'X' into the surrounding namespace scope, rather than introducing a class member. So the second 'class X {}' is a completely different type, which changes the meaning of the earlier name 'X' from '::X' to 'A::X'. Additionally, the friend declaration is pointless -- members of a class don't need to be befriended to be able to access private members.	2020-06-01 22:03:05 -07:00
Vedant Kumar	22f3fd7742	[LiveDebugValues] Remove early-exit when testing regmasks, NFC In transferRegisterDef, if the instruction has a regmask attached, we'll check if any currently used register is clobbered by the regmask. The early exit in this scan isn't necessary, costs a set lookup, and is almost never taken [1]. Delete it. [1] http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp.html#L1136	2020-06-01 15:16:10 -07:00
Vedant Kumar	b5e5fd1027	[LiveDebugValues] Add LocIndex::u32_{location,index}_t types for readability, NFC This is per Adrian's suggestion in https://reviews.llvm.org/D80684.	2020-06-01 11:02:36 -07:00
Vedant Kumar	6f84fd7763	[LiveDebugValues] Speed up removeEntryValue, NFC Summary: Instead of iterating over all VarLoc IDs in removeEntryValue(), just iterate over the interval reserved for entry value VarLocs. This changes the iteration order, hence the test update -- otherwise this is NFC. This appears to give an ~8.5x wall time speed-up for LiveDebugValues when compiling sqlite3.c 3.30.1 with a Release clang (on my machine): ``` ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- Before: 2.5402 ( 18.8%) 0.0050 ( 0.4%) 2.5452 ( 17.3%) 2.5452 ( 17.3%) Live DEBUG_VALUE analysis After: 0.2364 ( 2.1%) 0.0034 ( 0.3%) 0.2399 ( 2.0%) 0.2398 ( 2.0%) Live DEBUG_VALUE analysis ``` The change in removeEntryValue() is the only one that appears to affect wall time, but for consistency (and to resolve a pending TODO), I made the analogous changes for iterating over SpillLocKind VarLocs. Reviewers: nikic, aprantl, jmorse, djtodoro Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80684	2020-06-01 11:02:36 -07:00
Matt Arsenault	0ba1ec26dd	DAG: Fix getNode dropping flags if there's a glue output The AMDGPU non-strict fdiv lowering needs to introduce an FP mode switch in some cases, and has custom nodes to provide chain/glue for the intermediate FP operations. We need to propagate nofpexcept here, but getNode was dropping the flags. Adding nofpexcept in the AMDGPU custom lowering is left to a future patch. Also fix a second case where flags were dropped, but in this case it seems it just didn't handle this number of operands. Test will be included in future AMDGPU patch.	2020-06-01 13:48:02 -04:00
hsmahesha	ed5decd2c8	[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on max number of mem ops that can be clustered. This patch adds support to pass number of clustered bytes to target mem ops clustering logic. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80545	2020-06-01 22:52:34 +05:30
Chen Zheng	c5c4a9bca7	[MachineCombine] add a hook for resource length limit	2020-05-31 23:21:04 -04:00
Matt Arsenault	354a94569e	AArch64/GlobalISel: Fix incorrect ptrmask usage for alignment I inverted the mask when I ported to the new form of G_PTRMASK in 8bc03d2168241f7b12265e9cd7e4eb7655709f34. I don't think this really broke anything, since G_VASTART isn't handled for types with an alignment higher than the stack alignment.	2020-05-31 10:56:55 -04:00
Florian Hahn	4c3ac27019	[ScheduleDAG] Avoid unnecessary recomputation of topological order. In some cases ScheduleDAGRRList has to add new nodes to resolve problems with interfering physical registers. When new nodes are added, it completely re-computes the topological order, which can take a long time, but is unnecessary. We only add nodes one by one, and initially they do not have any predecessors. So we can just insert them at the end of the vector. Later we add predecessors, but the helper function properly updates the topological order much more efficiently. With this change, the compile time for the program below drops from 300s to 30s on my machine. define i11129 @test1() { %L1 = load i11129, i11129* undef %B30 = ashr i11129 %L1, %L1 store i11129 %B30, i11129* undef ret i11129 %L1 } This should be generally beneficial, as we can skip a large amount of work. Theoretically there are some scenarios where we might not safe much, e.g. when we add a dependency between the first and last node. Then we would have to shift all nodes. But we still do not have to spend the time re-computing the initial order. Reviewers: MatzeB, atrick, efriedma, niravd, paquette Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D59722	2020-05-31 11:04:35 +01:00

1 2 3 4 5 ...

28826 Commits