llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Simon Pilgrim	9edfcde2eb	[CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.	2021-05-07 14:48:23 +01:00
Simon Pilgrim	b5e4cc0124	[DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.	2021-05-07 14:48:23 +01:00
Benjamin Kramer	f2e0efd85f	Retire TargetRegisterInfo::getSpillAlignment getSpillAlign does the same thing.	2021-05-07 15:16:22 +02:00
Sebastian Neubauer	154e1ab9f4	[AMDGPU] Restrict immediate scratch offsets gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292	2021-05-07 14:51:32 +02:00
David Stuttard	754f532bdd	AMDGPU: Correct const_index_stride for wave 32 for PAL ABI Retrying after revert and fix (removed implicit def flag from operand). Now passes with expensive_checks enabled. Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360	2021-05-07 13:42:57 +01:00
Malhar Jajoo	0a055c77d0	[ARM] Transforming memset to Tail predicated Loop This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). The llvm.memset is converted to a TP loop for both constant and non-constant input sizes (of llvm.memset). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100435	2021-05-07 13:35:53 +01:00
Joachim Meyer	6aa024bb17	[NFC] Correctly assert the indents for printEnumValHelpStr. Only verify that there's no negative indent. Noted by @chapuni in https://reviews.llvm.org/D93494. Reviewed By: chapuni Differential Revision: https://reviews.llvm.org/D102021	2021-05-07 14:30:43 +02:00
Stephen Tozer	59f38d865c	[DebugInfo] Fix crash when emitting an invalidated SDDbgValue This patch fixes a crash in the compiler that occurs when certain invalidated SDDbgValues are emitted. The cause of this was that we would attempt to check the liveness of the debug value's operands, which triggers an assert if any of those operands are invalid. This patch changes this check such that it only occurs if the SDDbgValue is valid; if not, the check is irrelevant anyway, so can be safely ignored. Differential Revision: https://reviews.llvm.org/D101540	2021-05-07 13:13:56 +01:00
Simon Pilgrim	742cb688cc	[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts. Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication. I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to). NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly. Differential Revision: https://reviews.llvm.org/D101987	2021-05-07 13:12:30 +01:00
David Stuttard	9e3c57dd1e	Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI" This reverts commit 442de0c1adf36bfddb5fb66b442bba8999fa733b.	2021-05-07 12:49:17 +01:00
Simon Pilgrim	c787d6b082	[X86] Ensure we pass DebugLoc by const reference where possible. NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef	2021-05-07 12:32:59 +01:00
David Stuttard	fc89cd7d12	AMDGPU: Correct const_index_stride for wave 32 for PAL ABI Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c	2021-05-07 12:19:49 +01:00
Roman Lebedev	01dd090cc4	[X86] AMD Zen 3: 32/64 -bit GPR register moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 13:56:07 +03:00
Stephen Tozer	8292a6702b	[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST This patch modifies updateDbgUsersToReg to properly handle DBG_VALUE_LIST instructions, by replacing the hard-coded operand indices (i.e. getOperand(0)) with the more general getDebugOperandsForReg(), and updating the register for all matching operands. Differential Revision: https://reviews.llvm.org/D101523	2021-05-07 11:47:50 +01:00
Guillaume Chatelet	1a5a03f259	[llvm][NFC] Remove remaining deprecated alignment functions from CodeGen Differential Revision: https://reviews.llvm.org/D102058	2021-05-07 10:22:41 +00:00
LemonBoy	8144a0e6eb	[AsmParser][ARM] Make .thumb_func imply .thumb GNU as documentation states that a `.thumb_func` directive implies `.thumb`, teach the asm parser to switch mode whenever it's encountered. On the other hand the labeled form, exclusive to Apple's toolchain, doesn't switch mode at all. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D101975	2021-05-07 12:13:36 +02:00
Sebastian Neubauer	e872e10570	[AMDGPU] Serialize MFInfo::ScavengeFI Serialize ScavengeFI from SIMachineFunctionInfo into yaml. ScavengeFI is not used outside of the PrologEpilogInserter, so this shouldn't change anything. Differential Revision: https://reviews.llvm.org/D101367	2021-05-07 11:15:25 +02:00
Caroline Concatto	da71820b1a	[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction The function fixReduction used to assert/crash for scalable vector when a vector reduce could be done with a smaller vector. This patch removes this assertion as it is safe to use scalable vector for vector reduce and truncate. Differential Revision: https://reviews.llvm.org/D101260	2021-05-07 09:37:37 +01:00
Peilin Guo	1d55cf8e66	[LazyValueInfo] Insert an Overdefined placeholder to prevent infinite recursion getValueFromCondition() uses a Visited set to record the intermediate value. However, it uses a postorder way to compute the value first and update the Visited set later. Thus it will be trapped into an infinite recursion if there exists IRs that use no dominated by its def as in this example: %tmp3 = or i1 undef, %tmp4 %tmp4 = or i1 undef, %tmp3 To prevent this, we can insert an Overdefined placeholder into the set before computing the actual value. Reviewed by: nikic Differential Revision: https://reviews.llvm.org/D101273	2021-05-07 16:05:50 +08:00
Chen Zheng	9afc3b9839	[Debug-Info][NFC] add a wrapper for Die.addValue Add a new wrapper function addAttribute() for Die.addValue() function, so we can do some attributes control in one single interface. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D101125	2021-05-07 07:24:09 +00:00
Amara Emerson	dbed2fcc3c	[GlobalISel] Micro-optimize the conditional branch optimization. Convert a check into an assert and pass an MI instead of recomputing in the apply function.	2021-05-07 00:03:09 -07:00
Chen Zheng	7189f2452c	[XCOFF] handle string constants generation for AIX This follows https://www.ibm.com/docs/en/aix/7.2?topic=constants-string Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D101280	2021-05-07 06:43:36 +00:00
Qiu Chaofan	f654201998	[PowerPC] Remove extra swap for extract+vperm on LE This is a simple fix on LE. On BE, vector shuffles are categorized into different ops. We may need more work to eliminate these in tablegen/pre-isel. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D101605	2021-05-07 13:48:08 +08:00
Yonghong Song	b2cd040e05	BPF: fix FIELD_EXISTS relocation with array subscripts Lorenz Bauer reported an issue in bpf mailing list ([1]) where for FIELD_EXISTS relocation, if the object is an array subscript, the patched immediate is the object offset from the base address, instead of 1. Currently in BPF AbstractMemberAccess pass, the final offset from the base address is the patched offset except FIELD_EXISTS which is 1 unconditionally. In this particular case, the last data structure access is not a field (struct/union offset) so it didn't hit the place to set patched immediate to be 1. This patch fixed the issue by checking the relocation type. If the type is FIELD_EXISTS, just set to 1. Tested by modifying some bpf selftests, libbpf is okay with such types with FIELD_EXISTS relocation. [1] https://lore.kernel.org/bpf/CACAyw99n-cMEtVst7aK-3BfHb99GMEChmRLCvhrjsRpHhPrtvA@mail.gmail.com/ Differential Revision: https://reviews.llvm.org/D102036	2021-05-06 22:37:02 -07:00
Bruno Cardoso Lopes	da10d555a9	[CGAtomic] Lift strong requirement for remaining compare_exchange combinations Follow up on 431e3138a and complete the other possible combinations. Besides enforcing the new behavior, it also mitigates TSAN false positives when combining orders that used to be stronger.	2021-05-06 21:05:20 -07:00
Cyndy Ishida	98255d6170	[llvm][TextAPI] add mapping from OS string to Platform * add utility for matching target triple OS value strings to PlatformKind This was reviewed offline by ributzka, steven_wu	2021-05-06 16:25:56 -07:00
Stanislav Mekhanoshin	ab08d36d20	[AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32 Differential Revision: https://reviews.llvm.org/D102022	2021-05-06 16:17:33 -07:00
Malhar Jajoo	4adadbc511	[ARM] Transforming memcpy to Tail predicated Loop This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). From an implementation point of view, the patch - adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99723	2021-05-06 23:21:28 +01:00
Sanjay Patel	290838c72e	[PassManager] add helper function to hold set of vector passes This is no-functional-change-intended (NFC) and split off from D102002 (which proposes to eliminate the LTO-based differences).	2021-05-06 15:36:15 -04:00
Mircea Trofin	d07528fa7d	[NPM] Do not run function simplification pipeline unnecessarily The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1]. This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since. The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios. [1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions \| compile time tracker ]] run with the option enabled. Differential Revision: https://reviews.llvm.org/D98103	2021-05-06 12:24:33 -07:00
Craig Topper	0265102f2f	[RISCV] Remove unused ComplexPatterns. NFC	2021-05-06 12:17:41 -07:00
Craig Topper	2c6f23cea3	[RISCV] Minor vector instruction tablegen cleanup. NFC Use result_type for the IMPLICIT_DEF in masked vector patterns. This doesn't matter today because result_type and op_type are always the same. Use multiclass inheritance to reduce repeated code.	2021-05-06 11:23:59 -07:00
Fangrui Song	b76e201ee8	[AArch64] Fix namespace issue. NFC	2021-05-06 11:16:07 -07:00
Craig Topper	c150a3b9a7	[RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC Looks like these got left behind when vleff isel was moved to X86ISelDAGToDAG.cpp	2021-05-06 09:41:29 -07:00
Jonas Paulsson	e554dc2124	[SystemZ] Don't use libcall for 128 bit shifts. Expand 128 bit shifts instead of using a libcall. This patch removes the 128 bit shift libcalls and thereby causes ExpandShiftWithUnknownAmountBit() to be called. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D101993	2021-05-06 18:14:41 +02:00
Craig Topper	53ff8f3378	[RISCV] Cleanup instruction formats used for B extension ternary operations. Rename RVInstR4 as used by F/D/Zfh extensions to RVInstR4Frm. Introduce new RVInstR4 that takes funct3 as a parameter. Add new format classes for FSRI and FSRIW instead of trying to bend RVInstR4 to use a shamt overlayed on rs2 and funct2. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100427	2021-05-06 08:59:05 -07:00
Stanislav Mekhanoshin	8091cc1712	[AMDGPU] Fix 64 bit DPP validation AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly find a DPP register operand, regadless of the position it is always src0. Moved this check into a new validateDPP() method where we have full instruction already. In particular it was failing to reject this case: v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf Essentially it was broken for any case where size of dst and src0 differ. It also improves the diagnostics with a proper error message. The check in the InstPrinter also drops verification of the dst register as it does not have anything to do with the dpp operand. Differential Revision: https://reviews.llvm.org/D101930	2021-05-06 08:40:26 -07:00
Simon Pilgrim	0bc61be53e	[SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	4dd73bffc3	[SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	07a2f5ec27	[SLP] Use empty() instead of size() == 0. NFCI.	2021-05-06 16:20:18 +01:00
Austin Kerbow	3e889f931b	[AMDGPU][NFC] Fix typos in SIFormMemoryClauses description NFC.	2021-05-06 07:47:39 -07:00
Victor Huang	328fd59846	[AIX][TLS] Add support for TLSGD relocations to XCOFF objects - Add branch absolute reloction R_RBA, R_TLS relocation for the variable offset for the tlsgd model and R_TLSM for the region handle for the tlsgd model - Properly set the relocation fixed values for R_TLS and R_TLSM - Emit the TCEntry with the variant kind in the XCOFFStreamer Reviewed by: sfertile, nemanjai, DiggerLin Differential Revision: https://reviews.llvm.org/D100214	2021-05-06 09:01:47 -05:00
Jay Foad	0eb6d67512	[AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC.	2021-05-06 14:47:54 +01:00
Carl Ritson	c515c8773d	[AMDGPU] Fix WQM failure with single block inactive demote Instruction test for inactive kill/demote needs to be based on actual opcode not whether instruction would be lowered to demote. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D101966	2021-05-06 21:02:26 +09:00
Malhar Jajoo	aef884b9ab	Revert "[ARM] Transforming memcpy to Tail predicated Loop" Reverting commit since it causes failure (10462). This reverts commit b856f4a232cbd43476e9b9f75c80aacfc6f5c152.	2021-05-06 12:39:08 +01:00
David Green	7d3e19f39a	[LV] Account for tripcount when calculation vectorization profitability The loop vectorizer will currently assume a large trip count when calculating which of several vectorization factors are more profitable. That is often not a terrible assumption to make as small trip count loops will usually have been fully unrolled. There are cases however where we will try to vectorize them, and especially when folding the tail by masking can incorrectly choose to vectorize loops that are not beneficial, due to the folded tail rounding the iteration count up for the vectorized loop. The motivating example here has a trip count of 5, so either performs 5 scalar iterations or 2 vector iterations (with VF=4). At a high enough trip count the vectorization becomes profitable, but the rounding up to 2 vector iterations vs only 5 scalar makes it unprofitable. This adds an alternative cost calculation when we know the max trip count and are folding tail by masking, rounding the iteration count up to the correct number for the vector width. We still do not account for anything like setup cost or the mixture of vector and scalar loops, but this is at least an improvement in a few cases that we have had reported. Differential Revision: https://reviews.llvm.org/D101726	2021-05-06 12:36:46 +01:00
Jonas Paulsson	a97c722aa8	[SystemZ] Support builtin_frame_address with packed stack without backchain. In order to use __builtin_frame_address(0) with packed stack and no backchain, the address of where the backchain would have been written is returned (like GCC). This address may either contain a saved register or be unused. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D101897	2021-05-06 12:50:49 +02:00
Kerry McLaughlin	5dadd98082	[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences Adds support for scalable vectorization of loops containing first-order recurrences, e.g: ``` for(int i = 0; i < n; i++) b[i] = a[i] + a[i - 1] ``` This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into account when inserting into and extracting from the last lane of a vector. CreateVectorSplice has been added to construct a vector for the recurrence, which returns a splice intrinsic for scalable types. For fixed-width the behaviour remains unchanged as CreateVectorSplice will return a shufflevector instead. The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D101076	2021-05-06 11:35:39 +01:00
Jay Foad	ba54efe9d6	[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value for. Second clean up the loop that calls tryConstantFoldOp so that it does not have to restart from the beginning every time it folds an instruction. This is NFCI but there are some minor changes caused by the order in which things are folded. Differential Revision: https://reviews.llvm.org/D100031	2021-05-06 09:55:22 +01:00
Malhar Jajoo	e6b0d4f6de	[ARM] Transforming memcpy to Tail predicated Loop This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). From an implementation point of view, the patch - adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop. Note: A cli option is used to control the conversion of memcpy to TP loop and this option is currently disabled by default. It may be enabled in the future after further downstream testing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99723	2021-05-06 09:34:09 +01:00

1 2 3 4 5 ...

146714 Commits