llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00

Author	SHA1	Message	Date
Arthur Eubanks	920d09ef67	Revert "[CFGuard] Add address-taken IAT tables and delay-load support" This reverts commit ef4e971e5e18ae796466623df8f26265ba6bdfb5.	2020-10-01 11:29:54 -07:00
Sanjay Patel	65ea3c84f4	[InstCombine] auto-generate complete test checks; NFC	2020-10-01 13:44:31 -04:00
zoecarver	b38d4083e5	[DSE] Look through memory PHI arguments when removing noop stores in MSSA. Summary: Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks. Originally part of D79391. Reviewers: fhahn, jfb, asbirlea Differential Revision: https://reviews.llvm.org/D82588	2020-10-01 10:42:02 -07:00
Jamie Schmeiser	37e405d0af	Reland No.3: Add new hidden option -print-changed which only reports changes to IR A new hidden option -print-changed is added along with code to support printing the IR as it passes through the opt pipeline in the new pass manager. Only those passes that change the IR are reported, with others only having the banner reported, indicating that they did not change the IR, were filtered out or ignored. Filtering of output via the -filter-print-funcs is supported and a new supporting hidden option -filter-passes is added. The latter takes a comma separated list of pass names and filters the output to only show those passes in the list that change the IR. The output can also be modified via the -print-module-scope function. The code introduces an abstract template base class that generalizes the comparison of IRs that takes an IR representation as template parameter. Derived classes provide overrides that provide an event based API for generalized reporting of IRs as they are changed in the opt pipeline through the new pass manager. The first of several instantiations is provided that prints the IR in a form similar to that produced by -print-after-all with the above mentioned filtering capabilities. This version, and the others to follow will be introduced at the upcoming developer's conference. Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen), MaskRay (Fangrui Song) Differential Revision: https://reviews.llvm.org/D86360	2020-10-01 17:39:13 +00:00
Mircea Trofin	14a8d83207	[NFC] Let (MC)Register APIs check isStackSlot The user is expected to make the isStackSlot check before calling isPhysicalRegister or isVirtualRegister. The APIs assert otherwise. We can improve the usability of these APIs by carrying out the check in the 2 APIs: they become a complete "source of truth" and remove an extra responsibility from the user. Differential Revision: https://reviews.llvm.org/D88598	2020-10-01 09:55:20 -07:00
Shoaib Meenai	8896c8f66c	[runtimes] Remove TOOLCHAIN_TOOLS specialization https://reviews.llvm.org/D88310 fixed the AIX issue in LLVMExternalProjectUtils, so we shouldn't need the workaround in the runtimes build anymore. I'm reverting it because it prevents the target-specific tool selection in LLVMExternalProjectUtils from taking effect, which we rely on for our runtimes builds. Reviewed By: daltenty Differential Revision: https://reviews.llvm.org/D88627	2020-10-01 09:53:10 -07:00
Vy Nguyen	a8dbe39f1b	Reland rG4fcd1a8e6528:[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking. This is mostly for the benefit of the LBR latency mode. Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency. Differential Revision: https://reviews.llvm.org/D85254 New change: Updated lit.local.cfg to use pass the right argument to llvm-exegesis to actually request the LBR mode. Differential Revision: https://reviews.llvm.org/D88670	2020-10-01 12:21:16 -04:00
Martin Storsjö	0bc7a3ead4	[AArch64] Don't merge sp decrement into later stores when using WinCFI This matches the corresponding existing case in AArch64LoadStoreOpt::findMatchingUpdateInsnForward. Both cases could also be modified to check MBBI->getFlag(FrameSetup/FrameDestroy) instead of forbidding any optimization involving SP, but the effect is probably pretty much the same. Differential Revision: https://reviews.llvm.org/D88541	2020-10-01 19:03:27 +03:00
Martin Storsjö	bbdb6c4cf7	[AArch64] Remove a duplicate call to setHasWinCFI. NFCI. The function already has a cleanup scope that calls the same whenever the function is exited. When reading the code, seeing that this return codepath has an explicit call while other return paths lack it is confusing. In the hypothetical case of a function having a prologue that set the HasWinCFI flag in the MF, but the epilogue containing no WinCFI instructions, the HasWinCFI flag in the MF would end up reset back to false. Differential Revision: https://reviews.llvm.org/D88636	2020-10-01 19:03:27 +03:00
Simon Pilgrim	744dd3076c	[InstCombine] collectBitParts - convert to use PatterMatch matchers and avoid IntegerType casts. Make sure we're using getScalarSizeInBits instead of cast<IntegerType> to get Type bit widths. This is preliminary cleanup before we can start adding vector support to the bswap/bitreverse (element level) matching.	2020-10-01 16:44:14 +01:00
Meera Nakrani	1ebfeef2a5	[ARM] Removed hasSideEffects from signed/unsigned saturates Removed hasSideEffects from SSAT and USAT so that they are no longer marked as unpredictable. Differential Revision: https://reviews.llvm.org/D88545	2020-10-01 14:55:01 +00:00
Jay Foad	b964542465	[AMDGPU] Simplify getNumFlatOffsetBits. NFC. Remove some checks that have already been done in the only caller.	2020-10-01 15:24:09 +01:00
LLVM GN Syncbot	fb69abb068	[gn build] Port f6b1323bc68	2020-10-01 14:18:52 +00:00
Simon Pilgrim	a4a9ce83e5	[InstCombine] Use m_FAbs matcher helper. NFCI.	2020-10-01 14:42:34 +01:00
Simon Pilgrim	038cfb6d37	[IR] PatternMatch - add m_FShl/m_FShr funnel shift intrinsic matchers. NFCI.	2020-10-01 14:42:34 +01:00
Jay Foad	0359573ae0	[AMDGPU] Tiny cleanup in isLegalFLATOffset. NFC.	2020-10-01 14:26:03 +01:00
David Sherwood	42f980da4f	[SVE][CodeGen] Replace use of TypeSize operator< in GlobalMerge::doMerge We don't support global variables with scalable vector types so I've changed the code to compare the fixed sizes instead. Differential Revision: https://reviews.llvm.org/D88564	2020-10-01 14:06:59 +01:00
James Henderson	6428f0596b	[Archive] Don't throw away errors for malformed archive members When adding an archive member with a problem, e.g. a new bitcode with an old archiver, containing an unsupported attribute, or an ELF file with a malformed symbol table, the archiver would throw away the error and simply add the member to the archive without any symbol entries. This meant that the resultant archive could be silently unusable when not using --whole-archive, and result in unexpected undefined symbols. This change fixes this issue by addressing two FIXMEs and only throwing away not-an-object errors. However, this meant that some LLD tests which didn't need symbol tables and were using invalid members deliberately to test the linker's malformed input handling no longer worked, so this patch also stops the archiver from looking for symbols in an object if it doesn't require a symbol table, and updates the tests accordingly. Differential Revision: https://reviews.llvm.org/D88288 Reviewed by: grimar, rupprecht, MaskRay	2020-10-01 14:03:34 +01:00
LLVM GN Syncbot	77de5cb36a	[gn build] Port d53b4bee0cc	2020-10-01 12:55:59 +00:00
Sjoerd Meijer	c208ece583	[LoopFlatten] Add a loop-flattening pass This is a simple pass that flattens nested loops. The intention is to optimise loop nests like this, which together access an array linearly: for (int i = 0; i < N; ++i) for (int j = 0; j < M; ++j) f(A[iM+j]); into one loop: for (int i = 0; i < (NM); ++i) f(A[i]); It can also flatten loops where the induction variables are not used in the loop. This can help with codesize and runtime, especially on simple cpus without advanced branch prediction. This is only worth flattening if the induction variables are only used in an expression like i*M+j. If they had any other uses, we would have to insert a div/mod to reconstruct the original values, so this wouldn't be profitable. This partially fixes PR40581 as this pass triggers on one of the two cases. I will follow up on this to learn LoopFlatten a few more (small) tricks. Please note that LoopFlatten is not yet enabled by default. Patch by Oliver Stannard, with minor tweaks from Dave Green and myself. Differential Revision: https://reviews.llvm.org/D42365	2020-10-01 13:54:45 +01:00
Sam Parker	55afff27c3	[NFC][ARM] LowOverheadLoop DEBUG statements	2020-10-01 13:38:16 +01:00
Simon Pilgrim	7c21647448	[InstCombine] collectBitParts - use APInt directly to check for out of range bit shifts. NFCI.	2020-10-01 12:50:36 +01:00
Andrew Paverd	ee355f6f7f	[CFGuard] Add address-taken IAT tables and delay-load support This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table. Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87544	2020-10-01 12:45:07 +01:00
Kerry McLaughlin	6865916c61	[SVE][CodeGen] Lower scalable fp_extend & fp_round operations This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU ISD nodes, used to lower scalable vector fp_extend/fp_round operations. fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one. This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll, resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88321	2020-10-01 12:17:37 +01:00
Max Kazantsev	407ffa5a5f	[SCEV] Prove implicaitons via AddRec start If we know that some predicate is true for AddRec and an invariant (w.r.t. this AddRec's loop), this fact is, in particular, true on the first iteration. We can try to prove the facts we need using the start value. The motivating example is proving things like ``` isImpliedCondOperands(>=, X, 0, {X,+,-1}, 0} ``` Differential Revision: https://reviews.llvm.org/D88208 Reviewed By: reames	2020-10-01 17:09:38 +07:00
Sam Parker	825974110a	[ARM][LowOverheadLoops] Adjust Start insertion. Try to move the insertion point to become the terminator of the block, usually the preheader. Differential Revision: https://reviews.llvm.org/D88638	2020-10-01 10:49:19 +01:00
Kerry McLaughlin	3df0efdb34	[SVE][CodeGen] Legalisation of integer -> floating point conversions Splitting the operand of a scalable [S\|U]INT_TO_FP results in a concat_vectors operation where the operands are unpacked FP scalable vectors (e.g. nxv2f32). This patch adds custom lowering of concat_vectors which checks that the number of operands is 2, and isel patterns to match concat_vectors of scalable FP types with uzp1. Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88033	2020-10-01 10:43:20 +01:00
Paul Walker	cb05bf7165	[NFC] Iterate across an explicit list of scalable MVTs when driving setOperationAction. Iterating across all of integer_scalable_vector_valuetypes seems wasteful when there's only a handful we care about. Also removes some rouge whitespace. Differential Revision: https://reviews.llvm.org/D88552	2020-10-01 10:17:59 +01:00
Sam Parker	5a6c4a60c6	[ARM][LowOverheadLoops] Iteration count liveness Before deciding to insert a [W\|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the iteration count. Differential Revision: https://reviews.llvm.org/D88549	2020-10-01 10:11:10 +01:00
Sam Parker	b42a8cceb4	[ARM][LowOverheadLoops] Start insertion point If possible, try not to move the start position earlier than it already is. Differential Revision: https://reviews.llvm.org/D88542	2020-10-01 10:05:25 +01:00
Stefan Gränitz	bf37528089	[ORC][examples] Temporarily remove LLJITWithChildProcess until ORC TPC lands This solves a phase ordering problem: OrcV2 remote process support depends on OrcV2 removable code, OrcV2 removable code depends on OrcV1 removal, OrcV1 removal depends on LLJITWithChildProcess migration, and LLJITWithChildProcess migration depends on OrcV2 TargetProcessControl support.	2020-10-01 10:25:13 +02:00
Stefan Gränitz	32fd6cdf35	[ORC][examples] Remove ThinLtoJIT example after LLJITWithThinLTOSummaries landed in OrcV2Examples The ThinLtoJIT example was aiming to utilize ThinLTO summaries and concurrency in ORC for speculative compilation. The latter is heavily dependent on asynchronous task scheduling which is probably done better out-of-tree with a mature library like Boost-ASIO. The pure utilization of ThinLTO summaries in ORC is demonstrated in OrcV2Examples/LLJITWithThinLTOSummaries.	2020-10-01 10:22:09 +02:00
Sam Parker	bf6bcf083d	[ARM][LowOverheadLoops] Use iterator for InsertPt. Use a MachineBasicBlock::iterator instead of a MachineInstr* for the position of our LoopStart instruction. NFCish, as it change debug info.	2020-10-01 08:32:35 +01:00
Fangrui Song	3485738376	[MC] Inline MCExpr::printVariantKind & remove UseParensForSymbolVariantBit Note, MAI may be nullptr in -show-encoding.	2020-10-01 00:10:06 -07:00
Amara Emerson	1095c55378	[AArch64][GlobalISel] Select all-zero G_BUILD_VECTOR into a zero mov. Unfortunately the leaf SDAG patterns aren't supported yet so we need to do this manually, but it's not a significant amount of code anyway. Differential Revision: https://reviews.llvm.org/D87924	2020-09-30 23:53:38 -07:00
Andrew Dona-Couch	fb68d2c9e8	[AVR] fix interrupt stack pointer restoration This patch fixes a corruption of the stack pointer and several registers in any AVR interrupt with non-empty stack frame. Previously, the callee-saved registers were popped before restoring the stack pointer, causing the pointer math to use the wrong base value while also corrupting the caller's register. This change fixes the code to restore the stack pointer last before exiting the interrupt service routine. https://bugs.llvm.org/show_bug.cgi?id=47253 Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D87735 Patch by Andrew Dona-Couch.	2020-10-01 18:52:13 +13:00
Chris Lattner	bff360bbf9	We don't need two different ways to get commit access, just simplify the policy here so that old SVN users and new contributors do the same thing.	2020-09-30 22:36:44 -07:00
Max Kazantsev	6b8dcdbad5	[SCEV][NFC] Introduce isKnownPredicateAt method We can query known predicates in different points, respecting their dominating conditions.	2020-10-01 12:11:24 +07:00
Michael Liao	fe455e705f	Revert "[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking." This reverts commit 4fcd1a8e6528ca42fe656f2745e15d2b7f5de495 as `llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s` failed on hosts without LBR supported if the build has LIBPFM enabled. On that host, `perf_event_open` fails with `EOPNOTSUPP` on LBR config. That change's basic assumption > If this is run on a non-supported hardware, it will produce all zeroes for latency. could not stand as `perf_event_open` system call will fail if the underlying hardware really don't have LBR supported.	2020-09-30 23:15:35 -04:00
Craig Topper	1d58f0df56	[APFloat] Improve asserts in isSignificandAllOnes and isSignificandAllZeros so they protect shift operations from undefined behavior. For example, the assert in isSignificandAllZeros allowed NumHighBits to be integerPartWidth. But since it is used directly as a shift amount it must be less than integerPartWidth.	2020-09-30 19:32:34 -07:00
Amara Emerson	18b73b8cb3	[AArch64][GlobalISel] Clamp oversize FP arithmetic vectors.	2020-09-30 18:03:37 -07:00
Amara Emerson	81128d0ecd	Try to fix build. May have used a C++ feature too new/not supported on all platforms.	2020-09-30 17:36:38 -07:00
Arthur Eubanks	e702df7d7f	[WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass The legacy pass's default constructor sets UseCommandLine = true and goes down a separate testing route. Match that in the NPM pass. This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D88588	2020-09-30 17:27:37 -07:00
Amara Emerson	2f3a33fbc1	[AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE. Also use this opportunity start to clean up the mess of vector type lists we have in the LegalizerInfo. Unfortunately since the legalizer rule builders require std::initializer_list objects as parameters we can't programmatically generate the type lists.	2020-09-30 17:25:33 -07:00
Jessica Paquette	9348344fc3	[AArch64][GlobalISel] NFC: Refactor G_FCMP selection code Refactor this so it's similar to the existing integer comparison code. Also add some missing 64-bit testcases to select-fcmp.mir. Refactoring to prep for improving selection for G_FCMP-related conditional branches etc. Differential Revision: https://reviews.llvm.org/D88614	2020-09-30 16:50:39 -07:00
Craig Topper	ffd58b5f83	Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes (bug 34579) Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes to behave correctly in the case that the size of the significand is a multiple of the width of the integerParts making up the significand. The patch to IEEEFloat::isSignificandAllOnes fixes bug 34579, and the patch to IEEE:Float:isSignificandAllZeros fixes the unit test "APFloatTest.x87Next" I added here. I have included both in this diff since the changes are very similar. Patch by Andrew Briand	2020-09-30 16:07:15 -07:00
Ahsan Saghir	f1747882b8	[PowerPC] Add outer product instructions for MMA This patch adds outer product instructions for MMA, including related infrastructure, and their tests. Depends on D84968. Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D88043	2020-09-30 18:06:49 -05:00
Reid Kleckner	03ab9ed5e9	Re-land "[PDB] Merge types in parallel when using ghashing" Stored Error objects have to be checked, even if they are success values. This reverts commit 8d250ac3cd48d0f17f9314685a85e77895c05351. Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f.. Original commit message: ----------------------------------------- This makes type merging much faster (-24% on chrome.dll) when multiple threads are available, but it slightly increases the time to link (+10%) when /threads:1 is passed. With only one more thread, the new type merging is faster (-11%). The output PDB should be identical to what it was before this change. To give an idea, here is the /time output placed side by side: BEFORE \| AFTER Input File Reading: 956 ms \| 968 ms Code Layout: 258 ms \| 190 ms Commit Output File: 6 ms \| 7 ms PDB Emission (Cumulative): 6691 ms \| 4253 ms Add Objects: 4341 ms \| 2927 ms Type Merging: 2814 ms \| 1269 ms -55%! Symbol Merging: 1509 ms \| 1645 ms Publics Stream Layout: 111 ms \| 112 ms TPI Stream Layout: 764 ms \| 26 ms trivial Commit to Disk: 1322 ms \| 1036 ms -300ms ----------------------------------------- -------- Total Link Time: 8416 ms 5882 ms -30% overall The main source of the additional overhead in the single-threaded case is the need to iterate all .debug$T sections up front to check which type records should go in the IPI stream. See fillIsItemIndexFromDebugT. With changes to the .debug$H section, we could pre-calculate this info and eliminate the need to do this walk up front. That should restore single-threaded performance back to what it was before this change. This change will cause LLD to be much more parallel than it used to, and for users who do multiple links in parallel, it could regress performance. However, when the user is only doing one link, it's a huge improvement. In the future, we can use NT worker threads to avoid oversaturating the machine with work, but for now, this is such an improvement for the single-link use case that I think we should land this as is. Algorithm ---------- Before this change, we essentially used a DenseMap<GloballyHashedType, TypeIndex> to check if a type has already been seen, and if it hasn't been seen, insert it now and use the next available type index for it in the destination type stream. DenseMap does not support concurrent insertion, and even if it did, the linker must be deterministic: it cannot produce different PDBs by using different numbers of threads. The output type stream must be in the same order regardless of the order of hash table insertions. In order to create a hash table that supports concurrent insertion, the table cells must be small enough that they can be updated atomically. The algorithm I used for updating the table using linear probing is described in this paper, "Concurrent Hash Tables: Fast and General(?)!": https://dl.acm.org/doi/10.1145/3309206 The GHashCell in this change is essentially a pair of 32-bit integer indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the TpiSource object, and it represents an input type stream. The typeIndex is the index of the type in the stream. Together, we have something like a ragged 2D array of ghashes, which can be looked up as: tpiSources[tpiSrcIndex]->ghashes[typeIndex] By using these side tables, we can omit the key data from the hash table, and keep the table cell small. There is a cost to this: resolving hash table collisions requires many more loads than simply looking at the key in the same cache line as the insertion position. However, most supported platforms should have a 64-bit CAS operation to update the cell atomically. To make the result of concurrent insertion deterministic, the cell payloads must have a priority function. Defining one is pretty straightforward: compare the two 32-bit numbers as a combined 64-bit number. This means that types coming from inputs earlier on the command line have a higher priority and are more likely to appear earlier in the final PDB type stream than types from an input appearing later on the link line. After table insertion, the non-empty cells in the table can be copied out of the main table and sorted by priority to determine the ordering of the final type index stream. At this point, item and type records must be separated, either by sorting or by splitting into two arrays, and I chose sorting. This is why the GHashCell must contain the isItem bit. Once the final PDB TPI stream ordering is known, we need to compute a mapping from source type index to PDB type index. To avoid starting over from scratch and looking up every type again by its ghash, we save the insertion position of every hash table insertion during the first insertion phase. Because the table does not support rehashing, the insertion position is stable. Using the array of insertion positions indexed by source type index, we can replace the source type indices in the ghash table cells with the PDB type indices. Once the table cells have been updated to contain PDB type indices, the mapping for each type source can be computed in parallel. Simply iterate the list of cell positions and replace them with the PDB type index, since the insertion positions are no longer needed. Once we have a source to destination type index mapping for every type source, there are no more data dependencies. We know which type records are "unique" (not duplicates), and what their final type indices will be. We can do the remapping in parallel, and accumulate type sizes and type hashes in parallel by type source. Lastly, TPI stream layout must be done serially. Accumulate all the type records, sizes, and hashes, and add them to the PDB. Differential Revision: https://reviews.llvm.org/D87805	2020-09-30 15:44:38 -07:00
Stanislav Mekhanoshin	4c0f9685b0	[AMDGPU] Reorganize VOP3P encoding This changes width of encoding and opcode fields to match the documentation. Differential Revision: https://reviews.llvm.org/D88619	2020-09-30 15:27:06 -07:00
Reid Kleckner	fb7d976110	Revert "[PDB] Merge types in parallel when using ghashing" This reverts commit 49b3459930655d879b2dc190ff8fe11c38a8be5f.	2020-09-30 14:55:32 -07:00

1 2 3 4 5 ...

204483 Commits