llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00

Author	SHA1	Message	Date
Nico Weber	b16664144b	[gn build] (manually) port 933518fff82c	2021-01-19 18:51:39 -05:00
Ian Levesque	cb3e4b9e0e	[xray] Honor xray-never function-instrument attribute function-instrument=xray-never wasn't actually honored before. We were getting lucky that it worked because CodeGenFunction would omit the other xray attributes when a function was annotated with xray_never_instrument. This patch adds proper support. Differential Revision: https://reviews.llvm.org/D89441	2021-01-19 18:47:09 -05:00
Wei Mi	6e11fd80e6	Fix Wmissing-field-initializers warnings.	2021-01-19 15:26:52 -08:00
Wei Mi	c2ff32c8b0	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Sam Clegg	a3e0bfc296	Revert "[WebAssembly] call_indirect issues table number relocs" This reverts commit 418df4a6ab35d343cc0f2608c90a73dd9b8d0ab1. This change broke emscripten tests, I believe because it started generating 5-byte a wide table index in the call_indirect instruction. Neither v8 nor wabt seem to be able to handle that. The spec currently says that this is single 0x0 byte and: "In future versions of WebAssembly, the zero byte occurring in the encoding of the call_indirectcall_indirect instruction may be used to index additional tables." So we need to revisit this change. For backwards compat I guess we need to guarantee that __indirect_function_table is always at address zero. We could also consider making this a single-byte relocation with and assert if have more than 127 tables (for now). Differential Revision: https://reviews.llvm.org/D95005	2021-01-19 15:06:07 -08:00
Craig Topper	6d7ba9369f	[RISCV] Remove NotHasStdExtZbb predicate from zext.h/sext.b/sext.h InstAliases. NFC NotHasStdExtZbb doesn't have an AssemblerPredicate associated with it so it didn't do anything. We don't need it either because the sorting rules in tablegen prioritize by number of predicates. So the dedicated instructions in the B extension that have predicates will be prioritized automatically.	2021-01-19 14:31:48 -08:00
Arthur Eubanks	19e4267ba4	[polly][NewPM][test] Fix polly tests under -enable-new-pm In preparation for turning on opt's -enable-new-pm by default, this pins uses of passes via the legacy "opt -passname" with pass names beginning with "polly-" and "polyhedral-info" to the legacy PM. Many of these tests use -analyze, which isn't supported in the new PM. (This doesn't affect uses of "opt -passes=passname"). rL240766 accidentally removed `-polly-prepare` in phi_not_grouped_at_top.ll, and it also doesn't use the output of -analyze. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D94266	2021-01-19 12:38:58 -08:00
Mircea Trofin	58819de749	[NFC] Disallow unused prefixes under Other Differential Revision: https://reviews.llvm.org/D94853	2021-01-19 12:22:29 -08:00
Alexey Bataev	13011c3c3f	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit 438682de6a38ac97f89fa38faf5c8dc9b09cd9ad to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Jeroen Dobbelaere	6f9b7491b6	[NFC] cleanup noalias2.ll test D75825 and D75828 modified llvm/test/Transforms/Inline/noalias2.ll to handle llvm.assume. The checking though was broken. The NO_ASSUME has been replaced by a normal CHECK; the ASSUME rules were never triggered and have been removed. The test checks have been regenerated. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94978	2021-01-19 20:47:02 +01:00
Mitch Phillips	c784193edd	Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it" This reverts commit 6529d7c5a45b1b9588e512013b02f891d71bc134. Reason: Broke the ASan buildbots. http://lab.llvm.org:8011/#/builders/99/builds/1567	2021-01-19 11:45:48 -08:00
Jonas Devlieghere	ef10d9ddc8	[llvm] Protect signpost map with a mutex Use a mutex to protect concurrent access to the signpost map. This fixes nondeterministic crashes in LLDB that appeared after using signposts in the timer implementation. Differential revision: https://reviews.llvm.org/D94285	2021-01-19 11:41:54 -08:00
Mariya Podchishchaeva	9a03068d26	[ScalarizeMaskedMemIntrin] Add missing dependency The pass has dependency on 'TargetTransformInfoWrapperPass', but the corresponding call to INITIALIZE_PASS_DEPENDENCY was missing. Differential Revision: https://reviews.llvm.org/D94916	2021-01-19 22:33:47 +03:00
Nikita Popov	54a7845eef	Reapply [InstCombine] Replace one-use select operand based on condition Relative to the original change, this adds a check that the instruction on which we're replacing operands is safe to speculatively execute, because that's what we're effectively doing. We're executing the instruction with the replaced operand, which is fine if it's pure, but not fine if can cause side-effects or UB (aka is not speculatable). Additionally, we cannot (generally) replace operands in phi nodes, as these may refer to a different loop iteration. This is also covered by the speculation check. ----- InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-19 20:26:38 +01:00
Nikita Popov	27bab255e6	[InstCombine] Add additional tests for select operand replacement (NFC) In particular, add tests for speculatable and non-speculatable instructions.	2021-01-19 20:26:38 +01:00
Craig Topper	14aa520218	[RISCV] Add DAG combine to turn (setcc X, 1, setne) -> (setcc X, 0, seteq) if we can prove X is 0/1. If we are able to compare with 0 instead of 1, we might be able to fold the setcc into a beqz/bnez. Often these setccs start life as an xor that gets converted to a setcc by DAG combiner's rebuildSetcc. I looked into a detecting (xor X, 1) and converting to (seteq X, 0) based on boolean contents being 0/1 in rebuildSetcc instead of using computeKnownBits. It was very perturbing to AMDGPU tests which I didn't look closely at. It had a few changes on a couple other targets, but didn't seem to be much if any improvement. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D94730	2021-01-19 11:21:48 -08:00
Jeroen Dobbelaere	116cd71f2c	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Brendon Cahoon	ccb1095e5c	[Hexagon] Fix segment start to adjust for gaps between segments The Hexagon Vector Combine pass genertes stores for a complete aligned vector. The start of each section is a multiple of the vector size, so that value is passed to normalize to compute the offset of the stores in the section. The first store may not occur at offset 0 when there is a gap between sections.	2021-01-19 12:49:39 -06:00
Jay Foad	9f5dab7186	[AMDGPU] Simpler names for arch-specific ttmp registers. NFC. Rename the _gfx9_gfx10 ttmp registers to _gfx9plus for simplicity, and use the corresponding isGFX9Plus predicate to decide when to use them instead of the old *_vi versions. Differential Revision: https://reviews.llvm.org/D94975	2021-01-19 18:47:14 +00:00
Jessica Paquette	05fff88674	Fix buildbot after cfc60730179042a93cb9cb338982e71d20707a24 Windows buildbots were not happy with using find_if + instructionsWithoutDebug. In cfc60730179042a9, instructionsWithoutDebug is not technically necessary. So, just iterate over the block directly. http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio	2021-01-19 10:38:04 -08:00
Jessica Paquette	c4d2d8a4de	[GlobalISel] Combine (a[0]) \| (a[1] << k1) \| ...\| (a[m] << kn) into a wide load This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861) This tries to recognize patterns like below (assuming a little-endian target): ``` s8* x = ... s32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) -> s32 val = ((i32)a) s8 x = ... s32 val = a[3] \| (a[2] << 8) \| (a[1] << 16) \| (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ``` (This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.) To recognize the pattern, this searches from the last G_OR in the expression tree. E.g. ``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ``` Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic. If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.) To simplify things, this patch (1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll) At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark. Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block. Differential Revision: https://reviews.llvm.org/D94350	2021-01-19 10:24:27 -08:00
Fraser Cormack	00b92c78b2	[RISCV] Add ISel patterns for scalable mask exts & truncs Original patch by @rogfer01. This patch adds support for sign-, zero-, and any-extension from scalable mask vector types to integer vector types, as well as truncation in the opposite direction. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Fraser Cormack <fraser@codeplay.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94590	2021-01-19 18:13:15 +00:00
Abhina Sreeskantharajan	74052a31cc	[SystemZ][z/OS] Fix Permission denied pattern matching On z/OS, the error message "EDC5111I Permission denied." is not matched correctly in lit tests. This patch updates the check expression to match successfully. Differential Revision: https://reviews.llvm.org/D94432	2021-01-19 13:05:52 -05:00
David Green	04f2bf7a46	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
Nikita Popov	d56d32f92d	[ValueTracking] Strengthen impliesPoison reasoning Split impliesPoison into two recursive walks, one over V, the other over ValAssumedPoison. This allows us to reason about poison implications in a number of additional cases that are important in practice. This is a generalized form of D94859, which handles the cmp to cmp implication in particular. Differential Revision: https://reviews.llvm.org/D94866	2021-01-19 18:04:23 +01:00
Jay Foad	a8ed61d9be	[AMDGPU] Fix test case for D94010	2021-01-19 16:46:47 +00:00
Jay Foad	911caa6874	[AMDGPU] Simplify test case for D94010	2021-01-19 16:36:43 +00:00
Fraser Cormack	8876d555ae	[RISCV] Extend RVV VType info with the type's AVL (NFC) This patch factors out the "VLMax" operand passed to most scalable-vector ISel patterns into a property of each VType. This is seen as a preparatory change to allow RVV in the future to more easily support fixed-length vector types with constrained vector lengths, with the AVL operand set to the length of the fixed-length vector. It has no effect on the scalable code generation path. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D94594	2021-01-19 15:46:56 +00:00
David Green	c7d5648431	[ARM] Add MVE add.sat costs This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs, based on when the instruction is legal. With smaller than legal types that are promoted we generate shr(qadd(shl, shl)), so the cost is 4 appropriately. Differential Revision: https://reviews.llvm.org/D94958	2021-01-19 15:38:46 +00:00
Valentin Clement	715d713b9e	[flang][directive] Get rid of flangClassValue in TableGen The TableGen emitter for directives has two slots for flangClass information and this was mainly to be able to keep up with the legacy openmp parser at the time. Now that all clauses are encapsulated in AccClause or OmpClause, these two strings are not necessary anymore and were the the source of couple of problem while working with the generic structure checker for OpenMP. This patch remove the flangClassValue string from DirectiveBase.td and use the string flangClass as the placeholder for the encapsulated class. Reviewed By: sameeranjoshi Differential Revision: https://reviews.llvm.org/D94821	2021-01-19 10:28:46 -05:00
Victor Huang	c88264fd68	[PowerPC] Fix the check for the instruction using FRSP/XSRSP output register When performing peephole optimization to simplify the code, after removing passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the source of the FRSP/XSRSP. We are finding the machine instruction using virtual register holding FRSP/XSRSP results by searching all following instructions and encountering an issue that the first use of the virtual register is a debug MI causing: 1. virtual register in the debug MI removed unexpectedly. 2. virtual register used in non-debug MI not replaced with the source of FRSP/XSRSP. which stays in a undef status. This patch fix the issue by only searching non-debug machine instruction using virtual register holding FRSP/XSRSP results when the vr only has one non debug usage. Differential Revisien: https://reviews.llvm.org/D94711 Reviewed by: nemanjai	2021-01-19 09:20:03 -06:00
Raul Tambre	732857e164	[CMake] Remove dead code setting policies to NEW cmake_minimum_required(VERSION) calls cmake_policy(VERSION), which sets all policies up to VERSION to NEW. LLVM started requiring CMake 3.13 last year, so we can remove a bunch of code setting policies prior to 3.13 to NEW as it no longer has any effect. Reviewed By: phosek, #libunwind, #libc, #libc_abi, ldionne Differential Revision: https://reviews.llvm.org/D94374	2021-01-19 17:19:36 +02:00
David Green	218cbd9ab8	[ARM] Expand add.sat/sub.sat cost checks. NFC	2021-01-19 15:06:06 +00:00
Florian Hahn	46a4268cbe	[LoopRotate] Calls not lowered to calls should not block rotation. 83daa49758a1 made loop-rotate more conservative in the presence of function calls in the prepare-for-lto stage. The code did not properly account for calls that are no actual function calls, like calls to intrinsics. This patch updates the code to ensure only calls that are lowered to actual calls are considered inline candidates.	2021-01-19 14:37:36 +00:00
Simon Pilgrim	3e379c7903	[X86] Regenerate fmin/fmax reduction tests Add missing check-prefixes + v1f32 tests	2021-01-19 14:28:44 +00:00
Tim Northover	ed1f4159c7	AArch64: add apple-a14 as a CPU This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to Clang. A bit weird, but the best way for things like xnu to detect the new features it cares about.	2021-01-19 14:04:53 +00:00
Hans Wennborg	12b7677de1	[ThinLTO] Also prune Thin-* files from the ThinLTO cache Such files (Thin-%%%%%%.tmp.o) are supposed to be deleted immediately after they're used (either by renaming or deletion). However, we've seen instances on Windows where this doesn't happen, probably due to the filesystem being flaky. This is effectively a resource leak which has prevented us from using the ThinLTO cache on Windows. Since those temporary files are in the thinlto cache directory which we prune periodically anyway, allowing them to be pruned too seems like a tidy way to solve the problem. Differential revision: https://reviews.llvm.org/D94962	2021-01-19 14:43:49 +01:00
Med Ismail Bennani	3496691e63	[llvm/Orc] Fix ExecutionEngine module build breakage This patch updates the llvm module map to reflect changes made in `24672ddea3c97fd1eca3e905b23c0116d7759ab8` and fixes the module builds (`-DLLVM_ENABLE_MODULES=On`). Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>	2021-01-19 14:39:06 +01:00
Caroline Concatto	90b2e7be56	[AArch64][SVE]Add cost model for vector reduce for scalable vector This patch computes the cost for vector.reduce<operand> for scalable vectors. The cost is split into two parts: the legalization cost and the horizontal reduction. Differential Revision: https://reviews.llvm.org/D93639	2021-01-19 11:54:16 +00:00
Simon Pilgrim	6007680bfe	[X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c)) If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.	2021-01-19 11:04:13 +00:00
Hans Wennborg	78cd25d7f9	Revert 5238e7b302 "[InstCombine] Replace one-use select operand based on condition" This caused a miscompile in Chromium, see comments on the codereview for discussion and pointer to a reproducer. > InstCombine already performs a fold where X == Y ? f(X) : Z is > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, > if f(X) only has one use, then we can always directly replace the > use inside the instruction. To actually be profitable, limit it to > the case where Y is a non-expr constant. > > This could be further extended to replace uses further up a one-use > instruction chain, but for now this only looks one level up. > > Among other things, this also subsumes D94860. > > Differential Revision: https://reviews.llvm.org/D94862 This also reverts the follow-up a003f26539cf4db744655e76c41f4c4a8913f116: > [llvm] Prevent infinite loop in InstCombine of select statements > > This fixes an issue where the RHS and LHS the comparison operation > creating the predicate were swapped back and forth forever. > > Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 11:50:56 +01:00
Jay Foad	a816310ce3	[AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC. Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00	2021-01-19 10:39:56 +00:00
Florian Hahn	d596025713	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Yvan Roux	178bc607c9	[ARM][MachineOutliner] Add stack fixup feature This patch handles cases where we have to save/restore the link register into the stack and and load/store instruction which use the stack are part of the outlined region. It checks that there will be no overflow introduced by the new offset and fixup these instructions accordingly. Differential Revision: https://reviews.llvm.org/D92934	2021-01-19 10:59:09 +01:00
Fraser Cormack	d6f7a6374a	[RISCV] Add scalable-vector integer extension patterns Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94694	2021-01-19 09:30:36 +00:00
Tres Popp	f0a947fd0e	[llvm] Prevent infinite loop in InstCombine of select statements This fixes an issue where the RHS and LHS the comparison operation creating the predicate were swapped back and forth forever. Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 10:31:48 +01:00
serge-sans-paille	1734c4008a	[lit] Harmonize lit and llvm versionning In addition to consistency, we'll hit a wall when 11.1.0 gets released, because we cannot represent it with lit versioning scheme. Differential Revision: https://reviews.llvm.org/D94157	2021-01-19 10:27:14 +01:00
Lang Hames	3c8ce89cc8	[ORC] Move LookupRequest from OrcShared to Orc. It depends on Orc types (SymbolLookupSet), so can't be part of OrcShared.	2021-01-19 20:23:47 +11:00
Tres Popp	29b8ccfd0c	[llvm][nvptx] add atomicity to counter in ISelLowering Previously uniqueCallSite could have race conditions between different threads. Now it is accessed with an atomic RMW and will be unique between different threads. Differential Revision: https://reviews.llvm.org/D94784	2021-01-19 10:20:20 +01:00
David Sherwood	6c0762e51e	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00

1 2 3 4 5 ...

209883 Commits