llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00

Author	SHA1	Message	Date
Max Kazantsev	5f3ecc5640	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"" This reverts commit 43d2e51c2e86788b9e2a582fdd3d8ffa7829328a. Commited wrong version.	2021-05-26 19:29:07 +07:00
Max Kazantsev	5577bed3a9	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:23:21 +07:00
Andrew Savonichev	44a60c9c6a	[AArch64] Generate LD1 for anyext i8 or i16 vector load The existing LD1 patterns do not cover cases where result type does not match the memory type. This happens when illegal vector types are extended and scalarized, for example: load <2 x i16>* %v2i16 is lowered into: // first element (v4i32 (insert_subvector (v2i32 (scalar_to_vector (load anyext from i16))))) // other elements (v4i32 (insert_vector_elt (i32 (load anyext from i16)) idx)) Before this patch these patterns were compiled into LDR + INS. Now they are compiled into LD1. The problem was reported in PR24820: LLVM Generates abysmal code in simple situation. Differential Revision: https://reviews.llvm.org/D102938	2021-05-26 14:44:21 +03:00
Max Kazantsev	65b3a9381f	[Test] Add Loop Deletion test with irreducible CFG Authored by Mikael Holmén. It demonstrated miscompile on irreducible CFG with patch "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration". The patch is reverted. Checking in the test to make sure this bug does not return.	2021-05-26 18:40:14 +07:00
Tomas Matheson	d1c0dd082f	[MC] Move elf-unique-sections-by-flags.ll to X86/	2021-05-26 12:28:17 +01:00
pooja2299	e5cec4b7a2	[Docs] Updated the content of getting started documentation under llvm/lib/MC Wrote about llvm/lib/MC subproject on https://llvm.org/docs/GettingStarted.html page. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D101047	2021-05-26 16:25:26 +05:30
Tomas Matheson	79405b0d62	[MC][ELF] Emit unique sections for different flags Global values imply flags such as readable, writable, executable for the sections that they will be placed in. Currently MC places all such entries into the same section, using the first set of flags seen. This can lead to situations in LTO where a writable global is placed in the same named section as a readable global from another file, and the section may not be marked writable. D72194 ensures that mergeable globals with explicit sections are placed in separate sections with compatible entry size, by emitting the `unique` assembly syntax where appropriate. This change extends that approach to include section flags, so that globals with different section flags are emitted in separate unique sections. Differential revision: https://reviews.llvm.org/D100944	2021-05-26 11:51:29 +01:00
Tomas Matheson	4b71dc385f	[MC][NFCI] Factor out ELF section unique ID calculation Precursor to D100944. The logic for determining the unique ID had become quite difficult to reason about, so I have factored this out into a separate function. Differential Revision: https://reviews.llvm.org/D102336	2021-05-26 11:51:29 +01:00
Simon Pilgrim	6128b86b1d	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Florian Hahn	dc7c658d73	[SCEV] Add tests with signed predicates for applyLoopGuards.	2021-05-26 11:10:11 +01:00
Kerry McLaughlin	fe86cc6dc2	[NFC] Add CHECK lines for unordered FP reductions An additional RUN line has been added to both strict-fadd.ll & scalable-strict-fadd.ll to ensure the correct behaviour of these tests where `-enable-strict-reductions` is false. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D103015	2021-05-26 11:00:20 +01:00
Mirko Brkusanin	f5f0329306	[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail. Differential Revision: https://reviews.llvm.org/D98515	2021-05-26 11:57:41 +02:00
Mirko Brkusanin	7eda2c55b2	Revert "[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks" This reverts commit 18c5444702893fd63b0a99ec7133dd714284f9d2.	2021-05-26 11:57:41 +02:00
Fraser Cormack	691b0e7fba	[RISCV] Pre-commit fixed-length mask vselect tests These are default-expanded but later unrolled due to RISC-V's vector boolean content policy. A patch to improve this codegen will follow shortly.	2021-05-26 10:44:45 +01:00
Max Kazantsev	35b10351c0	[Test] Add simplified versions of tests for loop deletion that don't need context	2021-05-26 16:39:00 +07:00
Tim Northover	22efee723c	AArch64: support post-indexed stores to bfloat types.	2021-05-26 10:35:52 +01:00
Simon Pilgrim	215df0a901	[CostModel][X86] Remove old testshift* tests The vector shift cost tests are better covered (more cpu/sse levels) by the vshift--cost files, and we're trying to avoid codegen tests in here as it makes it harder to maintain the test files.	2021-05-26 10:31:00 +01:00
Simon Pilgrim	f18fae2383	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Max Kazantsev	2adf59415e	[Test] Add test on unrolling to make sure it won't fail Initially it failed an assertion with "Do actual DCE in LoopUnroll (try 2)" which was later reverted. Make sure that when this patch is returned, the test works fine.	2021-05-26 16:30:41 +07:00
Roman Lebedev	395eb3f2e5	[NFC][X86] clang-format X86TTIImpl::getInterleavedMemoryOpCostAVX2() I plan to make changes to it, and undoing formatting each time is not going to be fun.	2021-05-26 12:27:47 +03:00
David Sherwood	8526f9ba5b	Fix warning introduced by 9c766f4090d19e3e2f56e87164177f8c3eba4b96	2021-05-26 10:20:39 +01:00
David Sherwood	8067824e3d	[InstCombine] Fold extractelement + vector GEP with one use We sometimes see code like this: Case 1: %gep = getelementptr i32, i32* %a, <2 x i64> %splat %ext = extractelement <2 x i32> %gep, i32 0 or this: Case 2: %gep = getelementptr i32, <4 x i32> %a, i64 1 %ext = extractelement <4 x i32> %gep, i32 0 where there is only one use of the GEP. In such cases it makes sense to fold the two together such that we create a scalar GEP: Case 1: %ext = extractelement <2 x i64> %splat, i32 0 %gep = getelementptr i32, i32 %a, i64 %ext Case 2: %ext = extractelement <2 x i32> %a, i32 0 %gep = getelementptr i32, i32 %ext, i64 1 This may create further folding opportunities as a result, i.e. the extract of a splat vector can be completely eliminated. Also, even for the general case where the vector operand is not a splat it seems beneficial to create a scalar GEP and extract the scalar element from the operand. Therefore, in this patch I've assumed that a scalar GEP is always preferrable to a vector GEP and have added code to unconditionally fold the extract + GEP. I haven't added folds for the case when we have both a vector of pointers and a vector of indices, since this would require generating an additional extractelement operation. Tests have been added here: Transforms/InstCombine/gep-vector-indices.ll Differential Revision: https://reviews.llvm.org/D101900	2021-05-26 09:54:26 +01:00
Esme-Yi	b08bc2bd30	[NFC][object] Change the input parameter of the method isDebugSection. Summary: This is a NFC patch to change the input parameter of the method SectionRef::isDebugSection(), by replacing the StringRef SectionName with DataRefImpl Sec. This allows us to determine if a section is debug type in more ways than just by section name. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D102601	2021-05-26 08:47:53 +00:00
David Green	75476d7b62	[ARM] Add patterns for vmulh Now that vmulh can be selected, this adds the MVE patterns to make it legal and generate instructions. Differential Revision: https://reviews.llvm.org/D88011	2021-05-26 09:22:12 +01:00
LLVM GN Syncbot	4876ff1ec7	[gn build] Port 36d0fdf9ac3b	2021-05-26 04:31:12 +00:00
Arthur Eubanks	9085f7d6c9	[OpaquePtr] Make atomicrmw work with opaque pointers FullTy is only necessary when we need to figure out what type an instruction works with given a pointer's pointee type. However, we just end up using the value operand's type, so FullTy isn't necessary. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102788	2021-05-25 20:16:21 -07:00
Teresa Johnson	9d60059fb7	[LTT] Handle merged llvm.assume when dropping type tests When the lower type test pass is invoked a second time with DropTypeTests set to true, it expects that all remaining type tests feed assume instructions, which are removed along with the type tests. In some cases the llvm.assume might have been merged with another one, i.e. from a builtin_assume instruction, in which case the type test would actually feed a phi that in turn feeds the merged assume instruction. In this case we can simply replace that operand of the phi with "true" before removing the type test. Differential Revision: https://reviews.llvm.org/D103073	2021-05-25 17:02:13 -07:00
Arthur Eubanks	a1a83c59b0	[OpaquePtr] Create new bitcode encoding for atomicrmw Since the opaque pointer type won't contain the pointee type, we need to separately encode the value type for an atomicrmw. Emit this new code for atomicrmw. Handle this new code and the old one in the bitcode reader. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103123	2021-05-25 16:30:34 -07:00
Kevin Athey	671838d601	LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode. Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102462	2021-05-25 16:17:39 -07:00
Alexandre Ganea	27303db33a	[benchmark] Silence 'suggest override' and 'missing override' warnings When building with Clang 11 on Windows, silence the following: F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(955,8): warning: 'Run' overrides a member function but is not marked 'override' [-Wsuggest-override] void Run(State& st); ^ F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(895,16): note: overridden virtual function is here virtual void Run(State& state) = 0; ^ 1 warning generated.	2021-05-25 18:46:37 -04:00
David Green	21ada5fffa	[ARM] Extra predicated tests for VMULH. NFC	2021-05-25 22:24:06 +01:00
Fangrui Song	875f91ef4b	[Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member Beside the `comdat any` deduplication feature, instrumentations use comdat to establish dependencies among a group of sections, to prevent section based linker garbage collection from discarding some members without discarding all. LangRef acknowledges this usage with the following wording: > All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated `__llvm_prf_data` section are placed in the same GRP_COMDAT group. A `__llvm_prf_data` is usually not referenced and expects the liveness of its associated `__llvm_prf_cnts` to retain it. The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained). The main goal of this patch is to fix the dependency relationship. I think it makes sense for InternalizePass to internalize a comdat and thus suppress the deduplication feature, e.g. a relocatable link of a regular LTO can create an object file affected by InternalizePass. If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the a.o references to the comdat definitions will be non-resolvable (references cannot bind to STB_LOCAL definitions in b.o). On PE-COFF, for a non-external selection symbol, deduplication is naturally suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend to believe the spec creator has not thought about this use case (see D102973). GNU ld and gold are still using the "signature is name based" interpretation. So even if D102973 for ld.lld is accepted, for portability, a better approach is to rename the comdat. A comdat with one single member is the common case, leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by deleting the comdat; otherwise we rename the comdat. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103043	2021-05-25 14:15:27 -07:00
Matt Morehouse	86272fe99b	Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" This reverts commit 2531fd70d19aa5d61feb533bbdeee7717a4129eb due to performance regression on the PPC buildbot.	2021-05-25 13:58:42 -07:00
Martin Storsjö	77c89471fd	[docs] [CMake] Change recommendations for how to use LLVM_DEFINITIONS LLVM_DEFINITIONS is a string variable containing a list of arguments to pass to the compiler. When CMake's add_definitions is passed a string variable, this is interpreted as one argument. To make it behave properly, the string variable needs to be split into a list. Despite the fact that add_definitions isn't supposed to be used like the LLVM docs recommended, it worked fine in practice in many cases. If the first argument in LLVM_DEFINITIONS is of the form -DFOO=42 instead of plain -DFOO, the rest of the string is treated as value to this define. I.e. if LLVM_DEFINITIONS consists of `-DFOO=42 -DBAR`, CMake ended up passing `-DFOO="42 -DBAR"` to the compiler. See https://gitlab.kitware.com/cmake/cmakissues/22162 for discussion on the matter. Changing LLVM_DEFINITIONS to be a list variable would possibly be more disruptive; instead keep the variable defined as before but change the recommendation for how to use it. Then projects using it can gradually be updated to follow the new recommendation. Differential Revision: https://reviews.llvm.org/D103044	2021-05-25 22:56:51 +03:00
Krzysztof Parzyszek	a0acaf9a48	[Hexagon] Remove unused function from HexagonISelDAGToDAGHVX.cpp It will be reintroduced shortly with an actual use. This change is simply to eliminate a compilation warning.	2021-05-25 14:47:15 -05:00
Arthur Eubanks	c0b1fddf31	[docs] Explain address spaces a bit more in opaque pointers doc Reviewed By: theraven Differential Revision: https://reviews.llvm.org/D102523	2021-05-25 12:35:43 -07:00
Stanislav Mekhanoshin	ac5c5988ca	[AMDGPU] Fix unused variable warning. NFC.	2021-05-25 12:32:28 -07:00
Vitaly Buka	27ccb6a338	[NFC] Fix 'unused' warning	2021-05-25 12:23:57 -07:00
Lang Hames	6214a6b655	[JITLink][MachO][arm64] Build GOT entries for defined symbols too. During the generic x86-64 support refactor in ecf6466f01c52 the implementation of MachO_arm64_GOTAndStubsBuilder::isGOTEdgeToFix was altered to only return true for external symbols. This behavior is incorrect: GOT entries may be required for defined symbols (e.g. in the large code model). This patch fixes the bug and adds a test case for it (renaming an old test case to avoid any ambiguity).	2021-05-25 12:19:09 -07:00
Lang Hames	fc88d818ee	[JITLink][MachO][arm64] Use a more descriptive test name.	2021-05-25 12:19:08 -07:00
Benjamin Kramer	3d36005d3c	[Matrix] Use LLVM_DEBUG for a debug flag dump() doesn't exist in release builds. ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by LowerMatrixIntrinsics.cpp >>> LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())	2021-05-25 21:10:19 +02:00
Nikita Popov	a348a4b365	[SCEV] Cache operands used in BEInfo (NFC) When memoized values for a SCEV expressions are dropped, we also drop all BECounts that make use of the SCEV expression. This is done by iterating over all the ExitNotTaken counts and (recursively) checking whether they use the SCEV expression. If there are many exits, this will take a lot of time. This patch improves the situation by pre-computing a set of all used operands, so that we can determine whether a certain BEInfo needs to be invalidated using a simple set lookup. Will still need to loop over all BEInfos though. This makes for a mild improvement on non-degenerate cases: https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384, for n=128 I'm seeing run time drop from 1.6s to 1.1s. Differential Revision: https://reviews.llvm.org/D102796	2021-05-25 21:03:33 +02:00
LLVM GN Syncbot	51ec30d56f	[gn build] Port 33706191d88d	2021-05-25 18:58:50 +00:00
Nikita Popov	b926818b2f	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Michael Liao	1d1d6e0b78	[SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics. - When memory intrinsics, such as memcpy, the attached scoped AA metadata is not passed down to the backend. As a result, the backend cannot schedule relevant memory operations around them following that hint. In this patch, SelectionDAG is enhanced to propagate that metadata (scoped AA only) when they are lowered into loads and stores. Differential Revision: https://reviews.llvm.org/D102215	2021-05-25 14:42:26 -04:00
Michael Liao	7a35f0efe7	Add pre-commit tests for [D102215](https://reviews.llvm.org/D102215 ).	2021-05-25 14:42:25 -04:00
Stanislav Mekhanoshin	b8b00b1711	[AMDGPU] Lower kernel LDS into a sorted structure Differential Revision: https://reviews.llvm.org/D102954	2021-05-25 11:29:29 -07:00
Sanjay Patel	831c97ae39	[InstSimplify] allow undef element match in vector select condition value The semantics of select with undefined/poison condition are not explicitly stated in the LangRef, but this matches comments in the code and Alive2 appears to concur: https://alive2.llvm.org/ce/z/KXytmd We can find this pattern after demanded elements transforms. As noted in D101191, fuzzers are finding infinite loops because we may not account for this pattern in other passes.	2021-05-25 14:25:34 -04:00
Adam Nemet	e8d17ebfac	[Matrix] Factor and distribute transposes across multiplies Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733	2021-05-25 11:12:20 -07:00
Roman Lebedev	c7aa0b49d1	[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: https://github.com/AliveToolkit/alive2/issues/726	2021-05-25 21:02:28 +03:00

1 2 3 4 5 ...

216417 Commits