llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-21 18:22:53 +01:00

Author	SHA1	Message	Date
Bradley Smith	28d769100d	Workaround incorrect types when lowering fixed length gather/scatter When lowering a fixed length gather/scatter the index type is assumed to be the same as the memory type, this is incorrect in cases where the extension of the index has been folded into the addressing mode. For now add a temporary workaround to fix the codegen faults caused by this by preventing the removal of this extension. At a later date the lowering for SVE gather/scatters will be redesigned to improve the way addressing modes are handled. As a short term side effect of this change, the addressing modes generated for fixed length gather/scatters will not be optimal. Differential Revision: https://reviews.llvm.org/D109145 (cherry picked from commit 14e1a4a6eef2fb95ec852c9ddfc597f80bba3226)	2021-09-09 09:05:58 -07:00
Bjorn Pettersson	b37f5f2114	Inform pass manager when child loops are deleted As part of the nontrivial unswitching we could end up removing child loops. This patch add a notification to the pass manager when that happens (using the markLoopAsDeleted callback). Without this there could be stale LoopAccessAnalysis results cached in the analysis manager. Those analysis results are cached based on a Loop* as key. Since the BumpPtrAllocator used to allocate Loop objects could be resetted between different runs of for example the loop-distribute pass (running on different functions), a new Loop object could be created using the same Loop pointer. And then when requiring the LoopAccessAnalysis for the loop we got the stale (corrupt) result from the destroyed loop. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D109257 (fixes PR51754) (cherry-picked from commit 0f0344dd1e3b53387bb396070916e67f4c426da6)	2021-09-09 09:04:59 -07:00
serge-sans-paille	601c2dd9dd	Fine grain control over some symbol visibility Setting -fvisibility=hidden when compiling Target libs has the advantage of not being intrusive on the codebase, but it also sets the visibility of all functions within header-only component like ADT. In the end, we end up with some symbols with hidden visibility within llvm dylib (through the target libs), and some with external visibility (through other libs). This paves the way for subtle bugs like https://reviews.llvm.org/D101972 This patch explicitly set the visibility of some classes to `default` so that `llvm::Any` related symbols keep a `default` visibility. Indeed a template function with `default` visibility parametrized by a type with `hidden` visibility is granted `hidden` visibility, and we don't want this for the uniqueness of `llvm::Any::TypeId`. Differential Revision: https://reviews.llvm.org/D108943	2021-09-08 21:06:19 -07:00
Cullen Rhodes	5f6ef6fbfd	[AArch64][SME] Fix imm bug in mov vector to tile aliases Also fixes a warning mentioned in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109363 (cherry picked from commit 89786c2b992c3cb4c4a230542d2af34ec2915a08)	2021-09-08 20:47:08 -07:00
Chen Zheng	921995afd5	Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount." This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and is not a right patch according to comments in D91724 This reverts commit 42eaf4fe0adef3344adfd9fbccd49f325cb549ef. (cherry picked from commit 34badc409cc452575c538c4b6449546adc38f121)	2021-09-08 20:46:17 -07:00
Jonas Paulsson	d21237cb11	[SelectionDAGBuilder] Bugfix in visitInlineAsm() In case of a virtual register tied to a phys-def, the register class needs to be computed. Make sure that this works generally also with fast regalloc by using TLI.getRegClassFor() whenever possible, and make only the case of 'Untyped' use getMinimalPhysRegClass(). Fixes https://bugs.llvm.org/show_bug.cgi?id=51699. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109291 (cherry picked from commit 118997d8e931dcb4c6e972611a7e4febcc33a061)	2021-09-08 14:03:50 -07:00
Maksim Panchenko	64f4a52b52	[llvm-objdump] Fix 'llvm-objdump -dr' for executables with relocations Print relocations interleaved with disassembled instructions for executables with relocatable sections, e.g. those built with "-Wl,-q". Differential Revision: https://reviews.llvm.org/D109016 (cherry picked from commit 6300e4ac5806c9255c68c6fada37b2ce70efc524)	2021-09-08 08:45:39 -07:00
Hans Wennborg	09c230ecec	Add llvm-ml to LLVM_TOOLCHAIN_TOOLS (PR50536) so that it gets installed in LLVM_INSTALL_TOOLCHAIN_ONLY builds, such as used by the Windows installer. Differential revision: https://reviews.llvm.org/D109358 (cherry picked from commit c364dcbf1fd81c6291e935564fce2d9ebb97a3d0)	2021-09-08 08:44:45 -07:00
David Truby	e0d7c39869	[AArch64][sve] Prevent incorrect function call on fixed width vector The isEssentiallyExtractHighSubvector function currently calls getVectorNumElements on a type that in specific cases might be scalable. Since this function only has correct behaviour at the moment on scalable types anyway, the function can just return false when given a fixed type. Differential Revision: https://reviews.llvm.org/D109163 (cherry picked from commit b297531ece896fb9ec36f001a74aef144082602b)	2021-09-08 06:09:19 -07:00
Nikita Popov	3c59cf5aa7	[SCEV] Fix applyLoopGuards() with range check idiom (PR51760) Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3)) rather than umax(C1, umin(C2, %x)). This didn't make a difference for the existing tests, because the result is only used for range calculation, and %x will usually have an unknown starting range, and the additional offset keeps it unknown. However, if %x already has a known range, we may compute a result range that is too small. (cherry picked from commit 8d54c8a0c3d7d4a50186ae7087780c6082e5bb46)	2021-09-07 22:34:39 -07:00
Sanjay Patel	8830999b87	[DAGCombine] Prevent the transform of combine for multi-use operand The test is based on a miscompile example in: https://llvm.org/PR51321 Differential Revision: https://reviews.llvm.org/D107692 (cherry picked from commit e1e4bf174b09bcd4b25cd624f177537890bff785)	2021-09-07 22:33:53 -07:00
Yunde Zhong	6d085fa2f5	[tests] precommit tests for D107692 (cherry picked from commit 9790a2a72f60bb2caf891658c3c6a02b61e1f1a2)	2021-09-07 22:33:46 -07:00
Andrzej Warzynski	bf02d31357	[docs] Update release notes with items related to Flang Differential Revision: https://reviews.llvm.org/D109317	2021-09-06 14:48:43 +01:00
Joachim Protze	12537ac90c	[libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available In some build configurations, the target we depend on is not available for declaring the build dependency. We only need to declare the build dependency, if the build target is available in the same build. Fixes the issue raised in https://reviews.llvm.org/D107156#2969862 This patch should go into release/13 together with D108404 Differential Revision: https://reviews.llvm.org/D108868 (cherry picked from commit 5ea1c37118699f0ed1da17e0d8562011d0002edd)	2021-09-03 15:49:42 -07:00
Joachim Protze	4b60833c5b	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404 (cherry picked from commit 4bb36df144127c5bee6ea2607bc544c003aae446)	2021-09-03 15:49:31 -07:00
Fraser Cormack	ba85498148	[RISCV] Fix reporting of incorrect commutable operand indices This patch fixes an issue where RISCV's `findCommutedOpIndices` would incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable operand index, rather than fixing a specific index. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D108206 (cherry picked from commit 5b06cbac11e53ce55f483c1852a108012507a6bb)	2021-09-03 15:48:26 -07:00
Stefan Gränitz	bf7908ca9b	[Orc] Enable debug object tests only on x86_64 hosts These tests rely on running IR code with an explicit x86_64 target triple. They won't work on other architectures. (They won't work for 32-bit processes on x86_64 hosts either. We will take care of this later.) Differential Revision: https://reviews.llvm.org/D107640 (cherry picked from commit c5ab55f5331c9da3c352b61d10d2f8a470a08b5b)	2021-09-02 14:16:41 -07:00
Petr Hosek	683bdb08a0	[Linker] Support weak symbols in nodeduplicate COMDAT group When a nodeduplicate COMDAT group contains a weak symbol, choose a non-weak symbol (or one of the weak ones) rather than reporting an error. This should address issue PR51394. With the current IR representation, a generic comdat nodeduplicate semantics is not representable for LTO. In the linker, sections and symbols are separate concepts. A dropped weak symbol does not force the defining input section to be dropped as well (though it can be collected by GC). In the IR, when a weak linkage symbol is dropped, its associate section content is dropped as well. For InstrProfiling, which is where ran into this issue in PR51394, the deduplication semantic is a sufficient workaround. Differential Revision: https://reviews.llvm.org/D108689	2021-09-02 14:15:28 -07:00
Arthur Eubanks	e415eb692a	[docs] Mention that the legacy PM is deprecated and will be removed after 14 Per https://lists.llvm.org/pipermail/llvm-dev/2021-August/152305.html. Reviewed By: MaskRay, fhahn Differential Revision: https://reviews.llvm.org/D109080 (cherry picked from commit 2413d6063b788c3abc69072d48afa0b2a6e3583c)	2021-09-01 23:33:00 -07:00
Nikita Popov	fda0edff63	[NewPM] Add missing LTO ArgPromotion pass This is a followup to D96780 to add one more pass missing from the NewPM LTO pipeline. The missing ArgPromotion run is inserted at the same position as in the LegacyPM, resolving the already present FIXME: `16086d47c0/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp (L1096-L1098)` The compile-time impact is minimal with ~0.1% geomean regression on CTMark. Differential Revision: https://reviews.llvm.org/D108866 (cherry picked from commit b28c3b9d9f4292d7779a0e2661d308f1230c6ecd)	2021-09-01 17:37:57 -07:00
Philip Reames	72d2352801	[AlignFromAssume] Bailout w/non-constant alignments (pr51680) This is a bailout for pr51680. This pass appears to assume that the alignment operand to an align tag on an assume bundle is constant. This doesn't appear to be required anywhere, and clang happily generates non-constant alignments for cases such as this case taken from the bug report: // clang -cc1 -triple powerpc64-- -S -O1 opal_pci-min.c extern int a[]; long b; long c; void d(long, int *, int, long, long, long) __attribute__((__alloc_align__(6))); void e() { b = d(c, a, 0, 0, 5, c); b[0] = 0; } This was exposed by a SCEV change which allowed a non-constant alignment to reach further into the pass' code. We could generalize the pass, but for now, let's fix the crash. (cherry picked from commit 9b45fd909ffa754acbb4e927bc2d55c7ab0d4e3f)	2021-09-01 17:36:37 -07:00
Bjorn Pettersson	6d1749e6c0	[SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero When expanding a SMULFIXSAT ISD node (usually originating from a smul.fix.sat intrinsic) we've applied some optimizations for the special case when the scale is zero. The idea has been that it would be cheaper to use an SMULO instruction (if legal) to perform the multiplication and at the same time detect any overflow. And in case of overflow we could use some SELECT:s to replace the result with the saturated min/max value. The only tricky part is to know if we overflowed on the min or max value, i.e. if the product is positive or negative. Unfortunately the implementation has been incorrect as it has looked at the product returned by the SMULO to determine the sign of the product. In case of overflow that product is truncated and won't give us the correct sign bit. This patch is adding an extra XOR of the multiplication operands, which is used to determine the sign of the non truncated product. This patch fixes PR51677. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D108938 (cherry picked from commit 789f01283d52065b10049b58a3288c4abd1ef351)	2021-08-31 20:59:28 -07:00
Nikita Popov	4ddceef928	[WebAssembly] Fix FastISel of condition in different block (PR51651) If the icmp is in a different block, then the register for the icmp operand may not be initialized, as it nominally does not have cross-block uses. Add a check that the icmp is in the same block as the branch, which should be the common case. This matches what X86 FastISel does: `5b6b090cf2/llvm/lib/Target/X86/X86FastISel.cpp (L1648)` The "not" transform that could have a similar issue is dropped entirely, because it is currently dead: The incoming value is a branch or select condition of type i1, but this code requires an i32 to trigger. Fixes https://bugs.llvm.org/show_bug.cgi?id=51651. Differential Revision: https://reviews.llvm.org/D108840 (cherry picked from commit 16086d47c0d0cd08ffae8e69a69c88653e654d01)	2021-08-31 20:58:25 -07:00
Ricky Taylor	8fbe4ddc7c	[M68k] Update pointer data layout Fixes PR51626. The M68k requires that all instruction, word and long word reads are aligned to word boundaries. From the 68020 onwards, there is a performance benefit from aligning long words to long word boundaries. The M68k uses the same data layout for pointers and integers. In line with this, this commit updates the pointer data layout to match the layout already set for 32-bit integers: 32:16:32. Differential Revision: https://reviews.llvm.org/D108792 (cherry picked from commit 8d3f112f0cdbed2311aead86bcd72e763ad55255)	2021-08-31 20:56:41 -07:00
Ricky Taylor	de85b171b7	[M68k][NFC] Rename M68kOperand::Kind to KindTy Rename the M68kOperand::Type enumeration to KindTy to avoid ambiguity with the Kind field when referencing enumeration values e.g. `Kind::Value`. This works around a compilation error under GCC 5, where GCC won't lookup enum class values if you have a similarly named field (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60994). The error in question is: `M68kAsmParser.cpp:857:8: error: 'Kind' is not a class, namespace, or enumeration` Differential Revision: https://reviews.llvm.org/D108723 (cherry picked from commit f659b6b1fa43ffb8c95dbbf767ef57f6e964e7f6)	2021-08-30 21:40:39 -07:00
Fangrui Song	ab550c798f	[CMake] Change -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=off to -DLLVM_ENABLE_NEW_PASS_MANAGER=off LLVM_ENABLE_NEW_PASS_MANAGER is set to ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER, so -DLLVM_ENABLE_NEW_PASS_MANAGER=off has no effect. Change the cache variable to LLVM_ENABLE_NEW_PASS_MANAGER instead. A user opting out the new PM needs to switch from -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=off to -DLLVM_ENABLE_NEW_PASS_MANAGER=off. Also give a warning that -DLLVM_ENABLE_NEW_PASS_MANAGER=off is deprecated. Reviewed By: aeubanks, phosek Differential Revision: https://reviews.llvm.org/D108775 (cherry picked from commit a42bd1b560524905d3b9aebcb658cf6dc9521d26)	2021-08-26 16:40:38 -07:00
Dawid Jurczak	79ff6e4b2d	[LoopIdiom] Don't transform loop into memmove when load from body has more than one use This change fixes issue found by Markus: https://reviews.llvm.org/rG11338e998df1 Before this patch following code was transformed to memmove: for (int i = 15; i >= 1; i--) { p[i] = p[i-1]; sum += p[i-1]; } However load from p[i-1] is used not only by store to p[i] but also by sum computation. Therefore we cannot emit memmove in loop header. Differential Revision: https://reviews.llvm.org/D107964 (cherry picked from commit bdcf04246c401aec9bdddf32fabc99fa4834a477)	2021-08-25 16:19:10 +02:00
Dawid Jurczak	b61f80921f	[NFC][LoopIdiom] Add reproducer of wrong memmove transformation That's precommit test for D107964. Differential Revision: https://reviews.llvm.org/D108537	2021-08-25 16:09:09 +02:00
Tom Stellard	5aea8f0472	Revert "[RISCV] Fix reporting of incorrect commutable operand indices" This reverts commit a7933290f72a08dc060d38fa52772a9cc33ed9ba. This commit caused some bot failures: clang-with-thin-lto-ubuntu-release lld-x86_64-win-release llvm-clang-x86_64-expensive-checks-debian-release	2021-08-24 21:59:54 -07:00
Sami Tolvanen	8a19c7f52b	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands 700d07f8ce6f2879610fd6b6968b05c6f17bb915 with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058 (cherry picked from commit 7ce1c4da7726577986535cb7766d782f325145fe)	2021-08-24 18:49:13 -07:00
Fraser Cormack	2424302e99	[RISCV] Fix reporting of incorrect commutable operand indices This patch fixes an issue where RISCV's `findCommutedOpIndices` would incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable operand index, rather than fixing a specific index. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D108206 (cherry picked from commit 5b06cbac11e53ce55f483c1852a108012507a6bb)	2021-08-24 10:20:28 -07:00
Christian Fetzer	13fbbfff3a	[Coverage][llvm-cov] Correctly export branch coverage in LCOV format Commit 9f2967bcfe2f7d1fc02281f0098306c90c2c10a5 introduced support for branch coverage including export to the LCOV format. This commit corrects the LCOV field name for branches from BFH to BRH. The mistake seems to have slipped in as typo because the correct field name BRH is used in the comment section at the beginning of the file. Differential Revision: https://reviews.llvm.org/D108358 (cherry picked from commit 9116211d180ca417fa93d4e97e60f4ba849d58d9)	2021-08-23 21:02:01 -07:00
Jeremy Morse	fa3a9f0fc7	Revert sharing subprograms across CUs This patch is a revert of e08f205f5c2c. In that patch, DW_TAG_subprograms were permitted to be referenced across CU boundaries, to improve stack trace construction using call site information. Unfortunately, as documented in PR48790, the way that subprograms are "owned" by dwarf units is sufficiently complicated that subprograms end up in unexpected units, invalidating cross-unit references. There's no obvious way to easily fix this, and several attempts have failed. Revert this to ensure correct DWARF is always emitted. Three tests change in addition to the reversion, but they're all very light alterations. Differential Revision: https://reviews.llvm.org/D107076 (cherry picked from commit d4ce9e463d51b18547dbd181884046abf77c5c91) Signed-off-by: Jeremy Morse <jeremy.morse@sony.com> Conflicts: llvm/test/DebugInfo/X86/convert-loclist.ll	2021-08-23 11:18:57 -07:00
Nikita Popov	e4e6f3eeff	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076 (cherry picked from commit 81b106584f2baf33e09be2362c35c1bf2f6bfe94)	2021-08-18 20:07:23 -07:00
Simon Pilgrim	45d26b8826	[X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281) As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width. The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data. This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize. Differential Revision: https://reviews.llvm.org/D107158 (cherry picked from commit 18e6a03b1a15b2661259af15ae604b4c4850cd61)	2021-08-18 12:15:46 -07:00
Tomas Matheson	4d78ad44fb	[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert This assert is intended to ensure that the high registers are not selected when it is passed to one of the thumb UXT instructions. However it was triggering even for 32 bit where no UXT instruction is emitted. Fixes PR51313. Differential Revision: https://reviews.llvm.org/D107363 (cherry picked from commit 40650f27b5df95b2f96d25ea03976d8136804441)	2021-08-18 12:14:24 -07:00
Amy Kwan	8202608068	[PowerPC] Disable CTR Loop generate for fma with the PPC double double type. It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this intrinsic needs to remain as a call as it cannot be lowered to any instruction, which also means we need to disable CTR loop generation for fma involving the ppcf128 type. This patch accomplishes this behaviour. Differential Revision: https://reviews.llvm.org/D107914 (cherry picked from commit 581a80304c671b6cb2b1b1f87feb9fbe14875f2a)	2021-08-17 20:22:13 -07:00
Paul Walker	22b0c02125	[DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation. visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when removing "redundant" INSERT_SUBVECTOR operations. This patch adds an extra check to ensure such combines only occur after operation legalisation if any resulting BITBAST is itself legal. Differential Revision: https://reviews.llvm.org/D108086 (cherry picked from commit cd0e1964137f1cd7b508809ec80c7d9dcb3f0458)	2021-08-16 23:26:32 -07:00
Johannes Doerfert	0bbcb191e3	[Attributor][FIX] Guard constant casts with type size checks (cherry picked from commit 5f543919b2646d36f2ddc1424acdd555bfcebe4f)	2021-08-16 11:36:30 -07:00
Sanjay Patel	4345e14522	[InstCombine] avoid infinite loops from min/max canonicalization The intrinsics have an extra chunk of known bits logic compared to the normal cmp+select idiom. That allows folding the icmp in each case to something better, but that then opposes the canonical form of min/max that we try to form for a select. I'm carving out a narrow exception to preserve all existing regression tests while avoiding the inf-loop. It seems unlikely that this is the only bug like this left, but this should fix: https://llvm.org/PR51419 (cherry picked from commit b267d3ce8defa092600bda717ff18440d002f316)	2021-08-16 11:35:38 -07:00
Sanjay Patel	5a14ea148e	[InstSimplify] fold min/max with limit constant This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there. (cherry picked from commit e260e10c4a21784c146c94a2a14b7e78b09a9cf7)	2021-08-16 11:35:29 -07:00
Sanjay Patel	de802b8e6e	[InstSimplify] add tests for min/max idioms; NFC (cherry picked from commit 9b942a545cb53d4bae2071a2dea513be74f68221)	2021-08-16 11:35:24 -07:00
David Sherwood	9740b5c5ef	[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284 (cherry picked from commit 3fd96e1b2e129b981f1bc1be2615486187e74687)	2021-08-16 11:32:41 -07:00
David Sherwood	00203829b4	[NFC] Clean up tests in test/Transforms/LoopVectorize/assume.ll The tests previously had lots of unnecessary CHECK lines, where all we really need to check is the presence (or absence) of the assume intrinsic and the correct input operands. Differential Revision: https://reviews.llvm.org/D107157 (cherry picked from commit 1172a8a7639399fe0b8a6c78a7123b1c3f9cf833)	2021-08-16 11:32:33 -07:00
Martin Storsjö	3a88dc8338	Add release notes for things relating to MinGW in the release	2021-08-16 12:26:49 +03:00
Rainer Orth	4a38ef8718	[ELF] Don't emit SHF_GNU_RETAIN on Solaris The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris. Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but `SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag (in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a completely different semantics, which confuses Solaris `ld` very much. Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris `ld` doesn't support, causing it to SEGV and break the build. The linker is currently being hardened to not accept non-native OS ABIs to avoid this. The need for linker support is already documented in `clang/include/clang/Basic/AttrDocs.td`, but not currently checked. This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D107747 (cherry picked from commit 7bbbf2956181f375ab193321b37ea71c5fc44054)	2021-08-12 22:51:57 -07:00
Petr Hosek	fd411b2b5d	[profile] Fix profile merging with binary IDs This fixes support for merging profiles which broke as a consequence of e50a38840dc3db5813f74b1cd2e10e6d984d0e67. The issue was missing adjustment in merge logic to account for the binary IDs which are now included in the raw profile just after header. In addition, this change also: * Includes the version in module signature that's used for merging to avoid accidental attempts to merge incompatible profiles. * Moves the binary IDs size field after version field in the header as was suggested in the review. Differential Revision: https://reviews.llvm.org/D107143 (cherry picked from commit 83302c84890e5e6cb74c7d6c9f8eaaa56db0077c)	2021-08-12 22:46:22 -07:00
Andrea Di Biagio	681b643c07	[X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322). This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in the RM/MR variants of ADC/SBB (PR51318) This also fixes the absence of ReadAdvance for the register operand of RMW arithmetic instructions (PR51322). Differential Revision: https://reviews.llvm.org/D107367 (cherry picked from commit 7a1a35a1d1ae2e69769505c9f39910067c53d53b)	2021-08-11 21:40:03 -07:00
Andrea Di Biagio	fb132cb74b	[MCA][NFC] Add tests for PR51318 and PR51322. Also, regenerate existing X86 tests using update_mca_test.py. (cherry picked from commit f0658c7a429b9e356da1670b280ab943ad0b0b94)	2021-08-11 21:39:56 -07:00
Andrea Di Biagio	db94372a40	[MCA] Simplify the rounding logic used in TimelineView::printWaitTimeEntry. This is related to PR51392. Before this patch, the timeline view was rounding doubles to the first decimal, using a logic similar to this: ``` double AverageTime = (double)Input / CumulativeExecutions; double Result = floor((AverageTime * 10) + 0.5) / 10 ``` Here, Input and CumulativeExecutions are both unsigned integers. The last operation is what effectively performs the rounding of AverageTime. PR51392 has been raised because - under specific -m32 configurations of GCC - one of the timeline tests reports slighlty different values (due to a different rounding choice). This patch tries to minimise the propagation of floating-point error by hoisting the multiply by 10, so that it is performed on the unsigned. ``` double AverageTime = (double)(Input * 10) / CumulativeExecutions; floor(AverageTime + 0.5) / 10 ``` So we are trading a floating point multiply for a integer multiply (which can be expanded using a simple MUL or using an `ADD + LEA` sequence). This decrease in floating point operations executed should also help with decreasing the error in the computation.. Strictly speaking, that computation will always be potentially subject to error (depending on what values are passed in input). However, this patch should improve the situation and make bug like PR51392 less frequent. (cherry picked from commit 45685a1fc4524579a25b03eb1a27e8fcb792afc7)	2021-08-11 13:42:58 -07:00

1 2 3 4 5 ...

219405 Commits