llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00

Author	SHA1	Message	Date
Fangrui Song	8d6669fe11	[VE] Fix -Wunused-private-field after D72598 and -Wdeprecated-declarations after D76348	2020-03-20 15:06:58 -07:00
Huihui Zhang	7fbf09d073	[ValueTracking][SVE] Fix getOffsetFromIndex for scalable vector. Summary: Return None if GEP index type is scalable vector. Size of scalable vectors are multiplied by a runtime constant. Avoid transforming: %a = bitcast i8* %p to <vscale x 16 x i8>* %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0 store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp0 %tmp1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 1 store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp1 into: %a = bitcast i8* %p to <vscale x 16 x i8>* %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0 %1 = bitcast <vscale x 16 x i8>* %tmp0 to i8* call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 false) Reviewers: sdesmalen, efriedma, apazos, reames Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, arphaman, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76464	2020-03-20 14:48:29 -07:00
Nikita Popov	a84d00a0cd	[InstSimplify] Reorder checks to be more efficient; NFC First check whether the RHS is a null pointer, and only then perform a potentially expensive non-zero query.	2020-03-20 22:05:38 +01:00
Pirama Arumuga Nainar	119fe7e34e	[llvm-ar] Use target triple to deduce archive kind for bitcode inputs Summary: When using full LTO on cross-compile settings, instead of generating the default archive kind of the host platform, we could deduce the archive kind based on the target triple. This specifically addresses https://github.com/android/ndk/issues/1209 by making it possible to drop llvm-ar in place of GNU ar without extra flags. Reviewers: compnerd, pcc, srhines, danalbert Subscribers: hiraditya, MaskRay, steven_wu, dexonsmith, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76461	2020-03-20 13:19:44 -07:00
Nikita Popov	d21f6c74eb	[InstCombine] Remove known bits constant folding If ExpensiveCombines is enabled (which is the case with -O3 on the legacy PM and always on the new PM), InstCombine tries to compute the known bits of all instructions in the hope that all bits end up being known, which is fairly expensive. How effective is it? If we add some statistics on how often the constant folding succeeds and how many KnownBits calculations are performed and run test-suite we get: "instcombine.NumConstPropKnownBits": 642, "instcombine.NumConstPropKnownBitsComputed": 18744965, In other words, we get one fold for every 30000 KnownBits calculations. However, the truth is actually much worse: Currently, known bits are computed before performing other folds, so there is a high chance that cases that get folded by known bits would also have been handled by other folds. What happens if we compute known bits after all other folds (hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)? "instcombine.NumConstPropKnownBits": 0, "instcombine.NumConstPropKnownBitsComputed": 18105547, So it turns out despite doing 18 million known bits calculations, the known bits fold does not do anything useful on test-suite. I was originally planning to move this into AggressiveInstCombine so it only runs once in the pipeline, but seeing this, I think we're better off removing it entirely. As this is the only use of the "expensive combines" mechanism, it may be removed afterwards, but I'll leave that to a separate patch. Differential Revision: https://reviews.llvm.org/D75801	2020-03-20 20:54:06 +01:00
Vedant Kumar	1230f7058d	unittest: Work around build failure on MSVC builders MSVC insists on using the deleted move constructor instead of the copy constructor: http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/41203 C:\ps4-buildslave2\lld-x86_64-win7\llvm-project\llvm\unittests\ADT\CoalescingBitVectorTest.cpp(193): error C2280: 'llvm::CoalescingBitVector<unsigned int,16>::CoalescingBitVector(llvm::CoalescingBitVector<unsigned int,16> &&)': attempting to reference a deleted function	2020-03-20 12:38:00 -07:00
Vedant Kumar	56222dd183	[LiveDebugValues] Speed up collectIDsForRegs, NFC Use the advanceToLowerBound operation available on CoalescingBitVector iterators to speed up collection of variables which reside within some set of registers. The speedup comes from avoiding repeated top-down traversals in IntervalMap::find. The linear scan forward from one register interval to the next is unlikely to be as expensive as a full IntervalMap search starting from the root. This reduces time spent in LiveDebugValues when compiling sqlite3 by 200ms (about 0.1% - 0.2% of the total User Time). Depends on D76466. rdar://60046261 Differential Revision: https://reviews.llvm.org/D76467	2020-03-20 12:18:26 -07:00
Vedant Kumar	5f49ad1d16	[ADT] CoalescingBitVector: Add advanceToLowerBound iterator operation advanceToLowerBound moves an iterator to the first bit set at, or after, the given index. This can be faster than doing IntervalMap::find. rdar://60046261 Differential Revision: https://reviews.llvm.org/D76466	2020-03-20 12:18:26 -07:00
Vedant Kumar	c1c993b3e8	[ADT] CoalescingBitVector: Avoid initial heap allocation, NFC Avoid making a heap allocation when constructing a CoalescingBitVector. This reduces time spent in LiveDebugValues when compiling sqlite3 by 700ms (0.5% of the total User Time). rdar://60046261 Differential Revision: https://reviews.llvm.org/D76465	2020-03-20 12:18:25 -07:00
Fangrui Song	d737b91aa4	[X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile UseInitArray is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false. The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called. clang -target i386 -c a.c clang -target x86_64 -c a.c This cleanup fixes this as a bonus. Differential Revision: https://reviews.llvm.org/D71360	2020-03-20 11:18:36 -07:00
Fangrui Song	0ac65271c0	[llc] Initialize TargetLoweringObjectFile for MIR input MIRParser uses MC and transitively calls MCObjectFileInfo::getObjectFileType(). TargetLoweringObjectFile::Initialize should be called beforehand to initialize MCObjectFileInfo::Env. This manifested as a -fsanitize=undefined test/CodeGen/MIR/X86/instr-symbols-and-mcsymbol-operands.mir failure when D71360/aa5ee8f244441a8ea103a7e0ed8b6f3e74454516 was committed.	2020-03-20 11:18:36 -07:00
Vedant Kumar	3de3902a62	PR45181: Fix another invalid DIExpression combination The original test case from PR45181 triggers a DIExpression combination that wasn't fixed in D76164.	2020-03-20 11:18:05 -07:00
Adrian Prantl	a194f87a9b	Add missing module map entry	2020-03-20 11:11:27 -07:00
Sterling Augustine	b72765b119	Cleanup the plumbing for DILineInfoSpecifier. [NFC - Try 2]	2020-03-20 10:29:57 -07:00
Nikita Popov	609bfedb6b	[InstCombine] Handle known shl nsw sign bit in SimplifyDemanded Ideally SimplifyDemanded should compute the same known bits as computeKnownBits(). This patch addresses one discrepancy, where ValueTracking is more powerful: If we have a shl nsw shift, we know that the sign bit of the input and output must be the same. If this results in a conflict, the result is poison. This is implemented in `2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)` and `2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908)`. This implements the same basic logic in SimplifyDemanded. It's slightly stronger, because I return undef instead of zero for the poison case (which is not an option inside ValueTracking). As mentioned in https://reviews.llvm.org/D75801#inline-698484, we could detect poison in more cases, this just establishes parity with the existing logic. Differential Revision: https://reviews.llvm.org/D76489	2020-03-20 18:16:05 +01:00
Pirama Arumuga Nainar	011eaae1ea	[DAGCombiner] Do not fold truncate(build_vector(..)) if it creates an illegal type Summary: It can be the case that a vector type is legal but the corresponding scalar type is not legal for an architecture (i8 vs. v16i8 on AArch64). Check if the scalar type created when folding truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) is legal if we are running after the type legalizer. This fixes https://github.com/android/ndk/issues/1207. Reviewers: RKSimon, srhines Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76312	2020-03-20 09:20:16 -07:00
Sean Fertile	efd6d16a3e	[PowerPC][AIX][NFC] Extend the test coverage of ByVal args. Adds/changes some types in the ByVal cc test so that they aren't all structs of arrays of bytes, and adds testing for passing multiple ByVal arguments.	2020-03-20 12:19:08 -04:00
Craig Topper	0f922585b3	[X86] Prevent (bitcast (broadcast_load)) combine from producing vXf16 broadcast instructions. The combine tries to put the broadcast in either the integer or fp domain to match the bitcast domain. But we can only do this if the broadcast size is 32 or larger.	2020-03-20 09:15:07 -07:00
Simon Pilgrim	d8bac0b803	[InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391) The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range. This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).	2020-03-20 15:48:06 +00:00
Simon Tatham	98323f59a2	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	da80cc9fe7	[ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family. Summary: I've implemented these as target-specific IR intrinsics, because they're not //quite// enough like @llvm.experimental.vector.reduce.min (which doesn't take the extra scalar parameter). Also this keeps the predicated and unpredicated versions looking similar, and the floating-point minnm/maxnm versions fold into the same schema. We had a couple of min/max reductions already implemented, from the initial pathfinding exercise in D67158. Those were done by having separate IR intrinsic names for the signed and unsigned integer versions; as part of this commit, I've changed them to use a flag parameter indicating signedness, which is how we ended up deciding that the rest of the MVE intrinsics family ought to work. So now hopefully the ewhole lot is consistent. In the new llc test, the output code from the `v8f16` test functions looks quite unpleasant, but most of it is PCS lowering (you can't pass a `half` directly in or out of a function). In other circumstances, where you do something else with your `half` in the same function, it doesn't look nearly as nasty. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76490	2020-03-20 15:42:33 +00:00
Sean Fertile	7a4f5c2016	[PowerPC][AIX][NFC] Add zero-sized by val params to cc test. The zero sized structs force creation of a stack object of size 1, align 8 in the locals area, but otherwise have no effect on the calling convention code. i.e. They consume no registers or stack space in the paramater save area. The 32-bit codegen has 8 bytes of padding to fit the new stack object so stack size stays the same. 64-bit codegen has no padding in the stack frames allocated so 8 bytes is added, and becuase of 16-byte aligned stack, the stack size increases from 112 bytes to 128.	2020-03-20 11:24:46 -04:00
Bjorn Pettersson	b09ad4fef7	[DAGCombiner] Fix non-determinism problem related to argument evaluation order in visitFDIV Summary: For some reason the order in which we call getNegatedExpression for the involved operands, after a call to isCheaperToUseNegatedFPOps, seem to matter. This patch includes a new test case in test/CodeGen/X86/fdiv.ll that crashes if we reverse the order of those calls. Before this patch that could happen depending on which compiler that were used when buildind llvm. With my GCC version (7.4.0) I got the crash, because it seems like it is using a different order for the argument evaluation compared to clang. All other users of isCheaperToUseNegatedFPOps already used this pattern with unfolded/ordered calls to getNegatedExpression, so this patch is aligning visitFDIV with the other use cases. This patch simply deals with the non-determinism for FDIV. While the underlying problem with getNegatedExpression is discussed further in D76439. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: hiraditya, mgrang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76319	2020-03-20 16:11:17 +01:00
Matt Arsenault	ef93b09932	AMDGPU: Move towards deprecating alignbit intrinsic This is equivalent to llvm.fshr, so legalize the intrinsic to the generic node.	2020-03-20 11:03:04 -04:00
Matt Arsenault	134b259429	AMDGPU: Add more tests for fshr	2020-03-20 11:01:51 -04:00
alex-t	fd26f332c3	[AMDGPU] Enable divergence driven ISel for ADD/SUB i64 Summary: Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary. This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step. Reviewers: arsenm, vpykhtin, rampitec Reviewed By: arsenm, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa Differential Revision: https://reviews.llvm.org/D76371	2020-03-20 17:06:11 +03:00
Mikhail Maltsev	4fa24484ce	[ARM,CDE] Implement CDE unpredicated Q-register intrinsics Summary: This patch implements the following intrinsics: uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm); T __arm_vcx1qa(int coproc, T acc, uint32_t imm); T __arm_vcx2q(int coproc, T n, uint32_t imm); uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm); T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm); T __arm_vcx3q(int coproc, T n, U m, uint32_t imm); uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm); T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm); Most of them are polymorphic. Furthermore, some intrinsics are polymorphic by 2 or 3 parameter types, such polymorphism is not supported by the existing MVE/CDE tablegen backends, also we don't really want to have a combinatorial explosion caused by 1000 different combinations of 3 vector types. Because of this some intrinsics are implemented as macros involving a cast of the polymorphic arguments to uint8x16_t. The IR intrinsics are even more restricted in terms of types: all MVE vectors are cast to v16i8. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76299	2020-03-20 14:01:56 +00:00
Mikhail Maltsev	821822dd9a	[ARM,CDE] Implement CDE S and D-register intrinsics Summary: This patch implements the following ACLE intrinsics: uint32_t __arm_vcx1_u32(int coproc, uint32_t imm); uint32_t __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm); uint32_t __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm); uint32_t __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm); uint32_t __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm); uint32_t __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m, uint32_t imm); uint64_t __arm_vcx1d_u64(int coproc, uint32_t imm); uint64_t __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm); uint64_t __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm); uint64_t __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm); uint64_t __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm); uint64_t __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m, uint32_t imm); Since the semantics of CDE instructions is opaque to the compiler, the ACLE intrinsics require dedicated LLVM IR intrinsics. The 64-bit and 32-bit variants share the same IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76298	2020-03-20 14:01:53 +00:00
Mikhail Maltsev	bcefc6091f	[ARM,CDE] Implement GPR CDE intrinsics Summary: This change implements ACLE CDE intrinsics that translate to instructions working with general-purpose registers. The specification is available at https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because they have distinct function prototypes). Dual-register operands are represented as pairs of i32 values. Because of this the instruction selection for these intrinsics cannot be represented as TableGen patterns and requires custom C++ code. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76296	2020-03-20 14:01:51 +00:00
Florian Hahn	886b9aca97	[DSE,MSSA] Precommit additional tests for D73763.	2020-03-20 13:39:46 +00:00
Simon Pilgrim	7cc5fa5f26	[ValueTracking] Add some initial isKnownNonZero DemandedElts support (PR36319)	2020-03-20 13:29:00 +00:00
Alexey Bataev	960e248f36	[OPENMP50]Initial support for scan directive. Addedi basic parsing/sema/serialization support for scan directive.	2020-03-20 07:58:15 -04:00
Nikita Popov	1cc3b3c8a9	[InstCombine] Move test to instcombine; NFC This test uses -instcombine, so move it into the appropriate directory. Also fork it for expensive checks enabled/disabled.	2020-03-20 12:41:19 +01:00
Simon Pilgrim	9d05ebb2cb	[ValueTracking] Add computeKnownBits DemandedElts support to shift instructions (PR36319)	2020-03-20 11:08:08 +00:00
Nikita Popov	330cabedf4	[Tests] Regenerate some test checks; NFC	2020-03-20 12:06:53 +01:00
James Henderson	c2469985d5	[llvm-readobj] Allow syms from all sections to match stack size entries Prior to this change, for non-relocatable objects llvm-readobj would assume that all symbols that corresponded to a stack size section's entries were in the section specified by the section's sh_link field. In the presence of an output section description combining SHF_LINK_ORDER sections linking different output sections, this cannot be respected, since linker script section patterns are "by name" by nature. Consequently, the sh_link value would not be correct for all section entries. This patch changes llvm-readobj to ignore the section of symbols in a non-relocatable object. Fixes https://bugs.llvm.org/show_bug.cgi?id=45228. Reviewed by: grimar, MaskRay Differential Revision: https://reviews.llvm.org/D76425	2020-03-20 10:54:18 +00:00
Georgii Rymar	5defb79081	[llvm-readobj][llvm-readelf][test] - Add a test to check how we dump relocation addends. Seems we do not test how we print relocation addends well. And the behavior of dumpers does not seem to be ideal here (and llvm-readelf does not match GNU as the test case shows). This patch adds a test case to document the current behavior. Differential revision: https://reviews.llvm.org/D75671	2020-03-20 13:41:32 +03:00
Adrian Kuegel	00bbf66f21	Revert "[TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes" This reverts commit e9f22fd4293a65bcdcf1b18b91c72f63e5e9e45b. When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70 failing tests with this revision, and 29 without this revision.	2020-03-20 11:02:50 +01:00
David Green	b0d54a04ab	[ARM] Change VDUP type to i32 for MVE The MVE VDUP instruction take a GPR and splats into every lane of a vector register. Unlike NEON we do not have a VDUPLANE equivalent instruction, doing the same splat from a fp register. Previously a VDUP to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which would mean the instruction pattern needs to add a COPY_TO_REGCLASS to the GPR. Instead this now converts that earlier during an ISel DAG combine, converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction selection to tell that the input needs to be an i32, which in one of the testcases allows it to use ldr (or specifically ldm) over (vldr;vmov). Whilst being simple enough for floats, as the types sizes are the same, these is no BITCAST equivalent for getting a half into a i32. This uses a VMOVrh ARMISD node, which doesn't know the same tricks yet. Differential Revision: https://reviews.llvm.org/D76292	2020-03-20 09:48:45 +00:00
Roger Ferrer Ibanez	ef495f25d0	[RISCV] Select +0.0 immediate using fmv.{w,d}.x / fcvt.d.w Floating point positive zero can be selected using fmv.w.x / fmv.d.x / fcvt.d.w and the zero source register. Differential Revision: https://reviews.llvm.org/D75729	2020-03-20 09:42:24 +00:00
Roger Ferrer Ibanez	1f76f9f3fb	[NFC][RISCV] Test for 0.0 fp immediate To show a later change that impacts 0.0 fp constant generation. Differential Revision: https://reviews.llvm.org/D75728	2020-03-20 09:42:24 +00:00
Nikita Popov	f73282e3d5	[InstCombine] Simplify calls with "returned" attribute If a call argument has the "returned" attribute, we can simplify the call to the value of that argument. This was already partially handled by InstSimplify/InstCombine for the case where the argument is an integer constant, and the result is thus known via known bits. The non-constant (or non-int) argument cases weren't handled though. This previously landed as an InstSimplify transform, but was reverted due to assertion failures when compiling the Linux kernel. The reason is that simplifying a call to another call breaks assumptions in call graph updating during inlining. As the code is not easy to fix, and there is no particularly strong motivation for having this in InstSimplify, the transform is only performed in InstCombine instead. Differential Revision: https://reviews.llvm.org/D75815	2020-03-20 10:23:39 +01:00
David Green	469aaafeb8	[ARM] Extra MVE float loop tests. NFC	2020-03-20 09:21:45 +00:00
Nikita Popov	117eda6206	[InstCombine] Don't replace musttail result based on known bits This is the same change as D75824, but for two cases where InstCombine performs the same optimization: Replacing an instruction whose bits are fully known with a constant. This is not (generally) legal for musttail calls. Differential Revision: https://reviews.llvm.org/D76457	2020-03-20 10:17:09 +01:00
Florian Hahn	ad3bbfd795	[Matrix] Generalize ColumnMatrixTy to MatrixTy (NFC). This patch sets the stage for supporting both row and column major layouts for matrixes. It renames ColumnMatrixTy to MatrixTy, adds booleans indicating the underlying layout to both MatrixTy and ShapeInfo and generalizes the methods of MatrixTy to support both row and column major layouts. Reviewers: Gerolf, anemet, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D76324	2020-03-20 08:32:13 +00:00
Florian Hahn	30717a4559	[DSE] Support traversing MemoryPhis. For MemoryPhis, we have to avoid that the MemoryPhi may be executed before before the access we are currently looking at. To do this we do a post-order numbering of the basic blocks in the function and bail out once we reach a MemoryPhi with a larger (or equal) post-order block number than the current MemoryAccess. This changes the order in which we visit stores for elimination. This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration. For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access. Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72148	2020-03-20 07:51:42 +00:00
Austin Kerbow	ba7de4f645	[AMDGPU] Reuse register during frame index elimination If there were no free VGPRs we would need two emergency spill slots for register scavenging during PEI/frame index elimination. Reuse 'ResultReg' for scale calculation so that only one spill is needed. Differential Revision: https://reviews.llvm.org/D76387	2020-03-20 00:19:15 -07:00
cdevadas	555c7ce155	[AMDGPU] Set the CostPerUse value for vgpr registers. Apart from the argument registers, set the CostPerUse value as per the ratio reg_index/allocation_granularity. It is a pre-commit for introducing the scratch registers in the ABI. This change should help in a balanced register allocation. Differential Revision: https://reviews.llvm.org/D76417	2020-03-20 11:49:35 +05:30
Wei Mi	2f9d0b467d	Revert "Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore." This reverts commit 3c96d01d2e3de63304ca3429d349ec62ae2adef3. Got report that it caused test failures in libc++.	2020-03-19 22:45:27 -07:00
Jun Ma	030942f981	[Coroutines] Fix PR45130 For now, when final suspend can be simplified by simplifySuspendPoint, handleFinalSuspend is executed as well to remove last case in switch instruction. This patch fixes it. Differential Revision: https://reviews.llvm.org/D76345	2020-03-20 11:27:08 +08:00

1 2 3 4 5 ...

193690 Commits