llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00

Author	SHA1	Message	Date
Evandro Menezes	0c8a79e78d	[RISCV] Add scheduling resources for V Add the scheduling resources for the V extension instructions. Differential Revision: https://reviews.llvm.org/D98002 (cherry picked from commit 63a5ac4e0d969f41bf71785cc3979349a45a2892)	2021-08-10 23:11:38 -07:00
Bradley Smith	ee15bdbb06	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576 (cherry picked from commit 73ecb9987b00db274b7b2ac34b0602ffdb906a4b)	2021-08-10 15:34:36 -07:00
Yonghong Song	f6a86e448a	BPF: avoid NE/EQ loop exit condition Kuniyuki Iwashima reported in [1] that llvm compiler may convert a loop exit condition with "i < bound" to "i != bound", where "i" is the loop index variable and "bound" is the upper bound. In case that "bound" is not a constant, verifier will always have "i != bound" true, which will cause verifier failure since to verifier this is an infinite loop. The fix is to avoid transforming "i < bound" to "i != bound". In llvm, the transformation is done by IndVarSimplify pass. The compiler checks loop condition cost (i = i + 1) and if the cost is lower, it may transform "i < bound" to "i != bound". This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo class to return a higher cost for such an operation, which will prevent the transformation for the test case added in this patch. [1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/ Differential Revision: https://reviews.llvm.org/D107483 (cherry picked from commit e52946b9ababcbf8e6f40b1b15900ae2e795a1c6)	2021-08-06 12:45:53 -07:00
Craig Topper	7c9c296915	[RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop. The sign_extend we insert here can get turned into a zero_extend if the sign bit is known zero. This can enable a setcc combine that shrinks compares with zero_extend. This reduces the use count of the zero_extend allowing other combines to turn it back into an any_extend. This restricts the combine to only cases where the result is used by a CopyToReg. This works for my original motivating case. I hope the CopyToReg use will prevent any converted extends from turning back into an any_extend. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106754 (cherry picked from commit 54588bcc052e5b08f90e672c33d0c1ad4eda2424)	2021-08-02 11:31:08 -07:00
Alexandros Lamprineas	276fcebbe0	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-08-02 15:45:58 +01:00
Bradley Smith	183b0c7c98	[AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter An incorrect mask type when lowering an SVE gather/scatter was causing a codegen fault which manifested as the incorrect predicate size being used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d). Fixes PR51182. Differential Revision: https://reviews.llvm.org/D106943 (cherry picked from commit 191831e380f317cd2baa5d48abe02d1d11cd44cb)	2021-07-29 07:03:40 -07:00
Xiang1 Zhang	a6d5003afd	[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D106780	2021-07-28 08:16:59 +08:00
Xiang1 Zhang	5d447ad589	Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT" This reverts commit 6ff73efea94621e74642e4d7a15cc86a5fb6d411.	2021-07-28 08:12:29 +08:00
Xiang1 Zhang	409f0eedd6	[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT	2021-07-28 08:08:30 +08:00
Krzysztof Parzyszek	60850cdc6a	[Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs This fixes https://llvm.org/PR51229.	2021-07-27 18:36:28 -05:00
Nemanja Ivanovic	8b3f85a32c	[PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX The dst/dstt/dstst/dststt instructions are nop's on all PowerPC cores that AIX supports. The AIX assembler also does not accept these mnemonics. Turn them into nop's on AIX (similar to dstall).	2021-07-27 15:50:02 -05:00
Sanjay Patel	11aa71a71d	[x86] update stale code comment; NFC The transform was generalized with: 1ce05ad619a5	2021-07-27 16:45:52 -04:00
Matt Arsenault	ece3299a71	AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9 The patterns for the m0 glue patterns were failing to import.	2021-07-27 15:56:42 -04:00
Amara Emerson	6ce8f2f7c1	[AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection. Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.	2021-07-27 12:13:56 -07:00
Craig Topper	6bfc6b8665	[RISCV] Select vector shl by 1 to a vector add. A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689	2021-07-27 10:57:28 -07:00
Matt Arsenault	8979bda8e8	AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source This is partially a workaround. SILowerI1Copies does not understand unstructured loops. This would result in inserting instructions to merge a mask register in the same block where it was defined in an unstructured loop.	2021-07-27 11:44:38 -04:00
Thomas Lively	bb4e957ebf	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Anirudh Prasad	2b967460b4	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
Tres Popp	05308b27ea	Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI." This reverts commit 1cfecf4fc4278afb0005923f6dff595cd372da5c. This commit broke LLVM code generated through XLA by removing a conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD This is not a perfect revert. The new function is left as other uses of it exist now.	2021-07-27 16:55:50 +02:00
Tres Popp	d809395fde	Revert "Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI."" This reverts commit d7bbb1230a94cb239aa4a8cb896c45571444675d. There were follow up uses of a deleted method and I didn't run the tests. Undo the revert, so I can do it properly.	2021-07-27 16:48:31 +02:00
Tres Popp	9d32182a3a	Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI." This reverts commit 1cfecf4fc4278afb0005923f6dff595cd372da5c. This commit broke LLVM code generated through XLA by removing a conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD	2021-07-27 16:22:25 +02:00
Fraser Cormack	5afacc5171	[RISCV] Add support for vector saturating add/sub operations This patch adds support for lowering the saturating vector add/sub intrinsics to RVV instructions, for both fixed-length and scalable-vector forms alike. Note that some of the DAG combines are still not triggering for the scalable-vector tests. These require a bit more work in the DAGCombiner itself. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106651	2021-07-27 10:04:14 +01:00
Cullen Rhodes	1ae2c0fb16	[AArch64][SME] Add zero instruction This patch adds the zero instruction for zeroing a list of 64-bit element ZA tiles. The instruction takes a list of up to eight tiles ZA0.D-ZA7.D, which must be in order, e.g. zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d} zero {za1.d,za3.d,za5.d,za7.d} The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which are mapped to corresponding 64-bit element tiles in accordance with the architecturally defined mapping between different element size tiles, e.g. * Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing all eight 64-bit element tiles ZA0.D to ZA7.D. * Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D. The preferred disassembly of this instruction uses the shortest list of tile names that represent the encoded immediate mask, e.g. * An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and ZA5.D is disassembled as {ZA0.S, ZA1.S}. * An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and ZA6.D is disassembled as {ZA0.H}. * An all-ones immediate is disassembled as {ZA}. * An all-zeros immediate is disassembled as an empty list {}. This patch adds the MatrixTileList asm operand and related parsing to support this. Depends on D105570. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D105575	2021-07-27 08:35:45 +00:00
David Green	4d9d8789ad	[ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the same time this alters VLDR_P0_off and VSTR_P0_off to use the same code too, as they too should be returning VPR in Op0, take a FI in Op1 and zero offset in Op2. Differential Revision: https://reviews.llvm.org/D106797	2021-07-27 09:11:58 +01:00
Craig Topper	eb0ba2928d	[AArch64] Fix -Wparentheses warning with gcc 5.4. NFC	2021-07-26 21:08:56 -07:00
Carl Ritson	5997b61680	[AMDGPU] Add SelectionDAG support for insert_subvector on v4f64 Enable custom insert_subvector for larger vector types. This is necessary now that SelectionDAG can attempt v3f64 insert to v4f64, etc. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105385	2021-07-27 10:11:34 +09:00
Nemanja Ivanovic	64c9fc2e9c	[PowerPC] Fix materialization of SP float values on Power10 All floating point values in registers are in double precision representation. In order to materialize the correct single precision value, we need to convert the APFloat that represents the value to double precision first. Reviewed By: amyk, NeHuang Differential Revision: https://reviews.llvm.org/D106812	2021-07-26 19:43:10 -05:00
Jon Roelofs	69e5ee4faa	Revert "[AArch64][GlobalISel] Legalize ctpop s128" This reverts commit 97e95fea53fc403c2a12e356dc835fc922123575. It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.	2021-07-26 17:06:43 -07:00
Jon Roelofs	18108c50ec	[AArch64][GlobalISel] Legalize ctpop s128 Differential revision: https://reviews.llvm.org/D106494	2021-07-26 16:33:50 -07:00
Masoud Ataei	8bfb4cf745	[PowerPC] Add pwr7 and pwr10 support to IBM MASSV pass on AIX Before MASSV only supported P8 and P9 on AIX ans Linux . This patch proposes MASSV to add support of P7 and P10 only on AIX too. Differential: https://reviews.llvm.org/D106678	2021-07-26 23:21:38 +00:00
Amara Emerson	6634a71bd1	[AArch64][GlobalISel] Add identity combines to post-legal combiner. We see some shifts of zero emitted during legalization. Differential Revision: https://reviews.llvm.org/D106816	2021-07-26 15:17:11 -07:00
Amara Emerson	5e3045ed0c	[GlobalISel] Add a constant folding combine. Use it AArch64 post-legal combiner. These don't always get folded because when the instructions are created the constants are obscured by artifacts. Differential Revision: https://reviews.llvm.org/D106776	2021-07-26 14:53:33 -07:00
Heejin Ahn	9b48b7e7be	[WebAssembly] Make Emscripten EH work with Emscripten SjLj When Emscripten EH mixes with Emscripten SjLj, we are not currently handling some of them correctly. There are three cases: 1. The current function calls `setjmp` and there is an `invoke` to a function that can either throw or longjmp. In this case, we have to check both for exception and longjmp. We are currently handling this case correctly: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1058-L1090)` When inserting routines for functions that can longjmp, which we do only for setjmp-calling functions, we check if the function was previously an `invoke` and handle it correctly. 2. The current function does NOT call `setjmp` and there is an `invoke` to a function that can either throw or longjmp. Because there is no `setjmp` call, we haven't been doing any check for functions that can longjmp. But in that case, for `invoke`, we only check for an exception and if it is not an exception we reset `__THREW__` to 0, which can silently swallow the longjmp: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L70-L80)` This CL fixes this. 3. The current function calls `setjmp` and there is no `invoke`. Because it is not an `invoke`, we haven't been doing any check for functions that can throw, and only insert longjmp-checking routines for functions that can longjmp. But in that case, if a longjmpable function throws, we only check for a longjmp so if it is not a longjmp we reset `__THREW__` to 0, which can silently swallow the exception: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L156-L169)` This CL fixes this. To do that, this moves around some code, so we register necessary functions for both EH and SjLj and precompute some data (the set of functions that contains `setjmp`) before doing actual EH or SjLj transformation. This CL makes 2nd and 3rd tests in https://github.com/emscripten-core/emscripten/pull/14732 work. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D106525	2021-07-26 13:48:31 -07:00
Lei Huang	bbc51b9f17	[PowerPC]Add addex instruction definition and MC tests Add td definitions and asm/disasm tests for the addex instruction introduced in ISA 3.0. Reviewed By: nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D106666	2021-07-26 14:55:38 -05:00
Lei Huang	c4acdbbb3c	[PowerPC] Add implicit-def RM to instructions mtfsb[01] This is a followup patch for D105930 to add implicit-def of RM for mtfsb[01] instructions as per review comments. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D106603	2021-07-26 14:07:08 -05:00
Michael Liao	b2d24acf01	[amdgpu] Add 64-bit PC support when expanding unconditional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445	2021-07-26 14:50:30 -04:00
Amara Emerson	6a4c66376a	[AArch4][GlobalISel] Post-legalize combine s64 = G_MERGE s32, 0 -> G_ZEXT. These are generated as a byproduce of legalization. Differential Revision: https://reviews.llvm.org/D106768	2021-07-26 10:58:04 -07:00
Amara Emerson	d0d4c1578a	[AArch64][GlobalISel] Enable some select combines after legalization. The legalizer generates selects for some operations, which can have constant condition values, resulting in lots of dead code if it's not folded away. Differential Revision: https://reviews.llvm.org/D106762	2021-07-26 10:40:32 -07:00
Amara Emerson	b09f2e63d9	[GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner. Differential Revision: https://reviews.llvm.org/D106761	2021-07-26 10:37:31 -07:00
Heejin Ahn	cd521f45d9	[WebAssembly] Improve pseudocode in LowerEmscriptenEHSjLj Both `__THREW__` and `__threwValue` are global variables, and we have been distinguishing the global variable `__THREW__` and the loaded value `%__THREW__.val` in comments but not doing it for `__threwValue`. Made the pseudocode comments consistent for both variables. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D106524	2021-07-26 10:13:28 -07:00
Paul Walker	6d505ef4dd	[SVE] Use reg+reg addressing mode for immediate offsets. For reg+imm SVE addressing mode imm is implictly scaled by VL, making them impractical for truely immediate offsets. However, if the offset can be unscaled based on the storage element type we can use the reg+reg SVE addressing mode and thus either reduce the number of generate add instructions or replace them with a mov instruction that can be hoisted from the hot code path. Differential Revision: https://reviews.llvm.org/D106744	2021-07-26 16:24:16 +01:00
Bradley Smith	5e51e7ed64	[AArch64][SVE] Break false dependencies for inactive lanes of unary operations Differential Revision: https://reviews.llvm.org/D105889	2021-07-26 15:01:21 +00:00
Ulrich Weigand	81afdbc83c	[SystemZ] Add support for new cpu architecture - arch14 This patch adds support for the next-generation arch14 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch14 as host processor. - Assembler/disassembler support for new instructions. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10304. Note: No currently available Z system supports the arch14 architecture. Once new systems become available, the official system name will be added as supported -march name.	2021-07-26 16:57:28 +02:00
Jay Foad	453349344b	[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. D106284 did this for SelectionDAG. This is the corresponding fix for GlobalISel. Differential Revision: https://reviews.llvm.org/D106451	2021-07-26 14:27:30 +01:00
Jay Foad	7f0f3d6b7b	[AMDGPU] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. Differential Revision: https://reviews.llvm.org/D106284	2021-07-26 14:27:30 +01:00
David Green	1cb5f9c5d7	[ARM] Ensure correct regclass in distributing postinc The register class required for some MVE loads/stores is more constrained than the register we use when creating postinc. Make sure we constrain the register class to keep the code correct.	2021-07-26 14:26:38 +01:00
Tim Northover	c8cc09ffa5	AArch64: support i128 (& larger) returns in GlobalISel	2021-07-26 14:16:35 +01:00
Caroline Concatto	7e3be1e953	[AArch65][SVE] Remove vector_splice from AddedComplexity pattern The pattern for vector_splice with Index equal or bigger than zero was misplaced in the AddedComplexity = 1 pattern in the AArch64 tablegen file. This patch fixes it by removing vector_splice pattern from inside AddedComplexity = 1.	2021-07-26 13:35:51 +01:00
Caroline Concatto	9af238dfc2	[SVE][AArch64] Improve code generation for vector_splice for Imm > 0 This patch implements vector_splice in tablegen for all cases when the Immediate is positive and lower than the known minimum value of a scalable vector. Vector_splice can be implemented using SVE instruction EXT. For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E Depends on D105633 Differential Revision: https://reviews.llvm.org/D106273	2021-07-26 11:45:46 +01:00
Caroline Concatto	d9b910d9d2	[AArch64][SVE] Improve code generation for vector_splice for Imm == -1 This patch implements vector_splice in tablegen for: a) when the immediate is equal to -1 (Imm==1) and uses: INSR + LASTB For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, -1) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G> LAST RegLast, Vector_1 // RegLast = D INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G Differential Revision: https://reviews.llvm.org/D105633	2021-07-26 11:25:01 +01:00

1 2 3 4 5 ...

63622 Commits