llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00

Author	SHA1	Message	Date
Fangrui Song	7cc3572b85	[XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering	2021-01-25 00:49:18 -08:00
Fangrui Song	18f43f5ac9	[XRay] Make __xray_customevent support non-Linux	2021-01-25 00:48:21 -08:00
QingShan Zhang	d5b70bbb38	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Chen Zheng	db58448497	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-24 21:28:21 -05:00
Kazu Hirata	0ec2908ce9	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	2fb5578408	[CodeGen] Forward-declare TargetMachine (NFC) InstrEmitter.h needs TargetMachine but relies on a forward declaration of TargetMachine in MachineOperand.h. This patch adds a forward declaration right in InstrEmitter.h. While we are at it, this patch removes the one in MachineOperand.h, where it is unnecessary.	2021-01-24 12:18:54 -08:00
Roger Ferrer Ibanez	e9cd01d82d	[RISCV][PrologEpilogInserter] "Float" emergency spill slots to avoid making them immediately unreachable from the stack pointer In RISC-V there is a single addressing mode of the form imm(reg) where imm is a signed integer of 12-bit with a range of [-2048..2047] bytes from reg. The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite exercises several scenarios with the stack, including function calls where the stack will need to be realigned to to a local variable having a large alignment of 4096 bytes. In situations of large stacks, the RISC-V backend (in RISCVFrameLowering) reserves an extra emergency spill slot which can be used (if no free register is found) by the register scavenger after the frame indexes have been eliminated. PrologEpilogInserter already takes care of keeping the emergency spill slots as close as possible to the stack pointer or frame pointer (depending on what the function will use). However there is a final alignment step to honour the maximum alignment of the stack that, when using the stack pointer to access the emergency spill slots, has the side effect of setting them farther from the stack pointer. In the case of the frame_layout testcase, the net result is that we do have an emergency spill slot but it is so far from the stack pointer (more than 2048 bytes due to the extra alignment of a variable to 4096 bytes) that it becomes unreachable via any immediate offset. During elimination of the frame index, many (regular) offsets of the stack may be immediately unreachable already. Their address needs to be computed using a register. A virtual register is created and later RegisterScavenger should be able to find an unused (physical) register. However if no register is available, RegisterScavenger will pick a physical register and spill it onto an emergency stack slot, while we compute the offset (restoring the chosen register after all this). This assumes that the emergency stack slot is easily reachable (this is, without requiring another register!). This is the assumption we seem to break when we perform the extra alignment in PrologEpilogInserter. We can "float" the emergency spill slots by increasing (in absolute value) their offsets from the incoming stack pointer. This way the emergency spill slots will remain close to the stack pointer (once the function has allocated storage for the stack, including the needed realignment). The new size computed in PrologEpilogInserter is padding so it should be OK to move the emergency spill slots there. Also because we're increasing the alignment, the new location should stay aligned for the purpose of the emergency spill slots. Note that this change also impacts other backends as shown by the tests. Changes are minor adjustments to the emergency stack slot offset. Differential Revision: https://reviews.llvm.org/D89239	2021-01-23 09:10:03 +00:00
Craig Topper	4a8863f5e1	[TargetLowering] Use isOneConstant to simplify some code. NFC	2021-01-22 19:32:19 -08:00
Stanislav Mekhanoshin	dd3332e2e5	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
Mitch Phillips	1c4f0a4fdd	Revert "[AArch64][GlobalISel] Implement widenScalar for signed overflow" This reverts commit 541d98efa222b00e16c67348810898c2fa11f398. Reason: Dependent patch 3dedad475da45c05bc4f66cd14e9f44581edf0bc broke UBSan on Android: http://lab.llvm.org:8011/#/builders/77/builds/3082	2021-01-22 14:32:11 -08:00
Mitch Phillips	7a51025e46	Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method" This reverts commit 2bb92bf451d7eb2c817f3e5403353e7c0c14d350. Dependent patch broke UBSan on Android: 3dedad475da45c05bc4f66cd14e9f44581edf0bc	2021-01-22 14:32:11 -08:00
Cassie Jones	166f6f7864	[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method The widenScalar implementation for signed and unsigned overflowing operations were very similar: both are checked by truncating the result and then re-sign/zero-extending it and checking that it matches the computed operation. Using a truncate + zero-extend for the unsigned case instead of manually producing the AND instruction like before leads to an extra copy instruction during legalization, but this should be harmless. Differential Revision: https://reviews.llvm.org/D95035	2021-01-22 14:08:46 -08:00
Simon Pilgrim	82a418b65b	[DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u)) We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.	2021-01-22 11:43:18 +00:00
Craig Topper	cac4215eb0	[TargetLowering] Use getBoolConstant instead of assuming zero or one for boolean contents. Noticed while I was touching other nearby code. I don't have a test where this matters because the targets I work on use zero or one boolean contents. And the tests cases I've seen this fire on happen before type legalization where the result type is MVT::i1 so the distinction doesn't matter.	2021-01-22 00:26:14 -08:00
Craig Topper	39e0e13396	[TargetLowering] Simplify some code in SimplifySetCC that tries to handle SIGN_EXTEND_INREG operand types that should never happen. NFCI There was code to handle the first operand being different than the result type. And code to handle first operand having the same type as the type to extend from. This should never happen for a correctly formed SIGN_EXTEND_INREG. I've replace the code with asserts. I also noticed we created the same APInt twice so I've reused it.	2021-01-21 23:56:37 -08:00
Cassie Jones	6bc2033895	[AArch64][GlobalISel] Implement widenScalar for signed overflow Implement widening for G_SADDO and G_SSUBO. Previously it was only implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for narrow overflowing add/sub on AArch64. Differential Revision: https://reviews.llvm.org/D95034	2021-01-21 22:55:42 -08:00
Kazu Hirata	69d44af40a	[llvm] Use isDigit (NFC)	2021-01-21 19:59:50 -08:00
Kazu Hirata	6f30ddfaf5	[CodeGen] Use llvm::append_range (NFC)	2021-01-21 19:59:46 -08:00
Chen Zheng	b37b3473aa	[NFC] [TargetRegisterInfo] add another API to get srcreg through copy. Reviewed By: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-21 20:10:25 -05:00
Matt Arsenault	a964cd0596	AArch64/GlobalISel: Factor out parametersInCSRMatch Make this look more like the DAG handling and move to common code. I also noticed AArch64 seems to not be properly adding the physreg:virtreg mapping to the function live ins.	2021-01-21 10:32:48 -05:00
Simon Pilgrim	add6aab738	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Simon Pilgrim	bf43079680	[DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.	2021-01-21 12:38:36 +00:00
Simon Pilgrim	49e90d18ab	[DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI. Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.	2021-01-21 11:04:09 +00:00
Luo, Yuanke	d25938d612	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit 20013d02f3352a88d0838eed349abc9a2b0e9cc0.	2021-01-21 18:11:43 +08:00
Luo, Yuanke	90681ff5c1	[X86][AMX] Fix tile config register spill issue. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction. Differential Revision: https://reviews.llvm.org/D94155	2021-01-21 16:01:50 +08:00
Kazu Hirata	cd6de67c94	[llvm] Use hasSingleElement (NFC)	2021-01-20 21:35:55 -08:00
Hans Wennborg	87b5059ac7	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit cad4275d697c601761e0819863f487def73c67f8.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	11b2803aa0	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Amanieu d'Antras	bea54db865	[AArch64] Add support for the GNU ILP32 ABI Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64. The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are: - Wiring up the new target to enable ILP32 codegen and MC. - ILP32 va_list support. - ILP32 TLSDESC relocation support. There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain. This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D94143	2021-01-20 13:34:47 +00:00
Mirko Brkusanin	a421260042	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Gabriel Hjort Åkerlund	0fd4b778e9	[GlobalISel] Add missing operand update when copy is required When constraining an operand register using constrainOperandRegClass(), the function may emit a COPY in case the provided register class does not match the current operand register class. However, the operand itself is not updated to make use of the COPY, thereby resulting in incorrect code. This patch fixes that bug by updating the machine operand accordingly. Reviewed By: dsanders Differential Revision: https://reviews.llvm.org/D91244	2021-01-20 10:32:52 +01:00
Kazu Hirata	e7b66f83f2	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	f118370581	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Ian Levesque	cb3e4b9e0e	[xray] Honor xray-never function-instrument attribute function-instrument=xray-never wasn't actually honored before. We were getting lucky that it worked because CodeGenFunction would omit the other xray attributes when a function was annotated with xray_never_instrument. This patch adds proper support. Differential Revision: https://reviews.llvm.org/D89441	2021-01-19 18:47:09 -05:00
Jeroen Dobbelaere	116cd71f2c	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Jessica Paquette	05fff88674	Fix buildbot after cfc60730179042a93cb9cb338982e71d20707a24 Windows buildbots were not happy with using find_if + instructionsWithoutDebug. In cfc60730179042a9, instructionsWithoutDebug is not technically necessary. So, just iterate over the block directly. http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio	2021-01-19 10:38:04 -08:00
Jessica Paquette	c4d2d8a4de	[GlobalISel] Combine (a[0]) \| (a[1] << k1) \| ...\| (a[m] << kn) into a wide load This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861) This tries to recognize patterns like below (assuming a little-endian target): ``` s8* x = ... s32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) -> s32 val = ((i32)a) s8 x = ... s32 val = a[3] \| (a[2] << 8) \| (a[1] << 16) \| (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ``` (This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.) To recognize the pattern, this searches from the last G_OR in the expression tree. E.g. ``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ``` Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic. If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.) To simplify things, this patch (1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll) At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark. Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block. Differential Revision: https://reviews.llvm.org/D94350	2021-01-19 10:24:27 -08:00
Luo, Yuanke	414d93201a	[X86] Fix tile spill merge issue. This is a additional bug fix for c5be0e0cc0. The distance for the spill instructions is wrong in previous patch. Differential Revision: https://reviews.llvm.org/D94772	2021-01-19 10:51:42 +08:00
Chen Zheng	7478f56eb6	Revert "[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike." This reverts commit 3bdf4507b66348ad78df4655a8e4f36c3fc10f3c. Post commit comments need to be addressed first.	2021-01-18 21:33:31 -05:00
Craig Topper	9e854071e5	Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This recommits 2c51bef76cbf0149101b9e7c7c658b4a58657929. I've fixed the broken check line from when I renamed the test function. Original commit message: This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount.	2021-01-18 11:08:28 -08:00
Craig Topper	4299f44938	Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This reverts commit 2c51bef76cbf0149101b9e7c7c658b4a58657929. I seem to have messed up the check lines in the test.	2021-01-18 11:00:20 -08:00
Craig Topper	7410032174	[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results. This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount. Differential Revision: https://reviews.llvm.org/D94149	2021-01-18 10:41:36 -08:00
Kazu Hirata	8b4d487fa2	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Denis Antrushin	f91dde86ae	[Statepoint] Handle `undef` operands in statepoint. Currently when spilling statepoint register operands in FixupStatepoints we do not pay attention that it might be `undef`. We just generate a spill, which may lead to verifier error because we have a use without def. To handle it, let FixupStateponts ignore `undef` register operands completely and change them to some constant value when generating stack map. Use same value as used by ISel for this purpose (0xFEFEFEFE). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D94703	2021-01-18 15:20:54 +03:00
Tres Popp	3fdf369051	Revert "[PowerPC] support register pressure reduction in machine combiner." This reverts commit 26a396c4ef481cb159bba631982841736a125a9c. See https://reviews.llvm.org/D92071 for a description of the issue.	2021-01-18 12:01:57 +01:00
Simon Pilgrim	ba5c703719	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Craig Topper	4097ff94d8	[IR] Allow scalable vectors in structs to support intrinsics returning multiple values. RISC-V would like to use a struct of scalable vectors to return multiple values from intrinsics. This woud also be needed for target independent intrinsics like llvm.sadd.overflow. This patch removes the existing restriction for this. I've modified StructType::isSized to consider a struct containing scalable vectors as unsized so the verifier won't allow loads/stores/allocas of these structs. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94142	2021-01-17 23:29:51 -08:00
Chen Zheng	6f751d75c9	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-17 23:56:13 -05:00
Qiu Chaofan	6cc3cf5799	[Legalizer] Promote result type in expanding FP_TO_XINT This patch promotes result integer type of FP_TO_XINT in expanding. So crash in conversion from ppc_fp128 to i1 will be fixed. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92473	2021-01-18 11:56:11 +08:00
Chen Zheng	2bf67e3d19	[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike. add one use check to lookThruCopyLike. The root node is safe to be deleted if we are sure that every definition in the copy chain only has one use. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-17 19:56:42 -05:00

1 2 3 4 5 ...

30071 Commits