llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 12:12:47 +01:00

Author	SHA1	Message	Date
Craig Topper	4511712eb4	[RISCV] Implement lround/llround/lrint/llrint with fcvt instruction with -fno-math-errno These are fp->int conversions using either RMM or dynamic rounding modes. The lround and lrint opcodes have a return type of either i32 or i64 depending on sizeof(long) in the frontend which should follow xlen. llround/llrint should always return i64 so we'll need a libcall for those on rv32. The frontend will only emit the intrinsics if -fno-math-errno is in effect otherwise a libcall will be emitted which will not use these ISD opcodes. gcc also does this optimization. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D105206	2021-07-06 11:43:22 -07:00
David Green	fc3dffa9a7	[Tests] Update some tests for D104765. NFC	2021-07-06 19:23:52 +01:00
LLVM GN Syncbot	7ae669846a	[gn build] Port 8517a26d442f	2021-07-06 18:17:43 +00:00
Eli Friedman	61b59d3278	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit 74d6ce5d5f169e9cf3fac0eb1042602e286dd2b9. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Sanjay Patel	b4665b3453	[InstSimplify] fix bug in poison propagation for FP ops If any operand of a math op is poison, that takes precedence over general undef/NaN. This should not be visible with binary ops because it requires 2 constant operands to trigger (and if both operands of a binop are constant, that should get handled first in ConstantFolding).	2021-07-06 14:06:50 -04:00
Sanjay Patel	73092c39ee	[InstSimplify][test] add tests for poison propagation through FP calls; NFC	2021-07-06 14:06:50 -04:00
Philip Reames	394bad8809	[LV] Disable epilogue vectorization for non-latch exits When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed. Better late (4 months), than never right? I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error. Thus, let's play it safe for now.	2021-07-06 10:57:10 -07:00
Philip Reames	2a3dbbb793	[LoopVersion] Move an assert [nfc-ish]	2021-07-06 10:57:10 -07:00
Eli Friedman	b011bc0424	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Jonas Paulsson	e4905eeb01	[SystemZ] Support the 'N' code for the odd register in inline-asm. The odd register of a (128 bit) register pair is accessed with the 'N' code with an inline assembly operand. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D105502	2021-07-06 19:46:49 +02:00
Jeremy Morse	409363cd51	[DebugInfo][InstrRef][3/4] Produce DBG_INSTR_REFs for all variable locations This patch emits DBG_INSTR_REFs for two remaining flavours of variable locations that weren't supported: copies, and inter-block VRegs. There are still some locations that must be represented by DBG_VALUE such as constants, but they're mostly independent of optimisations. For variable locations that refer to values defined in different blocks, vregs are allocated before isel begins, but the defining instruction might not exist until late in isel. To get around this, emit DBG_INSTR_REFs in a "half done" state, where the first operand refers to a VReg. Then at the end of isel, patch these back up to refer to instructions, using the finalizeDebugInstrRefs method. Copies are something that I complained about the original RFC, and I really don't want to have to put instruction numbers on copies. They don't define a value: they move them. To address this isel, salvageCopySSA interprets: * COPYs, * SUBREG_TO_REG, * Anything that isCopyInstr thinks is a copy. And follows chains of copies back to the defining instruction that they read from. This relies on any physical registers that COPYs read being defined in the same block, or being entry-block arguments. For the former we can put an instruction number on the defining instruction; for the latter we can drop a DBG_PHI that reads the incoming value. Differential Revision: https://reviews.llvm.org/D88896	2021-07-06 18:31:38 +01:00
Craig Topper	ddaf66b4f7	[RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors. This adds a DAG combine to detect sext/zext inputs and emit a new ISD opcode. The extends will either be removed or replaced with narrower extends. Isel patterns are used to match add and widening mul to vwmacc similar to the recently added vmacc patterns. There's still some work to be to match vmulsu. We should also rewrite splats that were extended as scalars and then splatted. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D104802	2021-07-06 10:24:31 -07:00
Arnold Schwaighofer	2b83fc820a	Fix coro lowering of single predecessor phis Code assumes that uses of single predecessor phis are not live accross suspend points. Cleanup any single predecessor phis preceeding the code making this assumption. rdar://76020301 Differential Revision: https://reviews.llvm.org/D105488	2021-07-06 10:22:25 -07:00
Simon Pilgrim	b8bec02bc7	[CostModel][X86] fptosi/fptoui to i8/i16 are truncated from fptosi to i32 Provide a generic fallback that performs the fptosi to i32 types, then truncates to sub-i32 scalars. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.	2021-07-06 17:28:03 +01:00
Jonas Paulsson	f20b0fa5e7	[SystemZ] Generate XC loop for memset 0 of variable length. Benchmarking has shown that it is worthwhile to implement a variable length memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall. This requires the use of the EXecute Relative Long (EXRL) instruction which can now be done in a framework that can also be used with other target instructions (not just XC). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103865	2021-07-06 18:07:31 +02:00
Alexey Bataev	28343f1971	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Arnold Schwaighofer	d7fbb97dd0	Use swift mangling for resume functions The resume partial functions generated for swift suspend points will now use a Swift mangling suffix. Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_") and suspend resume partial functions will use the suffix 'TY'[0-9]+'_' (e.g "...TY1_"). Reviewed By: nate_chandler Differential Revision: https://reviews.llvm.org/D104144	2021-07-06 08:27:46 -07:00
Bradley Smith	5f6637b99b	[AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru Differential Revision: https://reviews.llvm.org/D105348	2021-07-06 14:17:23 +00:00
Florian Hahn	d6b40438e4	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit 706bbfb35bd31051e46ac77aab3e9b2dbc3abe78. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Simon Pilgrim	1787dc0460	[CostModel][X86] i8/i16 sitofp/uitofp are sext/zext to i32 for sitofp Provide a generic fallback that extends sub-i32 scalars before using the existing sitofp instructions. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first. We get the extension for free for non-vector loads.	2021-07-06 13:58:52 +01:00
Kerry McLaughlin	8dd39c43b3	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Peter Waller	f69dc8533f	[CodeGen][AArch64][SVE] Use ld1r[bhsd] for vector splat from memory This avoids the use of the vector unit for copying from scalar to vector. There is an extra ptrue instruction, but a predicate register with the ptrue pattern populated is likely to be free in the context of real code. Tests were generated from a template to cover the axes mentioned at the top of the test file. Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D103170	2021-07-06 12:03:54 +00:00
Florian Hahn	4e40134cc8	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit 3fed6d443f802c43aade1b5b1b09f5e2f8b3edb1, bbcbf21ae60c928e07dde6a1c468763b3209d1e6 and 6c3451cd76cbd0cd973d9c2b08b168dcd0bce3c2. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	47d904c90f	[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual. VPReductionRecipe overrides those implementations. Mark them as virtual in the VPWidenPHIRecipe to unbreak build in certain configurations.	2021-07-06 12:00:41 +01:00
Florian Hahn	d1df2cf22b	[VPlan] Add destructor to VPReductionRecipe to unbreak build. Attempt to unbreak https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio	2021-07-06 11:41:20 +01:00
Jay Foad	fc0855a822	[AMDGPU] Remove outdated comment and tidy up. NFC. This was left over from D94746.	2021-07-06 11:29:36 +01:00
Florian Hahn	29223963b7	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Sebastian Neubauer	c9afbee37c	[AMDGPU] Set optional PAL metadata Set informational fields in the .shader_functions table. Also correct the documentation, .scratch_memory_size and .lds_size are integers. Differential Revision: https://reviews.llvm.org/D105116	2021-07-06 11:58:00 +02:00
Kerry McLaughlin	562db8dad2	[LV] Collect a list of all element types found in the loop (NFC) Splits `getSmallestAndWidestTypes` into two functions, one of which now collects a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not have to iterate over all instructions in the loop again in other places, such as in D102253 which disables scalable vectorization of a loop if any of the instructions use invalid types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105437	2021-07-06 10:37:41 +01:00
Albion Fung	2776c1ab5d	[PowerPC] Implament Load and Reserve and Store Conditional Builtins This patch implaments the load and reserve and store conditional builtins for the PowerPC target, in order to have feature parody with xlC on AIX. Differential revision: https://reviews.llvm.org/D105236	2021-07-05 21:35:41 -05:00
David Green	797d9342f0	[ARM] Fix arm.mve.pred.v2i range upper limit The range metadata specifies a half open range, so our top limit was one off.	2021-07-05 21:06:30 +01:00
Akira Hatanaka	0cc046d72c	[ObjC][ARC] Prevent moving objc_retain calls past objc_release calls that release the retained object This patch fixes what looks like a longstanding bug in ARC optimizer where it reverses the order of objc_retain calls and objc_release calls that retain and release the same object. The code in ARC optimizer that is responsible for code motion takes the following steps: 1. Traverse the CFG bottom-up and determine how far up objc_release calls can be moved. Determine the insertion points for the objc_release calls, but don't actually move them. 2. Traverse the CFG top-down and determine how far down objc_retain calls can be moved. Determine the insertion points for the objc_retain calls, but don't actually move them. 3. Try to move the objc_retain and objc_release calls if they can't be removed. The problem is that the insertion points for the objc_retain calls are determined in step 2 without taking into consideration the insertion points for objc_release calls determined in step 1, so the order of an objc_retain call and an objc_release call can be reversed, which is incorrect, even though each step is correct in isolation. To fix this bug, this patch teaches the top-down traversal step to take into consideration the insertion points for objc_release calls determined in the bottom-up traversal step. Code motion for an objc_retain call is disabled if there is a possibility that it can be moved past an objc_release call that releases the retained object. rdar://79292791 Differential Revision: https://reviews.llvm.org/D104953	2021-07-05 12:16:15 -07:00
Nico Weber	27befb62e0	[gn build] (manually) port 98f078324fc5 (llvm-strings Opts.td)	2021-07-05 14:43:05 -04:00
Sushma Unnibhavi	44791e33f4	[M68k][GloballSel] Lower outgoing return values in IRTranslator Implementation of lowerReturn in the IRTranslator for the M68k backend. Differential Revision: https://reviews.llvm.org/D105332	2021-07-05 11:39:09 -07:00
Fangrui Song	98d2a19fea	[llvm-strings] Switch command line parsing from llvm::cl to OptTable Some behavior changes: * `-t=d` is removed. Use `-t d` instead. * one-dash long options like `-all` are supported. Use `--all` instead. * `--all=0` or `--all=false` cannot be used. (Note: `--all` is silently ignored anyway) * `--help-list` is removed. This is a `cl::` specific option. Nobody is likely leveraging any of the above. Advantages: * `-t` diagnostic gets improved. * in the absence of `HideUnrelatedOptions`, `--help` will not list unrelated options if linking against libLLVM-13git.so or linker GC is not used. * Decrease the probability of cl::opt collision if we do decide to support multiplexing Note: because the tool is so simple, used more for forensics instead of a building tool, and its long options are unlikely used in one-dash form, I just drop the one-dash form in this patch. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D104889	2021-07-05 10:46:17 -07:00
Akira Hatanaka	bb430adbb5	Precommit another test for https://reviews.llvm.org/D104953	2021-07-05 10:28:03 -07:00
Tiehu Zhang	66e4833ec7	[AArch64ISelDAGToDAG] Fix ORRWrs/ORRXrs usefulbits calculation bug For the following case: t8: i32 = or t7, t4 t10: i32 = ORRWrs t8, t8, TargetConstant:i32<73> Current code wrongly returns (t8 >> shiftConstant) as the UsefulBits of t8, which in fact is (t8 \| (t8 >> shiftConstant)). Reviewed by: sdesmalen, mdchen Differential Revision: https://reviews.llvm.org/D102759	2021-07-06 00:38:42 +08:00
Paul Walker	3ed0563b6c	Fix typo in help text for -aarch64-enable-branch-targets.	2021-07-05 16:15:40 +01:00
Anirudh Prasad	5a7de1d054	[MCParser][z/OS] Mark a few tests as unsupported for the z/OS Target - Background here is that that these sets of tests are "invalid" to be run on z/OS - The reason is because these test constructs that HLASM never supports (HLASM doesn't support GNU style directives) - Usually tests are geared towards a particular target via the use of a triple that targets just that platform, but these tests require the use of a "default triple" - Thus, we mark these tests as "UNSUPPORTED" for z/OS since we don't want to run these for z/OS Reviewed By: yusra.syeda, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D105204	2021-07-05 11:06:52 -04:00
Florian Hahn	137df65805	[LV] Extend FIXME in test add in 91ee1e379901af3.	2021-07-05 15:56:03 +01:00
Florian Hahn	494eb99702	[LV] Add initial test cases with small clamped indices.	2021-07-05 15:51:12 +01:00
Sanjay Patel	4906453b2f	[InstCombine] fold icmp slt/sgt of offset value with constant This follows up patches for the unsigned siblings: 0c400e895306 c7b658aeb526 We are translating an offset signed compare to its unsigned equivalent when one end of the range is at the limit (zero or unsigned max). (X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1) (X + C2) <s C --> X >u (C ^ SMAX) (if C == C2) This probably does not show up much in IR derived from C/C++ source because that would likely have 'nsw', and we have folds for that already. As with the previous unsigned transforms, the folds could be generalized to handle non-constant patterns: https://alive2.llvm.org/ce/z/Y8Xrrm ; sgt define i1 @src(i8 %a, i8 %c) { %c2 = add i8 %c, 1 %t = add i8 %a, %c2 %ov = icmp sgt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_off = sub i8 127, %c ; SMAX %ov = icmp ult i8 %a, %c_off ret i1 %ov } https://alive2.llvm.org/ce/z/c8uhnk ; slt define i1 @src(i8 %a, i8 %c) { %t = add i8 %a, %c %ov = icmp slt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_offnot = xor i8 %c, 127 ; SMAX %ov = icmp ugt i8 %a, %c_offnot ret i1 %ov }	2021-07-05 10:08:31 -04:00
Sanjay Patel	0acfe2bcae	[InstCombine][tests] add tests for signed icmp with constant and offset; NFC	2021-07-05 10:08:31 -04:00
Caroline Concatto	1631d2fbaa	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Wang, Pengfei	cd7ebd1f50	[X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) This patch fixes PR50823. The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104903	2021-07-05 21:29:42 +08:00
Simon Pilgrim	8ed3c99d18	[CostModel][X86] Handle costs for insert/extractelement with non-immediate indices via stack Determine the insert/extractelement costs when performing this as a sequence of aliased loads+stores via the stack.	2021-07-05 13:26:53 +01:00
Simon Pilgrim	4d75d8ef44	[CostModel][X86] Adjust i32/i64 to f32/f64 scalar based on llvm-mca reports (+ Agner). Older SSE targets have slower gpr->fpu scalar conversions - we also need to account for uitofp i32 > f32/f64 being lowered as sitofp i64 -> f32/f64	2021-07-05 13:26:53 +01:00
Sanjay Patel	ab96742bc5	[InstSimplify] fold extractelement of splat with variable extract index We already have a fold for variable index with constant vector, but if we can determine a scalar splat value, then it does not matter whether that value is constant or not. We overlooked this fold in D102404 and earlier patches, but the fixed vector variant is shown in: https://llvm.org/PR50817 Alive2 agrees on that: https://alive2.llvm.org/ce/z/HpijPC The same logic applies to scalable vectors. Differential Revision: https://reviews.llvm.org/D104867	2021-07-05 08:19:40 -04:00
Caroline Concatto	eae00d0f38	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Bradley Smith	e338c9199a	[AArch64][SVE] Improve fixed length codegen for common vector shuffle case Improve codegen when lowering the common vector shuffle case from the vectorizer (op1[last]:op2[0:last-1]). This patch only handles this common case as it is difficult to handle this more generally when using fixed length vectors, due to being unable to use the SVE ext instruction. Differential Revision: https://reviews.llvm.org/D105289	2021-07-05 12:09:27 +01:00

1 2 3 4 5 ...

218189 Commits