llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00

Author	SHA1	Message	Date
David Green	fd61052e59	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Bjorn Pettersson	9033ba7995	[NewPM] Rename 'unswitch' to 'simple-loop-unswitch' in PassRegistry It is confusing to have two ways of specifying the same pass ('simple-loop-unswitch' and 'unswitch'). This patch replaces 'unswitch' by 'simple-loop-unswitch' to get a unique identifier. Using 'simple-loop-unswitch' instead of 'unswitch' also has the advantage of matching how the pass is named in DEBUG_TYPE etc. So this makes it a bit more consistent how we refer to the pass in options such as -passes, -print-after and -debug-only. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105628	2021-07-09 09:47:33 +02:00
Bjorn Pettersson	8be96b95c4	[NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg' There was an alias between 'simplifycfg' and 'simplify-cfg' in the PassRegistry. That was the original reason for this patch, which effectively removes the alias. This patch also replaces all occurrances of 'simplify-cfg' by 'simplifycfg'. Reason for choosing that form for the name is that it matches the DEBUG_TYPE for the pass, and the legacy PM name and also how it is spelled out in other passes such as 'loop-simplifycfg', and in other options such as 'simplifycfg-merge-cond-stores'. I for some reason the name should be changed to 'simplify-cfg' in the future, then I think such a renaming should be more widely done and not only impacting the PassRegistry. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105627	2021-07-09 09:47:03 +02:00
Eli Friedman	aa27065cdf	[NFC][ScalarEvolution] Cleanup howManyLessThans. In preparation for D104075. Some NFC cleanup, and some test coverage for planned changes.	2021-07-08 17:56:26 -07:00
Eli Friedman	915fc454ff	[ScalarEvolution] Fix overflow in computeBECount. There are two issues with the current implementation of computeBECount: 1. It doesn't account for the possibility that adding "Stride - 1" to Delta might overflow. For almost all loops, it doesn't, but it's not actually proven anywhere. 2. It doesn't account for the possibility that Stride is zero. If Delta is zero, the backedge is never taken; the value of Stride isn't relevant. To handle this, we have to make sure that the expression returned by computeBECount evaluates to zero. To deal with this, add two new checks: 1. Use a variety of tricks to try to prove that the addition doesn't overflow. If the proof is impossible, use an alternate sequence which never overflows. 2. Use umax(Stride, 1) to handle the possibility that Stride is zero. Differential Revision: https://reviews.llvm.org/D105216	2021-07-08 10:09:55 -07:00
Simon Pilgrim	5d69b0490b	[CostModel][X86] Account for older SSE targets with slow fp->int conversions Both the conversion cost and the xmm->gpr transfer cost tend to be a lot higher on early SSE targets	2021-07-08 18:08:24 +01:00
Sander de Smalen	3bbfdfb241	[CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand. The Legalizer expands the operations of urem/srem into a div+mul+sub or divrem when those are legal/custom. This patch changes the cost-model to reflect that cost. Since there is no 'divrem' Instruction in LLVM IR, the cost of divrem is assumed to be the same as div+mul+sub since the three operations will need to be executed at runtime regardless. Patch co-authored by David Sherwood (@david-arm) Reviewed By: RKSimon, paulwalker-arm Differential Revision: https://reviews.llvm.org/D103799	2021-07-07 14:40:28 +01:00
Simon Pilgrim	8dde5bc762	[CostModel][X86] Adjust sext/zext SSE/AVX legalized costs based on llvm-mca reports. Update costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-07 13:58:27 +01:00
Simon Pilgrim	929bb9374e	[CostModel][X86] Adjust sitofp/uitofp SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXi8/vXi16 -> vXf32/vXf64 sitofp/uitofp costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-07 12:03:45 +01:00
Eli Friedman	b83eae9454	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
Eli Friedman	61b59d3278	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit 74d6ce5d5f169e9cf3fac0eb1042602e286dd2b9. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Eli Friedman	b011bc0424	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Simon Pilgrim	b8bec02bc7	[CostModel][X86] fptosi/fptoui to i8/i16 are truncated from fptosi to i32 Provide a generic fallback that performs the fptosi to i32 types, then truncates to sub-i32 scalars. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.	2021-07-06 17:28:03 +01:00
Simon Pilgrim	1787dc0460	[CostModel][X86] i8/i16 sitofp/uitofp are sext/zext to i32 for sitofp Provide a generic fallback that extends sub-i32 scalars before using the existing sitofp instructions. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first. We get the extension for free for non-vector loads.	2021-07-06 13:58:52 +01:00
Caroline Concatto	1631d2fbaa	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Simon Pilgrim	8ed3c99d18	[CostModel][X86] Handle costs for insert/extractelement with non-immediate indices via stack Determine the insert/extractelement costs when performing this as a sequence of aliased loads+stores via the stack.	2021-07-05 13:26:53 +01:00
Simon Pilgrim	4d75d8ef44	[CostModel][X86] Adjust i32/i64 to f32/f64 scalar based on llvm-mca reports (+ Agner). Older SSE targets have slower gpr->fpu scalar conversions - we also need to account for uitofp i32 > f32/f64 being lowered as sitofp i64 -> f32/f64	2021-07-05 13:26:53 +01:00
Sjoerd Meijer	cda6f69edf	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
Simon Pilgrim	5e6cee0948	[CostModel][X86] Drop some hard coded fp<->int scalarization costs Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695	2021-07-02 14:29:32 +01:00
Simon Pilgrim	1a043ee5ed	[CostModel][X86] Find AVX conversion costs using legalized types if custom types didn't match Building on rG2a1ef8784ad9a, fallback to attempting to match against legalized types like we do for SSE targets.	2021-07-02 13:49:31 +01:00
Simon Pilgrim	c54e15d18e	[CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports. Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695. Fixes a few regressions before we start adding AVX costs for legalized types.	2021-07-02 13:09:00 +01:00
Florian Hahn	6aab8b237f	[AArch64] Use custom lowering for fp16 vector copysign. The custom copysign lowering already supports fp16. Use it. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105277	2021-07-02 11:15:30 +01:00
Simon Pilgrim	f0a2f7f0f9	[CostModel][X86] Adjust fp<->int vXi32 SSE legalized costs based on llvm-mca reports. Building on rG2a1ef8784ad9a, adjust the SSE cost tables to use the legalized types based on the worst case costs from the script in D103695. To account for different numbers of src/dst legalized type registers we must scale the cost by maximum of the src/dst, not just use src	2021-07-01 15:34:20 +01:00
Simon Pilgrim	b53b3c8d1c	[CostModel][X86] getCastInstrCost - attempt to match custom cast/conversion before legalized types. Move the (SSE-only) generic, legalized type conversion matching after the specific,custom conversion cases, allowing us to properly provide cost overrides. The next step will be to clean up some of the weird existing costs and then to enable AVX+ legalized costs, which will let us strip out a lot of the cost tables entries.	2021-07-01 12:06:40 +01:00
Florian Hahn	bd03d58d2a	[BasicAA] Use separate scale variable for GCD. Use separate variable for adjusted scale used for GCD computations. This fixes an issue where we incorrectly determined that all indices are non-negative and returned noalias because of that. Follow up to 91fa3565da16.	2021-06-30 20:04:39 +01:00
Florian Hahn	78eb09629d	[BasicAA] Add test for incorrectly inferring noalias due to scale sign. This patch adds a test where we currently incorrectly determine noalias, because the sign of Scale is adjusted after 91fa3565da16.	2021-06-30 19:57:29 +01:00
Philip Reames	0b160e9666	[SCEV] Fold (0 udiv %x) to 0 We have analogous rules in instsimplify, etc.., but were missing the same in SCEV. The fold is near trivial, but came up in the context of a larger change.	2021-06-30 08:31:13 -07:00
Philip Reames	7110dfba97	[test] precommit a test for missing (0 /u %x) SCEV fold	2021-06-30 08:26:34 -07:00
Simon Pilgrim	3ab40c9e92	[CostModel][X86] Adjust fp<->int vXi32 AVX1+ costs based on llvm-mca reports Based off the worse case numbers generated by D103695, the AVX1/2/512 sitofp/uitofp/fptosi/fptoui costs were higher than necessary (based off instruction counts instead of actual throughput). The SSE costs still need further fixes, but I hit an issue with the order in which SSE costs are checked - we need to check CUSTOM costs (with non-legal types) first, and then fallback to LEGALIZED types. I'm looking at this now, and this should let us start thinning out a lot of the duplicates in the costs tables. Then we can finally start work on vXi64 / vXi16 / vXi8 / vXi1 integers, which should let us look at sub-128-bit vectorization (D103925).	2021-06-30 15:23:34 +01:00
alex-t	a236991319	[AMDGPU] PHI node cost should not be counted for the size and latency. Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the result of the PHI lowering are inserted into the basic block predecessors - not into the block itself. As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop body size/cost estimation. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105104	2021-06-30 16:11:17 +03:00
Florian Hahn	8002fe7d67	[BasicAA] Be more careful with modulo ops on VariableGEPIndex. (V * Scale) % X may not produce the same result for any possible value of V, e.g. if the multiplication overflows. This means we currently incorrectly determine NoAlias in some cases. This patch updates LinearExpression to track whether the expression has NSW and uses that to adjust the scale used for alias checks. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99424	2021-06-29 09:22:36 +01:00
Florian Hahn	412987a7f5	[BasicAA] Add test to cover GetIndexDifference change in D99424. Precommit test case for a change to GetIndexDifference in D99424.	2021-06-28 16:03:05 +01:00
Florian Hahn	068181fb4a	[SCEV] Support single-cond range check idiom in applyLoopGuards. This patch extends applyLoopGuards to detect a single-cond range check idiom that InstCombine generates. It extends applyLoopGuards to detect conditions of the form (-C1 + X < C2). InstCombine will create this form when combining two checks of the form (X u< C2 + C1) and (X >=u C1). In practice, this enables us to correctly compute a tight trip count bounds for code as in the function below. InstCombine will fold the minimum iteration check created by LoopRotate with the user check (< 8). void unsigned_check(short pred, unsigned width) { if (width < 8) { for (int x = 0; x < width; x++) pred[x] = pred[x] pred[x]; } } As a consequence, LLVM creates dead vector loops for the code above, e.g. see https://godbolt.org/z/cb8eTcqET https://alive2.llvm.org/ce/z/SHHW4d Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104741	2021-06-25 10:24:40 +01:00
Rosie Sumpter	de55f5a044	[CostModel][AArch64] Improve cost model for vector reduction intrinsics OR, XOR and AND entries are added to the cost table. An extra cost is added when vector splitting occurs. This is done to address the issue of a missed SLP vectorization opportunity due to unreasonably high costs being attributed to the vector Or reduction (see: https://bugs.llvm.org/show_bug.cgi?id=44593). Differential Revision: https://reviews.llvm.org/D104538	2021-06-24 12:02:58 +01:00
Florian Hahn	9acf2cef1e	[SCEV] Support signed predicates in applyLoopGuards. This adds handling for signed predicates, similar to how unsigned predicates are already handled. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104732	2021-06-23 10:21:05 +01:00
Florian Hahn	71133f8ae4	[SCEV] Add tests with single-cond range check generated by InstComb.	2021-06-23 10:16:57 +01:00
Florian Hahn	e93c4c3336	[SCEV] Retain AddExpr flags when subtracting a foldable constant. Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2. But we can retain flags under certain conditions: * Adding a smaller constant is NUW if the original AddExpr was NUW. * Adding a constant with the same sign and small magnitude is NSW, if the original AddExpr was NSW. This can improve results after using `SimplifyICmpOperands`, which may subtract one in order to use stricter predicates, as is the case for `isKnownPredicate`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104319	2021-06-22 11:27:51 +01:00
Vitaly Buka	379a6155dd	[NFC] Add getUnderlyingObjects test Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D104585	2021-06-21 16:36:50 -07:00
Eli Friedman	19ad020e28	[ScalarEvolution] Ensure backedge-taken counts are not pointers. A backedge-taken count doesn't refer to memory; returning a pointer type is nonsense. So make sure we always return an integer. The obvious way to do this would be to just convert the operands of the icmp to integers, but that doesn't quite work out at the moment: isLoopEntryGuardedByCond currently gets confused by ptrtoint operations. So we perform the ptrtoint conversion late for lt/gt operations. The test changes are mostly innocuous. The most interesting changes are more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to i64)) + %ptr)". This is expected: we can't fold this to zero because we need to preserve the pointer base. The call to isLoopEntryGuardedByCond in howFarToZero is less precise because of ptrtoint operations; this shows up in the function pr46786_c26_char in ptrtoint.ll. Fixing it here would require more complex refactoring. It should eventually be fixed by future improvements to isImpliedCond. See https://bugs.llvm.org/show_bug.cgi?id=46786 for context. Differential Revision: https://reviews.llvm.org/D103656	2021-06-21 16:24:16 -07:00
Philip Reames	160f03fe5f	Split a test for ease of auto update	2021-06-21 11:02:26 -07:00
Eli Friedman	e4884552df	[ScalarEvolution] Fix pointer/int type handling converting select/phi to min/max. The old version of this code would blindly perform arithmetic without paying attention to whether the types involved were pointers or integers. This could lead to weird expressions like negating a pointer. Explicitly handle simple cases involving pointers, like "x < y ? x : y". In all other cases, coerce the operands of the comparison to integer types. This avoids the weird cases, while handling most of the interesting cases. Differential Revision: https://reviews.llvm.org/D103660	2021-06-17 14:05:12 -07:00
Bjorn Pettersson	29ffba4b56	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Rosie Sumpter	b4082e9aee	[CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic Added a case for CTPOP to AArch64TTIImpl::getIntrinsicInstrCost so that the cost estimate matches the codegen in test/CodeGen/AArch64/arm64-vpopcnt.ll Differential Revision: https://reviews.llvm.org/D103952	2021-06-11 11:15:46 +01:00
Philip Reames	d870f55264	[SCEV] Use mustprogress flag on loops (in addition to function attribute) This addresses a performance regression reported against 3c6e4191. That change (correctly) limited a transform based on assumed finiteness to mustprogress loops, but the previous change (38540d7) which introduced the mustprogress check utility only handled function attributes, not the loop metadata form. It turns out that clang uses the function attribute form for C++, and the loop metadata form for C. As a result, 3c6e4191 ended up being a large regression in practice for C code as loops weren't being considered mustprogress despite the language semantics.	2021-06-10 13:20:28 -07:00
Irina Dobrescu	e44ee5c21e	[AArch64] Add cost tests for bitreverse This patch includes cost tests for bit reverse as well as some adjustments to the cost model. Differential Revision: https://reviews.llvm.org/D102755	2021-06-10 14:51:33 +01:00
Philip Reames	9d72ae84c7	[tests] Precommit test for D103991	2021-06-09 15:05:54 -07:00
Florian Hahn	f91e95602e	[SCEV] Keep common NUW flags when inlining Add operands. Currently, NoWrapFlags are dropped if we inline operands of SCEVAddExpr operands. As a consequence, we always drop flags when building expressions like `getAddExpr(A, getAddExpr(B, C, NUW), NUW)`. We should be able to retain NUW flags common among all inlined SCEVAddExpr and the original flags. Reviewed By: nikic, mkazantsev Differential Revision: https://reviews.llvm.org/D103877	2021-06-09 17:13:21 +01:00
Florian Hahn	92b2a39d20	[ScalarEvolution] Add test for preserving add overflow flags.	2021-06-09 09:20:02 +01:00
Kerry McLaughlin	d259f6577a	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Simon Pilgrim	b21f333bb4	[CostModel][X86] Improve AVX1/AVX2 truncation costs Based off the worse case numbers generated by D103695, we were overestimating the cost of a number of vector truncations: AVX2: v2i32->v2i8, v2i64->v2i16 + v4i64->v4i32 AVX1: v2i32->v2i8, v4i64->v4i16 + v16i16->v16i8 Once we have a working set of conversion costs, the intention is to cleanup the tables and use legalized types a lot more to reduce the number of entries we currently have.	2021-06-08 10:41:03 +01:00

1 2 3 4 5 ...

2799 Commits