llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00

Author	SHA1	Message	Date
Philip Reames	be7020e420	[tests] Add a couple of tests for zero stride trip counts w/loop varying exit values	2021-07-19 16:33:10 -07:00
Eli Friedman	18c83b5d27	[ScalarEvolution] Refine computeMaxBECountForLT to be accurate in more cases. Allow arbitrary strides, and make sure we return the correct result when the backedge-taken count is zero. Differential Revision: https://reviews.llvm.org/D106197	2021-07-19 15:43:30 -07:00
Simon Pilgrim	6e7877566a	[CostModel][X86] Add fast math tests for float reductions As noticed on D105432 we didn't have any coverage to distinguish between fast/exact float reductions	2021-07-19 13:01:28 +01:00
Eli Friedman	05a71b0a6d	[ScalarEvolution] Fix overflow in computeBECount. The current implementation of computeBECount doesn't account for the possibility that adding "Stride - 1" to Delta might overflow. For almost all loops, it doesn't, but it's not actually proven anywhere. To deal with this, use a variety of tricks to try to prove that the addition doesn't overflow. If the proof is impossible, use an alternate sequence which never overflows. Differential Revision: https://reviews.llvm.org/D105216	2021-07-16 16:15:18 -07:00
Philip Reames	e01009f13f	[tests] Precommit test for D104140	2021-07-16 10:57:59 -07:00
Philip Reames	1c5ed99606	[test] Extend negative stride backedge tests to cover signed comparisons	2021-07-16 10:29:22 -07:00
Philip Reames	055a12795f	[SCEV] Add tests for known negative strides in trip count logic	2021-07-16 10:08:31 -07:00
Eli Friedman	b7cade9437	[DependenceAnalysis] Guard analysis using getPointerBase(). D104806 broke some uses of getMinusSCEV() in DependenceAnalysis: subtraction with different pointer bases returns a SCEVCouldNotCompute. Make sure we avoid cases involving such subtractions. Differential Revision: https://reviews.llvm.org/D106099	2021-07-15 14:57:32 -07:00
Philip Reames	6a45d08863	[SCEV] Fix unsound reasoning in howManyLessThans This is split from D105216, it handles only a subset of the cases in that patch. Specifically, the issue being fixed is that the code incorrectly assumed that (Start-Stide) < End implied that the backedge was taken at least once. This is not true when e.g. Start = 4, Stride = 2, and End = 3. Note that we often do produce the right backedge taken count despite the flawed reasoning. The fix chosen here is to use an alternate form of uceil (ceiling of unsigned divide) lowering which is safe when max(RHS,Start) > Start - Stride. (Note that signedness of both max expression and comparison depend on the signedness of the comparison being analyzed, and that overflow in the Start - Stride expression is allowed.) Note that this is weaker than proving the backedge is taken because it allows start - stride < end < start. Some cases which can't be proven safe are sent down the generic path, and we do end up generating less optimal expressions in a few cases. Credit for coming up with the approach goes entirely to Eli. I just split it off, tweaked the comments a bit, and did some additional testing. Differential Revision: https://reviews.llvm.org/D105942	2021-07-15 10:32:47 -07:00
Philip Reames	99a7d1e6cf	[tests] Stablize tests for possible change in deref semantics This is conceptually part of e75a2dfe. This file contains both tests whose results don't change (with the right attributes added), and tests which fundementally regress with the current proposal. Doing the update took some care, thus the seperate change. Here's the e75a2dfe context repeated: There's a potential change in dereferenceability attribute semantics in the nearish future. See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context. This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics. Note that for many of these cases, O3 would infer exactly these attributes on the test IR. This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory. There's a couple other tests which need more one-off attention, they'll be handled in another change.	2021-07-14 13:37:50 -07:00
Philip Reames	b12ecd2ebd	[tests] Stablize tests for possible change in deref semantics There's a potential change in dereferenceability attribute semantics in the nearish future. See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context. This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics. Note that for many of these cases, O3 would infer exactly these attributes on the test IR. This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory. There's a couple other tests which need more one-off attention, they'll be handled in another change.	2021-07-14 13:05:43 -07:00
Sander de Smalen	d616b58c4d	[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882	2021-07-14 16:44:22 +01:00
Simon Pilgrim	32c30ddee7	[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic: small := CVTTPS2SI(x); fp_to_ui(x) := small \| (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31)) Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering. @TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded). Original Patch by @TomHender (Tom Hender) Differential Revision: https://reviews.llvm.org/D89697	2021-07-14 12:03:49 +01:00
Philip Reames	dbffa1f255	[SCEV] Handle zero stride correctly in howManyLessThans This is split from D105216, but the code is hoisted much earlier into the path where we can actually get a zero stride flowing through. Some fairly simple proofs handle the cases which show up in practice. The only test changes are the cases where we really do need a non-zero divider to produce the right result. Recommitting with isLoopInvariant() check. Differential Revision: https://reviews.llvm.org/D105921	2021-07-13 19:14:01 -07:00
Arthur Eubanks	222304d82f	Revert "[SCEV] Handle zero stride correctly in howManyLessThans" This reverts commit 4df591b5c960affd1612e330d0c9cd3076c18053. Causes crashes, see comments on D105921.	2021-07-13 17:53:48 -07:00
Eli Friedman	60fc4afb9a	[ScalarEvolution] Make isKnownNonZero handle more cases. Using an unsigned range instead of signed ranges is a bit more precise. Differential Revision: https://reviews.llvm.org/D105941	2021-07-13 15:36:45 -07:00
Philip Reames	50475a84ee	[SCEV] Handle zero stride correctly in howManyLessThans This is split from D105216, but the code is hoisted much earlier into the path where we can actually get a zero stride flowing through. Some fairly simple proofs handle the cases which show up in practice. The only test changes are the cases where we really do need a non-zero divider to produce the right result. Differential Revision: https://reviews.llvm.org/D105921	2021-07-13 13:31:40 -07:00
Eli Friedman	240a6f4cf4	[NFC] Use CHECK-LABEL in trip-count-unknown-stride.ll	2021-07-13 12:21:13 -07:00
Philip Reames	b8173e8f1b	[tests] Precommit a test case from D105216	2021-07-13 12:02:44 -07:00
Philip Reames	bb03652033	[test] Add a SCEV backedge computation test with an explicit zero stride	2021-07-13 11:08:26 -07:00
Simon Pilgrim	436c202090	[CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-12 20:38:25 +01:00
Simon Pilgrim	063de0d715	[CostModel][X86] Adjust truncate SSE/AVX legalized costs based on llvm-mca reports. Update truncation costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-12 13:50:43 +01:00
Eli Friedman	9219080ab8	[NFC][ScalarEvolution] Precommit tests for D104075.	2021-07-09 23:33:19 -07:00
Nikita Popov	575750b257	Reapply [IR] Don't mark mustprogress as type attribute Reapply with fixes for clang tests. ----- This is a simple enum attribute. Test changes are because enum attributes are sorted before type attributes, so mustprogress is now in a different position.	2021-07-09 20:57:44 +02:00
Nikita Popov	ac1ee01737	Revert "[IR] Don't mark mustprogress as type attribute" This reverts commit 84ed3a794b4ffe7bd673f1e5a17d507aa3113d12. A number of clang tests are also affected by this change. Revert until I can update them.	2021-07-09 18:46:00 +02:00
Nikita Popov	7845932811	[IR] Don't mark mustprogress as type attribute This is a simple enum attribute. Test changes are because enum attributes are sorted before type attributes.	2021-07-09 18:24:16 +02:00
Martin Storsjö	28e1cf3bb9	Revert "[ScalarEvolution] Fix overflow in computeBECount." This reverts commit 5b350183cdabd83573bc760ddf513f3e1d991bcb (and also "[NFC][ScalarEvolution] Cleanup howManyLessThans.", 009436e9c1fee1290d62bc0faafe0c0295542f56, to make it apply). See https://reviews.llvm.org/D105216 for discussion on various miscompilations caused by that commit.	2021-07-09 14:26:48 +03:00
David Green	fd61052e59	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Bjorn Pettersson	9033ba7995	[NewPM] Rename 'unswitch' to 'simple-loop-unswitch' in PassRegistry It is confusing to have two ways of specifying the same pass ('simple-loop-unswitch' and 'unswitch'). This patch replaces 'unswitch' by 'simple-loop-unswitch' to get a unique identifier. Using 'simple-loop-unswitch' instead of 'unswitch' also has the advantage of matching how the pass is named in DEBUG_TYPE etc. So this makes it a bit more consistent how we refer to the pass in options such as -passes, -print-after and -debug-only. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105628	2021-07-09 09:47:33 +02:00
Bjorn Pettersson	8be96b95c4	[NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg' There was an alias between 'simplifycfg' and 'simplify-cfg' in the PassRegistry. That was the original reason for this patch, which effectively removes the alias. This patch also replaces all occurrances of 'simplify-cfg' by 'simplifycfg'. Reason for choosing that form for the name is that it matches the DEBUG_TYPE for the pass, and the legacy PM name and also how it is spelled out in other passes such as 'loop-simplifycfg', and in other options such as 'simplifycfg-merge-cond-stores'. I for some reason the name should be changed to 'simplify-cfg' in the future, then I think such a renaming should be more widely done and not only impacting the PassRegistry. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105627	2021-07-09 09:47:03 +02:00
Eli Friedman	aa27065cdf	[NFC][ScalarEvolution] Cleanup howManyLessThans. In preparation for D104075. Some NFC cleanup, and some test coverage for planned changes.	2021-07-08 17:56:26 -07:00
Eli Friedman	915fc454ff	[ScalarEvolution] Fix overflow in computeBECount. There are two issues with the current implementation of computeBECount: 1. It doesn't account for the possibility that adding "Stride - 1" to Delta might overflow. For almost all loops, it doesn't, but it's not actually proven anywhere. 2. It doesn't account for the possibility that Stride is zero. If Delta is zero, the backedge is never taken; the value of Stride isn't relevant. To handle this, we have to make sure that the expression returned by computeBECount evaluates to zero. To deal with this, add two new checks: 1. Use a variety of tricks to try to prove that the addition doesn't overflow. If the proof is impossible, use an alternate sequence which never overflows. 2. Use umax(Stride, 1) to handle the possibility that Stride is zero. Differential Revision: https://reviews.llvm.org/D105216	2021-07-08 10:09:55 -07:00
Simon Pilgrim	5d69b0490b	[CostModel][X86] Account for older SSE targets with slow fp->int conversions Both the conversion cost and the xmm->gpr transfer cost tend to be a lot higher on early SSE targets	2021-07-08 18:08:24 +01:00
Sander de Smalen	3bbfdfb241	[CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand. The Legalizer expands the operations of urem/srem into a div+mul+sub or divrem when those are legal/custom. This patch changes the cost-model to reflect that cost. Since there is no 'divrem' Instruction in LLVM IR, the cost of divrem is assumed to be the same as div+mul+sub since the three operations will need to be executed at runtime regardless. Patch co-authored by David Sherwood (@david-arm) Reviewed By: RKSimon, paulwalker-arm Differential Revision: https://reviews.llvm.org/D103799	2021-07-07 14:40:28 +01:00
Simon Pilgrim	8dde5bc762	[CostModel][X86] Adjust sext/zext SSE/AVX legalized costs based on llvm-mca reports. Update costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-07 13:58:27 +01:00
Simon Pilgrim	929bb9374e	[CostModel][X86] Adjust sitofp/uitofp SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXi8/vXi16 -> vXf32/vXf64 sitofp/uitofp costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-07 12:03:45 +01:00
Eli Friedman	b83eae9454	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
Eli Friedman	61b59d3278	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit 74d6ce5d5f169e9cf3fac0eb1042602e286dd2b9. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Eli Friedman	b011bc0424	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Simon Pilgrim	b8bec02bc7	[CostModel][X86] fptosi/fptoui to i8/i16 are truncated from fptosi to i32 Provide a generic fallback that performs the fptosi to i32 types, then truncates to sub-i32 scalars. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.	2021-07-06 17:28:03 +01:00
Simon Pilgrim	1787dc0460	[CostModel][X86] i8/i16 sitofp/uitofp are sext/zext to i32 for sitofp Provide a generic fallback that extends sub-i32 scalars before using the existing sitofp instructions. These numbers can be tweaked for specific sse levels, but we should get the default handling in place first. We get the extension for free for non-vector loads.	2021-07-06 13:58:52 +01:00
Caroline Concatto	1631d2fbaa	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Simon Pilgrim	8ed3c99d18	[CostModel][X86] Handle costs for insert/extractelement with non-immediate indices via stack Determine the insert/extractelement costs when performing this as a sequence of aliased loads+stores via the stack.	2021-07-05 13:26:53 +01:00
Simon Pilgrim	4d75d8ef44	[CostModel][X86] Adjust i32/i64 to f32/f64 scalar based on llvm-mca reports (+ Agner). Older SSE targets have slower gpr->fpu scalar conversions - we also need to account for uitofp i32 > f32/f64 being lowered as sitofp i64 -> f32/f64	2021-07-05 13:26:53 +01:00
Sjoerd Meijer	cda6f69edf	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
Simon Pilgrim	5e6cee0948	[CostModel][X86] Drop some hard coded fp<->int scalarization costs Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695	2021-07-02 14:29:32 +01:00
Simon Pilgrim	1a043ee5ed	[CostModel][X86] Find AVX conversion costs using legalized types if custom types didn't match Building on rG2a1ef8784ad9a, fallback to attempting to match against legalized types like we do for SSE targets.	2021-07-02 13:49:31 +01:00
Simon Pilgrim	c54e15d18e	[CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports. Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695. Fixes a few regressions before we start adding AVX costs for legalized types.	2021-07-02 13:09:00 +01:00
Florian Hahn	6aab8b237f	[AArch64] Use custom lowering for fp16 vector copysign. The custom copysign lowering already supports fp16. Use it. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105277	2021-07-02 11:15:30 +01:00
Simon Pilgrim	f0a2f7f0f9	[CostModel][X86] Adjust fp<->int vXi32 SSE legalized costs based on llvm-mca reports. Building on rG2a1ef8784ad9a, adjust the SSE cost tables to use the legalized types based on the worst case costs from the script in D103695. To account for different numbers of src/dst legalized type registers we must scale the cost by maximum of the src/dst, not just use src	2021-07-01 15:34:20 +01:00

1 2 3 4 5 ...

2826 Commits