llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

Author	SHA1	Message	Date
Simon Pilgrim	bb7e18f9de	[CostModel][X86] Remove unused check-prefixes	2020-11-10 12:48:35 +00:00
Sanjay Patel	a8868babff	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Sanjay Patel	4b28ca8f6f	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
Sanjay Patel	81f8bd9111	[CostModel] fix cost calc bug for sadd/ssub with overflow As noted in D90554, there's an opcode typo in using an easily misused cost model API: getCmpSelInstrCost(). Beyond that, the assumed sequence of ops is questionable, but that would be another patch. My guess is that the x86 test diffs show that we are probably wrong both before and after this change, so there will be no practical difference. As an example, I tried this test which shows a cost of '7' either way: define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) { %V4I32 = call {<4 x i32>, <4 x i1>} @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb) %ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1 %r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0 %z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r ret <4 x i32> %z } $ llc -o - sadd.ll -mattr=avx vpaddd %xmm1, %xmm0, %xmm2 vpcmpgtd %xmm2, %xmm0, %xmm0 vpxor %xmm0, %xmm1, %xmm0 vblendvps %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a Differential Revision: https://reviews.llvm.org/D90681	2020-11-03 11:03:47 -05:00
David Green	41688b499e	[CostModel] Make target intrinsics cheap by default This patch changes the intrinsics cost model to assume that by default target intrinsics are cheap. This didn't seem to be the case for all intrinsics, and is potentially an MVE problem due to our scalarization overheads. Cheap seems to be a good default in general though. Differential Revision: https://reviews.llvm.org/D90597	2020-11-03 09:58:28 +00:00
David Green	7290582ceb	[ARM] Cost model test for target intrinsics. NFC	2020-11-02 17:46:48 +00:00
Sanjay Patel	389858bbc4	[x86] add AVX2 cost model entries for maxnum of 256-bit vectors As noticed in D90554 , the AVX2 costs for 256-bit vectors did not include FMAXNUM entries, so we fell back to AVX1 which assumes those ops will be split into 128-bit halves or something close to that. Differential Revision: https://reviews.llvm.org/D90613	2020-11-02 12:20:17 -05:00
Florian Hahn	1db8566f5e	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit 408c4408facc3a79ee4ff7e9983cc972f797e176. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Fangrui Song	0536175cbb	[test] Fix some unused check prefixes in test/Analysis/CostModel/X86	2020-10-31 23:29:57 -07:00
David Green	e722c3abc4	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	f44af1a603	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit 73f01e3df58dca9d1596440b866b52929e3878de. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Sanjay Patel	a82a184b76	[x86] add cost overrides for mul with overflow I'm assuming the standard size integer instructions for this end up as something like: mulq %rsi seto %al And the 'mul' generally has reciprocal throughput of 1 on typical implementations (higher latency, but that's not handled here). The default costs may end up much higher than that, and that's what we see in the test diffs. Vector types are left as a 'TODO'. Differential Revision: https://reviews.llvm.org/D90431	2020-10-30 12:38:16 -04:00
Florian Hahn	d8012eedb7	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Sanjay Patel	b21114cf03	[x86] add test for umul intrinsic costs; NFC	2020-10-29 12:12:52 -04:00
Sanjay Patel	ac57df4098	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Sanjay Patel	4542ce5444	[CostModel] remove cost-kind predicate for funnel shift costs Completing the series of FIXME removals for special-case intrinsics: 50dfa19cc799 f2c25c70791d c963bde0152a 01ea93d85d6e This one looks quite different than the others. The size/blended cost is still potentially very far off from the throughput cost, but this is hopefully not worse on the whole. It looks like the underlying costs for the expanded shift/logic have their own cost-kind limitations. Also, we are not asking the target if it has a legal funnel shift op, so we just assume that the intrinsic gets expanded.	2020-10-28 14:02:34 -04:00
David Green	61a282c3b7	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Sanjay Patel	b47683ecbd	[CostModel] remove cost-kind predicate for FP add/mul vector reduction costs This was originally part of: f2c25c70791d but that was reverted because there was an underlying bug in processing the vector type of these intrinsics. That was fixed with: 74ffc823ed21 This is similar in spirit to 01ea93d85d6e (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-27 18:00:20 -04:00
Sanjay Patel	ab0890e928	[CostModel] add tests for FP reductions; NFC	2020-10-27 18:00:20 -04:00
Bing1 Yu	450c5e89bb	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Joe Ellis	a376939c97	[SVE][AArch64] Fix TypeSize warning in GEP cost analysis The warning would fire when calling getGEPCost for analyzing the cost of a GEP instruction. This would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes the issue by using getKnownMinSize instead of the implicit cast. This is possible because the code is already scalable-vector aware. The semantic behaviour of the code is unchanged by this patch. Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D89872	2020-10-26 17:40:19 +00:00
Tyker	1d03399e60	[Annotation] Allows annotation to carry some additional constant arguments. This allows using annotation in a much more contexts than it currently has. especially when annotation with template or constexpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D88645	2020-10-26 10:50:05 +01:00
Sanjay Patel	fe343140eb	[CostModel] remove cost-kind predicate for some vector reduction costs This is a modified 2nd try of 22d10b8ab44f (reverted by 1c8371692d because it managed to expose an existing crashing bug that should be fixed by 74ffc823 ). Original commit message: This is similar in spirit to 01ea93d85d6e (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-25 15:17:52 -04:00
Sanjay Patel	74732bce4a	[CostModel] fix operand/type accounting for fadd/fmul reductions I'm not sure if/how this ever worked, but it must not be tested currently because the basic tests added here were crashing as noted in the post-review comments for 1c83716 (which reverted another cost-model fix in 22d10b8ab44f).	2020-10-25 15:01:19 -04:00
Martin Storsjö	e7cc77c1de	Revert "[CostModel] remove cost-kind predicate for vector reduction costs" This reverts commit 22d10b8ab44f703b72b8316a9b3b8adc623ca73f. This broke compilation e.g. like this: $ cat synth.c a; float b; c() { for (;;) { float d = -b a++; d -= --b * a++; d -= --b * a; d -= --b * a; e(d); } } $ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl _cl<To, const From>::doit(const From*) [with To = llvm::PointerType; Fr om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail ed.	2020-10-25 08:47:54 +02:00
Sanjay Patel	ed4fa653c2	[CostModel] remove cost-kind predicate for vector reduction costs This is similar in spirit to 01ea93d85d6e (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-24 13:20:17 -04:00
dfukalov	efaecfc60e	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
Florian Hahn	9f772c5b7d	[AArch64] Add vector compare/select cost-model tests.	2020-10-23 20:43:04 +01:00
Florian Hahn	785022033e	[AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics. This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953	2020-10-23 11:32:42 +01:00
Florian Hahn	0d2c92bfc9	[AArch64] Add min/max cost-model tests for v2i32.	2020-10-22 16:04:13 +01:00
Florian Hahn	2db9fbe33c	[AArch64] Add min/max cost-model tests for v4i16.	2020-10-22 15:47:50 +01:00
Florian Hahn	c258c898fb	[AArch64] Add cost model tests for min/max intrinsics.	2020-10-22 13:28:04 +01:00
Sanjay Patel	3393df6056	[CostModel] remove cost-kind predicate for scatter/gather cost This is similar in spirit to 01ea93d85d6e (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant ARM could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. X86 has the same throughput restriction as the basic implementation, so it is still unchanged. Paraphrasing from the previous commit: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-21 14:26:05 -04:00
Sanjay Patel	7e2375171f	[ARM] add cost-kind tests for intrinsics; NFC This is a copy of the x86 file to provide better coverage; x86 may have strange overrides that mask changes in the generic model.	2020-10-21 14:26:04 -04:00
Sanjay Patel	d345d7741b	[CostModel] remove cost-kind predicate for memcpy cost The default implementation base returns TCC_Expensive (currently set to '4'), so that explains the test diff. This probably does not make sense for most callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. The ARM target has an override that tries to model codegen expansion, and that should likely be adapted for general usage. This probably does not affect anything because the vectorizers are the primary users of the throughput cost, but memcpy is not listed as a trivially vectorizable intrinsic.	2020-10-21 08:50:44 -04:00
David Green	e1c6fdf0df	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	afa83bbf3d	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
Sanjay Patel	21d0e78943	[CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation The cost modeling for intrinsics is a patchwork based on different expectations from the callers, so it's a mess. I'm hoping to untangle this to allow canonicalization to the new min/max intrinsics in IR. The general goal is to remove the cost-kind restriction here in the basic implementation class. Ie, if some intrinsic has throughput cost of 104, assume that it has the same size, latency, and blended costs. Effectively, an intrinsic with cost N is composed of N simple instructions. If that's not correct, the target should provide a more accurate override. The x86-64 SSE2 subtarget cost diffs require explanation: 1. The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or "TEST+BSF+CMOV/BRANCH", so not cheap. 2. The 128-bit SSE vector width versions assume cost of 18 or 26 (no explanation provided in the tables, but this corresponds to a bunch of shift/logic/compare). 3. The 512-bit vectors in the test file are scaled up by a factor of 4 from the legal vector width costs. 4. The plain latency cost-kind is not affected in this patch because that calc is diverted before we get to getIntrinsicInstrCost(). Differential Revision: https://reviews.llvm.org/D89461	2020-10-15 13:14:41 -04:00
Sanjay Patel	8874abb153	[CostModel] rearrange basic intrinsic cost implementation This is bigger/uglier than before, but it should allow fixing all of the broken paths more easily. Test coverage added with rGfab028b and other commits. This is not NFC - the scalable vector test would crash without this patch.	2020-10-13 11:52:00 -04:00
Sanjay Patel	9dd0476f6d	[x86] add cost model test for memcpy; NFC This is treated as a special-case in the base class implementation of getIntrinsicInstrCost().	2020-10-13 11:42:44 -04:00
Sanjay Patel	4b1b25be42	[AArch64] fix spacing in test's RUN lines; NFC	2020-10-13 10:44:18 -04:00
Sanjay Patel	b896e3d245	[x86] add tests for cost model kinds of intrinsics; NFC This provides coverage for existing special-cases and a sampling of other intrinsics. Current output appears to be wrong in several cases.	2020-10-13 10:39:43 -04:00
Sanjay Patel	8b55364bd6	[AArch64] add cost model test for scalable vector math; NFC Testing for the various cost model "TargetCostKind" is limited, and testing for scalable vectors is limited. The motivating example of an intrinsic is not included here yet because that just crashes.	2020-10-13 08:39:04 -04:00
Simon Pilgrim	c913b82065	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
David Green	863d69bc9d	[ARM] Add MVE vecreduce costmodel tests. NFC There were some existing tests that were not super useful. New ones are added for testing MVE specific patterns.	2020-10-09 16:25:25 +01:00
Amara Emerson	59c2440372	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sanjay Patel	9fe6872096	[CostModel] add cl option to check size and latency costs; NFC This is a setting used by SimplifyCFG, LoopUnroll, and InlineCost, but there is apparently no direct test coverage for any of those cost model values.	2020-09-27 09:52:56 -04:00
Jonas Paulsson	2e5f4ba2ce	[SystemZ] Make sure not to call getZExtValue on a >64 bit constant. Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than calling getZExtValue() since the immediate operand may be wider than 64 bits, which is not allowed with getZExtValue(). Fixes https://bugs.llvm.org/show_bug.cgi?id=47600 Review: Simon Pilgrim	2020-09-23 15:36:32 +02:00
Bing1 Yu	9b182096ba	[CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8) add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87884	2020-09-23 10:29:10 +08:00
Simon Pilgrim	1e7924bec0	[CostModel][X86] Add some select shuffle costs tests for D87884	2020-09-21 16:09:05 +01:00

1 2 3 4 5 ...

672 Commits