1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

2384 Commits

Author SHA1 Message Date
Sanjay Patel
4b28ca8f6f [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Sanjay Patel
81f8bd9111 [CostModel] fix cost calc bug for sadd/ssub with overflow
As noted in D90554, there's an opcode typo in using an easily
misused cost model API: getCmpSelInstrCost(). Beyond that, the
assumed sequence of ops is questionable, but that would be
another patch.

My guess is that the x86 test diffs show that we are probably
wrong both before and after this change, so there will be no
practical difference.
As an example, I tried this test which shows a cost of '7'
either way:

  define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) {
    %V4I32  = call {<4 x i32>, <4 x i1>}  @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb)
    %ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1
    %r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0
    %z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r
    ret <4 x i32> %z
  }

  $ llc -o - sadd.ll -mattr=avx
        vpaddd  %xmm1, %xmm0, %xmm2
        vpcmpgtd        %xmm2, %xmm0, %xmm0
        vpxor   %xmm0, %xmm1, %xmm0
        vblendvps       %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a

Differential Revision: https://reviews.llvm.org/D90681
2020-11-03 11:03:47 -05:00
David Green
41688b499e [CostModel] Make target intrinsics cheap by default
This patch changes the intrinsics cost model to assume that by default
target intrinsics are cheap. This didn't seem to be the case for all
intrinsics, and is potentially an MVE problem due to our scalarization
overheads. Cheap seems to be a good default in general though.

Differential Revision: https://reviews.llvm.org/D90597
2020-11-03 09:58:28 +00:00
Fangrui Song
c9829bfb08 [LazyCallGraph] Build SCCs of the reference graph in order
```
// The legacy PM CGPassManager discovers SCCs this way:
for function in the source order
  tarjanSCC(function)

// While the new PM CGSCCPassManager does:
for function in the reversed source order [1]
  discover a reference graph SCC
  build call graph SCCs inside the reference graph SCC
```

In the common cases, reference graph ~= call graph, the new PM order is
undesired because for `a | b | c` (3 independent functions), the new PM will
process them in the reversed order: c, b, a. If `a <-> b <-> c`, we can see
that `-print-after-all` will report the sole SCC as `scc: (c, b, a)`.

This patch corrects the iteration order. The discovered SCC order will match
the legacy PM in the common cases.

For some tests (`Transforms/Inline/cgscc-*.ll` and
`unittests/Analysis/CGSCCPassManagerTest.cpp`), the behaviors are dependent on
the SCC discovery order and there are too many check lines for the particular
order.  This patch simply reverses the function order to avoid changing too many
check lines.

Differential Revision: https://reviews.llvm.org/D90566
2020-11-02 13:22:42 -08:00
David Green
7290582ceb [ARM] Cost model test for target intrinsics. NFC 2020-11-02 17:46:48 +00:00
Sanjay Patel
389858bbc4 [x86] add AVX2 cost model entries for maxnum of 256-bit vectors
As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.

Differential Revision: https://reviews.llvm.org/D90613
2020-11-02 12:20:17 -05:00
Florian Hahn
1db8566f5e Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408facc3a79ee4ff7e9983cc972f797e176.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Fangrui Song
0536175cbb [test] Fix some unused check prefixes in test/Analysis/CostModel/X86 2020-10-31 23:29:57 -07:00
David Green
e722c3abc4 [ARM] Fix crash for gather of pointer costs.
If the elt size is unknown due to it being a pointer, a comparison
against 0 will cause an assert. Make sure the elt size is large enough
before comparing and for the moment just return the scalar cost.
2020-10-31 13:10:14 +00:00
Arthur Eubanks
bb84082e59 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662d8d72eaac48d3e9b31ca8dc90df5a4.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Florian Hahn
f44af1a603 Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df58dca9d1596440b866b52929e3878de.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Arthur Eubanks
f52f1e83f5 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Sanjay Patel
a82a184b76 [x86] add cost overrides for mul with overflow
I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al

And the 'mul' generally has reciprocal throughput of 1 on typical implementations
(higher latency, but that's not handled here).
The default costs may end up much higher than that, and that's what we see in the test diffs.

Vector types are left as a 'TODO'.

Differential Revision: https://reviews.llvm.org/D90431
2020-10-30 12:38:16 -04:00
Florian Hahn
d8012eedb7 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Roman Lebedev
f3176525e6 [SCEV] SCEVPtrToIntExpr simplifications
If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand,
so that the operation is performed on integers,
and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89692
2020-10-30 11:13:35 +03:00
Roman Lebedev
78e0d7e6ab [SCEV] Introduce SCEVPtrToIntExpr (PR46786)
And use it to model LLVM IR's `ptrtoint` cast.

This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)

As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
  and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
  to not loose pointer-ness of an expression in more cases

As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.

But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89456
2020-10-30 11:13:35 +03:00
Dávid Bolvanský
153ebe5596 [InferAttrs] Add nocapture/writeonly to string/mem libcalls
One step closer to fix PR47644.

Differential Revision: https://reviews.llvm.org/D89645
2020-10-29 20:06:43 +01:00
Sanjay Patel
b21114cf03 [x86] add test for umul intrinsic costs; NFC 2020-10-29 12:12:52 -04:00
Sanjay Patel
ac57df4098 [CostModel][x86] remove cost-kind predicate for intrinsic costs
We model cost as number of instructions / uops, so it does not
make sense to treat size/blended costs any differently than
throughput.
2020-10-28 14:33:37 -04:00
Sanjay Patel
4542ce5444 [CostModel] remove cost-kind predicate for funnel shift costs
Completing the series of FIXME removals for special-case intrinsics:
50dfa19cc799
f2c25c70791d
c963bde0152a
01ea93d85d6e

This one looks quite different than the others. The size/blended
cost is still potentially very far off from the throughput cost,
but this is hopefully not worse on the whole. It looks like the
underlying costs for the expanded shift/logic have their own
cost-kind limitations. Also, we are not asking the target if
it has a legal funnel shift op, so we just assume that the
intrinsic gets expanded.
2020-10-28 14:02:34 -04:00
Max Kazantsev
745df76f6b Re-enable "[SCEV] Prove implications of different type via truncation"
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.

This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.

The return was due to revert of underlying patch that this one depends on.

Unit test temporarily disabled because the required logic in SCEV is switched
off due to compile time reasons.

Differential Revision: https://reviews.llvm.org/D89548
2020-10-28 16:02:14 +07:00
David Green
61a282c3b7 [AArch64] Remove AArch64ISD::NOT, use vnot instead
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"

Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.

Differential Revision: https://reviews.llvm.org/D90126
2020-10-28 08:15:37 +00:00
Max Kazantsev
8c1c53f697 [SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Switched off by default, can be turned on by flag.

Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: lebedev.ri, nikic
2020-10-28 12:39:41 +07:00
Sanjay Patel
b47683ecbd [CostModel] remove cost-kind predicate for FP add/mul vector reduction costs
This was originally part of:
f2c25c70791d
but that was reverted because there was an underlying bug in
processing the vector type of these intrinsics. That was
fixed with:
74ffc823ed21

This is similar in spirit to 01ea93d85d6e (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.

Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-27 18:00:20 -04:00
Sanjay Patel
ab0890e928 [CostModel] add tests for FP reductions; NFC 2020-10-27 18:00:20 -04:00
Nico Weber
1ea6033a22 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c62c185632e3a75bf45b313eadab774b.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Shimin Cui
29a0e7a508 [ValueTracking] Add tracking of the alignment assume bundle
This patch is to add the support of the value tracking of the alignment assume bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88669
2020-10-27 12:16:45 +00:00
Arthur Eubanks
3273c1a681 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Bing1 Yu
450c5e89bb [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...
In each 128-lane, if there is at least one index is demanded and not all
indices are demanded and this 128-lane is not the first 128-lane of the
legalized-vector, then this 128-lane needs a extracti128;
If in each 128-lane, there is at least one index is demanded, this 128-lane
needs a inserti128.

The following cases will help you build a better understanding:
Assume we insert several elements into a v8i32 vector in avx2,
Case#1: inserting into 1th index needs vpinsrd + inserti128
Case#2: inserting into 5th index needs extracti128 + vpinsrd +
inserti128
Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.

Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D89767
2020-10-27 11:21:13 +08:00
Joe Ellis
50136e3679 [SVE] Fix TypeSize warning in llvm::getGEPInductionOperand
We do not need to use the implicit cast here. We can instead can rely on
a comparison between two TypeSize objects instead. This algorithm will
work fine with scalable vectors.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D90146
2020-10-26 17:40:32 +00:00
Joe Ellis
a376939c97 [SVE][AArch64] Fix TypeSize warning in GEP cost analysis
The warning would fire when calling getGEPCost for analyzing the cost of
a GEP instruction. This would result in the use of the now deprecated
implicit cast of TypeSize to uint64_t through the overloaded operator.

This patch fixes the issue by using getKnownMinSize instead of the
implicit cast. This is possible because the code is already
scalable-vector aware. The semantic behaviour of the code is unchanged
by this patch.

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D89872
2020-10-26 17:40:19 +00:00
Tyker
1d03399e60 [Annotation] Allows annotation to carry some additional constant arguments.
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D88645
2020-10-26 10:50:05 +01:00
Sanjay Patel
fe343140eb [CostModel] remove cost-kind predicate for some vector reduction costs
This is a modified 2nd try of 22d10b8ab44f
(reverted by 1c8371692d because it managed
to expose an existing crashing bug that should be fixed by
74ffc823 ).

Original commit message:

This is similar in spirit to 01ea93d85d6e (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.

Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-25 15:17:52 -04:00
Sanjay Patel
74732bce4a [CostModel] fix operand/type accounting for fadd/fmul reductions
I'm not sure if/how this ever worked, but it must not be tested
currently because the basic tests added here were crashing as
noted in the post-review comments for 1c83716 (which reverted
another cost-model fix in 22d10b8ab44f).
2020-10-25 15:01:19 -04:00
Nikita Popov
049b2e39fb [SCEV] Strenthen nowrap flags after constant folding for mul exprs
Same change as 0dda6333175c1749f12be660456ecedade3bcf21, but for
mul expressions. We want to first fold any constant operans and
then strengthen the nowrap flags, as we can compute more precise
flags at that point.
2020-10-25 19:43:58 +01:00
Nikita Popov
1b8fa86304 [SCEV] Always constant fold mul expression operands
Establish parity with the handling of add expressions, by always
constant folding mul expression operands before checking the depth
limit (this is a non-recursive simplification). The code was already
unconditionally constant folding the case where all operands were
constants, but was not folding multiple constant operands together
if there were also non-constant operands.

This requires picking out a different demonstration for depth-based
folding differences in the limit-depth.ll test.
2020-10-25 18:50:06 +01:00
Nikita Popov
fa8894df7b [SCEV] Strength nowrap flags after constant folding
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
2020-10-25 18:00:22 +01:00
Martin Storsjö
e7cc77c1de Revert "[CostModel] remove cost-kind predicate for vector reduction costs"
This reverts commit 22d10b8ab44f703b72b8316a9b3b8adc623ca73f.

This broke compilation e.g. like this:
$ cat synth.c
*a;
float *b;
c() {
  for (;;) {
    float d = -*b * *a++;
    d -= *--b * *a++;
    d -= *--b * *a;
    d -= *--b * *a;
    e(d);
  }
}
$ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c
clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl
_cl<To, const From*>::doit(const From*) [with To = llvm::PointerType; Fr
om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail
ed.
2020-10-25 08:47:54 +02:00
Sanjay Patel
ed4fa653c2 [CostModel] remove cost-kind predicate for vector reduction costs
This is similar in spirit to 01ea93d85d6e (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.

Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-24 13:20:17 -04:00
dfukalov
efaecfc60e [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
Florian Hahn
9f772c5b7d [AArch64] Add vector compare/select cost-model tests. 2020-10-23 20:43:04 +01:00
Florian Hahn
785022033e [AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics.
This patch adds a specialized implementation of getIntrinsicInstrCost
and add initial cost-modeling for min/max vector intrinsics.

AArch64 NEON support umin/smin/umax/smax for vectors
<8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>.
Notably, it does not support vectors with i64 elements.

This change by itself should have very little impact on codegen, but in
follow-up patches I plan to teach the vectorizers to consider using
those intrinsics on platforms where it is profitable, e.g. because there
is no general 'select'-like instruction.

The current cost returned should be better for throughput, latency and size.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D89953
2020-10-23 11:32:42 +01:00
Nikita Popov
a827007b92 [BasicAA] Only add visited phi blocks temporarily
Visited phi blocks only need to be added for the duration of the
recursive alias queries, they should not leak into following code.

Once again, while this also improves analysis precision, this is
mainly intended to clarify the applicability scope of VisitedPhiBBs.
2020-10-22 22:26:29 +02:00
Nikita Popov
077958bf5e [BasicAA] Don't track visited blocks for phi-phi alias query
We only need the VisitedPhiBBs to disambiguate comparisons of
values from two different loop iterations. If we're comparing
two phis from the same basic block in lock-step, the compared
values will always be on the same iteration.

While this also increases precision, this is mainly intended
to clarify the scope of VisitedPhiBBs.
2020-10-22 22:12:21 +02:00
Nikita Popov
7c90dbc2ed [BasicAA] Add additional phi tests (NFC) 2020-10-22 21:53:19 +02:00
Florian Hahn
0d2c92bfc9 [AArch64] Add min/max cost-model tests for v2i32. 2020-10-22 16:04:13 +01:00
Florian Hahn
2db9fbe33c [AArch64] Add min/max cost-model tests for v4i16. 2020-10-22 15:47:50 +01:00
Florian Hahn
c258c898fb [AArch64] Add cost model tests for min/max intrinsics. 2020-10-22 13:28:04 +01:00
Arthur Eubanks
f762198a61 [test] Fix tests using -analyze that fail under NPM
Many of these tests don't use the output of -analyze.
2020-10-21 21:54:30 -07:00
Arthur Eubanks
55eace9cda [test] Fix quadradic-exit-value.ll under NPM 2020-10-21 13:33:01 -07:00