1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

2836 Commits

Author SHA1 Message Date
Alina Sbirlea
5bc83521a0 [MemorySSA] Add test for PR49859. 2021-04-13 10:56:09 -07:00
Sander de Smalen
7bd4ae29c1 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
dfukalov
5a24e56a24 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Philip Reames
d69254e263 [funcattrs] Infer nosync from instruction walk
Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync.

A couple points of interest:
* I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere.
* The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately.
* I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising.

Differential Revision: https://reviews.llvm.org/D99769
2021-04-08 14:05:00 -07:00
Stanislav Mekhanoshin
3071fb3566 Set IgnoreLLVMUsed to false in CallGraph::addToCallGraph()
clang++ uses llvm.compiler.used in certain cases to preserve
symbol which is fully inlined. D96087 has resulted in undefined
symbols in such cases. Set it to false by default to preserve
old behavior but keep the option for specific uses where we
want to ignore these (e.g. to detect a potential indirect call
to a function).

Differential Revision: https://reviews.llvm.org/D99897
2021-04-08 11:14:09 -07:00
Roman Lebedev
b825875812 [NFC][X86][CostModel] Add some load/store tests w/ non-power-of-two elt cnt vectors
For example the cost to load <48 x i16> should likely be 3,
because that's just 3x load i256.
2021-04-08 15:00:28 +03:00
Florian Hahn
4b85eda637 [BasicAA] Add another GEP modulo test with shl with odd op. 2021-04-07 22:31:51 +01:00
Florian Hahn
50a9fd3de9 [BasicAA] Extend test coverage for GEP modulo logic.
Add a few additional test cases which combine multiplies with
powers-of-2, different wrapping flags.
2021-04-07 20:13:35 +01:00
Sander de Smalen
4bfea803ed [SVE] Remove checks for warnings in scalable-vector tests.
After D98856 these tests will by default break (fatal_error) if any of
the wrong interfaces are used, so there's no longer a need to have a
RUN line that checks for a warning message emitted by the compiler.
2021-04-07 15:59:32 +01:00
Max Kazantsev
c4958f4b06 [SCEV] Fix false-positive recognition of simple recurrences. PR49856
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence  which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.

Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.

This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).

Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
2021-04-07 13:55:17 +07:00
Jan Svoboda
298bb15218 Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction"
This reverts commit 167ea67d

This causes a bunch of build failures:
* http://lab.llvm.org:8011/#/builders/121/builds/6287
* http://green.lab.llvm.org/green/job/clang-stage1-RA/19915
2021-04-06 16:33:28 +02:00
Simon Pilgrim
d50aefd3c1 [CostModel][X86] Improve accuracy of vXi8 multiply reduction costs
After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....).

Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.
2021-04-06 11:53:22 +01:00
madhur13490
377aa4b4a7 [IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction
This patch enhances hasAddressTaken() to ignore bitcasts as a
callee in callbase instruction. Such bitcast usage doesn't really take
the address in a useful meaningful way.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D98884
2021-04-06 09:23:46 +00:00
Philip Reames
103039b1aa Exact ashr/lshr don't loose any set bits and are thus trivially invertible
Use that fact to improve isKnownNonEqual.
2021-04-05 19:22:36 -07:00
Philip Reames
5c362c1cd5 Fix copy paste errors in tests from be11bd1e
Several of these weren't testing what was intented.
2021-04-05 12:38:36 -07:00
Philip Reames
126765f727 [tests] Precommmit tests for reasoning about equality of recurrences 2021-04-05 12:14:05 -07:00
Philip Reames
db585a3ff9 [funcattrs] Infer nosync from readnone and non-convergent
This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (0626367202c).

This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive.

Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D99749
2021-04-01 11:37:34 -07:00
Philip Reames
1695a86b7f Mark unordered memset/memmove/memcpy as nosync
Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.
2021-04-01 10:38:54 -07:00
Philip Reames
5a0375405f [ValueTracking] Handle non-zero ashr/lshr recurrences
If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.
2021-03-31 16:48:32 -07:00
Philip Reames
e571959eb5 [tests] Add tests for ashr/lshr recurrences in isKnownNonZero 2021-03-31 16:48:32 -07:00
Philip Reames
b4fffe7b16 [tests] Exercise cases where SCEV can use trip counts to refine ashr/lshr recurrences 2021-03-31 12:48:50 -07:00
Philip Reames
4da7b4cf4f [SCEV] Handle unreachable binop when matching shift recurrence
This fixes an issue introduced with my change d4648e, and reported in pr49768.

The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code.  Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.
2021-03-31 10:33:34 -07:00
Sander de Smalen
772162c240 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
Sander de Smalen
d0edfcf635 [CostModel] Align the cost model for intrinsics for scalable/fixed-width vectors.
Let getIntrinsicInstrCost call getTypeBasedIntrinsicInstrCost for scalable vectors,
similar to how this is done for fixed-width vectors, instead of falling back
on BaseT::getIntrinsicInstrCost().

If the intrinsic cannot be costed (or is not overloaded by the target),
it will return InstructionCost::getInvalid() instead.

Depends on D97469

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D97470
2021-03-31 14:52:49 +01:00
Sander de Smalen
76e5989934 [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
Nashe Mncube
762a55526d [SVE][Analysis]Instruction costs for ops on scalable-vec
The following operations have no associated cost for them
when applied to scalable vectors, and as a consequence
can trigger a crash when a call is made to
AArch64TTIImpl::getCastInstrCost():
- fptrunc
- trunc
- fpext
- fpto(u,s)i

This patch adds costs for these operations and
relevant regression tests.

Differential Revision: https://reviews.llvm.org/D98934
2021-03-29 11:15:50 +01:00
Nikita Popov
ab6e561d30 [BasicAA] Make sure types match in constant offset heuristic
This can only happen if offset types that are larger than the
pointer size are involved. The previous implementation did not
assert in this case because it initialized the APInts to the
width of one of the variables -- though I strongly suspect it
did not compute correct results in this case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32621
reported by fhahn.
2021-03-28 21:38:09 +02:00
Nikita Popov
f2b0645b2f [BasicAA] Refactor linear expression decomposition
The current linear expression decomposition handles zext/sext by
decomposing the casted operand, and then checking NUW/NSW flags
to determine whether the extension can be distributed. This has
some disadvantages:

First, it is not possible to perform a partial decomposition. If
we have zext((x + C1) +<nuw> C2) then we will fail to decompose
the expression entirely, even though it would be safe and
profitable to decompose it to zext(x + C1) +<nuw> zext(C2)

Second, we may end up performing unnecessary decompositions,
which will later be discarded because they lack nowrap flags
necessary for extensions.

Third, correctness of the code is not entirely obvious: At a high
level, we encounter zext(x -<nuw> C) in the form of a zext on the
linear expression x + (-C) with nuw flag set. Notably, this case
must be treated as zext(x) + -zext(C) rather than zext(x) + zext(-C).
The code handles this correctly by speculatively zexting constants
to the final bitwidth, and performing additional fixup if the
actual extension turns out to be an sext. This was not immediately
obvious to me.

This patch inverts the approach: An ExtendedValue represents a
zext(sext(V)), and linear expression decomposition will try to
decompose V further, either by absorbing another sext/zext into the
ExtendedValue, or by distributing zext(sext(x op C)) over a binary
operator with appropriate nsw/nuw flags. At each step we can
determine whether distribution is legal and abort with a partial
decomposition if not. We also know which extensions we need to
apply to constants, and don't need to speculate or fixup.
2021-03-27 23:31:58 +01:00
Nikita Popov
4cf39be5a9 [BasicAA] Correct handle implicit sext in decomposition
While explicit sext instructions were handled correctly, the
implicit sext that occurs if the offset is smaller than the
pointer size blindly assumed that sext(X * Scale + Offset) is the
same as sext(X) * Scale + Offset, which is obviously not correct.

Fix this by extracting the code that handles linear expression
extension and reusing it for the implicit sext as well.
2021-03-27 15:15:47 +01:00
Nikita Popov
13fbfb64c9 [BasicAA] Clarify entry values of GetLinearExpression() (NFC)
A number of variables need to be correctly initialized on entry
to GetLinearExpression() for the implementation to behave reasonably.

The fact that SExtBits can currenlty be non-zero on entry is a bug,
as demonstrated by the added test: For implicit sexts by the GEP,
we do currently skip legality checks.
2021-03-27 14:50:09 +01:00
Nikita Popov
f99cc32fd4 [BasicAA] Retain shl nowrap flags in GetLinearExpression()
Nowrap flags between mul and shl differ in that mul nsw allows
multiplication of 1 * INT_MIN, while shl nsw does not. This means
that it is always fine to transfer shl nowrap flags to muls, but
not necessarily the other way around. In this case the NUW/NSW
results refer to mul/add operations, so it's fine to retain the
flags from the shl.
2021-03-27 12:26:22 +01:00
Nikita Popov
86d338c7f7 [ValueTracking] Handle shl pair in isKnownNonEqual()
Handle (x << s) != (y << s) where x != y and the shifts are
non-wrapping. Once again, this establishes parity with the
corresponing mul fold that already exists. The shift case is
more powerful because we don't need to guard against multiplies
by zero.
2021-03-26 20:21:05 +01:00
Nikita Popov
81d01bceaa [ValueTracking] Handle shl in isKnownNonEqual()
This handles the pattern X != X << C for non-zero X and C and a
non-overflowing shift. This establishes parity with the corresponing
fold for multiplies.
2021-03-26 20:21:05 +01:00
Nikita Popov
81173d98df [ValueTracking] Add tests for non equal shifts (NFC) 2021-03-26 20:21:05 +01:00
Nikita Popov
56b9e804f2 [ValueTracking] Handle non-zero shl recurrence
In this case we don't care about the step at all, and only require
that the starting value is non-zero.
2021-03-26 18:39:06 +01:00
Nikita Popov
07f0265381 [ValueTracking] Add tests for non-zero shl recurrences (NFC) 2021-03-26 18:35:38 +01:00
Nikita Popov
0926f8f2a9 [ValueTracking] Handle non-zero add/mul recurrences more precisely
This is mainly for clarity: It doesn't make sense to do any
negative/positive checks when dealing with a nuw add/mul. These
only make sense to nsw add/mul.
2021-03-26 18:30:07 +01:00
Nikita Popov
eb6c470c5f [ValueTracking] Add more non-zero add/mul recurrence tests (NFC) 2021-03-26 18:30:07 +01:00
Florian Hahn
0e377c5502 [BasicAA] Add a few more interesting modulo tests. 2021-03-26 16:56:49 +00:00
Florian Hahn
00978dd33d [BasicAA] Add a few cases with overflows in index computations.
This patch adds a few test cases where currently NoAlias is returned,
but the pointers can alias if the multiply overflows while computing
a GEP index value.
2021-03-26 14:50:03 +00:00
Jingu Kang
a1f6b21702 [ValueTracking] Handle two PHIs in isKnownNonEqual()
loop:
  %cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ]
  %pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ]
  ...
  %inc = add i32 %cmp.0, 1
  br label %loop

On above example, %pos.0 uses previous iteration's %cmp.0 with backedge
according to PHI's instruction's defintion. If the %inc is not same among
iterations, we can say the two PHIs are not same.

Differential Revision: https://reviews.llvm.org/D98422
2021-03-25 22:56:05 +00:00
Philip Reames
c967c78fcf [deref] Handle byval/byref/sret/inalloc/preallocated arguments for deref-at-point semantics
All of these are scoped allocations which remain dereferenceable during the lifetime of the callee.

Differential Revision: https://reviews.llvm.org/D99310
2021-03-25 14:47:31 -07:00
Craig Topper
deeab5c08f [RISCV] Reorder checks in RISCVTTIImpl::getGatherScatterOpCost to avoid calling getMinRVVVectorSizeInBits() when V extension is not enabled.
getMinRVVVectorSizeInBits() asserts if the V extension isn't
enabled. So check that gather/scatter is legal first since it
already contains a check for V extension being enabled. It
also already checks getMinRVVVectorSizeInBits for fixed length
vectors so we don't need a check in getGatherScatterOpCost.
2021-03-25 14:20:47 -07:00
Philip Reames
006e14ce80 [deref] Implement initial set of inference rules for deref-at-point
This implements a subset of the initial set of inference rules proposed in the llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree". The nolias one got moved to a separate review as there was some concerns raised which require further discussion.

Differential Revision: https://reviews.llvm.org/D99135
2021-03-24 16:20:41 -07:00
Nikita Popov
a718ad096c [SCEV] Improve handling of not expressions in isImpliedCond()
SCEV currently tries to prove implications of x pred y by also
trying to imply ~y pred ~x. This is expensive in terms of
compile-time (in fact, the majority of isImpliedCond compile-time
is spent here) and generally not fruitful. The issue is that this
also swaps the operands and thus breaks canonical ordering. If
originally we were trying to prove an implication like
X > C1 -> Y > C2, then we'll now try to prove X > C1 -> C3 > ~Y,
which will not work.

The only real case where we can get some use out of this transform
is if the original conditions were in the form X > C1 -> Y < C2, were
then swapped to X > C1 -> C2 > Y and are then swapped again here to
X > C1 -> ~Y > C3.

As such, handle this at a higher level, where we are doing the
swapping in the first place. There's four different ways that we
can line up a predicate and a swapped predicate, so we use some
heuristics to pick some profitable way.

Because we now try this transform at a higher level
(isImpliedCondOperands rather than isImpliedCondOperandsHelper),
we can also prove additional facts. Of the added tests, one was
proven previously while the other wasn't.

Differential Revision: https://reviews.llvm.org/D90926
2021-03-24 21:53:02 +01:00
Craig Topper
dbf0fc3a04 [RISCV] Add basic cost modelling for fixed vector gather/scatter.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99142
2021-03-24 11:14:14 -07:00
Jingu Kang
af9f082c95 [ValueTracking] Handle increasing mul recurrence in isKnownNonZero()
Differential Revision: https://reviews.llvm.org/D99069
2021-03-23 23:04:41 +00:00
Nikita Popov
57bd8da517 [BasicAA] Handle assumes with operand bundles
This fixes a regression reported on D99022: If a call has operand
bundles, then the inaccessiblememonly attribute on the function
will be ignored, as operand bundles can affect modref behavior in
the general case. However, for assume operand bundles in particular
this is not the case.

Adjust getModRefBehavior() to always report inaccessiblememonly
for assumes, regardless of presence of operand bundles.
2021-03-23 21:21:19 +01:00
Nikita Popov
7f52840c4e [BasicAA] Add test for assume with operand bundles (NFC) 2021-03-23 21:21:19 +01:00
David Sherwood
42a72164a2 [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00