1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

2705 Commits

Author SHA1 Message Date
Nicolai Hähnle
5d6415281f [IR] Memory intrinsics are not unconditionally nosync
Remove the `nosync` attribute from the memory intrinsic definitions
(i.e. memset, memcpy, memmove).

Like native memory accesses, memory intrinsics can be volatile. This is
indicated by an immarg in the intrinsic call. All else equal, a volatile
memory intrinsic is `sync`, so we cannot annotate the intrinsic functions
themselves as `nosync`. The attributor and function-attr passes know to
take the volatile bit into account.

Since `nosync` is a default attribute, this means we have to stop using
the DefaultAttrIntrinsic tablegen class for memory intrinsics, and
specify all default attributes other than `nosync` explicitly.

Most of the test changes are trivial churn, but one test case
(in nosync.ll) was in fact incorrect before this change.

Differential Revision: https://reviews.llvm.org/D102295
2021-05-21 03:40:59 +02:00
Simon Pilgrim
145caddc0a [CostModel][X86][AVX2] Improve 256-bit vector non-uniform shifts costs
Haswell, Excavator and early Ryzen all have slower 256-bit non-uniform vector shifts (confirmed on AMDSoG/Agner/instlatx64 and llvm models) - so bump the worst case costs accordingly.

Noticed while investigating PR50364
2021-05-20 12:16:16 +01:00
Caroline Concatto
85e935efe6 [CostModel][AArch64] Add missing costs for getShuffleCost with scalable vectors
Differential Revision: https://reviews.llvm.org/D102490
2021-05-20 09:08:31 +01:00
Simon Pilgrim
ef6c8d6574 [X86][AVX] Cleanup AVX2 vector integer truncation costs
Noticed while investigating PR50364, the truncation costs for v4i64->v4i16/v4i8 and v8i32->v8i8 were way too optimistic for a shuffle sequence that usually matches the AVX1 codegen (they matched AVX512 numbers which have actual truncation instructions!).
2021-05-18 13:07:29 +01:00
Simon Pilgrim
411894471f [CostModel][X86] Add scalar truncation cost checks
Ensure these are all zero
2021-05-18 12:24:59 +01:00
Simon Pilgrim
67a57a550e [CostModel][X86] Add missing check prefixes from cast.ll
We have checks for these but no actual RUNs were using them
2021-05-18 12:20:19 +01:00
Fraser Cormack
6af6b413bf [RISCV][NFC] Correct alignment in scatter/gather tests
This lays the groundwork for changes to alignment in D102493 to be more
apparent.
2021-05-17 15:12:55 +01:00
Roman Lebedev
ecdf27a217 [NFC][X86][Costmodel] Add tests for load/store with i1 element type 2021-05-16 14:29:37 +03:00
Roman Lebedev
7a524b3d94 Revert "[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again"
As reported in post-commit feedback, this has issues with e.g. <16 x i1>:
https://llvm.godbolt.org/z/jxPvdGEW4

This reverts commit c02476f3158f2908ef0a6f628210b5380bd33695.
2021-05-14 00:03:36 +03:00
Roman Lebedev
014eba5801 Revert "[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()"
Depends on a commit that is about to be reverted.

This reverts commit 69ed93a4355123a45c1d7216aea7cd53d07a361b.
2021-05-14 00:03:36 +03:00
Florian Hahn
8bc2e95e00 [SCEV] Apply guards to max with non-unitary steps.
We already apply loop-guards when computing the maximum with unitary
steps. This extends the code to also do so when dealing with non-unitary
steps.

This allows us to infer a tighter maximum in some cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102267
2021-05-13 09:47:29 +01:00
Florian Hahn
48fc37748a [SCEV] Add loop-guard pessimizing test with step = 2. 2021-05-12 19:30:11 +01:00
Roman Lebedev
a7f61f4671 [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()
Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.
2021-05-11 16:28:00 +03:00
Roman Lebedev
f3383e948b [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again
Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684
2021-05-11 16:02:22 +03:00
Andy Kaylor
cc0c445bfc [Dependence Analysis] Enable delinearization of fixed sized arrays
Patch by Artem Radzikhovskyy!

Allow delinearization of fixed sized arrays if we can prove that the GEP indices do not overflow the array dimensions. The checks applied are similar to the ones that are used for delinearization of parametric size arrays. Make sure that the GEP indices are non-negative and that they are smaller than the range of that dimension.

Changes Summary:

- Updated the LIT tests with more exact values, as we are able to delinearize and apply more exact tests
- profitability.ll - now able to delinearize in all cases, no need to use -da-disable-delinearization-checks flag and run the test twice
- loop-interchange-optimization-remarks.ll - in one of the cases we are able to delinearize without using -da-disable-delinearization-checks
- SimpleSIVNoValidityCheckFixedSize.ll - removed unnecessary "-da-disable-delinearization-checks" flag. Now can get the exact answer without it.
- SimpleSIVNoValidityCheckFixedSize.ll and PreliminaryNoValidityCheckFixedSize.ll - made negative tests more explicit, in order to demonstrate the need for "-da-disable-delinearization-checks" flag

Differential Revision: https://reviews.llvm.org/D101486
2021-05-10 10:30:15 -07:00
Nikita Popov
b57784b03b [SCEV] Handle and/or in applyLoopGuards()
applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.

Differential Revision: https://reviews.llvm.org/D101692
2021-05-09 21:34:28 +02:00
Nikita Popov
b739e40444 [SCEV] Add additional loop guard and/or tests (NFC)
Add tests for and/and, and/or, or/or, or/and combinations.
2021-05-09 21:34:28 +02:00
Roman Lebedev
39fdf56ee3 [X86] Improve costmodel for scalar byte swaps
Currently we model i16 bswap as very high cost (`10`),
which doesn't seem right, with all other being at `1`.

Regardless of `MOVBE`, i16 reg-reg bswap is lowered into
(an extending move plus) rot-by-8:
https://godbolt.org/z/8jrq7fMTj
I think it should at worst have throughput of `1`:

Since i32/i64 already have cost of `1`,
`MOVBE` doesn't improve their costs any further.

BUT, `MOVBE` must have at least a single memory operand,
with other being a register. Which means, if we have
a bswap of load, iff load has a single use,
we'll fold bswap into load.

Likewise, if we have store of a bswap, iff bswap
has a single use, we'll fold bswap into store.

So i think we should treat such a bswap as free,
unless of course we know that for the particular CPU
they are performing badly.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D101924
2021-05-08 15:17:35 +03:00
Whitney Tsang
e5ca2592d4 [LoopNest] Consider loop nest with inner loop guard using outer loop
induction variable to be perfect

This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.

Reviewed By: bmahjour, sidbav

Differential Revision: https://reviews.llvm.org/D94717
2021-05-07 16:04:18 +00:00
Joseph Tremoulet
b501eafd98 BasicAA: Recognize inttoptr as isEscapeSource
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.

Reviewed By: nikic, nlopes, aqjune

Differential Revision: https://reviews.llvm.org/D101541
2021-05-07 07:48:50 -07:00
Roman Lebedev
202c41a158 [NFC][X86][CostModel] Add tests for byteswap intrinsic 2021-05-05 20:11:46 +03:00
Philip Reames
c349ecadfa Recommit "Generalize getInvertibleOperand recurrence handling slightly"
This was reverted because of a reported problem.  It turned out this patch didn't introduce said problem, it just exposed it more widely.  15a4233 fixes the root issue, so this simple a) rebases over that, and b) adds a much more extensive comment explaining why that weakened assert is correct.

Original commit message follows:

Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884
2021-05-03 16:40:56 -07:00
Philip Reames
b123d90a62 One more test case inspired by PR50191 2021-05-03 16:23:04 -07:00
Philip Reames
f5c6ef2e13 Add some additional test cases inspired by PR50191 2021-05-03 15:56:37 -07:00
Nikita Popov
afad6987f7 [SCEV] Add test for non-unit stride with multiple exits (NFC)
We currently can't determine any exit counts here, because there
is no "controlling exit".
2021-05-02 18:14:05 +02:00
Nikita Popov
0c935681c7 [SCEV] Add tests for and/or loop guards (NFC) 2021-05-01 17:10:23 +02:00
Philip Reames
a17954e6f1 Revert "Generalize getInvertibleOperand recurrence handling slightly"
This reverts commit 0c01b37eeb18a51a7e9c9153330d8009de0f600e while a problem reported is investigated.
2021-04-29 13:06:26 -07:00
Philip Reames
5b3993fcf6 Generalize getInvertibleOperand recurrence handling slightly
Follow up to D99912, specifically the revert, fix, and reapply thereof.

This generalizes the invertible recurrence logic in two ways:
* By allowing mismatching operand numbers of the phi, we can recurse through a pair of phi recurrences whose operand orders have not been canonicalized.
* By allowing recurrences through operand 1, we can invert these odd (but legal) recurrence.

Differential Revision: https://reviews.llvm.org/D100884
2021-04-28 14:38:07 -07:00
Philip Reames
e5f1391c16 [tests] Precommit some extra tests for D100884 2021-04-28 13:46:35 -07:00
Philip Reames
cf98cdf20e [SCEV] Compute ranges for ashr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl. This was originally posted as part of D99687, but split off for ease of review.

(I also decided to exclude the unknown start sign case explicitly for simplicity of understanding.)

Differential Revision: https://reviews.llvm.org/D101181
2021-04-28 12:36:20 -07:00
Florian Hahn
939b677f1c [LAA] Support pointer phis in loop by analyzing each incoming pointer.
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101286
2021-04-28 20:19:40 +01:00
Nikita Popov
e7bbddfa9f [SCEV] Handle uge/ugt predicates in applyLoopGuards()
These can be handled the same way as ule/ult, just using umax
instead of umin. This is useful in cases where the umax prevents
the upper bound from overflowing.

Differential Revision: https://reviews.llvm.org/D101196
2021-04-27 22:41:05 +02:00
Nikita Popov
0fd5fddf4c [SCEV] Improve loop guard tests (NFC)
Invert the branch order to make the predicate more obvious.
Add tests with two predicates, to show that rewrites are
combined.
2021-04-27 22:34:56 +02:00
Arthur Eubanks
08189b14ea [test] Fix some func-attrs tests under the legacy PM
The new PM doesn't visit declarations in CGSCC passes. These tests
aren't testing that detail, so just run them against the new PM.
2021-04-27 13:07:56 -07:00
Andy Kaylor
64bce9007c [Dependence Analysis] Fix ExactSIV producing wrong analysis
Patch by Artem Radzikhovskyy!

Symptom: ExactSIV test produced incorrect analysis of dependencies see LIT tests
Bug: At the end of the algorithm when determining dependence direction original author forgot to divide intermediate results by gcd and round result toward zero

Although this bug can be fixed with significantly fewer changes I opted to write the code in such a way that reflects the original algorithm that Banerjee proposed, for easier reference in the future. This surprisingly results in shorter code, and fewer quotient and max/min calculations.

Changes Summary:

- fixed findGCD to return valid x and y so that they match the function description where: ax - by = gcd(a,b)
- Fixed ExactSIV test, to produce proper results
- Documented the extension of Banerjee's algorithm that the original code author introduced. Banerjee's original algorithm only tested whether Dst depends on Src, the extension also allows us to test whether Src depends on Dst, in one pass.
- ExactRDIV test worked fine. Since it uses findGCD(), it needed to be updated.Since ExactRDIV test has very few changes from the core algorithm of ExactSIV I modified the test to have consistent format as ExactSIV.
- Updated the LIT tests to be testing for correct values.

Differential Revision: https://reviews.llvm.org/D100331
2021-04-27 12:24:00 -07:00
Alexey Bataev
eadc8ecdd1 [COST][X86]Improve cost model for reverse shuffle v32i16/v64i8 in AVX512F.
Improved cost model for reverse shuffle on AVX512F for types
v32i16/v64i8.

Differential Revision: https://reviews.llvm.org/D100974
2021-04-27 11:14:21 -07:00
Florian Hahn
0cb75ec25f [LV,LAA] Add test cases with pointer phis in loops.
Pre-commits tests for D101286.
2021-04-27 13:49:32 +01:00
David Sherwood
6475ab5a00 [AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.

Tests for the cost model are added here:

  Transforms/LoopVectorize/AArch64/masked-op-cost.ll

Differential Revision: https://reviews.llvm.org/D100745
2021-04-26 11:00:03 +01:00
Roman Lebedev
35f6937f2b [NFC][X86][AVX2] Add baseline CodeGen/CostModel tests for interleaved loads/stores of i16 w/ strides 2/3/4
`X86TTIImpl::getInterleavedMemoryOpCostAVX2()` currently contains data
only for a handful of tuples. For now, at least add tests for a few more.

I'm guessing that we care how well the patterns codegen since
we use their presumed cost for vectorization decisions,
so i've added codegen tests too.

There's one really easy caveat for these codegen tests:
for interleaved load tests, we really have to ensure that the
deinterleaved vectors are escaped separately. Similarly for stores.
2021-04-26 01:13:07 +03:00
Nikita Popov
de1f20abe1 [SCEV] Fix applyLoopGuards() chaining for ne predicates
ICMP_NE predicates directly overwrote the rewritten result,
instead of chaining it with previous rewrites, as was done for
ICMP_ULT and ICMP_ULE. This means that some guards were effectively
discarded, depending on their order.
2021-04-24 21:43:46 +02:00
Nikita Popov
bdb95d235c [SCEV] Add additional NE applyLoopGuards() test (NFC)
This is the same as @test_guard_ult_ne, just with the order of
the conditions swapped.
2021-04-24 21:36:23 +02:00
Nikita Popov
d61fdd44b0 [SCEV] Add loop guard tests for ugt/uge predicates (NFC) 2021-04-23 22:21:06 +02:00
Simon Pilgrim
982c79726f [CostModel][X86] Improve v2f32 fadd reduction cost
This was being reported as a similar cost to v4f32 when its a lot cheaper (just a shufps+addps).
2021-04-23 16:56:13 +01:00
Daniil Fukalov
3d4fc1f0f6 [TTI] Fix ScalarizationCost initialization.
In cases when ScalarizationCostPassed has no value, UINT_MAX is actually used
for cost estimation in `return ScalarCalls * ScalarCost + ScalarizationCost`.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101099
2021-04-23 17:59:59 +03:00
Philip Reames
555456c598 [SCEV] Compute ranges for lshr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl.

Differential Revision: https://reviews.llvm.org/D99687
2021-04-22 11:06:31 -07:00
David Sherwood
081c5cd9a4 [AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types
We were missing some instruction costs when converting vectors of
floating point half types into integers, so I've added those here.
I also manually generated assembly code for each FP->int case and
looked at the number of instructions generated, which meant
adjusting some of the existing costs too.

I've updated an existing test to reflect the new costs:

  Analysis/CostModel/AArch64/sve-fptoi.ll

Differential Revision: https://reviews.llvm.org/D99935
2021-04-21 09:39:45 +01:00
Alexey Bataev
907c8ed010 [COST][AARCH64] Improve cost of reverse shuffles for AArch64.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.

Differential Revision: https://reviews.llvm.org/D100871
2021-04-20 13:47:56 -07:00
Philip Reames
6215df8065 Reapply "Look through invertible recurrences in isKnownNonEqual"
I'd reverted this in commit 3b6acb179708ea2f3caf95ace0f134fcbc460333 due to buildbot failures.  This patch contains the fix for said issue.  I'd forgotten to handle the case where two phis in the same block have different operand order.  We canonicalize away from this, but it's still valid IR.  The tests included in this change (as opposed to simply having test output changed), crashed without the fix.

Original commit message follows...

This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.

(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)

Differential Revision: https://reviews.llvm.org/D99912
2021-04-20 12:47:59 -07:00
Philip Reames
952b6e81cd Revert "Look through invertible recurrences in isKnownNonEqual"
This reverts commit be20eae25f50f5ef648aeefa1143e1c31e4410fc.  It appears to have caused a crash on a buildbot (https://lab.llvm.org/buildbot#builders/77/builds/5653).  Reverting while investigating.
2021-04-20 11:47:10 -07:00
Philip Reames
35e505e294 [tests] Expand coverage for D99687 2021-04-20 11:31:39 -07:00