1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

211920 Commits

Author SHA1 Message Date
Jay Foad
b3a27e03dc [AMDGPU] Add IntrWillReturn to recently added intrinsics
This adds IntrWillReturn to the gfx90a mfma intrinsics, to match all the
other mfma intrinsics, and llvm.amdgcn.live.mask, to match
llvm.amdgcn.ps.live.

Differential Revision: https://reviews.llvm.org/D97675
2021-03-01 17:35:26 +00:00
Jez Ng
5415c595ac [lld-macho] Switch default to new Darwin backend
The new Darwin backend for LLD is now able to link reasonably large
real-world programs on x86_64. For instance, we have achieved
self-hosting for the X86_64 target, where all LLD tests pass when
building lld with itself on macOS. As such, we would like to make it the
default back-end.

The new port is now named `ld64.lld`, and the old port remains
accessible as `ld64.lld.darwinold`

This [annoucement email][1] has some context. (But note that, unlike
what the email says, we are no longer doing this as part of the LLVM 12
branch cut -- instead we will go into LLVM 13.)

Numerous mechanical test changes were required to make this change; in
the interest of creating something that's reviewable on Phabricator,
I've split out the boring changes into a separate diff (D95905). I plan to
merge its contents with those in this diff before landing.

(@gkm made the original draft of this diff, and he has agreed to let me
take over.)

[1]: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147665.html

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D95204
2021-03-01 12:30:10 -05:00
Juneyoung Lee
312b2e1b99 [TTI] Consider select form of and/or i1 as having arithmetic cost
This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b`
as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97360
2021-03-02 02:18:19 +09:00
Florian Hahn
e4aed742a6 [VPlan] Remove recipes from back to front.
Update the deletion order when destroying VPBasicBlocks. This ensures
recipes that depend on earlier ones in the block are removed first.
Otherwise this may cause issues when recipes have remaining users later
in the block.
2021-03-01 16:06:30 +00:00
Andy Wingo
d9c224e71b [WebAssembly] call_indirect issues table number relocs
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.

Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.

Differential Revision: https://reviews.llvm.org/D90948
2021-03-01 16:49:00 +01:00
Simon Pilgrim
f9a4546a12 [TableGen] Avoid repeated TreePredicateFn::getCodeToRunOnSDNode() calls in MatcherTableEmitter::EmitNodePredicatesFunction loop. NFCI. 2021-03-01 15:43:37 +00:00
Masoud Ataei
eeb7eb5db7 [PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.

Differential Revision: https://reviews.llvm.org/D97487
2021-03-01 15:42:19 +00:00
Simon Pilgrim
918dfd8b20 [X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef))
Move NOT out to expose more AND -> ANDN folds
2021-03-01 14:47:39 +00:00
Jay Foad
d90cb44457 [AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32)
The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.

Differential Revision: https://reviews.llvm.org/D97670
2021-03-01 14:30:23 +00:00
Jay Foad
82b9132694 [AMDGPU] Simplify SITargetLowering::isSDNodeSourceOfDivergence. NFC.
Check for read-modify-write AtomicSDNodes instead of using an exhaustive
list of ISD opcodes.

Differential Revision: https://reviews.llvm.org/D97671
2021-03-01 14:22:08 +00:00
Matt Arsenault
3e86dcbeed GlobalISel: Verify G_CONCAT_VECTORS has at least 2 sources 2021-03-01 09:10:36 -05:00
Matt Arsenault
fede57d2a9 GlobalISel: Move splitToValueTypes to generic code
I copied the nearly identical function from AArch64 into AMDGPU, so
fix this duplication.

Mips and X86 have their own more exotic versions which should be
removed. However replacing those is better left for a separate patch
since it requires other changes to avoid regressions.
2021-03-01 08:58:18 -05:00
Matt Arsenault
f75441bd01 AArch64/GlobalISel: Fix using wrong calling convention for calls
This was reusing the parent function calling convention instead of the
callee. I'm not sure if there's a case where there's an observable
difference.

I previously missed this in b72a23650f573299aec30846fb844c3558921fb8
2021-03-01 08:46:33 -05:00
Sander de Smalen
c44f10ad3d [AArch64] NFC: Cleanup some SVE cost-model tests.
Moved some of the `sve-getIntrinsicCost-<..>` into a single sve-intrinsics.ll
file, and simplified the tests a bit by bundling all the intrinsics in one
function (instead of testing one intrinsic per function). That makes it easier
to see the cost of the intrinsics.
2021-03-01 13:26:31 +00:00
serge-sans-paille
66ad213da3 Revert "Use the default seed value for djb hash for StringMap"
This reverts commit d84440ec919019ac446241db72cfd905c6ac9dfa.

It breaks (at least) lldb and lld validation

https://lab.llvm.org/buildbot/#/builders/68/builds/7837
https://lab.llvm.org/buildbot/#/builders/36/builds/5495
2021-03-01 14:00:39 +01:00
David Green
042f6e8e77 [AArch64] Add combine for add(udot(0, x, y), z) -> udot(z, x, y).
Given a zero input for a udot, an add can be folded in to take the place
of the input, using thte addition that the instruction naturally
performs.

Differential Revision: https://reviews.llvm.org/D97188
2021-03-01 12:53:34 +00:00
David Green
2e8c4023c8 [AArch64] Adjust dot produce tests. NFC
This regenerates and splits out the dotproduce tests, adding a few extra
tests for upcoming changes.
2021-03-01 12:46:43 +00:00
serge-sans-paille
8e4de8e5a6 Use the default seed value for djb hash for StringMap
See original comment in 560ce2c70fb1fe8e4b9b5e39c54e494a50373ba8
Baiscally the default seed value results in less collision, but changes the
iteration order, which matters for a few test cases.

Differential Revision: https://reviews.llvm.org/D97396
2021-03-01 13:21:27 +01:00
Fraser Cormack
bc12858624 [RISCV] Support INSERT_SUBVECTOR on vector masks
Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem
for vector masks as RVV isn't able to slide mask types around. We choose
instead to bitcast to equivalently-sized i8 types where we can, else we
zero-extend, perform the operation, and truncate back down.

One test was left disabled due to a crash in the legalizer.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97559
2021-03-01 12:04:11 +00:00
Fraser Cormack
21e7d609a7 [RISCV] Fix INSERT/EXTRACT_SUBVECTOR on fractional LMUL types
This patch fixes a bug where the lowering for INSERT_SUBVECTOR and
EXTRACT_SUBVECTOR would insist on first extracting a register-aligned
LMUL1 vector type before perfoming the slide up/down. This was even if
the vector was a fractional LMUL type, in which case the aligned
EXTRACT_SUBVECTOR was invalid.

This issue only occurred for scalable vector types, but a variety of
tests for both scalable and fixed-length vectors have been added to
ensure this does not regress in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97556
2021-03-01 11:51:05 +00:00
Fraser Cormack
b214f42157 [RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.

As before, support for the insertion of mask vectors will come in a
separate patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97543
2021-03-01 11:38:47 +00:00
Fraser Cormack
5b136b7998 [RISCV] Support EXTRACT_SUBVECTOR on vector masks
This patch adds support for extracting subvectors from vector masks.
This can be either extracting a scalable vector from another, or a fixed-length
vector from a fixed-length or scalable vector.

Since RVV lacks a way to slide vector masks down on an element-wise
basis and we don't know the true length of the vector registers, in many
cases we must resort to using equivalently-sized i8 vectors to perform
the operation. When this is not possible we fall back and extend to a
suitable i8 vector.

Support was also added for fixed-length truncation to mask types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97475
2021-03-01 11:20:09 +00:00
Florian Hahn
5ace0d2963 [LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
Simon Pilgrim
fc7ed7f16c [DAG] visitVECTOR_SHUFFLE - attempt to match commuted shuffles with MergeInnerShuffle.
Try to match "shuffle(C, shuffle(A, B, M0), M1) -> shuffle(A, B, M2)" etc. by using MergeInnerShuffle's commuted inner shuffle mode.
2021-03-01 10:42:11 +00:00
Fraser Cormack
dd84fcbdc5 [CodeGen] Fix issues with subvector intrinsic index types
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.

Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.

The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D97459
2021-03-01 10:28:21 +00:00
Serguei Katkov
1ff80409ac [Statepoint Lowering] Consider dead deopt gc values together with other gc values
Currently dead gc value mentioned in the deopt section are not listed in gc section
and so are processed separately.
With this CL all deopt gc values are considered as base pointers and processed in the
same way as other gc values.

The fact that deopt gc pointer is a base pointer was used all the time but
it is explicitly documented here by putting the value in SI.Base.

The idea of the patch comes from Philip Reames.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97554
2021-03-01 17:23:02 +07:00
Simon Pilgrim
33efbc8acf [DAG] visitVECTOR_SHUFFLE - move shuffle canonicalization/merges all under the same legality test. NFCI.
Minor cleanup to move related combines closer together to make it more coherent, without changing the ordering.
2021-03-01 09:42:00 +00:00
Max Kazantsev
13859b219a [NFC] Detect IV increment expressed as uadd_with_overflow and usub_with_overflow
Current callers do not call it with such argument, so this is NFC.
But for further changes, it can be very useful to detect such cases.
2021-03-01 13:24:01 +07:00
Max Kazantsev
e3a0d79f40 [NFC] Introduce function getIVStep for further reuse 2021-03-01 13:04:56 +07:00
Max Kazantsev
3d0fa66f52 [NFC] Whitespace fix 2021-03-01 12:14:03 +07:00
Max Kazantsev
adc4cff078 [NFC] Factor out IV detector function for further reuse 2021-03-01 12:11:54 +07:00
Juneyoung Lee
e428efde6c [SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally
This is a minor change that fixes FoldTwoEntryPHINode to handle
phis with and/ors of select form and binop form equally.
2021-03-01 13:34:51 +09:00
Serguei Katkov
2c429002e4 [Statepoint lowering] Require spill of deopt value in case its type is not legal
If the type of the deopt operand has an illegal type and we want to use
register for it then it needs to be legalized.
This is not supported currently by legalizer and it is not actually clear how to
legalize this type of values.

Instead we just spill such values and use spill slot location in statepoint.

Originally tests were created by Philip Reames.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97541
2021-03-01 10:23:53 +07:00
Craig Topper
ce5d619908 [DAGCombiner][X86] Don't peek through ANDs on the shift amount in matchRotateSub when called from MatchFunnelPosNeg.
Peeking through AND is only valid if the input to both shifts is
the same. If the inputs are different, then the original pattern
ORs the two values when the masked shift amount is 0. This is ok
if the values are the same since the OR would be a NOP which is
why its ok for rotate.

Fixes PR49365 and reverts PR34641

Differential Revision: https://reviews.llvm.org/D97637
2021-02-28 12:58:00 -08:00
Kazu Hirata
a3564b4137 [IR] Use range-based for loops (NFC) 2021-02-28 10:59:23 -08:00
Kazu Hirata
2ca15ec5a6 [TableGen] Use ListSeparator (NFC) 2021-02-28 10:59:22 -08:00
Kazu Hirata
454751eae1 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
Craig Topper
38291ba7f3 [DAGCombiner] Don't skip no overflow check on UMULO if the first computeKnownBits call doesn't return any 0 bits.
Even if the first computeKnownBits call doesn't have any zero
bits it is possible the other operand has bitwidth-1 leading zero.
In that case overflow is still impossible. So always call computeKnownBits
for both operands.
2021-02-28 08:26:22 -08:00
Matt Arsenault
6e55b5e1e6 AMDGPU/GlobalISel: Add subtarget to a test
SelectionDAG forces us to have a weird ABI for 16-bit values without
legal 16-bit operations, but currently GlobalISel bypasses this and
sometimes ends up using the gfx8+ ABI in some contexts. Make sure
we're testing the normal ABI to avoid a test change in a future patch.
2021-02-28 10:29:25 -05:00
Sanjay Patel
a277205632 [InstCombine] avoid infinite loop in demanded bits for select
https://llvm.org/PR49205
2021-02-28 10:17:53 -05:00
David Green
941edbe847 [ARM] VMOVN undef folding
If we insert undef using a VMOVN, we can just use the original value in
three out of the four possible combinations. Using VMOVT into a undef
vector will still require the lanes to be moved, but otherwise the
non-undef value can be used.
2021-02-28 14:44:45 +00:00
Simon Pilgrim
cf1a34fda4 [X86][AVX] Reuse existing VBROADCAST(x) for SCALAR_TO_VECTOR(x)
Similar to what we already do for BROADCASTs of different vector sizes - if we're going to broadcast it anyway might as well reuse it.
2021-02-28 11:37:27 +00:00
David Green
ae50d26182 [ARM] VECTOR_REG_CAST undef -> undef
Propagate undef through VECTOR_REG_CAST nodes, allowing extra
simplification in some patterns.
2021-02-28 11:13:49 +00:00
Wei Mi
510612328d [SampleFDO] Add a cutoff flag to control how many symbols will be included
into profile symbol list.

When test is unrepresentative to production behavior, sample profile
collected from production can cause unexpected performance behavior
in test. To triage such issue, it is useful to have a cutoff flag
to control how many symbols will be included into profile symbol list
in order to do binary search.

Differential Revision: https://reviews.llvm.org/D97623
2021-02-27 23:15:31 -08:00
Craig Topper
ac78a77509 [X86] Add avx512f command lines to vec_smulo and vec_umulo. 2021-02-27 21:16:42 -08:00
Chen Zheng
0abf3f2c78 [Debug-Info][NFC] use emitDwarfUnitLength for debug line section
Use emitDwarfUnitLength for debug line, so we can benefit from
overriding of emitDwarfUnitLength inside different streamers.

Reviewed By: ikudrin, dblaikie

Differential Revision: https://reviews.llvm.org/D95998
2021-02-27 22:33:49 -05:00
William S. Moses
fb8af1f3c6 [Attributor] Conditinoally delete fns
Allow the attributor to delete functions only if requested

Differential Revision: https://reviews.llvm.org/D97238
2021-02-27 20:37:42 -05:00
Craig Topper
fffdcb057e [X86] Fix a couple comments that said LHS where they meant RHS. NFC 2021-02-27 17:14:17 -08:00
Craig Topper
6edf5e9419 [X86] Add back SSE check prefix for vec-umulo.ll. Regenerate vec-smulo.ll. NFC
Simon modified the check prefixes in these tests while D97160
was pending review. When D97160 was commited it wasn't updated
it merge cleanly, but didn't comprehend the check prefix changes.
2021-02-27 15:18:09 -08:00
Tony Tye
e33a5d6364 [NFC][AMDGPU] Document the AMDGPU target feature defaults
Document the default for the XNACK and SRAMECC target features for code object V2-V3 and V4.

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D97598
2021-02-27 18:28:15 +00:00