This adds IntrWillReturn to the gfx90a mfma intrinsics, to match all the
other mfma intrinsics, and llvm.amdgcn.live.mask, to match
llvm.amdgcn.ps.live.
Differential Revision: https://reviews.llvm.org/D97675
The new Darwin backend for LLD is now able to link reasonably large
real-world programs on x86_64. For instance, we have achieved
self-hosting for the X86_64 target, where all LLD tests pass when
building lld with itself on macOS. As such, we would like to make it the
default back-end.
The new port is now named `ld64.lld`, and the old port remains
accessible as `ld64.lld.darwinold`
This [annoucement email][1] has some context. (But note that, unlike
what the email says, we are no longer doing this as part of the LLVM 12
branch cut -- instead we will go into LLVM 13.)
Numerous mechanical test changes were required to make this change; in
the interest of creating something that's reviewable on Phabricator,
I've split out the boring changes into a separate diff (D95905). I plan to
merge its contents with those in this diff before landing.
(@gkm made the original draft of this diff, and he has agreed to let me
take over.)
[1]: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147665.html
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D95204
This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b`
as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`.
Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D97360
Update the deletion order when destroying VPBasicBlocks. This ensures
recipes that depend on earlier ones in the block are removed first.
Otherwise this may cause issues when recipes have remaining users later
in the block.
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.
Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.
Differential Revision: https://reviews.llvm.org/D90948
Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined.
Inline sqrt is faster than MASSV call on leppc.
Differential Revision: https://reviews.llvm.org/D97487
The expected use case is for frontends to insert this into
shaders that are to be run under a debugger. The shader can
then be resumed or single stepped from the point of the call
under debugger control.
Differential Revision: https://reviews.llvm.org/D97670
I copied the nearly identical function from AArch64 into AMDGPU, so
fix this duplication.
Mips and X86 have their own more exotic versions which should be
removed. However replacing those is better left for a separate patch
since it requires other changes to avoid regressions.
This was reusing the parent function calling convention instead of the
callee. I'm not sure if there's a case where there's an observable
difference.
I previously missed this in b72a23650f573299aec30846fb844c3558921fb8
Moved some of the `sve-getIntrinsicCost-<..>` into a single sve-intrinsics.ll
file, and simplified the tests a bit by bundling all the intrinsics in one
function (instead of testing one intrinsic per function). That makes it easier
to see the cost of the intrinsics.
Given a zero input for a udot, an add can be folded in to take the place
of the input, using thte addition that the instruction naturally
performs.
Differential Revision: https://reviews.llvm.org/D97188
See original comment in 560ce2c70fb1fe8e4b9b5e39c54e494a50373ba8
Baiscally the default seed value results in less collision, but changes the
iteration order, which matters for a few test cases.
Differential Revision: https://reviews.llvm.org/D97396
Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem
for vector masks as RVV isn't able to slide mask types around. We choose
instead to bitcast to equivalently-sized i8 types where we can, else we
zero-extend, perform the operation, and truncate back down.
One test was left disabled due to a crash in the legalizer.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97559
This patch fixes a bug where the lowering for INSERT_SUBVECTOR and
EXTRACT_SUBVECTOR would insist on first extracting a register-aligned
LMUL1 vector type before perfoming the slide up/down. This was even if
the vector was a fractional LMUL type, in which case the aligned
EXTRACT_SUBVECTOR was invalid.
This issue only occurred for scalable vector types, but a variety of
tests for both scalable and fixed-length vectors have been added to
ensure this does not regress in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97556
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.
As before, support for the insertion of mask vectors will come in a
separate patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97543
This patch adds support for extracting subvectors from vector masks.
This can be either extracting a scalable vector from another, or a fixed-length
vector from a fixed-length or scalable vector.
Since RVV lacks a way to slide vector masks down on an element-wise
basis and we don't know the true length of the vector registers, in many
cases we must resort to using equivalently-sized i8 vectors to perform
the operation. When this is not possible we fall back and extend to a
suitable i8 vector.
Support was also added for fixed-length truncation to mask types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97475
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.
The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.
This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.
One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.
Reviewed By: lebedev.ri, ebrevnov
Differential Revision: https://reviews.llvm.org/D75980
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.
Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.
The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D97459
Currently dead gc value mentioned in the deopt section are not listed in gc section
and so are processed separately.
With this CL all deopt gc values are considered as base pointers and processed in the
same way as other gc values.
The fact that deopt gc pointer is a base pointer was used all the time but
it is explicitly documented here by putting the value in SI.Base.
The idea of the patch comes from Philip Reames.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97554
If the type of the deopt operand has an illegal type and we want to use
register for it then it needs to be legalized.
This is not supported currently by legalizer and it is not actually clear how to
legalize this type of values.
Instead we just spill such values and use spill slot location in statepoint.
Originally tests were created by Philip Reames.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97541
Peeking through AND is only valid if the input to both shifts is
the same. If the inputs are different, then the original pattern
ORs the two values when the masked shift amount is 0. This is ok
if the values are the same since the OR would be a NOP which is
why its ok for rotate.
Fixes PR49365 and reverts PR34641
Differential Revision: https://reviews.llvm.org/D97637
Even if the first computeKnownBits call doesn't have any zero
bits it is possible the other operand has bitwidth-1 leading zero.
In that case overflow is still impossible. So always call computeKnownBits
for both operands.
SelectionDAG forces us to have a weird ABI for 16-bit values without
legal 16-bit operations, but currently GlobalISel bypasses this and
sometimes ends up using the gfx8+ ABI in some contexts. Make sure
we're testing the normal ABI to avoid a test change in a future patch.
If we insert undef using a VMOVN, we can just use the original value in
three out of the four possible combinations. Using VMOVT into a undef
vector will still require the lanes to be moved, but otherwise the
non-undef value can be used.
into profile symbol list.
When test is unrepresentative to production behavior, sample profile
collected from production can cause unexpected performance behavior
in test. To triage such issue, it is useful to have a cutoff flag
to control how many symbols will be included into profile symbol list
in order to do binary search.
Differential Revision: https://reviews.llvm.org/D97623
Use emitDwarfUnitLength for debug line, so we can benefit from
overriding of emitDwarfUnitLength inside different streamers.
Reviewed By: ikudrin, dblaikie
Differential Revision: https://reviews.llvm.org/D95998
Simon modified the check prefixes in these tests while D97160
was pending review. When D97160 was commited it wasn't updated
it merge cleanly, but didn't comprehend the check prefix changes.
Document the default for the XNACK and SRAMECC target features for code object V2-V3 and V4.
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D97598