Add an `enable_if` to the generic `IntrusiveRefCntPtr` constructors so
that std::is_convertible gives an honest answer when the underlying
pointers cannot be converted. Added `static_assert`s to the test suite
to verify.
Also combine generic constructors from `IntrusiveRefCntPtr<X>&&` and
`const IntrusiveRefCntPtr<X>&`. At first glance this appears to be an
infinite loop, but the real copy/move constructors are spelled out
separately above. Added a unit test to verify.
Differential Revision: https://reviews.llvm.org/D95498
Treat hint instructions like G_ASSERT_ZEXT like COPY instructions in helpers
which walk through copies.
This ensures that instructions like G_ASSERT_ZEXT won't impact any optimizations
that rely on these helpers.
Differential Revision: https://reviews.llvm.org/D95577
Add LLVM to the DW_CFA_LLVM_def_aspace_cfa and
DW_CFA_LLVM_def_aspace_cfa_sf DWARF extensions.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D95640
This de-pessimizes the arguably more usual case of no masked mem intrinsics,
and gets rid of one more Dominator Tree recalculation.
As per llvm/test/CodeGen/X86/opt-pipeline.ll,
there's one more Dominator Tree recalculation left, we could get rid of.
These are widened to a wider UADDE/USUBE, with the overflow value
unused, and with the same synthesis of a new overflow value as for the
O operations.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D95326
This adds a generic opcode which communicates that a type has already been
zero-extended from a narrower type.
This is intended to be similar to AssertZext in SelectionDAG.
For example,
```
%x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16
```
Signifies that the top 48 bits of %x are known to be 0.
This is useful in cases like this:
```
define i1 @zeroext_param(i8 zeroext %x) {
%cmp = icmp ult i8 %x, -20
ret i1 %cmp
}
```
In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit
value.
If we know that `%x` is already zero-ed out in the relevant high bits, we can
avoid the truncate.
Currently, in GISel, this looks like this:
```
_zeroext_param:
and w8, w0, #0xff ; We don't actually need this!
cmp w8, #236
cset w0, lo
ret
```
While SDAG does not produce the truncation, since it knows that it's
unnecessary:
```
_zeroext_param:
cmp w0, #236
cset w0, lo
ret
```
This patch
- Adds G_ASSERT_ZEXT
- Adds MIRBuilder support for it
- Adds MachineVerifier support for it
- Documents it
It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There
should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.)
This allows us to skip over hints in the legalizer etc. These can then later
be selected like COPY instructions or removed.
Differential Revision: https://reviews.llvm.org/D95564
This patch adds the ability to evaluate the state machine for CIE and FDE unwind objects and produce a UnwindTable with all UnwindRow objects needed to unwind registers. It will also dump the UnwindTable for each CIE and FDE when dumping DWARF .debug_frame or .eh_frame sections in llvm-dwarfdump or llvm-objdump. This allows users to see what the unwind rows actually look like for a given CIE or FDE instead of just seeing a list of opcodes.
This patch adds new classes: UnwindLocation, RegisterLocations, UnwindRow, and UnwindTable.
UnwindLocation is a class that describes how to unwind a register or Call Frame Address (CFA).
RegisterLocations is a class that tracks registers and their UnwindLocations. It gets populated when parsing the DWARF call frame instruction opcodes for a unwind row. The registers are mapped from their register numbers to the UnwindLocation in a map.
UnwindRow contains the result of evaluating a row of DWARF call frame instructions for the CIE, or a row from a FDE. The CIE can produce a set of initial instructions that each FDE that points to that CIE will use as the seed for the state machine when parsing FDE opcodes. A UnwindRow for a CIE will not have a valid address, whille a UnwindRow for a FDE will have a valid address.
The UnwindTable is a class that contains a sorted (by address) vector of UnwindRow objects and is the result of parsing all opcodes in a CIE, or FDE. Parsing a CIE should produce a UnwindTable with a single row. Parsing a FDE will produce a UnwindTable with one or more UnwindRow objects where all UnwindRow objects have valid addresses. The rows in the UnwindTable will be sorted from lowest Address to highest after parsing the state machine, or an error will be returned if the table isn't sorted. To parse a UnwindTable clients can use the following methods:
static Expected<UnwindTable> UnwindTable::create(const CIE *Cie);
static Expected<UnwindTable> UnwindTable::create(const FDE *Fde);
A valid table will be returned if the DWARF call frame instruction opcodes have no encoding errors. There are a few things that can go wrong during the evaluation of the state machine and these create functions will catch and return them.
Differential Revision: https://reviews.llvm.org/D89845
Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.
Differential Revision: https://reviews.llvm.org/D95634
and fix a few edge cases that show up in the Swift compiler but
weren't caught by the existing tests. Most notably the old code wasn't
salvaging load operations correctly. The patch also gets rid of the
LoadFromFramePtr argument and replaces it with a more generalized
mechanism.
These instructions have been removed from the 0.94 bitmanip spec.
We should focus on optimizing the codegen without using them.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D95302
This reverts commit ef0dcb506300dc9644e8000c6028d14214be9d97.
This change is causing a lot of compiler crashes inside, sorry I don't have a
small repro/stacktrace with symbols to share right now.
Differential Revision: https://reviews.llvm.org/D95622
Ensure we check the valuetypes of all the HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) shuffle input ops - there was a copy+paste typo (noticed by MSVC analyzer) that meant we were checking the same input from one of the shuffles twice.
I haven't been able to create a test case for this yet - I don't think its currently possible to create a target/faux binary shuffle that scales to a 2x128 shuffle mask from two different value types.
Followup to D92052 as I missed an issue as shown via GCC bug https://gcc.gnu.org/PR97827, namely: (e.g.) ".rodata." implies ELF::SHF_ALLOC.
Crossref:
- D73999 / commit 75af9da755721123e62b45cd0bc0c5e688a9722a
added for LLVM 11 a check that sh_flags and sh_entsize (and sh_type)
changes are an error, in line with GNU assembler.
- D92052 / commit 1deff4009e0ae661b03682901bf6932297ce7ea1
permitted the abbreviated form which many assemblers accept and
GCC generates: while the first .section contains the flags and entsize,
subsequent sections simply contain the name without repeating entsize or
flags.
However, the latter patch missed in the check that some flags are automatically set, e.g. '.rodata." implies ELF::SHF_ALLOC.
Related https://bugs.llvm.org/show_bug.cgi?id=48201
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D94072
The header would include OrcJIT headers in OrcTargetProcess, which is not desired. All common declarations should be in OrcShared.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D95606
The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned
to at least the size of the element type. This adds a check for that
into the ARM lowerInterleavedStore and lowerInterleavedLoad functions,
not creating the intrinsics if they are invalid for the alignment of
the load/store.
Unfortunately this is one of those bug fixes that does effect some
useful codegen, as we were able to sometimes do some nice lowering of
q15 types. But they can cause problem with low aligned pointers.
Differential Revision: https://reviews.llvm.org/D95319
The layout of the stack frame for SVE means that using the frame pointer
rather than the stack pointer for an access to an SVE stack object
removes the need for an additional add to jump over the non-SVE objects.
Likewise the opposite is true for non-SVE stack objects.
This patch allows for the former to be done by having HasFP return true
in the presence of both SVE and non-SVE stack objects, and also fixes a
minor issue whereby the later would not be done for certain offsets.
Unlike VPERMILPS, VPERMILPD can have non-repeating masks in each 128-bit subvector, we weren't accounting for this when folding vperm2f128(vpermilpd(x,c),vpermilpd(y,c)) -> vpermilpd(vperm2f128(x,y),c).
I'm intending to add support for this but wanted to get a minimal fix in first for merging into 12.xx.
Fixes PR48908
SimplifyCFG is an utility pass, and the fact that it does not
preserve DomTree's, forces it's users to somehow workaround that,
likely by not preserving DomTrees's themselves.
Indeed, simplifycfg pass didn't know how to preserve dominator tree,
it took me just under a month (starting with e1133179587dd895962a2fe4d6eb0cb1e63b5ee2)
do rectify that, now it fully knows how to,
there's likely some problems with that still,
but i've dealt with everything i can spot so far.
I think we now can flip the switch.
Note that this is functionally an NFC change,
since this doesn't change the users to pass in the DomTree,
that is a separate question.
Reviewed By: kuhar, nikic
Differential Revision: https://reviews.llvm.org/D94827