This does three things:
1) Adds G_REV16, G_REV32, and G_REV64. These are equivalent to AArch64rev16,
AArch64rev32, and AArch64rev64 respectively.
2) Adds support for producing G_REV64 in the postlegalizer combiner.
We don't legalize any of the shuffles which could give us a G_REV32 or
G_REV16 yet. Since the function for detecting the rev mask is lifted from
AArch64ISelLowering, it should work for G_REV32 and G_REV16 when we get
there.
3) Adds a selection test for a good portion of the patterns imported for the rev
family. The only ones which are not tested are the ones with bitconvert.
This also does a little cleanup, and adds a struct for shuffle vector pseudo
matchdata. This lets us still use `applyShuffleVectorPseudo` rather than adding
a new function.
It should also make it a bit easier to port some of the other masks from
AArch64ISelLowering. (e.g. `isZIP_v_undef_Mask` and friends)
Differential Revision: https://reviews.llvm.org/D81112
As noted in a comment on D80937, all of these are specified as unsigned values, but the verifier code was using signed. Given the practical values involved, the different in range didn't matter, but we might as well clean it up.
Porting the mask stuff for uzp1 and uzp2 from AArch64ISelLowering.
Add two custom opcodes: G_UZP1 and G_UZP2.
Produce them in the post-legalizer combiner when the mask checks out.
Tests:
- postlegalizer-combiner-uzp.mir verifies that we create G_UZP1 and G_UZP2.
The testcases that check that we create them come from neon-perm.ll.
- select-uzp.mir verifies that we can select G_UZP1 and G_UZP2.
Differential Revision: https://reviews.llvm.org/D81049
Currently, gc.relocates are defined in terms of indices into the statepoint's operand list. Given the gc args are at the end of a variable length list of operands, this makes interpreting their indices by hand a tad challenging. We can simplify the statepoint sequence and improve readability quite a bit by pulling these new operands into their own named operand bundle.
This patch defines a new operand bundle tag "gc-live". The semantics of the bundle are the same as the existing gc arguments of a statepoint. This patch simply introduces the definition and codegen for the bundle, future patches will migrate RS4GC to emitting the new form.
Interestingly, with this done and the recent migration to using deopt and gc-transition bundles, we really don't have much left in the statepoint itself. It really looks like the existing ID and flags fields are redundant; we have (existing!) attributes for all of them. I think we'll be able to reduce the gc.statepoint signature to simply a wrapped call (e.g. actual target and actual arguments).
Differential Revision: https://reviews.llvm.org/D80937
lld's symbol resolution algorithm makes it not depend on
the order of object files and libraries, but ld.bfd and
gold require listing dependencies later on the link line.
Put {{libs}} after {{inputs}} so that e.g. -lpthreads
appears after the object files, not before it.
Differential Revision: https://reviews.llvm.org/D81035
This reverts commit 755a89591528b692315ad0325347e2fd4637271b.
Although I was not able to reproduce any test failures locally,
aheejin was able to reproduce them and found a fix, applied here.
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.
Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
Summary:
For retcon and retcon.once coroutines we assume that all uses of spills
can be sunk past coro.begin. This simplifies handling of instructions
that escape the address of an alloca.
The current implementation would have issues if the address of the
alloca is escaped before coro.begin. (It also has issues with casts before and
uses of those casts after the coro.begin instruction)
%alloca_addr = alloca ...
%escape = ptrtoint %alloca_addr
coro.begin
store %escape to %alloca_addr
rdar://60272809
Subscribers: hiraditya, modocache, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81023
Debug sections will not be linked into the final executable and may contain
ambiguous relocations*. Skipping them avoids both some unnecessary processing
cost and the hassle of dealing with the problematic relocations.
* E.g. __debug_ranges contains non-extern relocations to the end of functions
hat begin with named symbols. Under the usual rules for interpreting non-extern
relocations these will be incorrectly associated with the following block, or
no block at all (if there is a gap between one block and the next).
Without this change, names start with 'L' will get created as
temporary symbol in MCContext::createSymbol.
Some other potential prefix considered:
.L, does not work for AIX, as a function start with L will end
up with .L as prefix for its function entry point.
..L could work, but it does not play well with the convention
on AIX that anything start with '.' are considered as entry point.
L. could work, but not sure if it's safe enough, as it's possible
to have suffixes like .something append to a plain L, giving L.something
which is not necessarily a temporary.
That's why we picked L.. for now.
Differential Revision: https://reviews.llvm.org/D80831
In the function "Analysis.cpp:isInTailCallPosition", it only checks whether
a call is in a tail call position if the call has side effects, access memory
or it is not safe to speculative execute. Therefore, a speculatable function
will not go through tail call position check and improperly tail called when
it is not in a tail-call position. This patch enables tail call position check
for speculatable functions.
Differential Revision: https://reviews.llvm.org/D80661
Summary:
In the patch D73152, it adds a new function LiveVariables::addNewBlock.
This new function will add the reg which PHI used to the MBB which reg
is from.
But the new function may cause LiveVariable Verification failed when the
Src reg in PHI is undef.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D80077
Commit 13f6c81c5d9a ("[BPF] simplify zero extension
with MOV_32_64") tried to use MOV_32_64 instructions
instead of lshift/rshift instructions for zero extension.
This has the benefit to remove the number of instructions
and may help verifier too.
But the same commit also removed the old MOV_32_64
pruning as it deems unsafe as MOV_32_64 does have the
side effect, zeroing out the top 32bit in the register.
This caused the following failure in kernel selftest
test_cls_redirect.o. In linux kernel, we have
struct __sk_buff {
__u32 data;
__u32 data_end;
};
The compiler will generate 32bit load for __sk_buff->data
and __sk_buff->data_end. But kernel verifier will actually
loads an address (64bit address on 64bit kernel) to the
result register. In this particular example, the explicit zext
was not optimized away and destroyed top 32bit
address and the verifier rejected the program :
w2 = *(u32 *)(r1 + 76)
...
r2 = w2 /* MOV_32_64: this will clear top 32bit */
Currently, if the load and the zext are next to each other, the
instruction pattern match can actually capture this to
avoid MOV_32_64, e.g., in BPFInstrInfo.td, we have
def : Pat<(i64 (zextloadi32 ADDRri:$src)),
(SUBREG_TO_REG (i64 0), (LDW32 ADDRri:$src), sub_32)>;
However, if they are not next to each other, LDW32 and
MOV_32_64 are generated, which may cause the above mentioned
problem.
BPF Backend already tried to optimize away pattern
mov_32_64 + lshift + rshift
Commit 13f6c81c5d9a may generate mov_32_64 not followed by shifts.
This patch added optimization for only mov_32_64 too.
Differential Revision: https://reviews.llvm.org/D81048
If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly.
This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI.
Differential Revision: https://reviews.llvm.org/D80869
Summary:
The standard data emission directives (e.g. .short, .long) in the AIX assembler
have the unintended consequence of aligning their output to the natural byte
boundary. This cause problems because we aren't expecting behavior from the
Data*bitsDirectives, so the final alignment of data isn't correct in some cases
on AIX.
This patch updated the Data*bitsDirectives to use .vbyte pseudo-ops instead to emit the
data, since we will emit the .align directives as needed. We update the existing
testcases and add a test for emission of struct data.
Reviewers: hubert.reinterpretcast, Xiangling_L, jasonliu
Reviewed By: hubert.reinterpretcast, jasonliu
Subscribers: wuzish, nemanjai, hiraditya, kbarton, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80934
Summary:
Add a basic disassember and regression tests of LEA/LD/ST
instructions. This patch also removes DecoderMethod declarations for
branch and call since those are not implemented in this patch. They
will be added again later. This patch also corrects DecoderMethod for
LD/ST instructions for one byte or two.
Differential Revision: https://reviews.llvm.org/D80912
Currently extracting a lane for a VPValue def is not supported, if it is
managed directly by VPTransformState (e.g. because it is created by a
VPInstruction or an external VPValue def).
For now, simply extract the requested lane. In the future, we should
also cache the extracted scalar values, similar to LV.
Reviewers: Ayal, rengolin, gilr, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D80787
Currently all code instances within the matrix lowering pass consider
matrix A to be MxN and B to be NxK, producing C which is MxK. Anyone
interacting with this API after reading the docs but without reading the pass
would expect A: MxK, B: KxN, and C: MxN. These changes bring the documentation
in line with the implementation.
One point of concern with this, the original signature as described in the docs
may be better or at least more expected. The interface as it was written
reflected other common matrix multiplication interfaces such as BLAS'[1], where
the matrices are MxK, KxN, MxN respectively. Choosing to honor this requires
changing code and tests instead, but should be mostly just renaming of variables.
Patch by Braedy Kuzma <braedy@ualberta.ca>
[1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260
Reviewers: anemet, LuoYuanke, nicolasvasilache, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D80663
Move TargetFrameLowering.h include to the top of the TargetFrameLoweringImpl.cpp includes (clang-format doesn't do this by default as the filenames don't match).
This adds call site info support for call instructions with delay slot.
Search for instructions inside call delay slot, which load value
into parameter forwarding registers.
Return address of the call points to instruction after call delay slot,
which is not the one, immediately after the call instruction.
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D78107