Not all combinations of SEW and LMUL we need to support. For example, we
only need to support [M1, M2, M4, M8] for SEW = 64. There is no need to
define pseudos for PseudoVLSE64MF8, PseudoVLSE64MF4, and PseudoVLSE64MF2.
Differential Revision: https://reviews.llvm.org/D95667
If the shuffle mask can't be widened to match the original extracted element width, see if the upper bits are zeroable - which allows us to extract+zero-extend the smaller extraction.
Various *TargetStreamer.h need formatted_raw_ostream but rely on a
forward declaration of formatted_raw_ostream in MCStreamer.h. This
patch adds forward declarations right in *TargetStreamer.h.
While we are at it, this patch removes the one in MCStreamer.h, where
it is unnecessary.
This patch allows targets to define multiple cost
values for each register so that the cost model
can be more flexible and better used during the
register allocation as per the target requirements.
For AMDGPU the VGPR allocation will be more efficient
if the register cost can be associated dynamically
based on the calling convention.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D86836
SCC was not correctly preserved when entering WWM.
Current lit test was unable to detect this as entry block is
handled differently.
Additionally fix an issue where SCC was unnecessarily preserved
when exiting from WWM to Exact mode.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D95500
This adds generic regbankselect support for G_ASSERT_ZEXT.
It inherits whatever register bank the source was given, always, on all targets.
I think that at the point where we run into these, the source register bank
should be decided.
This also adds some AArch64-specific code which makes sure we can handle
G_ASSERT_ZEXT when deciding on register banks for G_STORE, G_PHI, ... etc.
Differential Revision: https://reviews.llvm.org/D95649
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D95509
We try to do this optimization if we can determine that testing for the
truncated bits with an eq/ne predicate results in the same thing as testing
the lower bits.
Differential Revision: https://reviews.llvm.org/D95645
Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.
Differential Revision: https://reviews.llvm.org/D95634
These instructions have been removed from the 0.94 bitmanip spec.
We should focus on optimizing the codegen without using them.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D95302
Ensure we check the valuetypes of all the HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) shuffle input ops - there was a copy+paste typo (noticed by MSVC analyzer) that meant we were checking the same input from one of the shuffles twice.
I haven't been able to create a test case for this yet - I don't think its currently possible to create a target/faux binary shuffle that scales to a 2x128 shuffle mask from two different value types.
The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned
to at least the size of the element type. This adds a check for that
into the ARM lowerInterleavedStore and lowerInterleavedLoad functions,
not creating the intrinsics if they are invalid for the alignment of
the load/store.
Unfortunately this is one of those bug fixes that does effect some
useful codegen, as we were able to sometimes do some nice lowering of
q15 types. But they can cause problem with low aligned pointers.
Differential Revision: https://reviews.llvm.org/D95319
The layout of the stack frame for SVE means that using the frame pointer
rather than the stack pointer for an access to an SVE stack object
removes the need for an additional add to jump over the non-SVE objects.
Likewise the opposite is true for non-SVE stack objects.
This patch allows for the former to be done by having HasFP return true
in the presence of both SVE and non-SVE stack objects, and also fixes a
minor issue whereby the later would not be done for certain offsets.
Unlike VPERMILPS, VPERMILPD can have non-repeating masks in each 128-bit subvector, we weren't accounting for this when folding vperm2f128(vpermilpd(x,c),vpermilpd(y,c)) -> vpermilpd(vperm2f128(x,y),c).
I'm intending to add support for this but wanted to get a minimal fix in first for merging into 12.xx.
Fixes PR48908
Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant
offset for buffer stores. This also helps with merging of these instructions
later on.
Differential Revision: https://reviews.llvm.org/D95242
If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than
64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned
by the function appears to be incorrect and we'll investigate/fix it in a
follow-up commit. However, since this causes miscompiles, we must
temporarily disable emitting this instruction for such values.
This patch adds support for the full range of vector int-to-float,
float-to-int, and float-to-float conversions on legal types.
Many conversions are supported natively in RVV so are lowered with
patterns. These include conversions between (element) types of the same
size, and those that are half/double the size of the input. When
conversions take place between types that are less than half or more
than double the size we must lower them using sequences of instructions
which go via intermediate types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95447
Before the patch it was possible to trigger a constant bus
violation when folding immediates into a shrunk instruction.
The patch adds a check to enforce the legality of the new operand.
Differential Revision: https://reviews.llvm.org/D95527
In d2927f786e877410d90c1e6f0e0c7d99524529c5, I added patterns
to remove (and X, 31) from sllw/srlw/sraw shift amounts.
There is code in SelectionDAGISel.cpp that knows to use
computeKnownBits to fill in bits of the mask that were removed
by SimplifyDemandedBits based on bits being known zero.
The non-W shift patterns use immbottomxlenset which allows the
mask to have more than log2(xlen) trailing ones, but doesn't
have a call to computeKnownBits to fill in bits of the mask that may
have been cleared by SimplifyDemandedBits.
This patch copies code from X86 to handle more than log2(xlen)
bottom bits set and uses computeKnownBits to fill in missing bits
before counting.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95422
We cannot call LRM::unassign() if LRM::assign() was never called
before, these are symmetrical calls. There are two ways of
assigning a physical register to virtual, via LRM::assign() and
via VRM::assignVirt2Phys(). LRM::assign() will call the VRM to
assign the register and then update LiveIntervalUnion. Inline
spiller calls VRM directly and thus LiveIntervalUnion never gets
updated. A call to LRM::unassign() then asserts about inconsistent
liveness.
We have to note that not all callers of the InlineSpiller even
have LRM to pass, RegAllocPBQP does not have it, so we cannot
always pass LRM into the spiller.
The only way to get into that spiller LRE_DidCloneVirtReg() call
is from LiveRangeEdit::eliminateDeadDefs if we split an LI.
This patch refuses to reassign a LiveInterval created by a split
to workaround the problem. In fact we cannot reassign a spill
anyway as all registers of the needed class are occupied and we
are spilling.
Fixes: SWDEV-267996
Differential Revision: https://reviews.llvm.org/D95489
We are allowed to store 128-bit-wide values using the q registers on AArch64.
GlobalISel was clamping the number of elements in vector stores into 64 bits
instead.
This results in some poor codegen like below:
https://godbolt.org/z/E56dq8
```
; SDAG uses a stp + q registers in both cases here.
define void @float(<16 x float> %val, <16 x float>* %ptr) {
store <16 x float> %val, <16 x float>* %ptr
ret void
}
define void @double(<8 x double> %val, <8 x double>* %ptr) {
store <8 x double> %val, <8 x double>* %ptr
ret void
}
```
This adds similar legalization for vector stores with s8 and s16 elements.
Differential Revision: https://reviews.llvm.org/D95107
RISCVBaseInfo.h belongs to the MC layer, but the Pseudo instructions
are only used by the CodeGen layer. So it makes sense to keep this
table in the CodeGen layer.
-Remove the ISD opcode for READ_VL. Just emit the MachineSDNode directly.
-Move segmented fault first only load intrinsic handling completely to
RISCVISelDAGToDAG.cpp and emit the ReadVL MachineSDNode there
instead of lowering to ISD opcodes first.
Remove the RISCVVMVTs namespace because I don't think it provides
a lot of value. If we change the mappings we'd likely have to add
or remove things from the list anyway.
Add a wrapper around addRegisterClass that can determine the
register class from the fixed size of the type.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D95491
This adds sadd.sat, uadd.sat, ssub.sat and usub.sat costs for AArch64,
similar to how they were recently added for ARM.
Differential Revision: https://reviews.llvm.org/D95292
This patch fixes some crashes coming from
`RISCVISelLowering::getSetCCResultType`, which would occasionally return
an EVT constructed from an invalid MVT, which has a null Type pointer.
The attached test shows this happening currently for some fixed-length
vectors, which hit this issue when the V extension was enabled, even
though they're not legal types under the V extension. The fix was also
pre-emptively extended to scalable vectors which can't be represented as
an MVT, even though a test case couldn't be found for them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95434
This adds some simple fp16 scalar_to_vector patterns, preventing a
selection failure if this came up.
Differential Revision: https://reviews.llvm.org/D95427
This makes G_SADDE and G_SSUBE legal in preparation for further work
legalizing overflowing operations. It's fine that they don't have an
instruction selector implementation yet, because G_UADDE and G_USUBE are
already legal on AArch64 without an instruction selector implementation. This
completes the set of G_[SU]{ADD,SUB}[EO] operations on AArch64.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D95325
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.
While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.
STRT, STRHT, and STRBT are store instructions and their source register
$Rt should be treated as an input operand instead of an output operand.
This should fix things (e.g., liveness tracking in LivePhysRegs) if
these instructions were used in CodeGen.
Differential Revision: https://reviews.llvm.org/D95074
Revert the change to use APInt::isSignedIntN from
5ff5cf8e057782e3e648ecf5ccf1d9990b53ee90.
Its clear that the games we were playing to avoid the topological
sort aren't working. So just fix it once and for all.
Fixes PR48888.
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882