We use extact_subvector and insert_subvector to "cast" between
fixed length and scalable vectors. This patch adds custom c++
based ISel for the following cases:
fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0
Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized
vectors or COPY_TO_REGCLASS otherwise.
Differential Revision: https://reviews.llvm.org/D82871
For the GetElementPtr case in function
AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.
This changes fixes up all warnings in the following tests:
llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
Differential Revision: https://reviews.llvm.org/D83124
Summary:
When splitting a store of a scalable type, the new address is
calculated in SplitVecOp_STORE using a vscale and an add instruction.
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83041
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.
This patch also adds a DAG combiner fold to visitADD for vscale:
- Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82792
There are now more SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D82943
In an earlier commit 584d0d5c1749c13625a5d322178ccb4121eea610 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.
I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.
Differential Revision: https://reviews.llvm.org/D82524
This patch fixes all remaining warnings in:
llvm/test/CodeGen/AArch64/sve-trunc.ll
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
I hit some warnings related to getCopyPartsToVector. I fixed two
issues:
1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.
Differential Revision: https://reviews.llvm.org/D83028
AArch64ExpandPseudo::expand_DestructiveOp contains an assert to
ensure the destructive operand's register is unique. However,
this is only required when psuedo expansion emits a movprfx.
A simple example when a movprfx is not required is
Z0 = FADD_ZPZZ_UNDEF_S P0, Z0, Z0
which expands to an unprefixed FADD_ZPmZ_S instruction.
This patch moves the assert to the places where a movprfx is emitted.
Differential Revision: https://reviews.llvm.org/D83029
Summary:
Teach LLVM to recognize the above pattern, which is usually a
transformation of (a + b + 1) >> 1, where the operands are either
signed or unsigned types.
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82669
This patch upstreams support for the Arm-v8 Cortex-A77
processor for AArch64 and ARM.
In detail:
- Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang
- Cortex-A77 CPU name and ProcessorModel in llvm
details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77
and a similar submission to GCC can be found here:
e0664b7a63
The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev
Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer
Reviewed By: dmgreen
Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D82887
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.
The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.
For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.
Differential Revision: https://reviews.llvm.org/D82602
This patch puts the _ZERO pseudos and corresponding patterns
under the predicate 'UseExperimentalZeroingPseudos', so that they
can be enabled/disabled through compile flags.
This is done because the zeroing pseudos use MOVPRFX to do merging of
the inactive lanes, but it depends on the uarch whether this operation
is actually merged with the destructive operation. If not, it may be
more profitable to use a SELECT and to give the compiler the freedom to
schedule these instructions as normal, rather than keeping them bundled
together. Additionally, this feature is not yet fully implemented and
there are still known bugs (see D80410) that need to be resolved before
the 'experimental' can be dropped from the name.
Reviewers: paulwalker-arm, cameron.mcinally, efriedma
Reviewed By: paulwalker-arm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82780
I have added CHECK lines to the following tests:
llvm/test/CodeGen/AArch64/sve-breakdown-scalable-vectortype.ll
llvm/test/CodeGen/AArch64/sve-calling-convention-tuple-types.ll
llvm/test/CodeGen/AArch64/sve-intrinsics-create-tuple.ll
llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll
since they are now free of warnings related to invalid use of
EVT::getVectorNumElements() and VectorType::getNumElements().
Differential Revision: https://reviews.llvm.org/D82957
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).
When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:
extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
This patch fixes both issues.
Reviewers: david-arm, efriedma, spatel
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82910
In visitSCALAR_TO_VECTOR we try to optimise cases such as:
scalar_to_vector (extract_vector_elt %x)
into vector shuffles of %x. However, it led to numerous warnings
when %x is a scalable vector type, so for now I've changed the
code to only perform the combination on fixed length vectors.
Although we probably could change the code to work with scalable
vectors in certain cases, without a proper profit analysis it
doesn't seem worth it at the moment.
This change fixes up one of the warnings in:
llvm/test/CodeGen/AArch64/sve-merging-stores.ll
I've also added a simplified version of the same test to:
llvm/test/CodeGen/AArch64/sve-fp.ll
which already has checks for no warnings.
Differential Revision: https://reviews.llvm.org/D82872
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
This prevents the outlined functions from pulling in a lot of unnecessary code
in our downstream libraries/linker. Which stops outlining making codesize
worse in c++ code with no-exceptions.
Differential Revision: https://reviews.llvm.org/D57254
We currently lower SDIV to SDIV_MERGE_OP1. This forces the value
for inactive lanes in a way that can hamper register allocation,
however, the lowering has no requirement for inactive lanes.
Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus
freeing the register allocator. Once done the only user of
SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node
and perform ISel on the intrinsic directly. This also allows
us to implement MOVPRFX based zeroing in the same manner as SUB.
This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for
the same reason but in the ADD cases the ISel code is already
as required.
Differential Revision: https://reviews.llvm.org/D82783
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.
Differential Revision: https://reviews.llvm.org/D82564
Summary:
performPostLD1Combine will introduce either a LD1LANEpost
or LD1DUPpost node, which will cause selection failure if the
return type is a scalable vector.
Reviewers: sdesmalen, c-rhodes, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82670
The translation of cmpxchg added by
9481399c0fd2c198c81b92636c0dcff7d4c41df2 specifically skipped weak
cmpxchg due to not understanding the meaning. Weak cmpxchg was added
in 420a216817def01816186910a2e35885c9201951. As explained in the
commit message, the weak mode is implicit in how
ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular
ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg.
This handling seems weird to me, but this was already following the
DAG behavior. I would expect the strong IR instruction to not have the
boolean output. Failing that, I might expect the IRTranslator to emit
ATOMIC_CMP_SWAP and a constant for the boolean.
The original patch was reverted in
ff5ccf258e
as it was missing the C tests that got accidentally missing.
This patch is a NFC of https://reviews.llvm.org/D82501, together with
the SVE ACLE tests for the C intrinsics of svreinterpret for brain
float types.
This reverts commit a15722c5ce4759c12960fe434ee6bd8aac70bb16.
The commmit has to be reverted because I accidentally submit
https://reviews.llvm.org/D82501 without the C tests that were added in
an early version of the patch.
Summary:
Permutation and selection bfloat16 intrinsic patterns should be guarded
on the feature flag `+bf16`. Missed in D82182 and D80850.
Reviewers: sdesmalen, fpetrogalli, kmclaughlin, efriedma
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D82492
The complex pattern for extended shift offsets only allow sxtw as the extend,
not sxth. Our equivalent function to do this was not rejecting SXTH so we
were miscompiling. This was exposed by D81992.
Given this:
```
%x:_(<n x sK>) = G_BUILD_VECTOR %lane, ...
...
%y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...)
```
We can produce:
```
%y:_(<n x sK) = G_DUP %lane(sK)
```
Doesn't seem to be too common, but AArch64ISelLowering attempts to do this
before trying to produce a DUPLANE. Might as well port it.
Also make it so that when the splat has an undef mask, we try setting it to
0. SDAG does this, and it makes sure that when we get the build vector operand,
we actually get a source operand.
Differential Revision: https://reviews.llvm.org/D81979
There's more smarts in AArch64ISelLowering that we don't have yet, but this
change incrementally improves some of the more common patterns. I think future
iterations will want to use some combination of PostLegalizerCombiner and the
selector to catch the other cases.
Differential Revision: https://reviews.llvm.org/D82340
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
Implement them on top of sdiv/udiv, similar to what we do for integer
types.
Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.
Differential Revision: https://reviews.llvm.org/D81511
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]
Reviewers: stuij, efriedma, c-rhodes, fpetrogalli
Reviewed By: fpetrogalli
Tags: #clang, #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D82187
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.
Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:
V = load(Addr) =>
V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))
store(V, (Addr)) =>
masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))
Reviewers: rengolin, efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80385