1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

3875 Commits

Author SHA1 Message Date
Paul Walker
f7143dfdb4 [SVE] Custom ISel for fixed length extract/insert_subvector.
We use extact_subvector and insert_subvector to "cast" between
fixed length and scalable vectors.  This patch adds custom c++
based ISel for the following cases:

  fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
  scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0

Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized
vectors or COPY_TO_REGCLASS otherwise.

Differential Revision: https://reviews.llvm.org/D82871
2020-07-08 09:49:28 +00:00
David Sherwood
bd3697d837 [CodeGen] Fix warnings in sve-ld1-addressing-mode-reg-imm.ll
For the GetElementPtr case in function
  AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.

This changes fixes up all warnings in the following tests:

  llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
  llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

Differential Revision: https://reviews.llvm.org/D83124
2020-07-08 09:16:00 +01:00
Kerry McLaughlin
a7ecd67d40 [SVE][CodeGen] Legalisation of unpredicated store instructions
Summary:
When splitting a store of a scalable type, the new address is
calculated in SplitVecOp_STORE using a vscale and an add instruction.

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83041
2020-07-07 11:47:10 +01:00
Kerry McLaughlin
884e13dafb [SVE][CodeGen] Legalisation of unpredicated load instructions
Summary:
When splitting a load of a scalable type, the new address is
calculated in SplitVecRes_LOAD using a vscale and an add instruction.

This patch also adds a DAG combiner fold to visitADD for vscale:
 - Fold (add (vscale(C0)), (vscale(C1))) to (add (vscale(C0 + C1)))

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82792
2020-07-07 11:05:03 +01:00
David Sherwood
f96f4eb390 [SVE] Add more warnings checks to clang and LLVM SVE tests
There are now more SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.

Differential Revision: https://reviews.llvm.org/D82943
2020-07-07 09:33:20 +01:00
David Sherwood
654ddede3d [SVE][CodeGen] Fix bug when falling back to DAG ISel
In an earlier commit 584d0d5c1749c13625a5d322178ccb4121eea610 I
added functionality to allow AArch64 CodeGen support for falling
back to DAG ISel when Global ISel encounters scalable vector
types. However, it seems that we were not falling back early
enough as llvm::getLLTForType was still being invoked for scalable
vector types.

I've added a new fallback function to the call lowering class in
order to catch this problem early enough, rather than wait for
lowerFormalArguments to reject scalable vector types.

Differential Revision: https://reviews.llvm.org/D82524
2020-07-07 09:23:04 +01:00
David Sherwood
204220c193 [CodeGen] Fix warnings in sve-vector-splat.ll and sve-trunc.ll
This patch fixes all remaining warnings in:

  llvm/test/CodeGen/AArch64/sve-trunc.ll
  llvm/test/CodeGen/AArch64/sve-vector-splat.ll

I hit some warnings related to getCopyPartsToVector. I fixed two
issues:

1. In widenVectorToPartType() we assumed that we'd always be
using BUILD_VECTOR nodes to expand from one vector type to another,
which is incorrect for scalable vector types. I've fixed this for now
by simply bailing out immediately for scalable vectors.
2. In getCopyToPartsVector() I've changed the code to compare
the element counts of different types.

Differential Revision: https://reviews.llvm.org/D83028
2020-07-07 09:21:47 +01:00
Simon Pilgrim
db4d0a9002 Regenerate neon copy tests. NFC.
To simplify the diffs in a patch in development.
2020-07-06 13:58:25 +01:00
Paul Walker
2d7a8bc6d4 [SVE] Fix invalid assert in expand_DestructiveOp.
AArch64ExpandPseudo::expand_DestructiveOp contains an assert to
ensure the destructive operand's register is unique.  However,
this is only required when psuedo expansion emits a movprfx.

A simple example when a movprfx is not required is
  Z0 = FADD_ZPZZ_UNDEF_S P0, Z0, Z0
which expands to an unprefixed FADD_ZPmZ_S instruction.

This patch moves the assert to the places where a movprfx is emitted.

Differential Revision: https://reviews.llvm.org/D83029
2020-07-04 09:21:40 +00:00
Petre-Ionut Tudor
235879e1a0 [ARM] Generate [SU]RHADD from (b - (~a)) >> 1
Summary:
Teach LLVM to recognize the above pattern, which is usually a
transformation of (a + b + 1) >> 1, where the operands are either
signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82669
2020-07-03 16:00:06 +01:00
Luke Geeson
920620bd6d [ARM] Add Cortex-A77 Support for Clang and LLVM
This patch upstreams support for the Arm-v8 Cortex-A77
processor for AArch64 and ARM.

In detail:
- Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang
- Cortex-A77 CPU name and ProcessorModel in llvm

details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77

and a similar submission to GCC can be found here:
e0664b7a63

The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev

Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82887
2020-07-03 13:00:54 +01:00
Sanjay Patel
6e12757f99 [SelectionDAG] don't split branch on logic-of-vector-compares
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.

The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.

For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.

Differential Revision: https://reviews.llvm.org/D82602
2020-07-02 17:05:24 -04:00
Sander de Smalen
67ab949978 [AArch64][SVE] Put zeroing pseudos and patterns under flag.
This patch puts the _ZERO pseudos and corresponding patterns
under the predicate 'UseExperimentalZeroingPseudos', so that they
can be enabled/disabled through compile flags.

This is done because the zeroing pseudos use MOVPRFX to do merging of
the inactive lanes, but it depends on the uarch whether this operation
is actually merged with the destructive operation. If not, it may be
more profitable to use a SELECT and to give the compiler the freedom to
schedule these instructions as normal, rather than keeping them bundled
together. Additionally, this feature is not yet fully implemented and
there are still known bugs (see D80410) that need to be resolved before
the 'experimental' can be dropped from the name.

Reviewers: paulwalker-arm, cameron.mcinally, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82780
2020-07-02 14:24:33 +01:00
Kerry McLaughlin
07d934b054 [AArch64][SVE] Add reg+imm addressing mode for unpredicated stores
Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82985
2020-07-02 12:00:01 +01:00
David Sherwood
4591bf7068 [SVE] Add warnings checks in four more LLVM SVE tests
I have added CHECK lines to the following tests:

  llvm/test/CodeGen/AArch64/sve-breakdown-scalable-vectortype.ll
  llvm/test/CodeGen/AArch64/sve-calling-convention-tuple-types.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-create-tuple.ll
  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

since they are now free of warnings related to invalid use of
EVT::getVectorNumElements() and VectorType::getNumElements().

Differential Revision: https://reviews.llvm.org/D82957
2020-07-02 10:43:17 +01:00
Sander de Smalen
58e4673769 [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

      extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Reviewers: david-arm, efriedma, spatel

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82910
2020-07-02 10:16:43 +01:00
Sander de Smalen
b6c6989aee [AArch64][SVE] Add unpred load/store patterns for bf16 types
Reviewers: kmclaughlin, c-rhodes, efriedma

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82909
2020-07-02 10:01:24 +01:00
David Sherwood
a2113da008 [CodeGen] Fix warnings in DAGCombiner::visitSCALAR_TO_VECTOR
In visitSCALAR_TO_VECTOR we try to optimise cases such as:

  scalar_to_vector (extract_vector_elt %x)

into vector shuffles of %x. However, it led to numerous warnings
when %x is a scalable vector type, so for now I've changed the
code to only perform the combination on fixed length vectors.
Although we probably could change the code to work with scalable
vectors in certain cases, without a proper profit analysis it
doesn't seem worth it at the moment.

This change fixes up one of the warnings in:

  llvm/test/CodeGen/AArch64/sve-merging-stores.ll

I've also added a simplified version of the same test to:

  llvm/test/CodeGen/AArch64/sve-fp.ll

which already has checks for no warnings.

Differential Revision: https://reviews.llvm.org/D82872
2020-07-01 18:47:13 +01:00
James Y Knight
af0734bc33 Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
David Green
d0933009b5 [Outliner] Set nounwind for outlined functions
This prevents the outlined functions from pulling in a lot of unnecessary code
in our downstream libraries/linker. Which stops outlining making codesize
worse in c++ code with no-exceptions.

Differential Revision: https://reviews.llvm.org/D57254
2020-07-01 17:18:34 +01:00
Kerry McLaughlin
11891a6394 [AArch64][SVE] Add reg+imm addressing mode for unpredicated loads
Reviewers: efriedma, sdesmalen, david-arm

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82893
2020-07-01 10:33:56 +01:00
Paul Walker
411d35313e [SVE] Relax merge requirement for IR based divides.
We currently lower SDIV to SDIV_MERGE_OP1. This forces the value
for inactive lanes in a way that can hamper register allocation,
however, the lowering has no requirement for inactive lanes.

Instead this patch replaces SDIV_MERGE_OP1 with SDIV_PRED thus
freeing the register allocator. Once done the only user of
SDIV_MERGE_OP1 is intrinsic lowering so I've removed the node
and perform ISel on the intrinsic directly. This also allows
us to implement MOVPRFX based zeroing in the same manner as SUB.

This patch also renames UDIV_MERGE_OP1 and [F]ADD_MERGE_OP1 for
the same reason but in the ADD cases the ISel code is already
as required.

Differential Revision: https://reviews.llvm.org/D82783
2020-07-01 08:18:42 +00:00
David Sherwood
4bd32c5a01 [SVE][CodeGen] Fix bug in DAGCombiner::reduceBuildVecToShuffle
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.

Differential Revision: https://reviews.llvm.org/D82564
2020-06-30 07:28:15 +01:00
Cullen Rhodes
89c6c9f994 [AArch64][SVE] Add bfloat16 to outstanding tuple vector intrinsics
Summary:
* svget2/3/4
* svset2/3/4
* svcreate2/3/4
* svundef/2/3/4

Reviewers: sdesmalen, kmclaughlin, fpetrogalli, efriedma

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82665
2020-06-29 17:00:58 +00:00
Francesco Petrogalli
b0f83ed2ae [sve][acle] Implement some of the C intrinsics for brain float.
Summary:
The following intrinsics have been extended to support brain float types:

svbfloat16_t svclasta[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclasta[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlasta[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svclastb[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclastb[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlastb[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svdup[_n]_bf16(bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_m(svbfloat16_t inactive, svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_x(svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_z(svbool_t pg, bfloat16_t op)

svbfloat16_t svdupq[_n]_bf16(bfloat16_t x0, bfloat16_t x1, bfloat16_t x2, bfloat16_t x3, bfloat16_t x4, bfloat16_t x5, bfloat16_t x6, bfloat16_t x7)
svbfloat16_t svdupq_lane[_bf16](svbfloat16_t data, uint64_t index)

svbfloat16_t svinsr[_n_bf16](svbfloat16_t op1, bfloat16_t op2)

Reviewers: sdesmalen, kmclaughlin, c-rhodes, ctetreau, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82345
2020-06-29 16:09:08 +00:00
Cullen Rhodes
f038a9beac [AArch64][SVE] Add bfloat16 support to svext intrinsic
Reviewers: sdesmalen, kmclaughlin, efriedma, david-arm, fpetrogalli

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D82391
2020-06-29 11:08:38 +00:00
Kerry McLaughlin
df0704f42c [AArch64][SVE] Bail out of performPostLD1Combine for scalable types
Summary:
performPostLD1Combine will introduce either a LD1LANEpost
or LD1DUPpost node, which will cause selection failure if the
return type is a scalable vector.

Reviewers: sdesmalen, c-rhodes, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82670
2020-06-29 11:59:53 +01:00
Matt Arsenault
c0c2638b8d GlobalISel: Don't fail translate on weak cmpxchg
The translation of cmpxchg added by
9481399c0fd2c198c81b92636c0dcff7d4c41df2 specifically skipped weak
cmpxchg due to not understanding the meaning. Weak cmpxchg was added
in 420a216817def01816186910a2e35885c9201951. As explained in the
commit message, the weak mode is implicit in how
ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular
ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg.

This handling seems weird to me, but this was already following the
DAG behavior. I would expect the strong IR instruction to not have the
boolean output. Failing that, I might expect the IRTranslator to emit
ATOMIC_CMP_SWAP and a constant for the boolean.
2020-06-26 17:52:18 -04:00
Francesco Petrogalli
142ed95012 [sve][acle] Recommit https://reviews.llvm.org/D82501
The original patch was reverted in
ff5ccf258e
as it was missing the C tests that got accidentally missing.

This patch is a NFC of https://reviews.llvm.org/D82501, together with
the SVE ACLE tests for the C intrinsics of svreinterpret for brain
float types.
2020-06-26 20:45:29 +00:00
Francesco Petrogalli
6655379f18 Revert "[sve][acle] Add reinterpret intrinsics for brain float."
This reverts commit a15722c5ce4759c12960fe434ee6bd8aac70bb16.

The commmit has to be reverted because I accidentally submit
https://reviews.llvm.org/D82501 without the C tests that were added in
an early version of the patch.
2020-06-26 20:19:49 +00:00
Paul Walker
2467d01420 [SVE] Code generation for fixed length vector adds.
Summary:
Teach LowerToPredicatedOp to lower fixed length vector operations.

Add AArch64ISD nodes and isel patterns for predicated integer
and floating point adds.

Together this enables SVE code generation for fixed length vector adds.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82483
2020-06-26 19:54:41 +00:00
Sanjay Patel
5dcf739606 [AArch64] add vector test for merged condition branching; NFC 2020-06-26 14:22:11 -04:00
Francesco Petrogalli
fb4e03c951 [sve][acle] Add reinterpret intrinsics for brain float.
Reviewers: kmclaughlin, efriedma, ctetreau, sdesmalen, david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82501
2020-06-26 15:20:58 +00:00
Kerry McLaughlin
eb10451902 [AArch64][SVE] Add bfloat16 support to store intrinsics
Summary:
Bfloat16 support added for the following intrinsics:
 - ST1
 - STNT1

Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82448
2020-06-26 11:05:56 +01:00
Kerry McLaughlin
c4cb941ff9 [AArch64][SVE] Predicate bfloat16 load patterns with HasBF16
Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82464
2020-06-26 10:38:24 +01:00
Cullen Rhodes
af0ff0941f [AArch64][SVE] Guard perm and select bfloat16 intrinsic patterns
Summary:
Permutation and selection bfloat16 intrinsic patterns should be guarded
on the feature flag `+bf16`. Missed in D82182 and D80850.

Reviewers: sdesmalen, fpetrogalli, kmclaughlin, efriedma

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82492
2020-06-26 09:35:36 +00:00
Amara Emerson
b3f9630cd8 [AArch64][GlobalISel] Fix extended shift addressing mode selection not handling sxth.
The complex pattern for extended shift offsets only allow sxtw as the extend,
not sxth. Our equivalent function to do this was not rejecting SXTH so we
were miscompiling. This was exposed by D81992.
2020-06-25 17:24:32 -07:00
Jessica Paquette
3c623abe2f [AArch64][GlobalISel] Port buildvector -> dup pattern from AArch64ISelLowering
Given this:

```
%x:_(<n x sK>) = G_BUILD_VECTOR %lane, ...
...
%y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...)
```

We can produce:

```
%y:_(<n x sK) = G_DUP %lane(sK)
```

Doesn't seem to be too common, but AArch64ISelLowering attempts to do this
before trying to produce a DUPLANE. Might as well port it.

Also make it so that when the splat has an undef mask, we try setting it to
0. SDAG does this, and it makes sure that when we get the build vector operand,
we actually get a source operand.

Differential Revision: https://reviews.llvm.org/D81979
2020-06-25 14:19:06 -07:00
Francesco Petrogalli
fe31663523 [sve][acle] Add some C intrinsics for brain float types.
Summary:
The following intrinsics has been added:

svuint16_t svcnt[_bf16]_m(svuint16_t inactive, svbool_t pg, svbfloat16_t op)
svuint16_t svcnt[_bf16]_x(svbool_t pg, svbfloat16_t op)
svuint16_t svcnt[_bf16]_z(svbool_t pg, svbfloat16_t op)

svbfloat16_t svtbl[_bf16](svbfloat16_t data, svuint16_t indices)

svbfloat16_t svtbl2[_bf16](svbfloat16x2_t data, svuint16_t indices)

svbfloat16_t svtbx[_bf16](svbfloat16_t fallback, svbfloat16_t data, svuint16_t indices)

Reviewers: c-rhodes, kmclaughlin, efriedma, sdesmalen, ctetreau

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82429
2020-06-25 16:31:01 +00:00
Sanjay Patel
8ccfdd4e5c [x86][AArch64] add tests for fmul-fma combine; NFC
As discussed in D80801, there's a possible overstep in
what is allowed by the 'contract' fast-math-flag.
2020-06-24 15:56:32 -04:00
Cullen Rhodes
f2a50c987a [AArch64][SVE2] Add bfloat16 support to whilerw/whilewr intrinsics
Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82399
2020-06-24 10:06:31 +00:00
Cullen Rhodes
8543c38ff5 [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
Summary:
Added for following intrinsics:

  * zip1, zip2, zip1q, zip2q
  * trn1, trn2, trn1q, trn2q
  * uzp1, uzp2, uzp1q, uzp2q
  * splice
  * rev
  * sel

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D82182
2020-06-24 10:04:51 +00:00
Kerry McLaughlin
31c721c75b [AArch64][SVE] Add bfloat16 support to load intrinsics
Summary:
Bfloat16 support added for the following intrinsics:
 - LD1
 - LD1RQ
 - LDNT1
 - LDNF1
 - LDFF1

Reviewers: sdesmalen, c-rhodes, efriedma, stuij, fpetrogalli, david-arm

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82298
2020-06-24 10:32:19 +01:00
Amara Emerson
7f4328cecf [AArch64][GlobalISel] Improve codegen for some constant vectors by using constant pool loads.
There's more smarts in AArch64ISelLowering that we don't have yet, but this
change incrementally improves some of the more common patterns. I think future
iterations will want to use some combination of PostLegalizerCombiner and the
selector to catch the other cases.

Differential Revision: https://reviews.llvm.org/D82340
2020-06-23 19:23:47 -07:00
Eli Friedman
9d315e1c2b Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Eli Friedman
bd863e1c80 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Mikhail Maltsev
14bad468ca [BFloat] Add convert/copy instrinsic support
This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

Specifically it adds intrinsic support in clang and llvm for Arm and AArch64.

The bfloat type, and its properties are specified in the Arm Architecture Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
  - Alexandros Lamprineas
  - Luke Cheeseman
  - Mikhail Maltsev
  - Momchil Velikov
  - Luke Geeson

Differential Revision: https://reviews.llvm.org/D80928
2020-06-23 14:27:05 +00:00
Sander de Smalen
0b0c0d92c0 [AArch64][SVE] ACLE: Add bfloat16 to struct load/stores.
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]

Reviewers: stuij, efriedma, c-rhodes, fpetrogalli

Reviewed By: fpetrogalli

Tags: #clang, #lldb, #llvm

Differential Revision: https://reviews.llvm.org/D82187
2020-06-23 12:12:35 +01:00
Kerry McLaughlin
6ec1271650 [SVE][CodeGen] Legalisation of vsetcc with scalable types
Summary: Changes SplitVecOp_VSETCC to use getVectorElementCount()

Reviewers: sdesmalen, efriedma, dancgr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79167
2020-06-23 11:56:29 +01:00
Paul Walker
833bdd906f [SVE] Code generation for fixed length vector loads & stores.
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.

Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:

  V = load(Addr) =>
    V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))

  store(V, (Addr)) =>
    masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))

Reviewers: rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80385
2020-06-23 09:39:03 +00:00