This patch adds the mova instruction to insert/extract an SVE vector
register to/from a ZA tile vector.
The preferred MOV aliases are also implemented.
Depends on D105572.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm, CarolineConcatto
Differential Revision: https://reviews.llvm.org/D105574
This patch adds the new system registers introduced in SME:
- ID_AA64SMFR0_EL1 (ro) SME feature identifier.
- SMCR_ELx (r/w) streaming mode control register for configuring
effective SVE Streaming SVE Vector length when the PE is in
Streaming SVE mode.
- SVCR (r/w) streaming vector control register, visible at all
exception levels. Provides access to PSTATE.SM and PSTATE.ZA
using MSR and MRS instructions.
- SMPRI_EL1 (r/w) streaming mode execution priority register.
- SMPRIMAP_EL2 (r/w) streaming mode priority mapping register.
- SMIDR_EL1 (ro) streaming mode identification register.
- TPIDR2_EL0 (r/w) for use by SME software to manage per-thread
SME context.
- MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for
labelling memory accesses performed in streaming mode.
Also added in this patch are the SME mode change instructions.
Three MSR immediate instructions are implemented to set or clear
PSTATE.SM, PSTATE.ZA, or both respectively:
- MSR SVCRSM, #<imm1>
- MSR SVCRZA, #<imm1>
- MSR SVCRSMZA, #<imm1>
The following smstart/smstop aliases are also implemented for
convenience:
smstart -> MSR SVCRSMZA, #1
smstart sm -> MSR SVCRSM, #1
smstart za -> MSR SVCRZA, #1
smstop -> MSR SVCRSMZA, #0
smstop sm -> MSR SVCRSM, #0
smstop za -> MSR SVCRZA, #0
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105576
This patch adds support for following contiguous load and store
instructions:
* LD1B, LD1H, LD1W, LD1D, LD1Q
* ST1B, ST1H, ST1W, ST1D, ST1Q
A new register class and operand is added for the 32-bit vector select
register W12-W15. The differences in the following tests which have been
re-generated are caused by the introduction of this register class:
* llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
* llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir
* llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
* llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir
D88663 attempts to resolve the issue with the store pair test
differences in the AArch64 load/store optimizer.
The GlobalISel differences are caused by changes in the enum values of
register classes, tests have been updated with the new values.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: CarolineConcatto
Differential Revision: https://reviews.llvm.org/D105572
This patch adds support for the following outer product instructions:
* BFMOPA, BFMOPS, FMOPA, FMOPS, SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA,
UMOPS, USMOPA, USMOPS.
Depends on D105570.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105571
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.
SME instructions consist of three types of matrix operands:
* Tiles: a ZA tile is a square, two-dimensional sub-array of elements
within the ZA array. These tiles make up the larger accumulator array
and the granularity varies based on the element size, i.e.
- ZAQ0..ZAQ15 (smallest tile granule)
- ZAD0..ZAD7
- ZAS0..ZAS3
- ZAH0..ZAH1
or ZAB0 (largest tile granule, single tile)
* Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
to tell how the vector at [reg+offset] is layed out in the tile,
horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
to vectors in registers ZAH1 and ZAQ15, respectively.
* Accumulator matrix: this is the entire accumulator array ZA.
This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.
The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Co-authored by: Sander de Smalen (@sdesmalen)
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105570
First patch in a series adding MC layer support for the Arm Scalable
Matrix Extension.
This patch adds the following features:
sme, sme-i64, sme-f64
The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64
features.
If a target supports I16I64 then the following instructions are
implemented:
* 64-bit integer ADDHA and ADDVA variants (D105570).
* SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS
instructions that accumulate 16-bit integer outer products into 64-bit
integer tiles.
If a target supports F64F64 then the FMOPA and FMOPS instructions that
accumulate double-precision floating-point outer products into
double-precision tiles are implemented.
Outer products are implemented in D105571.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: CarolineConcatto
Differential Revision: https://reviews.llvm.org/D105569