1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00
Commit Graph

3627 Commits

Author SHA1 Message Date
Jessica Paquette
8e6b545e6e [GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.

We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:

- For a def instruction, it only matters if the def instruction must always
  output a value stored on a FPR

- For a use instruction, it only matters if the use instruction must always
  only take in values stored in FPRs

This adds two new functions:

- onlyUsesFP
- onlyDefinesFP

Then we can use those when we're checking the uses/defs instead.

Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.

Differential Revision: https://reviews.llvm.org/D62426

llvm-svn: 361679
2019-05-24 23:08:45 +00:00
Jessica Paquette
64b8570062 [GlobalISel][AArch64] NFC: Factor out HasFPConstraints into a proper function
Factor it out into a function, and replace places where we had the same check
with the new function.

Differential Revision: https://reviews.llvm.org/D62421

llvm-svn: 361677
2019-05-24 22:12:21 +00:00
Jessica Paquette
905dc0f2a9 [GlobalISel][AArch64] Improve register bank mappings for G_SELECT
The fcsel and csel instructions differ in only the register banks they work on.

So, they're entirely interchangeable otherwise.

With this in mind, this does two things:

- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
  the outputs.
- Teach it to choose the best register bank mapping based off the constraints
  of the inputs and outputs.

The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.

For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.

Also update the regbank-select.mir to check that we end up with the right
select instruction.

Differential Revision: https://reviews.llvm.org/D62267

llvm-svn: 361665
2019-05-24 19:35:25 +00:00
Nick Desaulniers
e9fdb3a112 [AArch64] check for INLINEASM_BR along w/ INLINEASM
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.

pr/41999

Reviewers: t.p.northover, peter.smith

Reviewed By: peter.smith

Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62402

llvm-svn: 361661
2019-05-24 19:00:13 +00:00
Cullen Rhodes
7236b9fc52 [AArch64][SVE2] Asm: support SVE2 String Processing Group
Summary:
Patch adds support for the SVE2 character match instructions MATCH and NMATCH.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62206

llvm-svn: 361627
2019-05-24 10:32:01 +00:00
Cullen Rhodes
c8f153b14d [AArch64][SVE2] Asm: support SVE2 Narrowing Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise shift right narrow:
    * SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT,
      SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB,
      UQRSHRNT

SVE2 integer add/subtract narrow high part:
    * ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT

SVE2 saturating extract narrow:
    * SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62205

llvm-svn: 361624
2019-05-24 10:22:30 +00:00
Cullen Rhodes
0c30c890bf [AArch64][SVE2] Asm: support SVE2 Accumulate Group
Summary:
Patch adds support for the following instructions:

SVE2 bitwise shift and insert:
    * SRI, SLI

SVE2 bitwise shift right and accumulate:
    * SSRA, USRA, SRSRA, URSRA

SVE2 complex integer add:
    * CADD, SQCADD

SVE2 integer absolute difference and accumulate:
    * SABA, UABA

SVE2 integer absolute difference and accumulate long:
    * SABALB, SABALT, UABALB, UABALT

SVE2 integer add/subtract long with carry:
    * ADCLB, ADCLT, SBCLB, SBCLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62204

llvm-svn: 361622
2019-05-24 10:10:34 +00:00
Cullen Rhodes
8c9c640bf8 [AArch64][SVE2] Asm: add PMULLB/PMULLT instructions
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62145

llvm-svn: 361619
2019-05-24 09:56:23 +00:00
Cullen Rhodes
79a36d0606 [AArch64][SVE2] Asm: add integer add/sub long/wide instructions
Summary:
Patch adds support for the following instructions:

SVE2 integer add/subtract long:
    * SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT,
      SABDLB, SABDLT, UABDLB, UABDLT

SVE2 integer add/subtract wide:
    * SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62142

llvm-svn: 361615
2019-05-24 09:28:27 +00:00
Cullen Rhodes
a2bc20e66c [AArch64][SVE2] Asm: add various bitwise shift instructions
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:

    * SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
      SQSHLR, UQSHLR, SQRSHLR, UQRSHLR

Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62140

llvm-svn: 361612
2019-05-24 09:17:23 +00:00
Cullen Rhodes
5d9e48d62f [AArch64][SVE2] Asm: add saturating add/sub instructions
Summary:
Patch adds support for the following instructions:

    * SQADD, UQADD, SUQADD, USQADD
    * SQSUB, UQSUB, SQSUBR, UQSUBR

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62130

llvm-svn: 361611
2019-05-24 09:06:37 +00:00
Cullen Rhodes
72174ee169 [AArch64][SVE2] Asm: fix overlapping bit
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.

Raised by Momchil: https://reviews.llvm.org/D62130

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D62292

llvm-svn: 361609
2019-05-24 08:45:37 +00:00
Tim Northover
3994d25957 GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.

llvm-svn: 361608
2019-05-24 08:40:13 +00:00
Reid Kleckner
c46660a5e0 [AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.

Fixes PR41997

Reviewers: mgrang, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62344

llvm-svn: 361585
2019-05-24 01:27:20 +00:00
Serge Pavlov
7d595fa212 [AArch64] Add nvcast patterns for v2f32 -> v1f64
Summary: Constant stores of f32 values can create such NvCast nodes.

Reviewers: t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62285

llvm-svn: 361584
2019-05-24 01:20:34 +00:00
Alexey Lapshin
1664547e1d [DebugInfo][AArch64] Recognise target specific instruction as mov instr
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :

 undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;

It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.

llvm-svn: 361417
2019-05-22 18:48:58 +00:00
Sjoerd Meijer
ef37506e4d [AArch64] Subtarget crypto extension defaults
The Armv8.2-A crypto extensions all defaulted to true, but should default to
false, like all the other extensions.

Differential Revision: https://reviews.llvm.org/D62180

llvm-svn: 361354
2019-05-22 07:10:27 +00:00
Florian Hahn
fc5d6589f6 [AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.

Fixes PR41951

Reviewers: t.p.northover, dmgreen, samparker

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D60690

llvm-svn: 361235
2019-05-21 10:05:26 +00:00
Cullen Rhodes
861f8b4a86 [AArch64][SVE2] Asm: add integer unary instructions (predicated)
Summary:
Patch adds support for the following instructions:

    * URECPE, URSQRTE, SQABS, SQNEG

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62129

llvm-svn: 361230
2019-05-21 09:06:51 +00:00
Cullen Rhodes
855ae54f60 [AArch64][SVE2] Asm: add integer pairwise arithmetic instructions
Summary:
Patch adds support for the following instructions:

    ADDP, SMAXP, UMAXP, SMINP, UMINP

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62128

llvm-svn: 361229
2019-05-21 08:59:00 +00:00
Martin Storsjo
d4561248c0 [AArch64] Handle lowering lround on windows, where long is 32 bit
Differential Revision: https://reviews.llvm.org/D62108

llvm-svn: 361192
2019-05-20 19:53:28 +00:00
Cullen Rhodes
31fcfd404f [AArch64][SVE2] Asm: add SADALP and UADALP instructions
Summary:
This patch adds support for the integer pairwise add and accumulate long
instructions SADALP/UADALP. These instructions are predicated.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62001

llvm-svn: 361154
2019-05-20 13:50:15 +00:00
Cullen Rhodes
9d12f27a52 [AArch64][SVE2] Asm: add int halving add/sub (predicated) instructions
Summary:
This patch adds support for the predicated integer halving add/sub
instructions:

    * SHADD, UHADD, SRHADD, URHADD
    * SHSUB, UHSUB, SHSUBR, UHSUBR

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D62000

llvm-svn: 361136
2019-05-20 10:35:23 +00:00
Cullen Rhodes
54ce3bc691 [AArch64][SVE2] Asm: add saturating multiply-add interleaved long instructions
Summary:
Patch adds support for SQDMLALBT and SQDMLSLBT instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61998

llvm-svn: 361135
2019-05-20 10:29:48 +00:00
Cullen Rhodes
7adb348bc8 [AArch64][SVE2] Asm: add saturating multiply-add long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D61997

llvm-svn: 361005
2019-05-17 09:29:43 +00:00
Cullen Rhodes
2272f32d5a [AArch64][SVE2] Asm: add integer multiply-add long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61951

llvm-svn: 361003
2019-05-17 09:19:41 +00:00
Cullen Rhodes
6bda113a4e [AArch64][SVE2] Asm: add integer multiply long instructions
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:

    * SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61936

llvm-svn: 361002
2019-05-17 09:04:44 +00:00
Fangrui Song
f78f3148bd [AArch64] Support .reloc *, R_AARCH64_NONE, *
Summary:
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D61973

llvm-svn: 360981
2019-05-17 03:05:07 +00:00
Adhemerval Zanella
101145b734 [AArch64] Handle ISD::LROUND and ISD::LLROUND
This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas
instruction. It currently only handles the scalar version.

llvm-svn: 360894
2019-05-16 13:30:18 +00:00
Cullen Rhodes
7ef8584543 [AArch64][SVE2] Asm: implement CMLA/SQRDCMLAH instructions
Summary:
This patch adds support for the indexed and unpredicated vectors forms
of the CMLA and SQRDCMLAH instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D61906

llvm-svn: 360871
2019-05-16 09:42:22 +00:00
Cullen Rhodes
2eaf0460bc [AArch64][SVE2] Asm: implement CDOT instruction
Summary:
The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:

Vector form, e.g.
  cdot z0.s, z1.b, z2.b, #90    - complex dot product on four 8-bit quad-tuplets,
                                  accumulating results in 32-bit elements. The
                                  complex numbers in the second source vector are
                                  rotated by 90 degrees.

  cdot z0.d, z1.h, z2.h, #180   - complex dot product on four 16-bit quad-tuplets,
                                  accumulating results in 64-bit elements.
                                  The complex numbers in the second source
                                  vector are rotated by 180 degrees.

Indexed form, e.g.
  cdot z0.s, z1.b, z2.b[3], #0  - complex dot product on four 8-bit quad-tuplets,
                                  with specified quadtuplet from second source vector,
                                  accumulating results in 32-bit elements.
  cdot z0.d, z1.h, z2.h[1], #0  - complex dot product on four 16-bit quad-tuplets,
                                  with specified quadtuplet from second source vector,
                                  accumulating results in 64-bit elements.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer, rovka

Differential Revision: https://reviews.llvm.org/D61903

llvm-svn: 360870
2019-05-16 09:33:44 +00:00
Cullen Rhodes
f96af16cbf [AArch64][SVE2] Asm: add unpredicated integer multiply instructions
Summary:
Add support for the following instructions:

  * MUL (indexed and unpredicated vectors forms)
  * SQDMULH (indexed and unpredicated vectors forms)
  * SQRDMULH (indexed and unpredicated vectors forms)
  * SMULH (unpredicated, predicated form added in SVE)
  * UMULH (unpredicated, predicated form added in SVE)
  * PMUL (unpredicated)

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: SjoerdMeijer, rovka

Differential Revision: https://reviews.llvm.org/D61902

llvm-svn: 360867
2019-05-16 09:07:26 +00:00
Mandeep Singh Grang
4fdc3d2e09 [AArch64] only indicate CFI on Windows if we emitted CFI
Summary:
Otherwise, we emit directives for CFI without any actual CFI opcodes to
go with them, which causes tools to malfunction.  The technique is
similar to what the x86 backend already does.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40876

Patch by: froydnj (Nathan Froyd)

Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61960

llvm-svn: 360816
2019-05-15 21:23:41 +00:00
Richard Trieu
4e494af6c5 [AArch64] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360709
2019-05-14 21:33:53 +00:00
Cullen Rhodes
81830575d0 [AArch64][SVE2] Asm: add SQRDMLAH/SQRDMLSH instructions
Summary:
This patch adds support for the indexed and unpredicated vectors forms of the
SQRDMLAH and SQRDMLSH instructions.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61515

llvm-svn: 360683
2019-05-14 15:10:16 +00:00
Cullen Rhodes
dc081e21c0 [AArch64][SVE2] Asm: add integer multiply-add/subtract (indexed) instructions
Summary:
This patch adds support for the following instructions:

  MLA mul-add, writing addend (Zda = Zda +  Zn * Zm[idx])
  MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx])

Predicated forms of these instructions were added in SVE.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D61514

llvm-svn: 360682
2019-05-14 15:01:00 +00:00
Tim Northover
18a8d29140 AArch64: support binutils-like things on arm64_32.
This adds support for the arm64_32 watchOS ABI to LLVM's low level tools,
teaching them about the specific MachO choices and constants needed to
disassemble things.

llvm-svn: 360663
2019-05-14 11:25:44 +00:00
Cullen Rhodes
f378d68edf [AArch64][SVE2] Add SVE2 target features to backend and TargetParser
Summary:
This patch adds the following features defined by Arm SVE2 architecture
extension:

  sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm

For existing CPUs these features are declared as unsupported to prevent
scheduler errors.

The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest

Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka

Reviewed By: SjoerdMeijer, rovka

Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61513

llvm-svn: 360573
2019-05-13 10:10:24 +00:00
Richard Trieu
29ecf8ea25 [AArch64] Move InstPrinter files to MCTargetDesc. NFC
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc.  Merging them together will fix this.  For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.

llvm-svn: 360486
2019-05-10 23:50:01 +00:00
Bill Wendling
3bb1a9e50f Add ".dword" directive
Summary:
The ".dword" directive is a synonym for ".xword" and is used used
by klibc, a minimalistic libc subset for initramfs.

Reviewers: t.p.northover, nickdesaulniers

Reviewed By: nickdesaulniers

Subscribers: nickdesaulniers, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61719

llvm-svn: 360381
2019-05-09 21:57:44 +00:00
Simon Pilgrim
3a3c43bf2f [AArch64] Remove scan-build "Value stored during its initialization is never read" warnings. NFCI.
llvm-svn: 360268
2019-05-08 16:29:39 +00:00
Simon Pilgrim
9a42a780eb [AArch64] Fix scan-build null/uninitialized pointer warnings. NFCI.
llvm-svn: 360267
2019-05-08 16:27:24 +00:00
Martin Storsjo
ed4e27a076 [AArch64] Default to SEH exception handling on MinGW
The SEH implementation is pretty mature at this point.

Differential Revision: https://reviews.llvm.org/D61590

llvm-svn: 360080
2019-05-06 21:18:15 +00:00
Amara Emerson
79073a3227 [GlobalISel] Handle <1 x T> vector return types properly.
After support for dealing with types that need to be extended in some way was
added in r358032 we didn't correctly handle <1 x T> return types. These types
don't have a GISel direct representation, instead we just see them as scalars.
When we need to pad them into <2 x T> types however we need to use a
G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR.

This fixes PR41738.

llvm-svn: 360068
2019-05-06 19:41:01 +00:00
Jessica Paquette
db19e9797c [AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs
This saves us some unnecessary copies.

If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.

Changes here are...

- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.

Also add two tests:

- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved

And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.

llvm-svn: 359940
2019-05-03 22:37:46 +00:00
Mandeep Singh Grang
ab504d3276 [COFF, ARM64] Fix ABI implementation of struct returns
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Related clang patch: D60349

Reviewers: rnk, efriedma, TomTan, ssijaric

Reviewed By: rnk, efriedma

Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60348

llvm-svn: 359934
2019-05-03 21:12:36 +00:00
Simon Pilgrim
2628d88f52 Avoid cppcheck operator precedence warnings. NFCI.
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)

llvm-svn: 359884
2019-05-03 13:50:38 +00:00
Eli Friedman
f714f84cd7 [AArch64][MC] Reject "add x0, x1, w2, lsl #1" etc.
Looks like just a minor oversight in the parsing code.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.

Differential Revision: https://reviews.llvm.org/D60840

llvm-svn: 359855
2019-05-03 00:59:52 +00:00
Evandro Menezes
46d8190766 [AArch64] Update for Exynos
Fix the forwarding of multiplication results for Exynos M4.

llvm-svn: 359834
2019-05-02 22:01:39 +00:00
Jessica Paquette
62380e005c [GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.

Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.

llvm-svn: 359734
2019-05-01 22:39:43 +00:00
Sjoerd Meijer
8238653a0e [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Jessica Paquette
c024feffc8 [GlobalISel][AArch64] Select llvm.aarch64.crypto.sha1h
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.

Also factor out the code for finding an intrinsic ID.

llvm-svn: 359501
2019-04-29 20:58:17 +00:00
Cullen Rhodes
ffbe24d53e [AArch64][SVE] Asm: add aliases for unpredicated bitwise logical instructions
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.

llvm-svn: 359457
2019-04-29 15:27:27 +00:00
Jessica Paquette
2be787ea59 [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.

Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.

llvm-svn: 359351
2019-04-26 21:53:13 +00:00
Nick Desaulniers
02a0e7f7fc [AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337
2019-04-26 18:45:04 +00:00
Jessica Paquette
1b1fd6193b [AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.

Update select-bswap.mir and arm64-rev.ll to reflect the changes.

llvm-svn: 359331
2019-04-26 18:00:01 +00:00
Hans Wennborg
83fd9bda79 Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry()
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.

Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)

Differential revision: https://reviews.llvm.org/D61124

llvm-svn: 359287
2019-04-26 08:31:00 +00:00
Jessica Paquette
620cbaeb4e [GlobalISel][AArch64] Make G_EXTRACT_VECTOR_ELT legal for v8s16s
This case was missing before, so we couldn't legalize it.

Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.

llvm-svn: 359231
2019-04-25 20:00:57 +00:00
Jessica Paquette
a49fa69bbb [GlobalISel][AArch64] Add generic legalization rule for extends
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).

Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.

Differential Revision: https://reviews.llvm.org/D60889

llvm-svn: 359222
2019-04-25 18:42:00 +00:00
Jessica Paquette
fd3dd81348 [GlobalISel][AArch64] Legalize G_FNEARBYINT
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.

Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.

llvm-svn: 359204
2019-04-25 16:44:40 +00:00
Jessica Paquette
a799766d83 [AArch64][GlobalISel] Select G_INTRINSIC_ROUND
Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.

llvm-svn: 359046
2019-04-23 23:03:03 +00:00
Jessica Paquette
6cf54d8ec9 [AArch64][GlobalISel] Mark G_INTRINSIC_ROUND as a pre-isel floating point opcode
Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its
input and output are assigned the correct register bank.

Add a regbankselect test to verify that we get what we expect here.

llvm-svn: 359044
2019-04-23 22:47:00 +00:00
Jessica Paquette
b31b0d9e64 [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND
Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.

llvm-svn: 359033
2019-04-23 21:11:57 +00:00
Jessica Paquette
d298f73700 [AArch64][GlobalISel] Actually select G_INTRINSIC_TRUNC
Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.

Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.

I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.

llvm-svn: 359030
2019-04-23 20:46:19 +00:00
Jessica Paquette
0651875195 [AArch64][GlobalISel] Teach regbankselect about G_INTRINSIC_TRUNC
Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test.

Update arm64-vfloatintrinsics.ll now that we can select it.

llvm-svn: 359022
2019-04-23 18:20:47 +00:00
Jessica Paquette
ff3cf1d228 [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC
Same patch as G_FCEIL etc.

Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.

llvm-svn: 359021
2019-04-23 18:20:44 +00:00
Jessica Paquette
d1184e3a9f [AArch64][GlobalISel] Legalize G_FMA for more vector types
Same as G_FCEIL, G_FABS, etc. Just move it into that rule.

Add a legalizer test for G_FMA, which we didn't have before and update
arm64-vfloatintrinsics.ll.

llvm-svn: 359015
2019-04-23 17:37:56 +00:00
Jessica Paquette
2b697e981f [AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcode
Noticed an unnecessary fallback in arm64-vmul caused by this.

Also add a regbankselect test for G_FMA.

llvm-svn: 359013
2019-04-23 17:17:06 +00:00
Simon Pilgrim
f3464e0685 Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 358969
2019-04-23 11:11:34 +00:00
Javed Absar
39dfbd5229 [AArch64] Add support for MTE intrinsics
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486

llvm-svn: 358963
2019-04-23 09:39:58 +00:00
Amara Emerson
4cebcd0399 Revert r358800. Breaks Obsequi from the test suite.
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.

llvm-svn: 358829
2019-04-20 21:25:00 +00:00
Amara Emerson
8abbc39fc7 Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.

llvm-svn: 358800
2019-04-19 23:54:44 +00:00
Jessica Paquette
d57f5959df [GlobalISel][AArch64] Legalize + select G_FRINT
Exactly the same as G_FCEIL, G_FABS, etc.

Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.

Differential Revision: https://reviews.llvm.org/D60895

llvm-svn: 358799
2019-04-19 23:41:52 +00:00
Eli Friedman
1ee278d235 [AArch64] Fix checks for AArch64MCExpr::VK_SABS flag.
VK_SABS is part of the SymLoc bitfield in the variant kind which should
be compared for equality, not by checking the VK_SABS bit.

As far as I know, the existing code happened to produce the correct
results in all cases, so this is just a cleanup.

Patch by Stephen Crane.

Differential Revision: https://reviews.llvm.org/D60596

llvm-svn: 358788
2019-04-19 21:58:10 +00:00
Amara Emerson
771972a5c7 Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"
This introduces some runtime failures which I'll need to investigate further.

llvm-svn: 358771
2019-04-19 17:42:13 +00:00
Jessica Paquette
26fc1186e9 [GlobalISel][AArch64] Legalize vector G_FPOW
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.

Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.

Differential Revision: https://reviews.llvm.org/D60218

llvm-svn: 358764
2019-04-19 16:28:08 +00:00
Bjorn Pettersson
e9abad235c [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

llvm-svn: 358744
2019-04-19 09:08:38 +00:00
Jessica Paquette
b1d320f69e [GlobalISel][AArch64] Legalize/select G_(S/Z/ANY)_EXT for v8s8s
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.

We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.

This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.

Differential Revision: https://reviews.llvm.org/D60881

llvm-svn: 358715
2019-04-18 21:15:48 +00:00
Jessica Paquette
872bc88b70 [GlobalISel][AArch64] Legalize v8s8 loads
Add legalizer support for loads of v8s8 and update legalize-load-store.mir.

Differential Revision: https://reviews.llvm.org/D60877

llvm-svn: 358714
2019-04-18 21:13:58 +00:00
Nick Desaulniers
85ce9e7ccd [AsmPrinter] hoist %a output template to base class for ARM+Aarch64
Summary:
X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do
basically the same thing (Aarch64 did not correctly handle immediates,
ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses
%a with an immediate) for a flag that should be target independent
anyways.

Reviewers: echristo, peter.smith

Reviewed By: echristo

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60841

llvm-svn: 358618
2019-04-17 22:21:10 +00:00
Amara Emerson
c8667e9456 [GlobalISel] Add legalization support for non-power-2 loads and stores
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.

This matches how SelectionDAG handles these operations.

Differential Revision: https://reviews.llvm.org/D59971

llvm-svn: 358613
2019-04-17 21:30:07 +00:00
Amara Emerson
5d4c0a60e2 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
Amara Emerson
442ab2552d [GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.

This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.

llvm-svn: 358368
2019-04-15 04:53:46 +00:00
Amara Emerson
249244f9ca [AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash.
This enables the simple copy combine that already exists in the CombinerHelper.
However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a
set of MIs to process, and so would end up causing a crash when deleted MIs were
being added to the combiner worklist again.

Differential Revision: https://reviews.llvm.org/D60579

llvm-svn: 358318
2019-04-13 00:33:25 +00:00
Amara Emerson
f16c13f916 [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element.
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.

llvm-svn: 358312
2019-04-12 21:31:21 +00:00
Eric Christopher
b7a05e858c Add explicit dependencies on MCSection.h and MCDwarf.h to the .cpp
files rather than rely on transitive includes from MCStreamer.h.

llvm-svn: 358263
2019-04-12 07:40:01 +00:00
Amara Emerson
19a1782c85 [AArch64][GlobalISel] Flesh out vector load/store support for more types.
Some of these were legalizing into smaller vector types unnecessarily,
others were simply not supported yet.

llvm-svn: 358223
2019-04-11 20:40:01 +00:00
Amara Emerson
a14e5c054e [AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers.
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.

This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.

Differential Revision: https://reviews.llvm.org/D60534

llvm-svn: 358221
2019-04-11 20:32:24 +00:00
Diogo N. Sampaio
db0243a51e [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64
Summary:  Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64

Reviewers: pbarrio, DavidSpickett, LukeGeeson

Reviewed By: LukeGeeson

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60259

llvm-svn: 358171
2019-04-11 14:19:43 +00:00
Amara Emerson
81b0599afa [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.
The existing isel support already works for p0 once the legalizer accepts it.

llvm-svn: 358144
2019-04-10 23:06:14 +00:00
Amara Emerson
b7a6d81ca1 [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.
llvm-svn: 358143
2019-04-10 23:06:11 +00:00
Amara Emerson
ba18fe7b6a [AArch64][GlobalISel] Scalarize vector SDIV.
llvm-svn: 358142
2019-04-10 23:06:08 +00:00
Craig Topper
441f15067b [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Nick Desaulniers
ba87dcb4ad [AsmPrinter] refactor to remove remove AsmVariant. NFC
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.

Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.

This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.

Reviewers: craig.topper

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60488

llvm-svn: 358101
2019-04-10 16:38:43 +00:00
Diogo N. Sampaio
072614629e [AArch64] Add lowering pattern for scalar fp16 facge and facgt
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.

Reviewers: olista01, javed.absar, pbarrio

Reviewed By: javed.absar

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60212

llvm-svn: 358083
2019-04-10 13:34:18 +00:00
Clement Courbet
2c7520f781 [NFC] Fix unused variable warning.
llvm-svn: 358080
2019-04-10 13:18:05 +00:00
Amara Emerson
98071b2dbb [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.

For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.

Differential Revision: https://reviews.llvm.org/D60436

llvm-svn: 358035
2019-04-09 21:22:43 +00:00
Amara Emerson
de0bcea47f [AArch64][GlobalISel] Legalize vector G_ICMP.
Selection support will be coming in a later patch.

Differential Revision: https://reviews.llvm.org/D60435

llvm-svn: 358034
2019-04-09 21:22:40 +00:00
Amara Emerson
cd87034109 [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.
This is needed for some future support for vector ICMP.

Differential Revision: https://reviews.llvm.org/D60433

llvm-svn: 358033
2019-04-09 21:22:37 +00:00
Amara Emerson
5b2864cb25 [GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032
2019-04-09 21:22:33 +00:00
Evandro Menezes
1fd81231ce [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Sander de Smalen
b0dbd990db [AArch64][AsmParser] Fix .arch_extension directive parsing
This patch fixes .arch_extension directive parsing to handle a wider
range of architecture extension options. The existing parser was parsing
extensions as an identifier which breaks for extensions containing a
"-", such as the "tlb-rmi" extension.

The extension is now parsed as a string. This is consistent with the
extension parsing in the .arch and .cpu directive parsing.

Patch by Cullen Rhodes (c-rhodes)

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D60118

llvm-svn: 357677
2019-04-04 09:11:17 +00:00
Jessica Paquette
1bef4155b3 [AArch64][GlobalISel] Legalize G_FEXP2
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.

Differential Revision: https://reviews.llvm.org/D60165

llvm-svn: 357605
2019-04-03 16:58:32 +00:00
Javed Absar
185877db73 [AArch64] Update v8.5a MTE LDG/STG instructions
The latest MTE specification adds register Xt to the STG instruction family:
  STG [Xn, #offset] -> STG Xt, [Xn, #offset]
The tag written to memory is taken from Xt rather than Xn.
Also, the LDG instruction also was changed to read return address from Xt:
  LDG Xt, [Xn, #offset].
This patch includes those changes and tests.
Specification is at: https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60188

llvm-svn: 357583
2019-04-03 14:12:13 +00:00
Jessica Paquette
7443b6a2d9 [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)
This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.

Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D60100

llvm-svn: 357518
2019-04-02 19:57:26 +00:00
Jessica Paquette
cc01c3459a [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.

Differential Revision: https://reviews.llvm.org/D60083

llvm-svn: 357432
2019-04-01 22:19:13 +00:00
David Spickett
163262b85e [AArch64] Add v8.5-a Memory Tagging STZGM instruction
This instruction writes a block of allocation tags
and stores zero to the associated data locations.

It differs from STGM by 1 bit and has the same
arguments.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60065

llvm-svn: 357397
2019-04-01 14:56:37 +00:00
David Spickett
6761055449 [AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.

The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60064

llvm-svn: 357395
2019-04-01 14:52:18 +00:00
David Spickett
3cba39ed76 [AArch64] Add v8.5-a Memory Tagging GMID_EL1 register
The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

llvm-svn: 357392
2019-04-01 14:41:14 +00:00
Jessica Paquette
e5318e0de2 [GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59910

llvm-svn: 357318
2019-03-29 21:39:36 +00:00
Evandro Menezes
4a0ab6cec5 [CodeGen] Refactor the option for the maximum jump table size
Refactor the option `max-jump-table-size` to default to the maximum
representable number.  Essentially, NFC.

llvm-svn: 357280
2019-03-29 17:28:11 +00:00
Amara Emerson
6dd746aa24 [AArch64][GlobalISel] Make G_PHI of v2s64, v4s32, v2s32 legal.
llvm-svn: 357108
2019-03-27 18:31:46 +00:00
Sander de Smalen
6fb2b88ccc [AArch64][SVE] Asm: error on unexpected SVE vector register type suffix
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.

The following are examples of what was previously valid:

    movprfx z0.b, z0.b
    movprfx z0.b, z0.s
    movprfx z0, z0.s

These instructions are now erroneous.

Patch by Cullen Rhodes (c-rhodes)

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D59636

llvm-svn: 357094
2019-03-27 17:23:38 +00:00
Sander de Smalen
e4a32ea8e7 [AArch64] NFC: Cleanup isAArch64FrameOffsetLegal
Cleanup isAArch64FrameOffsetLegal by:
- Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo().
- Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction
  has an unscaled variant.
- Simplifying the logic that calculates the offset to fit the immediate.

Reviewers: paquette, evandro, eli.friedman, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59636

llvm-svn: 357064
2019-03-27 13:16:19 +00:00
Sander de Smalen
e4c844b90f [AArch64] Adds cases for LDRSHWui and LDRSHXui to getMemOpInfo
This patch also adds cases PRFUMi and PRFMui.
This change was discussed in https://reviews.llvm.org/D59635.

llvm-svn: 357059
2019-03-27 10:39:03 +00:00
Eli Friedman
ea5a6285ae [AArch64] Prefer "mov" over "orr" to materialize constants.
This is generally more readable due to the way the assembler aliases
work.

(This causes a lot of test changes, but it's not really as scary as it
looks at first glance; it's just mechanically changing a bunch of checks
for orr to check for mov instead.)

Differential Revision: https://reviews.llvm.org/D59720

llvm-svn: 356954
2019-03-25 21:25:28 +00:00
Evandro Menezes
c19776dc48 [AArch64, ARM] Add support for Exynos M5
Add Exynos M5 support and test cases.

llvm-svn: 356793
2019-03-22 18:42:14 +00:00
Amara Emerson
de0bb2f505 [AArch64] Split the neon.addp intrinsic into integer and fp variants.
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.

This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.

This is a more permanent solution to PR40968.

Differential Revision: https://reviews.llvm.org/D59655

llvm-svn: 356722
2019-03-21 22:31:37 +00:00
Evandro Menezes
3bd1a2cbe6 [AArch64] Update for Exynos
Fix the feature set for Exynos M4 by removing support for `+fp16fml` and fix test case.

llvm-svn: 356698
2019-03-21 18:54:58 +00:00
Oliver Stannard
b5c44abcf5 [AArch64] Allow -mattr=tpidr-el[1|2|3]
Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.

Patch by Philip Derrin!

Differential revision: https://reviews.llvm.org/D54685

llvm-svn: 356657
2019-03-21 11:30:17 +00:00
Amara Emerson
36fb50659e [AArch64][GlobalISel] Add an optimization to select vector DUP instructions.
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.

Differential Revision: https://reviews.llvm.org/D59558

llvm-svn: 356526
2019-03-19 21:43:05 +00:00
Amara Emerson
3e478dfc75 [AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal.
llvm-svn: 356525
2019-03-19 21:43:02 +00:00
Amara Emerson
48f1898b5d Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy()
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().

For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.

This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.

llvm-svn: 356396
2019-03-18 19:20:10 +00:00
Adhemerval Zanella
badc888cad [AArch64] Small fix for getIntImmCost
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58461

llvm-svn: 356391
2019-03-18 18:50:58 +00:00
Adhemerval Zanella
b2a4fe0946 [AArch64] Optimize floating point materialization
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.

The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58460

llvm-svn: 356390
2019-03-18 18:45:57 +00:00
Adhemerval Zanella
3f35ea2992 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Adhemerval Zanella
7dcbe2d92b [AArch64] Refactor floating point materialization. NFC
It splits the login of actual instruction emission away from the logic
that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm.
The new function AArch64_IMM::expandMOVImm, which return the list of the 
instructions to materialize the immediate constant, is implemented on a 
separated unit because it will be used in a subsequent patch to optimize
floating point materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58915

llvm-svn: 356387
2019-03-18 18:23:23 +00:00
Christof Douma
59deedef1c [AArch64] Fix bug 35094 atomicrmw on Armv8.1-A+lse
Fixes https://bugs.llvm.org/show_bug.cgi?id=35094

The Dead register definition pass should leave alone the atomicrmw
instructions on AArch64 (LTE extension). The reason is the following
statement in the Arm ARM:

"The ST<OP> instructions, and LD<OP> instructions where the destination
register is WZR or XZR, are not regarded as doing a read for the purpose
of a DMB LD barrier."

A good example was given in the gcc thread by Will Deacon (linked in the
bugzilla ticket 35094):

    P0 (atomic_int* y,atomic_int* x) {
      atomic_store_explicit(x,1,memory_order_relaxed);
      atomic_thread_fence(memory_order_release);
      atomic_store_explicit(y,1,memory_order_relaxed);
    }

    P1 (atomic_int* y,atomic_int* x) {
      atomic_fetch_add_explicit(y,1,memory_order_relaxed);  // STADD
      atomic_thread_fence(memory_order_acquire);
      int r0 = atomic_load_explicit(x,memory_order_relaxed);
    }

    P2 (atomic_int* y) {
      int r1 = atomic_load_explicit(y,memory_order_relaxed);
    }

    My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
    this test has executed. However, if the relaxed add in P1 compiles to
    STADD and the subsequent acquire fence is compiled as DMB LD, then we
    don't have any ordering guarantees in P1 and the forbidden result could
    be observed.

Change-Id: I419f9f9df947716932038e1100c18d10a96408d0
llvm-svn: 356360
2019-03-18 09:21:06 +00:00
Amara Emerson
3d592fdef6 [GlobalISel] Allow MachineIRBuilder to build subregister copies.
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().

Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.

Differential Revision: https://reviews.llvm.org/D59434

llvm-svn: 356304
2019-03-15 21:59:50 +00:00
Nikita Popov
b965ccb2fd [AArch64] Turn BIC immediate creation into a DAG combine
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.

Differential Revision: https://reviews.llvm.org/D59187

llvm-svn: 356299
2019-03-15 21:04:34 +00:00
Amara Emerson
af364080e2 [AArch64][GlobalISel] Regbankselect: Fix G_BUILD_VECTOR trying to use s16 gpr sources.
Since we can't insert s16 gprs as we don't have 16 bit GPR registers, we need to
teach RBS to assign them to the FPR bank so our selector works.

llvm-svn: 356282
2019-03-15 18:00:01 +00:00
Jessica Paquette
5fc1e87953 [AArch64][GlobalISel] Add isel support for G_UADDO on s32s and s64s
This adds instruction selection support for G_UADDO on s32s and s64s.

Also
- Add an instruction selection test
- Update the arm64-xaluo.ll test to show that we generate the correct assembly

Differential Revision: https://reviews.llvm.org/D58734

llvm-svn: 356214
2019-03-14 22:54:29 +00:00
Amara Emerson
6da5735cc0 [AArch64][GlobalISel] Implement selection for G_UNMERGE of vectors to vectors.
This re-uses the previous support for extract vector elt to extract the
subvectors.

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356213
2019-03-14 22:48:18 +00:00
Amara Emerson
90869d8494 [AArch64][GlobalISel] Add some support for G_CONCAT_VECTORS.
Handles concatenating 2 x v2s32 and 2 x v4s16

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356212
2019-03-14 22:48:15 +00:00
Jessica Paquette
5e305f80d3 [GlobalISel][AArch64] Add partial selection support for G_INSERT_VECTOR_ELT
This adds support for inserting elements into packed vectors. It also adds
two tests: one for selection, and one for regbank select.

Unpacked vectors will come in a follow-up.

Differential Revision: https://reviews.llvm.org/D59325

llvm-svn: 356182
2019-03-14 18:01:30 +00:00
Jessica Paquette
9a354720e0 [AArch64][GlobalISel] Gardening: Simplify subregister copy in selectBuildVector
NFC. Some more preliminary factoring for G_INSERT_VECTOR_ELT.

Also better code-reuse, etc., etc.

Differential Revision: https://reviews.llvm.org/D59323

llvm-svn: 356107
2019-03-13 23:29:54 +00:00
Jessica Paquette
4117eab6b5 [GlobalISel][AArch64] Gardening: Factor out vector inserts
Factor out the vector insert code in `selectBuildVector`. Replace part of it
with `emitScalarToVector`, since it was pretty much equivalent.

This will make implementing G_INSERT_VECTOR_ELT easier.

Differential Revision: https://reviews.llvm.org/D59322

llvm-svn: 356106
2019-03-13 23:22:23 +00:00
Jessica Paquette
fcc568af0c [GlobalISel][AArch64] Gardening: Factor out code to find lane indices
Some more refactoring for G_INSERT_VECTOR_ELT.

Factor out the code used to find a lane index from `selectExtractElt`. Put it
into a more general-purpose `getConstantValueForReg` function.

This will be shared with the code for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59324

llvm-svn: 356101
2019-03-13 21:19:29 +00:00
Jessica Paquette
9202e8eb2a Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without
running into any problematic intrinsics.

Also add a fix for lane copies, which don't support index 0.

llvm-svn: 355871
2019-03-11 22:18:01 +00:00
Jessica Paquette
1d2b488ef7 [GlobalISel][AArch64] Always fall back on aarch64.neon.addp.*
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.

This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.

https://bugs.llvm.org/show_bug.cgi?id=40968

Differential Revision: https://reviews.llvm.org/D59062

llvm-svn: 355865
2019-03-11 20:51:17 +00:00
Nikita Popov
b4de4b44fe [SDAG][AArch64] Legalize VECREDUCE
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.

Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.

This also includes a few more changes to make this work somewhat
reasonably:

 * Add support for expanding VECREDUCE in SDAG. Usually
   experimental.vector.reduce is expanded prior to codegen, but if the
   target does have native vector reduce, it may of course still be
   necessary to expand due to legalization issues. This uses a shuffle
   reduction if possible, followed by a naive scalar reduction.
 * Allow the result type of integer VECREDUCE to be larger than the
   vector element type. For example we need to be able to reduce a v8i8
   into an (nominally) i32 result type on AArch64.
 * Use the vector operand type rather than the scalar result type to
   determine the action, so we can control exactly which vector types are
   supported. Also change the legalize vector op code to handle
   operations that only have vector operands, but no vector results, as
   is the case for VECREDUCE.
 * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
   explicitly specify for which vector types the reductions are supported.

This does not handle anything related to VECREDUCE_STRICT_*.

Differential Revision: https://reviews.llvm.org/D58015

llvm-svn: 355860
2019-03-11 20:22:13 +00:00
Stanislav Mekhanoshin
9260748488 Use bitset for assembler predicates
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.

This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.

Differential Revision: https://reviews.llvm.org/D59002

llvm-svn: 355839
2019-03-11 17:04:35 +00:00
Amara Emerson
17778b627c [AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required by ABI.
Fixes PR41001.

llvm-svn: 355745
2019-03-08 22:17:00 +00:00
Mitch Phillips
5c6e533e01 [HWASan] Save + print registers when tag mismatch occurs in AArch64.
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.

In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.

The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:

Registers where the failure occurred (pc 0x0055555561b4):
    x0  0000000000000014  x1  0000007ffffff6c0  x2  1100007ffffff6d0  x3  12000056ffffe025
    x4  0000007fff800000  x5  0000000000000014  x6  0000007fff800000  x7  0000000000000001
    x8  12000056ffffe020  x9  0200007700000000  x10 0200007700000000  x11 0000000000000000
    x12 0000007fffffdde0  x13 0000000000000000  x14 02b65b01f7a97490  x15 0000000000000000
    x16 0000007fb77376b8  x17 0000000000000012  x18 0000007fb7ed6000  x19 0000005555556078
    x20 0000007ffffff768  x21 0000007ffffff778  x22 0000000000000001  x23 0000000000000000
    x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
    x28 0000000000000000  x29 0000007ffffff6f0  x30 00000055555561b4

... and prints after the dump of memory tags around the buggy address.

Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.

Reviewers: pcc, eugenis

Reviewed By: pcc, eugenis

Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58857

llvm-svn: 355738
2019-03-08 21:22:35 +00:00
Abderrazek Zaafrani
e9a93d92e1 [AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions
https://reviews.llvm.org/D58855

llvm-svn: 355545
2019-03-06 20:30:06 +00:00
Amara Emerson
db1da90c33 [AArch64] Remove a stray test from the AArch64 directory.
llvm-svn: 355534
2019-03-06 18:54:07 +00:00
Jessica Paquette
1b3a1b13fd Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
This broke test-suite::aarch64_neon_intrinsics.test

Reverting while I look into it.

Example failure:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740

llvm-svn: 355408
2019-03-05 15:47:00 +00:00
Jessica Paquette
dd5c010238 [GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.

It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.

Differential Revision: https://reviews.llvm.org/D58469

llvm-svn: 355344
2019-03-04 22:35:32 +00:00
Jessica Paquette
aff4683104 [GlobalISel][AArch64] Legalize vector G_SELECT
Just scalarize it, and add a test showing it works.

Differential Revision: https://reviews.llvm.org/D58747

llvm-svn: 355339
2019-03-04 21:12:46 +00:00
Amara Emerson
5792128251 Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
The code to materialize a mask from a constant pool load tried to use a 128 bit
LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted
in a link failure in the NEON tests in the test suite since the LDR address was
unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is
64 bit, before converting back to a 128 bit register for the TBL.

llvm-svn: 355326
2019-03-04 19:16:00 +00:00
Jonas Hahnfeld
a58dcdaa5e [AArch64/ARM] Fix two compiler warnings in InstructionSelector, NFCI
1) GCC complains that KnownValid is set but not used.
2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral
   and non-enumeral type in conditional expression". Solve this by casting
   to unsigned which is the final type anyway.

Differential Revision: https://reviews.llvm.org/D58834

llvm-svn: 355304
2019-03-04 08:51:32 +00:00
Eli Friedman
e8f9092ebb [AArch64] [Windows] Don't skip constructing UnwindHelp.
In certain cases, the first non-frame-setup instruction in a function is
a branch.  For example, it could be a cbz on an argument.  Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40184

Differential Revision: https://reviews.llvm.org/D58752

llvm-svn: 355136
2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani
3581c4535e [AArch64] Improve FP16 vector convert from short instructions.
https://reviews.llvm.org/D58563

llvm-svn: 355134
2019-02-28 20:21:46 +00:00
Amara Emerson
09b6b37f1b Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
Seems to break some neon intrinsics tests.

llvm-svn: 355115
2019-02-28 18:47:29 +00:00
Amara Emerson
5268203d9c [AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1.
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.

Differential Revision: https://reviews.llvm.org/D58684

llvm-svn: 355104
2019-02-28 16:43:11 +00:00
Abderrazek Zaafrani
7c74725f8a [AArch64] Generate FP16 vector compare instructions.
https://reviews.llvm.org/D58561

llvm-svn: 355050
2019-02-28 00:31:38 +00:00
Amara Emerson
1e2d6950a1 [AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.

Differential Revision: https://reviews.llvm.org/D58528

llvm-svn: 354807
2019-02-25 18:52:54 +00:00
Luke Cheeseman
6af96f2e37 [AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-a/cortex-a76

llvm-svn: 354788
2019-02-25 15:08:27 +00:00
Amara Emerson
0697bcf2e4 Re-land "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR""
Thanks to Richard Trieu for pointing out that the failures were due to a
use-after-free of an ArrayRef.

llvm-svn: 354616
2019-02-21 20:20:16 +00:00
David Spickett
886462b5ef [AArch64] Print instruction before atomic semantic annotations
Commit r353303 added annotations when acquire semantics
were dropped from an instruction.

printAnnotation was called before printInstruction.
So if you didn't set a separate comment output stream
you got <comment><instr> instead of <instr><comment>
as expected.

To fix this move the new printAnnotation to after
the instruction is printed.

Differential Revision: https://reviews.llvm.org/D58059

llvm-svn: 354565
2019-02-21 10:42:49 +00:00
Amara Emerson
d0b15a3f1e Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"
This reverts r354521 because it broke the bots, but passes on Darwin somehow.

llvm-svn: 354532
2019-02-21 00:31:13 +00:00
Amara Emerson
117293b921 [AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.

For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.

Currently supports <2 x s64> and <4 x s32> result types.

Differential Revision: https://reviews.llvm.org/D58466

llvm-svn: 354521
2019-02-20 22:11:39 +00:00
Jessica Paquette
f7efb97beb [GlobalISel][AArch64] Legalize + select some llvm.ctlz.* intrinsics
Legalize/select llvm.ctlz.*

Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.

Differential Revision: https://reviews.llvm.org/D58155

llvm-svn: 354299
2019-02-18 23:33:24 +00:00
Matt Arsenault
3ca7deb4b1 GlobalISel: Add alignment to LegalityQuery MMOs
This allows targets to specify the minimum alignment required for the
load/store.

llvm-svn: 354071
2019-02-14 22:41:09 +00:00
Petr Hosek
469dd5010d [AArch64] Support reserving arbitrary general purpose registers
This is a follow up to D48580 and D48581 which allows reserving
arbitrary general purpose registers with the exception of registers
with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM
(X0, X19). This change also generalizes some of the existing logic to
rely entirely on values generated from tablegen.

Differential Revision: https://reviews.llvm.org/D56305

llvm-svn: 353957
2019-02-13 17:28:47 +00:00
Nikita Popov
fad5d2671a [AArch64] Expand v8i8 cttz (PR39729)
Fix for https://bugs.llvm.org/show_bug.cgi?id=39729.

Rather than adding just a case for v8i8 I'm setting cttz to expand
for all vector types.

Differential Revision: https://reviews.llvm.org/D58008

llvm-svn: 353872
2019-02-12 18:55:53 +00:00
Jessica Paquette
bc25529f39 [GlobalISel][AArch64] Select llvm.bswap* for non-vector types
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.

Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.

This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.

Differential Revision: https://reviews.llvm.org/D58081

llvm-svn: 353861
2019-02-12 17:28:17 +00:00
Jessica Paquette
c562e11af8 [AArch64][GlobalISel] Add isel support for a couple vector exts/truncs
Add support for

- v4s16 <-> v4s32
- v2s64 <-> v2s32

And update tests that use them to show that we generate the correct
instructions.

Differential Revision: https://reviews.llvm.org/D57832

llvm-svn: 353732
2019-02-11 18:56:39 +00:00
Jessica Paquette
5a40815c0d [GlobalISel][AArch64] Select G_FFLOOR
This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in
AArch64.

It updates the existing floating point tests, and adds a select-floor.mir test.

Differential Revision: https://reviews.llvm.org/D57486

llvm-svn: 353722
2019-02-11 17:22:58 +00:00
Benjamin Kramer
f0a0f80db1 Move some classes into anonymous namespaces. NFC.
llvm-svn: 353710
2019-02-11 15:16:21 +00:00
Eli Friedman
b87d675297 [AArch64] Fix condition for "high-vector" DUP optimizations.
AArch64 NEON has a bunch of instructions with a "2" suffix that extract
the top half of the source vectors, instead of the bottom half.  We have
some DAGCombines to try to take advantage of that.  However, they
assumed that any EXTRACT_VECTOR was extracting the high half of the
vector in question.

This issue has apparently existed since the AArch64 backend was merged.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 .

Differential Revision: https://reviews.llvm.org/D57862

llvm-svn: 353486
2019-02-08 00:23:35 +00:00
Matt Arsenault
e739762e43 GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Tim Northover
77f59ae9f5 AArch64: implement copy for paired GPR registers.
When doing 128-bit atomics using CASP we might need to copy a GPRPair to a
different register, but that was unimplemented up to now.

llvm-svn: 353383
2019-02-07 10:35:34 +00:00
Tim Northover
009fdea45c AArch64: enforce even/odd register pairs for CASP instructions.
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.

llvm-svn: 353308
2019-02-06 15:26:35 +00:00
Tim Northover
f20f21ec5e AArch64: annotate atomics with dropped acquire semantics when printing.
A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.

So this adds an annotation when disassembling such instructions, mentioning the
change.

llvm-svn: 353303
2019-02-06 15:07:59 +00:00
Aditya Nandakumar
bba2c76cb2 [NFC][GlobalISel]: Add a convenience method to MachineInstrBuilder to simplify getOperand(i).getReg()
https://reviews.llvm.org/D57608

It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs
(commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be
replaced with MIB.getReg(0).

llvm-svn: 353223
2019-02-05 22:14:40 +00:00
Oliver Stannard
a390364af8 [AArch64][Outliner] Don't outline BTI instructions
We can't outline BTI instructions, because they need to be the very first
instruction executed after an indirect call or branch. If we outline them, then
an indirect call might go to the branch to the outlined function, which will
fault.

Differential revision: https://reviews.llvm.org/D57753

llvm-svn: 353190
2019-02-05 17:21:57 +00:00
Matt Arsenault
eaac595e10 AArch64/GlobalISel: Don't clamp from 2 to 2
This is equivalent to clampMaxNumElements, but saves a check.

llvm-svn: 353188
2019-02-05 16:57:18 +00:00
Florian Hahn
ad99b95580 [CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D57377

llvm-svn: 353152
2019-02-05 10:27:40 +00:00
Andrea Di Biagio
2820258583 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Mandeep Singh Grang
553513010b [AArch64] Fix unused variable [NFC]
llvm-svn: 352940
2019-02-01 23:42:34 +00:00
Mandeep Singh Grang
9cf3ca95c2 [COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.

Reviewers: rnk, efriedma, ssijaric, TomTan

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D57183

llvm-svn: 352923
2019-02-01 21:41:33 +00:00
James Y Knight
d34c1cbe9e [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight
c8b30de05f [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight
31a2057127 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Adhemerval Zanella
36b7b3c0fa [AArch64] Optimize floating point materialization
This patch changes isFPImmLegal to return if the value can be enconded
as the immediate operand of a logical instruction besides checking if
for immediate field for fmov.

This optimizes some floating point materization, inclusive values
used on isinf lowering.

Reviewed By: rengolin, efriedma, evandro

Differential Revision: https://reviews.llvm.org/D57044

llvm-svn: 352866
2019-02-01 12:26:06 +00:00
James Y Knight
846be29e5e [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
James Y Knight
06da6dcca4 Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7a61d50db4566b02719de05492dcef1 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
James Y Knight
fa51e33345 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Sjoerd Meijer
068d715728 [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
And instead just generate a libcall. My motivating example on ARM was a simple:
  
  shl i64 %A, %B

for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.

Differential Revision: https://reviews.llvm.org/D57386

llvm-svn: 352736
2019-01-31 08:07:30 +00:00
Matt Arsenault
1067bf29d3 GlobalISel: Fix creating MMOs with align 0
llvm-svn: 352712
2019-01-31 01:38:47 +00:00
Jessica Paquette
156cec86e5 [GlobalISel][AArch64] Select G_FEXP
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.

It...

- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP

https://reviews.llvm.org/D57483

llvm-svn: 352692
2019-01-30 23:46:15 +00:00
Jessica Paquette
d46e179ece [GlobalISel][AArch64] Select G_FABS
This adds instruction selection support for G_FABS in AArch64. It also updates
the existing basic FP tests, adds a selection test for G_FABS.

https://reviews.llvm.org/D57418

llvm-svn: 352684
2019-01-30 22:54:21 +00:00
Jessica Paquette
d5349f419b [GlobalISel][AArch64] Add instruction selection support for @llvm.log2
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.

It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.

https://reviews.llvm.org/D57357

llvm-svn: 352673
2019-01-30 21:16:04 +00:00
Jessica Paquette
d03d1c2ace [GlobalISel][AArch64] Add instruction selection support for @llvm.sqrt
This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer
test for G_FSQRT, a selection test for it, and updates existing floating point
tests.

https://reviews.llvm.org/D57361

llvm-svn: 352671
2019-01-30 21:03:52 +00:00
Amara Emerson
478ae74dcd [AArch64][GlobalISel] Unmerge into scalars from a vector should use FPR bank.
This currently shows up as a selection fallback since the dest regs were given
GPR banks but the source was a vector FPR reg.

Differential Revision: https://reviews.llvm.org/D57408

llvm-svn: 352545
2019-01-29 21:19:33 +00:00
Martin Storsjo
30a71b482a [COFF, ARM64] Don't put jump table into a separate COFF section for EK_LabelDifference32
Windows ARM64 has PIC relocation model and uses jump table kind
EK_LabelDifference32. This produces jump table entry as
".word LBB123 - LJTI1_2" which represents the distance between the block
and jump table.

A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup
correctly if they are in different COFF section.

This change saves the jump table to the same COFF section as the
associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32
relocation type.

Patch by Tom Tan!

Differential Revision: https://reviews.llvm.org/D57277

llvm-svn: 352465
2019-01-29 09:36:48 +00:00
Reid Kleckner
18f71dad93 [AArch64] Include AArch64GenCallingConv.inc once
Summary:
Avoids duplicating generated static helpers for calling convention
analysis.

This also means you can modify AArch64CallingConv.td without recompiling
the AArch64ISelLowering.cpp monolith, so it provides faster incremental
rebuilds.

Saves 12K in llc.exe, but adds a new object file, which is large.

Reviewers: efriedma, t.p.northover

Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56948

llvm-svn: 352430
2019-01-28 21:28:40 +00:00
Jessica Paquette
91e5ed536e [GlobalISel][AArch64] Add legalization for G_FLOG
This adds support for legalizing G_FLOG into a RTLib call.

It adds a legalizer test, and updates the existing floating point tests.

https://reviews.llvm.org/D57347

llvm-svn: 352429
2019-01-28 21:27:23 +00:00
Jessica Paquette
ff99f81513 [GlobalISel][AArch64] Add instruction selection support for @llvm.log10
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.

https://reviews.llvm.org/D57341

llvm-svn: 352418
2019-01-28 19:53:14 +00:00
Francis Visoiu Mistrih
2379ffe16f [AArch64] Add 'apple-latest' CPU alias
The 'apple-latest' alias is supposed to provide a CPU that contains the
latest Apple processor model supported by LLVM.

This is supposed to be used by tools like lldb to provide a target that
supports most of the CPU features.

For now, this is mapped to Cyclone.

Differential Revision: https://reviews.llvm.org/D56384

llvm-svn: 352412
2019-01-28 19:27:33 +00:00
Jessica Paquette
207e020957 [GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.

https://reviews.llvm.org/D57197
3/3

llvm-svn: 352402
2019-01-28 18:34:18 +00:00
Amara Emerson
c61a90aead [AArch64][GlobalISel] Teach RBS about G_FNEG default mapping.
llvm-svn: 352340
2019-01-28 03:21:14 +00:00
Amara Emerson
5ebce2e19a [AArch64][GlobalISel] Add some missing vector support for FP arithmetic ops.
Moved the fneg lowering legalization test from AArch64 to X86, as we want to
specify that it's already legal.

llvm-svn: 352338
2019-01-28 02:28:22 +00:00
Amara Emerson
ba9fd8068d [AArch64][GlobalISel] Add some vector support for fp <-> int conversions.
Some unrelated, but benign, test changes as well due to the test update script.

llvm-svn: 352337
2019-01-28 02:27:59 +00:00
Jessica Paquette
00aa216e32 [GlobalISel][AArch64][NFC] Fix incorrect comment in selectUnmergeValues
s/scalar/vector/

llvm-svn: 352243
2019-01-25 21:28:27 +00:00
Simon Pilgrim
504bf36a53 Fix gcc -Wparentheses warning. NFCI.
llvm-svn: 352193
2019-01-25 11:38:40 +00:00
Matt Arsenault
f3ebfd83c4 GlobalISel: Add convenience mutatations to scalarize
llvm-svn: 352143
2019-01-25 00:51:00 +00:00
Benjamin Kramer
1d47eb5cbb [GlobalISel][AArch64] Avoid unused variable warning for variable only used in assert
llvm-svn: 352133
2019-01-24 23:45:07 +00:00
Benjamin Kramer
fe37caf3d2 [GlobalISel][AArch64] Avoid unused function warnings in Release builds
llvm-svn: 352129
2019-01-24 23:39:47 +00:00
Jessica Paquette
52154c0f89 Suppress unused capture warning in CheckCopy
Werror bots didn't like the lambda + assert thing in my previous commit.

Capture everything to suppress the error.

Example failure here:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/29393

llvm-svn: 352124
2019-01-24 22:51:31 +00:00
Jessica Paquette
af0b0f857f [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.

To do this, this patch...

- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
  G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors

It also adds

- A legalizer test for the 16-bit vector ceil which verifies that we create a
  G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
  full fp16 is supported
- A test for selecting G_UNMERGE_VALUES

And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.

https://reviews.llvm.org/D56682

llvm-svn: 352113
2019-01-24 22:00:41 +00:00
Benjamin Kramer
ca8adadeee [AArch64] Fix out of bounds strlen
CFIInst is not zero-terminated. This is one of more annoying functional
differences between StringRef and ArrayRef.

Found by asan.

llvm-svn: 351955
2019-01-23 14:51:21 +00:00
Kristof Beyls
1b3bd06827 [SLH] AArch64: correctly pick temporary register to mask SP
As part of speculation hardening, the stack pointer gets masked with the
taint register (X16) before a function call or before a function return.
Since there are no instructions that can directly mask writing to the
stack pointer, the stack pointer must first be transferred to another
register, where it can be masked, before that value is transferred back
to the stack pointer.
Before, that temporary register was always picked to be x17, since the
ABI allows clobbering x17 on any function call, resulting in the
following instruction pattern being inserted before function calls and
returns/tail calls:

mov x17, sp
and x17, x17, x16
mov sp, x17
However, x17 can be live in those locations, for example when the call
is an indirect call, using x17 as the target address (blr x17).

To fix this, this patch looks for an available register just before the
call or terminator instruction and uses that.

In the rare case when no register turns out to be available (this
situation is only encountered twice across the whole test-suite), just
insert a full speculation barrier at the start of the basic block where
this occurs.

Differential Revision: https://reviews.llvm.org/D56717

llvm-svn: 351930
2019-01-23 08:18:39 +00:00
Peter Collingbourne
2818e607ab hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Matt Arsenault
cfaee823da GlobalISel: Allow shift amount to be a different type
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.

X86 uses i8, but seemed to be hacking around this before.

llvm-svn: 351882
2019-01-22 21:42:11 +00:00
Matt Arsenault
fd94cae16d Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

llvm-svn: 351850
2019-01-22 18:18:02 +00:00
Chandler Carruth
3764443332 Revert r351778: IR: Add fp operations to atomicrmw
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.

This also required reverting the RISCV build fix in r351782.

llvm-svn: 351796
2019-01-22 10:29:58 +00:00
Matt Arsenault
44582e29c8 IR: Add fp operations to atomicrmw
Add just fadd/fsub for now.

llvm-svn: 351778
2019-01-22 03:32:36 +00:00
Eli Friedman
1d6f130191 [AArch64] Add patterns for zext/sext of shift amount.
Not sure this is the best fix, but it saves an instruction for certain
constructs involving variable shifts.

Differential Revision: https://reviews.llvm.org/D55572

llvm-svn: 351768
2019-01-22 00:21:35 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Sanjin Sijaric
19c7db09aa Fix the buildbot failure introduced by r351404
EXPENSIVE_CHECKS buildbots are failing due to r351404.

Add x1 as live in to the funclet basic block for SEH funclets, as well as
-verify-machineinstrs to the test case that triggered the failure.

llvm-svn: 351472
2019-01-17 20:24:14 +00:00
Matt Arsenault
afd1f8cb4f Allow FP types for atomicrmw xchg
llvm-svn: 351427
2019-01-17 10:49:01 +00:00
Sanjin Sijaric
10c2d07dfd [SEH] [ARM64] Retrieve the frame pointer from SEH funclets
The Windows ARM64 runtime passes the establisher frame to funclets as the first
argument.

llvm-svn: 351404
2019-01-17 00:24:38 +00:00
Mandeep Singh Grang
9f9869fa2f [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Aditya Nandakumar
c969a48f8c [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Evandro Menezes
ed44540ada [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Eli Friedman
d0fe48e3c6 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes
7744c8a2b5 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
Evandro Menezes
949fd90c8b [AArch64] Improve Exynos predicates
Expand the predicate using shifted arithmetic and logic instructions to also
consider the respective not shifted instructions.

llvm-svn: 350976
2019-01-11 22:39:47 +00:00
Evandro Menezes
85b71933f5 [AArch64] Add pipeline model for Exynos M4
Add the scheduling and cost model for Exynos M4.

llvm-svn: 350960
2019-01-11 19:36:25 +00:00
Evandro Menezes
1df9d07d1f [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Bryan Chan
7440d37913 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
Mandeep Singh Grang
9bf821bd2f [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00
Kristof Beyls
e117265e0e Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.

Differential Revision: https://reviews.llvm.org/D55929

llvm-svn: 350729
2019-01-09 15:13:34 +00:00
Diogo N. Sampaio
dfa50c3062 [AArch64] Move feature predctrl to predres
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang

Differential Revision: https://reviews.llvm.org/D56484

llvm-svn: 350702
2019-01-09 11:24:15 +00:00
Matt Arsenault
ad57c9218e GlobalISel: Implement fewerElements for implicit_def
llvm-svn: 350697
2019-01-09 07:51:52 +00:00
Evandro Menezes
2cf1366fcb [AArch64] Adjust the cost model for Exynos
Improve the modeling of ALU instructions.

llvm-svn: 350663
2019-01-08 22:29:58 +00:00
Tim Northover
19ac7a1fe6 AArch64: avoid splitting vector truncating stores.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.

Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.

llvm-svn: 350620
2019-01-08 13:30:27 +00:00
Mandeep Singh Grang
ab3dbc4187 [MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc.
Summary: This patch is a follow-up to D55896.

Reviewers: efriedma, mstorsjo

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56029

llvm-svn: 350606
2019-01-08 04:48:00 +00:00
Evandro Menezes
30be00653a [AArch64] Adjust the cost model for Exynos M3
Improve the modeling of ASIMD loads and stores.

llvm-svn: 350434
2019-01-04 21:02:25 +00:00
Evandro Menezes
41c27199d9 [AArch64] Add new scheduling predicates
Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode.

llvm-svn: 350332
2019-01-03 17:28:09 +00:00
Martin Storsjo
9c2f92fb12 [AArch64] Accept "sve" as arch feature in assembler
Differential Revision: https://reviews.llvm.org/D56128

llvm-svn: 350174
2018-12-31 10:22:04 +00:00
Martin Storsjo
4ebb5f7268 [AArch64] Implement the .arch_extension directive
Differential Revision: https://reviews.llvm.org/D56131

llvm-svn: 350169
2018-12-30 21:06:32 +00:00
Diogo N. Sampaio
4f295fed38 [AArch64] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also moves to FeatureSB the old FeatureSpecRestrict.

Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman	

Differential Revision: https://reviews.llvm.org/D55921

llvm-svn: 350126
2018-12-28 17:14:58 +00:00
Jessica Paquette
481e7777a1 [GlobalISel][AArch64] Add support for widening G_FCEIL
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.

This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.

llvm-svn: 349927
2018-12-21 17:05:26 +00:00
Evandro Menezes
36af2563a9 [AArch64] Refactor Exynos predicate (NFC)
Change order of conditions in predicate.

llvm-svn: 349918
2018-12-21 15:51:34 +00:00
Simon Pilgrim
0a3fdb6a75 [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349908
2018-12-21 15:05:10 +00:00
Luke Cheeseman
e503aa9de4 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Jessica Paquette
be246a61d1 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00