1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

3034 Commits

Author SHA1 Message Date
Sander de Smalen
836ce1e34b [AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)
This patch completes support for the following floating point
instructions that take FP immediates:
  FADD*  (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FMUL*  (multiplication)
  FMAX*  (maximum)
  FMAXNM (maximum number)
  FMIN   (maximum)
  FMINNM (maximum number)

All operations are predicated and take a FP immediate operand,
e.g.

  fadd z0.h, p0/m, z0.h, #0.5
  fmin z0.s, p0/m, z0.s, #1.0
        ^___________^ (tied)

* Instructions added in a previous patch.

llvm-svn: 337272
2018-07-17 12:36:08 +00:00
Sander de Smalen
48bd15f31f [AArch64][SVE] Asm: Support for predicated FP operations.
This patch adds support for the following floating point
instructions:
  FABD   (absolute difference)
  FADD   (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FDIV   (divide)
  FDIVR  (divide reverse form)
  FMAX   (maximum)
  FMAXNM (maximum number)
  FMIN   (minimum)
  FMINNM (minimum number)
  FSCALE (adjust exponent)
  FMULX  (multiply extended)

All operations are predicated and binary form, e.g.

  fadd z0.h, p0/m, z0.h, z1.h
        ^___________^ (tied)

Supporting 16, 32 and 64-bit FP elements.

llvm-svn: 337259
2018-07-17 09:48:57 +00:00
Sander de Smalen
8520f23387 [AArch64][SVE] Asm: Support for SPLICE instruction.
The SPLICE instruction splices two vectors into one vector using a
predicate. It copies the active elements from the first vector, and
then fills the remaining elements with the low-numbered elements from
the second vector.

The instruction has the following form, e.g.

  splice z0.b, p0, z0.b, z1.b

for 8-bit elements. It also supports 16, 32 and
64-bit elements.

llvm-svn: 337253
2018-07-17 08:52:45 +00:00
Sander de Smalen
e4d871e600 [AArch64][SVE] Asm: Support for EXT instruction.
This patch adds an instruction that allows extracting
a vector from a pair of vectors, given an immediate index
that describes the element position to extract from.

The instruction has the following assembly:
  ext z0.b, z0.b, z1.b, #imm

where #imm is an immediate between 0 and 255.

llvm-svn: 337251
2018-07-17 08:39:48 +00:00
Roman Lebedev
66aa54677c [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Sjoerd Meijer
4f353db6e2 [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions (cont'd)
Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha
for pointing this out.

Differential Revision: https://reviews.llvm.org/D49284

llvm-svn: 337009
2018-07-13 15:25:42 +00:00
Joel Galenson
9249622410 [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00
Sander de Smalen
70386770c5 [AArch64][SVE] Asm: Vector Unpack Low/High instructions.
This patch adds support for the following unpack instructions:
  
- PUNPKLO, PUNPKHI   Unpack elements from low/high half and
                     place into elements of twice their size.

  e.g. punpklo p0.h, p0.b

- UUNPKLO, UUNPKHI   Unpack elements from low/high half and 
  SUNPKLO, SUNPKHI   place into elements of twice their size
                     after zero- or sign-extending the values.

  e.g. uunpklo z0.h, z0.b

llvm-svn: 336982
2018-07-13 09:25:43 +00:00
Sander de Smalen
52edf6c6bd [AArch64][SVE] Asm: Support for insert element (INSR) instructions.
Insert general purpose register into shifted vector, e.g.
  insr    z0.s, w0
  insr    z0.d, x0

Insert SIMD&FP scalar register into shifted vector, e.g.
  insr    z0.b, b0
  insr    z0.h, h0
  insr    z0.s, s0
  insr    z0.d, d0

llvm-svn: 336979
2018-07-13 08:51:57 +00:00
Sjoerd Meijer
f1cde2c9d6 [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions
These instructions are added to AArch64 only.

llvm-svn: 336913
2018-07-12 14:57:59 +00:00
Sander de Smalen
9a4aa02652 [AArch64][SVE] Asm: Support for COMPACT instruction.
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero. 

e.g.
  compact z0.s, p0, z1.s

llvm-svn: 336789
2018-07-11 11:22:26 +00:00
Sander de Smalen
e8509a8efa [AArch64][SVE] Asm: Support for LAST(A|B) and CLAST(A|B) instructions.
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.

The added variants are:

  Scalar:
  last(a|b)  w0, p0, z0.b
  last(a|b)  w0, p0, z0.h
  last(a|b)  w0, p0, z0.s
  last(a|b)  x0, p0, z0.d

  SIMD & FP Scalar:
  last(a|b)  b0, p0, z0.b
  last(a|b)  h0, p0, z0.h
  last(a|b)  s0, p0, z0.s
  last(a|b)  d0, p0, z0.d

The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.

The added variants are:

  Scalar:
  clast(a|b)  w0, p0, w0, z0.b
  clast(a|b)  w0, p0, w0, z0.h
  clast(a|b)  w0, p0, w0, z0.s
  clast(a|b)  x0, p0, x0, z0.d

  SIMD & FP Scalar:
  clast(a|b)  b0, p0, b0, z0.b
  clast(a|b)  h0, p0, h0, z0.h
  clast(a|b)  s0, p0, s0, z0.s
  clast(a|b)  d0, p0, d0, z0.d

  Vector:
  clast(a|b)  z0.b, p0, z0.b, z1.b
  clast(a|b)  z0.h, p0, z0.h, z1.h
  clast(a|b)  z0.s, p0, z0.s, z1.s
  clast(a|b)  z0.d, p0, z0.d, z1.d

Please refer to the architecture specification for more details on
the semantics of the added instructions.

llvm-svn: 336783
2018-07-11 10:08:00 +00:00
Sander de Smalen
d48e3148d9 [AArch64][SVE] Asm: Support for predicated unary operations.
This patch adds support for the following instructions:
  CLS  (Count Leading Sign bits)
  CLZ  (Count Leading Zeros)
  CNT  (Count non-zero bits)
  CNOT (Logically invert boolean condition in vector)
  NOT  (Bitwise invert vector)
  FABS (Floating-point absolute value)
  FNEG (Floating-point negate)

All operations are predicated and unary, e.g.
  clz  z0.s, p0/m, z1.s

- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
  and 64 bit elements.

- FABS and FNEG have variants for 16, 32 and 64 bit elements.

llvm-svn: 336677
2018-07-10 14:05:55 +00:00
Sander de Smalen
4ccc83e13c [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.
This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552
2018-07-09 15:22:08 +00:00
Sander de Smalen
1161a60f75 [AArch64][SVE] Asm: Support for remaining shift instructions.
This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547
2018-07-09 13:23:41 +00:00
Sander de Smalen
96c8306f03 [AArch64][SVE] Asm: Support for TBL instruction.
Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544
2018-07-09 12:32:56 +00:00
Sander de Smalen
1db514b528 [AArch64][SVE] Asm: Support for ADR instruction.
Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533
2018-07-09 09:58:24 +00:00
Sander de Smalen
cbce5f6883 [AArch64][SVE] Asm: Support for UZP and TRN instructions.
This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

llvm-svn: 336531
2018-07-09 09:12:17 +00:00
Yvan Roux
27968d5797 [MachineOutliner] Assert that Liveness tracking is accurate (NFC)
The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).

Differential Revision: https://reviews.llvm.org/D49023

llvm-svn: 336481
2018-07-07 08:02:19 +00:00
Sjoerd Meijer
fc7fc1b734 [AArch64] Armv8.4-A: TLB support
This adds:
- outer shareable TLB Maintenance instructions, and
- TLB range maintenance instructions.

llvm-svn: 336434
2018-07-06 13:00:16 +00:00
Sjoerd Meijer
757ee882e7 Recommit: [AArch64] Armv8.4-A: Flag manipulation instructions
Now with the asm operand definition included.

llvm-svn: 336432
2018-07-06 12:32:33 +00:00
Sjoerd Meijer
5c16b3f6e6 Revert [AArch64] Armv8.4-A: Flag manipulation instructions
It's causing build errors.

llvm-svn: 336422
2018-07-06 08:39:43 +00:00
Sjoerd Meijer
9988992bb1 [AArch64] Armv8.4-A: Flag manipulation instructions
These instructions are added to AArch64 only.

Differential Revision: https://reviews.llvm.org/D48926

llvm-svn: 336421
2018-07-06 08:12:20 +00:00
Sjoerd Meijer
c3b59a654a [AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction
This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

llvm-svn: 336418
2018-07-06 08:03:12 +00:00
Sander de Smalen
99adcebf9f This is a recommit of r336322, previously reverted in r336324 due to
a deficiency in TableGen that has been addressed in r336334.

[AArch64][SVE] Asm: Support for predicated FP rounding instructions.

This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336387
2018-07-05 20:21:21 +00:00
Simon Pilgrim
10c6245d1c Try to fix -Wimplicit-fallthrough warning. NFCI.
llvm-svn: 336331
2018-07-05 09:48:01 +00:00
Sander de Smalen
a846479f25 Reverting r336322 for now, as it causes an assert failure
in TableGen, for which there is already a patch in Phabricator
(D48937) that needs to be committed first.

llvm-svn: 336324
2018-07-05 08:52:03 +00:00
Sander de Smalen
d90e3449c5 [AArch64][SVE] Asm: Support for predicated FP rounding instructions.
This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336322
2018-07-05 08:38:30 +00:00
Sander de Smalen
a44c2b47fb [AArch64][SVE] Asm: Support for signed/unsigned MIN/MAX/ABD
This patch implements the following varieties:

- Unpredicated signed max,   e.g. smax z0.h, z1.h, #-128
- Unpredicated signed min,   e.g. smin z0.h, z1.h, #-128

- Unpredicated unsigned max, e.g. umax z0.h, z1.h, #255
- Unpredicated unsigned min, e.g. umin z0.h, z1.h, #255

- Predicated signed max,     e.g. smax z0.h, p0/m, z0.h, z1.h
- Predicated signed min,     e.g. smin z0.h, p0/m, z0.h, z1.h
- Predicated signed abd,     e.g. sabd z0.h, p0/m, z0.h, z1.h

- Predicated unsigned max,   e.g. umax z0.h, p0/m, z0.h, z1.h
- Predicated unsigned min,   e.g. umin z0.h, p0/m, z0.h, z1.h
- Predicated unsigned abd,   e.g. uabd z0.h, p0/m, z0.h, z1.h

llvm-svn: 336317
2018-07-05 07:54:10 +00:00
Yvan Roux
af85981149 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Sander de Smalen
561fb22163 [AArch64][SVE] Asm: Support for reversed subtract (SUBR) instruction.
This patch adds both a vector and an immediate form, e.g.                  
                                                                           
- Vector form:                                                             
                                                                           
    subr z0.h, p0/m, z0.h, z1.h                                            
                                                                           
  subtract active elements of z0 from z1, and store the result in z0.      
                                                                           
- Immediate form:                                                          
                                                                           
    subr z0.h, z0.h, #255                                                  
                                                                           
  subtract elements of z0, and store the result in z0.

llvm-svn: 336274
2018-07-04 14:05:33 +00:00
Sander de Smalen
cbd4b2fcb6 [AArch64][SVE] Asm: Support for instructions to set/read FFR.
Includes instructions to read the First-Faulting Register (FFR):
- RDFFR (unpredicated)
    rdffr   p0.b
- RDFFR (predicated)
    rdffr   p0.b, p0/z
- RDFFRS (predicated, sets condition flags)
    rdffr   p0.b, p0/z

Includes instructions to set/write the FFR:
- SETFFR (no arguments, sets the FFR to all true)
    setffr
- WRFFR  (unpredicated)
    wrffr   p0.b

llvm-svn: 336267
2018-07-04 12:58:46 +00:00
Sander de Smalen
9918de7ecc [AArch64][SVE] Asm: Support for FP conversion instructions.
The variants added are:

- fcvt   (FP convert precision)
- scvtf  (signed int -> FP) 
- ucvtf  (unsigned int -> FP) 
- fcvtzs (FP -> signed int (round to zero))
- fcvtzu (FP -> unsigned int (round to zero))

For example:
  fcvt   z0.h, p0/m, z0.s  (single- to half-precision FP) 
  scvtf  z0.h, p0/m, z0.s  (32-bit int to half-precision FP) 
  ucvtf  z0.h, p0/m, z0.s  (32-bit unsigned int to half-precision FP) 
  fcvtzs z0.s, p0/m, z0.h  (half-precision FP to 32-bit int)
  fcvtzu z0.s, p0/m, z0.h  (half-precision FP to 32-bit unsigned int)

llvm-svn: 336265
2018-07-04 12:13:17 +00:00
Sander de Smalen
813960f161 [AArch64][SVE] Asm: Support for SVE condition code aliases
SVE overloads the AArch64 PSTATE condition flags and introduces
a set of condition code aliases for the assembler. The 
details are described in section 2.2 of the architecture
reference manual supplement for SVE.

In short:

  SVE alias =>  AArch64 name
  --------------------------
  NONE      => EQ
  ANY       => NE
  NLAST     => HS
  LAST      => LO
  FIRST     => MI
  NFRST     => PL
  PMORE     => HI
  PLAST     => LS
  TCONT     => GE
  TSTOP     => LT

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48869

llvm-svn: 336245
2018-07-04 08:50:49 +00:00
Fangrui Song
1b8bb0dc71 [AArch64] Make function parameter names in declarations match those of definitions
llvm-svn: 336222
2018-07-03 19:07:53 +00:00
Sander de Smalen
991b8fa3b2 [AArch64][SVE] Asm: Support for FP Complex ADD/MLA.
The variants added in this patch are:

- Predicated Complex floating point ADD with rotate, e.g.

   fcadd   z0.h, p0/m, z0.h, z1.h, #90

- Predicated Complex floating point MLA with rotate, e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h, #180

- Unpredicated Complex floating point MLA with rotate (indexed operand), e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h[0], #180

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48824

llvm-svn: 336210
2018-07-03 16:01:27 +00:00
Amara Emerson
78a301606f [AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores.
r336120 resulted in falling back to SelectionDAG more often due to the G_STORE
MMOs not matching the vreg size. This fixes that by explicitly any-extending the
value.

llvm-svn: 336209
2018-07-03 15:59:26 +00:00
Sander de Smalen
2fbefa55ad [AArch64][SVE] Asm: Support for FMUL (indexed)
Unpredicated FP-multiply of SVE vector with a vector-element given by
vector[index], for example:

  fmul z0.s, z1.s, z2.s[0]

which performs an unpredicated FP-multiply of all 32-bit elements in
'z1' with the first element from 'z2'.

This patch adds restricted register classes for SVE vectors:
  ZPR_3b (only z0..z7 are allowed)  - for indexed vector of 16/32-bit elements.
  ZPR_4b (only z0..z15 are allowed) - for indexed vector of 64-bit elements.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48823

llvm-svn: 336205
2018-07-03 15:31:04 +00:00
Sander de Smalen
d14899902f [AArch64][SVE] Asm: Support for predicated unary operations.
The patch includes support for the following instructions:

       ABS z0.h, p0/m, z0.h
       NEG z0.h, p0/m, z0.h

  (S|U)XTB z0.h, p0/m, z0.h
  (S|U)XTB z0.s, p0/m, z0.s
  (S|U)XTB z0.d, p0/m, z0.d

  (S|U)XTH z0.s, p0/m, z0.s
  (S|U)XTH z0.d, p0/m, z0.d

  (S|U)XTW z0.d, p0/m, z0.d

llvm-svn: 336204
2018-07-03 14:57:48 +00:00
Sjoerd Meijer
765bfbdcbd [AArch64] Armv8.4-A: system registers
This adds the following system registers:
- RAS registers,
- MPAM registers,
- Activitiy monitor registers,
- Trace Extension registers,
- Timing insensitivity of data processing instructions,
- Enhanced Support for Nested Virtualization.

Differential Revision: https://reviews.llvm.org/D48871

llvm-svn: 336193
2018-07-03 12:09:20 +00:00
Sander de Smalen
bc76d6bc5b [AArch64][SVE] Asm: Support for saturing ADD/SUB instructions.
The variants added are:
    signed Saturating ADD/SUB (immediate)  e.g. sqadd z0.h, z0.h, #42
  unsigned Saturating ADD/SUB (immediate)  e.g. uqadd z0.h, z0.h, #42
    signed Saturating ADD/SUB (vectors)    e.g. sqadd z0.h, z0.h, z1.h
  unsigned Saturating ADD/SUB (vectors)    e.g. uqadd z0.h, z0.h, z1.h

llvm-svn: 336186
2018-07-03 09:48:22 +00:00
Sander de Smalen
3392be5c6f [AArch64][SVE] Asm: Support for vector element FP compare.
Contains the following variants:

- Compare with (elements from) other vector
  instructions: fcmeq, fcmgt, fcmge, fcmne, fcmuo.
  aliases: fcmle, fcmlt.

  e.g. fcmle   p0.h, p0/z, z0.h, z1.h => fcmge p0.h, p0/z, z1.h, z0.h

- Compare absolute values with (absolute values from) other vector.
  instructions: facge, facgt.
  aliases: facle, faclt.

  e.g. facle   p0.h, p0/z, z0.h, z1.h => facge   p0.h, p0/z, z1.h, z0.h

- Compare vector elements with #0.0
  instructions: fcmeq, fcmgt, fcmge, fcmle, fcmlt, fcmne.

  e.g. fcmle   p0.h, p0/z, z0.h, #0.0

llvm-svn: 336182
2018-07-03 09:07:23 +00:00
Amara Emerson
f59955d9a1 [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.
We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.

llvm-svn: 336120
2018-07-02 16:39:09 +00:00
Sander de Smalen
7edb4ddf5e [AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector)
Increments/decrements the result with the number of active bits
from the predicate.

The inc/dec variants added are:
- incp   x0, p0.h     (scalar)
- incp   z0.h, p0     (vector)

The unsigned saturating inc/dec variants added are:
- uqincp x0, p0.h     (scalar)
- uqincp w0, p0.h     (scalar, 32bit)
- uqincp z0.h, p0     (vector)

The signed saturating inc/dec variants added are:
- sqincp x0, p0.h     (scalar)
- sqincp x0, p0.h, w0 (scalar, 32bit)
- sqincp z0.h, p0     (vector)

llvm-svn: 336091
2018-07-02 10:08:36 +00:00
Sander de Smalen
997a0cf82e [AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions.
Increment/decrement vector by multiple of predicate constraint
element count.

The variants added by this patch are:
 - INCH, INCW, INC 

and (saturating):
 - SQINCH, SQINCW, SQINCD
 - UQINCH, UQINCW, UQINCW
 - SQDECH, SQINCW, SQINCD
 - UQDECH, UQINCW, UQINCW

For example:
  incw z0.s, all, mul #4

llvm-svn: 336090
2018-07-02 09:31:11 +00:00
Sander de Smalen
7d47585b61 [AArch64][SVE] Asm: Support for vector element compares (immediate).
Compare vector elements with a signed/unsigned immediate, e.g.
  cmpgt   p0.s, p0/z, z0.s, #-16
  cmphi   p0.s, p0/z, z0.s, #127

llvm-svn: 336081
2018-07-02 08:20:59 +00:00
Sander de Smalen
703f486b92 Reapply r334980 and r334983.
These patches were previously reverted as they led to 
buildbot time-outs caused by large switch statement in
printAliasInstr when using UBSan and O3.  The issue has
been addressed with a workaround (r335525).

llvm-svn: 336079
2018-07-02 07:34:52 +00:00
Sjoerd Meijer
d2e7279f78 [AArch64] Armv8.4-A: Virtualization system registers
This adds the Secure EL2 extension.

Differential Revision: https://reviews.llvm.org/D48711

llvm-svn: 335962
2018-06-29 11:03:15 +00:00
Sjoerd Meijer
e768b5aa55 [ARM][AArch64] Armv8.4-A Enablement
Initial patch adding assembly support for Armv8.4-A.

Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:

- none of the v8.4 crypto functions are supported, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
  SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
  and SHA2 instructions must also be implemented.

The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.

The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.

The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Differential Revision: https://reviews.llvm.org/D48625

llvm-svn: 335953
2018-06-29 08:43:19 +00:00
Jessica Paquette
848840c089 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Matthias Braun
719039ac2c SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Daniel Sanders
4be0167b82 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Jessica Paquette
62e08924a6 [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Luke Geeson
fade465f79 [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella
65794251d1 [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Luke Geeson
34b88ee5ba [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Simon Pilgrim
d34cfc19bf [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Sirish Pande
8aef1e988f Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00
Tim Northover
e031e390d0 [AArch64] Implement FLT_ROUNDS macro.
Very similar to ARM implementation, just maps to an MRS.

Should fix PR25191.

Patch by Michael Brase.

llvm-svn: 335118
2018-06-20 12:09:01 +00:00
Vlad Tsyrklevich
8000cbcff4 Revert r334980 and 334983
This reverts commits r334980 and r334983 because they were causing build
timeouts on the x86_64-linux-ubsan bot.

llvm-svn: 335085
2018-06-20 00:02:32 +00:00
Jessica Paquette
0a78d09ccb [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Sander de Smalen
14aa992de5 [AArch64][SVE] Asm: Fix predicate pattern diagnostics.
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.

Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48220

llvm-svn: 334983
2018-06-18 21:03:02 +00:00
Sander de Smalen
2192e35631 [AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions.
The variants added by this patch are:
- SQINC     signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC     signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC   unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC   unsigned decrement, e.g. uqdec w0, all, mul #4
 
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
  x0 == GPR64(w0)
in:
  sqinc x0, w0, all, mul #4
         ^___^ (must match)

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47716

llvm-svn: 334980
2018-06-18 20:50:33 +00:00
Sander de Smalen
b9e05bec1f [AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions.
Summary:
The variants added by this patch are:
- SQINC  (signed increment)
- UQINC  (unsigned increment)
- SQDEC  (signed decrement)
- UQDEC  (unsigned decrement)

For example:
  uqincw  x0, all, mul #4

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Differential Revision: https://reviews.llvm.org/D47715

llvm-svn: 334948
2018-06-18 14:47:52 +00:00
Sander de Smalen
6864576067 [AArch64][SVE] Asm: Support for vector element compares.
This patch adds instructions for comparing elements from two vectors, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.s

and also adds support for comparing to a 64-bit wide element vector, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.d

The patch also contains aliases for certain comparisons, e.g.:
  cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
  cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
  cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
  cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s

llvm-svn: 334931
2018-06-18 10:59:19 +00:00
Sander de Smalen
013bed785d [AArch64][SVE] Asm: Support for bitwise operations on predicate vectors.
This patch adds support for instructions performing bitwise operations
on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and
their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS.

This patch also adds several aliases:

  orr  p0.b, p1/z, p1.b, p1.b  => mov  p0.b, p1.b
  orrs p0.b, p1/z, p1.b, p1.b  => movs p0.b, p1.b

  and  p0.b, p1/z, p2.b, p2.b  => mov  p0.b, p1/z, p2.b
  ands p0.b, p1/z, p2.b, p2.b  => movs p0.b, p1/z, p2.b

  eor  p0.b, p1/z, p2.b, p1.b  => not  p0.b, p1/z, p2.b
  eors p0.b, p1/z, p2.b, p1.b  => nots p0.b, p1/z, p2.b

llvm-svn: 334906
2018-06-17 10:48:21 +00:00
Sander de Smalen
1afe6f4625 [AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions.
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.

llvm-svn: 334905
2018-06-17 10:11:04 +00:00
Sander de Smalen
60effea7eb [AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions.
Predicated splat/copy of SIMD/FP register or general purpose
register to SVE vector, along with MOV-aliases.

llvm-svn: 334842
2018-06-15 16:39:46 +00:00
Sander de Smalen
f3cfe53db1 [AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions.
Increment/decrement scalar register by (scaled) element count given by
predicate pattern, e.g. 'incw x0, all, mul #4'.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47713

llvm-svn: 334838
2018-06-15 15:47:44 +00:00
Sander de Smalen
dda2100a0f [AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47712

llvm-svn: 334831
2018-06-15 13:57:51 +00:00
Sander de Smalen
00d2fd13c5 [AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates.
Some instructions require of a limited set of FP immediates as operands,
for example '#0.5 or #1.0' for SVE's FADD instruction.

This patch adds support for parsing and printing such FP immediates as
exact values (e.g. #0.499999 is not accepted for #0.5).

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47711

llvm-svn: 334826
2018-06-15 13:11:49 +00:00
Clement Courbet
e3e2fa9c0a [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles.
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.

Some notes:
 - Exynos is especially fishy.
 - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
 - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.

Also see PR37310.

Reviewers: RKSimon, craig.topper, javed.absar

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 334586
2018-06-13 09:41:49 +00:00
Petr Hosek
4239933fa7 [AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.

Differential Revision: https://reviews.llvm.org/D46552

llvm-svn: 334531
2018-06-12 20:00:50 +00:00
Luke Geeson
577f7ff74f [AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns
llvm-svn: 334488
2018-06-12 09:35:20 +00:00
Clement Courbet
8a83e8aa4e [ExynosM1][Sched] Fix resource usage in scheduling model.
This is part of https://reviews.llvm.org/D46356.

llvm-svn: 334391
2018-06-11 07:33:08 +00:00
Evandro Menezes
5e33852f89 [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

llvm-svn: 334115
2018-06-06 18:56:00 +00:00
Peter Smith
7d816e3012 [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Jessica Paquette
a6b0671eeb [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h
This is setting up to fix bug 37573 cleanly.

This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.

Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.

Also, as a side-effect, it makes the outliner a bit cleaner.

https://bugs.llvm.org/show_bug.cgi?id=37573

llvm-svn: 333952
2018-06-04 21:14:16 +00:00
Nicolai Haehnle
9a21ccd3ad TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Luke Geeson
053f862513 [AArch64] Audit on rL333634 to fix FP16 Disasm BitPatterns
llvm-svn: 333879
2018-06-04 09:41:32 +00:00
Sander de Smalen
ae2d9690a8 [AArch64][SVE] Fix range for DUP immediates (16bit elts)
For immediates used in DUP instructions that have the range
-128 to 127, or a multiple of 256 in the range -32768 to 32512,
one could argue that when the result element size is 16bits (.h),
the value can be considered both signed and unsigned.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47619

llvm-svn: 333873
2018-06-04 07:24:23 +00:00
Sander de Smalen
215847cbb3 [AArch64][SVE] Asm: Print indexed element 0 as FPR.
Print the first indexed element as a FP register, for example:

  mov z0.d, z1.d[0]

Is now printed as:

  mov z0.d, d1

Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47571

llvm-svn: 333872
2018-06-04 07:07:35 +00:00
Sander de Smalen
71703eae7a [AArch64][SVE] Asm: Support for indexed DUP instructions.
Unpredicated copy of indexed SVE element to SVE vector,
along with MOV-aliases.

For example:

  dup     z0.h, z1.h[0]

duplicates the first 16-bit element from z1 to all elements in
the result vector z0.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47570

llvm-svn: 333871
2018-06-04 06:40:55 +00:00
Sander de Smalen
0279feb6ed [AArch64][SVE] Asm: Support for FCPY immediate instructions.
Predicated copy of floating-point immediate value to SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47518

llvm-svn: 333869
2018-06-04 05:58:06 +00:00
Sander de Smalen
e927a1a732 [AArch64][SVE] Asm: Support for CPY immediate instructions
Predicated copy of possibly shifted immediate value into SVE
vector, along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47517

llvm-svn: 333868
2018-06-04 05:40:46 +00:00
Amara Emerson
a09ac7f5a7 [AArch64][GlobalISel] Zero-extend s1 values when returning.
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.

Fixes PR36719

Differential Revision: https://reviews.llvm.org/D47425

llvm-svn: 333747
2018-06-01 13:20:32 +00:00
Sander de Smalen
0ce24bfafa [AArch64][SVE] Asm: Support for FDUP_ZI (copy fp immediate) instruction.
Unpredicated copy of floating-point immediate value into SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47482

llvm-svn: 333744
2018-06-01 12:54:46 +00:00
Sander de Smalen
4aece32ce2 [AArch64][SVE] Asm: Support for DUPM (masked immediate) instruction.
Unpredicated copy of repeating immediate pattern to SVE vector, along
with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47328

llvm-svn: 333731
2018-06-01 07:25:46 +00:00
Francis Visoiu Mistrih
546bf16c10 [MC] Fallback on DWARF when generating compact unwind on AArch64
Instead of asserting when using the def_cfa directive with a register
different from fp, fallback on DWARF.

Easily triggered with:

.cfi_def_cfa x1, 32;

rdar://40249694

Differential Revision: https://reviews.llvm.org/D47593

llvm-svn: 333667
2018-05-31 16:33:26 +00:00
Luke Geeson
1fa16feed0 [AArch64] Reverted rL333427 fixing Clang UnitTest Failure
llvm-svn: 333634
2018-05-31 08:27:53 +00:00
Roman Tereshin
041719be2f [GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...)
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333618
2018-05-31 01:56:05 +00:00
Roman Tereshin
f1db1e6ccc [GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output

Reviewers: aemerson, qcolombet

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46338

llvm-svn: 333597
2018-05-30 22:10:04 +00:00
Tim Northover
a00817e988 AArch64: print correct annotation for ADRP addresses.
The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the
actual PC-offset that will be calculated.

llvm-svn: 333525
2018-05-30 09:54:59 +00:00
Sander de Smalen
df7e091147 [AArch64][AsmParser] Fix segfault on illegal fpimm.
Floating point immediate combining a negative sign and
a hexadecimal number, e.g. #-0x0  caused the compiler to crash.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47483

llvm-svn: 333524
2018-05-30 09:54:19 +00:00
Evandro Menezes
45428262f7 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00
Amara Emerson
128e26e6c5 Revert "[AArch64] added FP16 vcvth intrinsic support"
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
2018-05-29 15:34:22 +00:00
Sander de Smalen
4f02f1750c [AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors)
Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47365

llvm-svn: 333422
2018-05-29 14:40:24 +00:00
Sander de Smalen
5077aac217 [AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions.
This patch addresses the following variants:
  - bitmask immediate,         e.g. 'and z0.d, z0.d, #0x6'.
  - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'.
  - predicated data vectors,   e.g. 'and z0.d, p0/m, z0.d, z1.d'.

And also several aliases, such as: 
  - ORN, alias of ORR.
  - EON, alias of EOR.
  - BIC, alias of AND (immediate variant)
  - MOV, alias of ORR (if unpredicated and source register operands are the same)

Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47363

llvm-svn: 333414
2018-05-29 13:08:43 +00:00
Luke Geeson
30260ba412 [AArch64] added FP16 vcvth intrinsic support
Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705

Reviewers: javed.absar, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: llvm-commits, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46311

llvm-svn: 333410
2018-05-29 11:40:33 +00:00
Sander de Smalen
89371f799d [AArch64][SVE] Asm: Support for ADD (immediate) instructions.
This patch adds addsub_imm8_opt_lsl_(i8|i16|i32|i64) operands
that are unsigned values in the range 0 to 255. For element widths of
16 bits or higher it may also be a signed multiple of 256 in the
range 0 to 65280.

Note: This also does some refactoring to reuse convenience function
getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be
accepted to be mapped to SUB #4096.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47310

llvm-svn: 333408
2018-05-29 10:39:49 +00:00
Sander de Smalen
1dcbd3929f Fix ubsan errors introduced by r333263 re. left-shifting negative values.
llvm-svn: 333270
2018-05-25 11:41:04 +00:00
Sander de Smalen
9b5c7781f7 [AArch64][SVE] Asm: Support for DUP (immediate) instructions.
Unpredicated copy of optionally-shifted immediate to SVE vector,
along with MOV-aliases.

This patch contains parsing and printing support for
cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in
the range -128 to +127. For element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512.
For element-width of 8 bits a range of -128 to 255 is accepted, since a copy
of a byte can be considered either signed/unsigned.

Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift()
and moves the behaviour of trying to shift a plain immediate by an allowed
shift-value to its addImmWithOptionalShiftOperands() method, so that the
parsing itself is generic and allows immediates from multiple shifted operands.
This is done because an immediate can be divisible by both shifted operands.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47309

llvm-svn: 333263
2018-05-25 09:47:52 +00:00
Eli Friedman
381e806df7 [AArch64] Improve orr+movk sequences for MOVi64imm.
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK.  The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.

Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.

Differential Revision: https://reviews.llvm.org/D47176

llvm-svn: 333218
2018-05-24 19:38:23 +00:00
Geoff Berry
33474b29b1 [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.
Summary:
Optimize code generated for variable shifts/rotates by taking advantage
of the implicit and/mod done on the variable shift amount register.

Resolves bug 27582 and bug 37421.

Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46844

llvm-svn: 333214
2018-05-24 18:29:42 +00:00
Chad Rosier
2e93b5ba53 [CodeGen][AArch64] Use RegUnits to track register aliases. (NFC)
Use RegUnits to track register aliases in AArch64RedundantCopyElimination.

Differential Revision: https://reviews.llvm.org/D47269

llvm-svn: 333107
2018-05-23 17:49:38 +00:00
Alex Bradbury
4baebba9fe [AArch64] Use addAliasForDirective to support data directives
The AArch64 asm parser currently has custom parsing logic for .hword, .word, 
and .xword. Rather than use this custom logic, we can just use 
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.

Differential Revision: https://reviews.llvm.org/D47000

llvm-svn: 333077
2018-05-23 11:17:20 +00:00
Eli Friedman
e9e2822fc3 Delete unused variable from r333015.
(The assertion suppressed the unused variable warning on
Release+Asserts builds, so I didn't notice.)

llvm-svn: 333018
2018-05-22 19:38:07 +00:00
Eli Friedman
f156ca7ef2 [MachineOutliner] Add "thunk" outlining for AArch64.
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.

In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.

To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.

Differential Revision: https://reviews.llvm.org/D47173

llvm-svn: 333015
2018-05-22 19:11:06 +00:00
Roman Lebedev
b563ad8227 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Peter Collingbourne
799b49b89d MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Peter Collingbourne
caacbd9d09 MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Peter Collingbourne
a2edc5eab3 Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Peter Collingbourne
7e80473026 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Clement Courbet
b982f02652 [ExynosM3] Fix scheduling info.
Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 332713
2018-05-18 13:10:41 +00:00
Eli Friedman
8fa755c873 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Sander de Smalen
bb257dd74f [AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+scalar) store instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46680

llvm-svn: 332584
2018-05-17 09:05:41 +00:00
Eli Friedman
237d12abb5 [MachineOutliner] Don't outline instructions that modify SP.
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.

Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.

Differential Revision: https://reviews.llvm.org/D46920

llvm-svn: 332529
2018-05-16 21:20:16 +00:00
Eli Friedman
e346f11594 [MachineOutliner] Don't save/restore LR for tail calls.
The cost computation assumes we do this correctly, but the actual
lowering was wrong.

Differential Revision: https://reviews.llvm.org/D46923

llvm-svn: 332514
2018-05-16 19:49:01 +00:00
Sander de Smalen
2e093c9ca4 [AArch64][SVE] Improve diagnostics for vectors with incorrect element-size.
For regular SVE vector operands, this patch introduces a more
sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b).

For example:
  add z0.s, z1.s, z2.b      -> invalid element width
               ^_____^
               mismatch

For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes
a slightly different approach and instead returns a 'invalid operand'
if the element size is not as expected. This is because the diagnostics
are more specificied to suggest using the right shift/extend suffix. This
is a trade-off not to introduce more operand classes and still provide
useful diagnostics for LD1 and PRF instructions.

For example:
  ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw)'
  ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand
          ^________________^
               mismatch

For gather prefetches, both 'z0.s' and 'z0.d' would be allowed:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw) #2'
  prfw #0, p0, [x0, z0.d]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Without this change, the diagnostic would unnecessarily suggest a
different element size:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46688

llvm-svn: 332483
2018-05-16 15:45:17 +00:00
Sirish Pande
f2ba0203b0 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00
Sander de Smalen
3667e06eb4 [AArch64][SVE] Asm: Support for gather PRF prefetch instructions
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46686

llvm-svn: 332472
2018-05-16 14:16:01 +00:00
Amara Emerson
ba73b15d7a [GlobalISel][IRTranslator] Split aggregates during IR translation.
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.

This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.

A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.

Patch is based on the original work of Tim Northover.

Differential Revision: https://reviews.llvm.org/D46018

llvm-svn: 332449
2018-05-16 10:32:02 +00:00
Peter Smith
c69d5b5cbb [AArch64] Support "S" inline assembler constraint
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is

asm("adrp %0, %1\n\t"
    "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));

I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.

Fixes PR37180

Differential Revision: https://reviews.llvm.org/D46745

llvm-svn: 332444
2018-05-16 09:33:25 +00:00
Sander de Smalen
aaacb56f16 [AArch64][SVE] Asm: Support for structured LD2, LD3 and LD4 (scalar+scalar) load instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46679

llvm-svn: 332442
2018-05-16 09:16:20 +00:00
Sander de Smalen
a95b850aca [AArch64][SVE] Asm: Support for contiguous PRF prefetch instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46682

llvm-svn: 332433
2018-05-16 07:50:09 +00:00
Evandro Menezes
32ec7fd07b [AArch64] Improve single vector lane unscaled stores
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead.  In this case, also using the unscaled
offset.

Differential revision: https://reviews.llvm.org/D46762

llvm-svn: 332394
2018-05-15 20:41:12 +00:00
Evandro Menezes
fca3a5cdd4 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Nicola Zaghen
9667127c14 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Sander de Smalen
5f693d0b90 [AArch64][SVE] Extend parsing of Prefetch operation for SVE.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46681

llvm-svn: 332234
2018-05-14 11:54:41 +00:00
Geoff Berry
84b578a23c [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Haicheng Wu
cc71215e80 [CGP] Split large data structres to sink more GEPs
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.

This patch tries to split a large data struct starting from %base like the
following.

Before:
BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:
BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.

Differential Revision: https://reviews.llvm.org/D42759

llvm-svn: 332015
2018-05-10 18:27:36 +00:00
Adhemerval Zanella
2e1b0642c7 [AArch64] Improve cost of vector division by constant
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32).  The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.

Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.

llvm-svn: 331873
2018-05-09 12:48:22 +00:00
Daniel Sanders
7932a6b01b Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen
208c23a5a2 [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Daniel Sanders
7e100ec90e [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Sander de Smalen
dc5259e58b [AArch64][SVE] Asm: Support for LD1R load-and-replicate scalar instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46251

llvm-svn: 331758
2018-05-08 10:46:55 +00:00
Sander de Smalen
bb0cf5cecf [AArch64] Disallow vector operand if FPR128 Q register is required.
Patch https://reviews.llvm.org/D41445 changed the behaviour of 'isReg()'
to also return 'true' if the parsed register operand is a vector
register. Code in the AsmMatcher checks if a register is a subclass of the
expected register class. However, even though both parsed registers map
to the same physical register, the 'v' register is of kind 'NeonVector',
where 'q' is of type Scalar, where isSubclass() does not distinguish
between the two cases.

The solution is to use an AsmOperand instead of the register directly,
and use the PredicateMethod to distinguish the two operands.

This fixes for example:
  ldr v0, [x0]    // 'v0' is an invalid operand for this instruction
  ldr q0, [x0]    // valid

Reviewers: aemerson, Gerolf, SjoerdMeijer, javed.absar

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46310

llvm-svn: 331755
2018-05-08 10:01:04 +00:00
Daniel Sanders
96623363e9 [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Craig Topper
ac4f504c06 Fix a bunch of places where operator-> was used directly on the return from dyn_cast.
Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577
2018-05-05 01:57:00 +00:00
Michael Berg
eab596452c Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Adhemerval Zanella
78143c1a3b [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).

New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46009

llvm-svn: 331522
2018-05-04 14:33:55 +00:00
Martin Storsjo
f77528e6b4 [COFF, ARM64] Hook up a few remaining relocations
Differential Revision: https://reviews.llvm.org/D46355

llvm-svn: 331384
2018-05-02 18:24:37 +00:00
Sander de Smalen
04ee355bf3 [AArch64][SVE] Asm: Support for LDR/STR fill and spill instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46270

llvm-svn: 331352
2018-05-02 13:32:39 +00:00
Sander de Smalen
617d4ec81a [AArch64][SVE] Asm: Support for scatter ST1 store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46248

llvm-svn: 331349
2018-05-02 13:00:30 +00:00
Sander de Smalen
e969c96428 [AArch64][SVE] Asm: Support for non-temporal, contiguous LDNT1/STNT1 load/store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46269

llvm-svn: 331343
2018-05-02 11:48:49 +00:00
Sander de Smalen
86aaf3b716 [AArch64][SVE] Asm: Support for LD1RQ load-and-replicate quad-word vector instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46250

llvm-svn: 331339
2018-05-02 08:49:08 +00:00
Adrian Prantl
076a6683eb Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Sander de Smalen
82922eefcd [AArch64][SVE] Asm: Support for contiguous ST1 (scalar+scalar) store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46121

llvm-svn: 331260
2018-05-01 13:36:03 +00:00
Daniel Sanders
794bd52155 Fix infinite loop after r331115
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.

llvm-svn: 331201
2018-04-30 17:20:01 +00:00
Sander de Smalen
37a50e1a7a [AArch64][SVE] Asm: Improve diagnostics for gather loads.
This patch extends the 'isSVEVectorRegWithShiftExtend' function to 
improve diagnostics for SVE's gather load (scalar + vector) addressing 
modes. Instead of always suggesting the 'unscaled' addressing mode, 
the use of DiagnosticPredicate enables a more specific error message
in the context where the scaling is incorrect. For example:

  ld1h z0.d, p0/z, [x0, z0.d, lsl #2]
                                   ^ 
           shift amount should be '1'

Instead of suggesting the packed, unscaled addressing mode:
  expected 'z[0..31].d, (uxtw|sxtw)'

the assembler now suggests using the proper scaling:
  expected 'z[0..31].d, (lsl|uxtw|sxtw) #1'

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46124

llvm-svn: 331162
2018-04-30 07:24:38 +00:00