1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

47807 Commits

Author SHA1 Message Date
Amaury Sechet
f6e6ba0be1 [ARM] Remove code handling ADDC/ADDE/SUBC/SUBE
Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY .

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D47413

llvm-svn: 333544
2018-05-30 13:45:43 +00:00
Krzysztof Parzyszek
8c44cdee95 [Hexagon] Use vector align-left when shift amount fits in 3 bits
This saves an instruction because for align-right the shift amount
would need to be put in a register first.

llvm-svn: 333543
2018-05-30 13:45:34 +00:00
Simon Dardis
72cd920805 [mips] Correct the definition of CTC2/CFC2
llvm-svn: 333542
2018-05-30 13:21:13 +00:00
Simon Dardis
4df07ba4b8 [mips] Correct the predicates of microMIPS compact branch instructions
llvm-svn: 333541
2018-05-30 13:16:17 +00:00
Simon Dardis
b2fa48fc56 [mips] Sink PredicateControl further down the class hierarchy.
Previously PredicateControl in some cases was a member of <X>Inst classes
for some X (DSP, EVA) or was in more irregular place in the hierarchry
for any given instruction.

This patch moves PredicateControl down to the root so that it is consistently
available. Then correct the base class of microMIPS instructions as using
EncodingPredicates instead of the general Predicates field of Instruction.

Reviewers: smaksimovic, abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D47526

llvm-svn: 333536
2018-05-30 12:40:53 +00:00
Simon Dardis
59a7022fac [mips] Correct the predicates of arithmetic and logic instructions.
As part of this effort, duplicate and correct the predicates of some
aliases. Also disable code generation of some short form instructions
for FastISel, as it would otherwise reject them.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D47075

llvm-svn: 333530
2018-05-30 11:33:35 +00:00
Tim Northover
a00817e988 AArch64: print correct annotation for ADRP addresses.
The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the
actual PC-offset that will be calculated.

llvm-svn: 333525
2018-05-30 09:54:59 +00:00
Sander de Smalen
df7e091147 [AArch64][AsmParser] Fix segfault on illegal fpimm.
Floating point immediate combining a negative sign and
a hexadecimal number, e.g. #-0x0  caused the compiler to crash.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47483

llvm-svn: 333524
2018-05-30 09:54:19 +00:00
Daniel Cederman
51f7d78f33 [Sparc] Treat %fxx registers with value type Other as single precision
They get type Other when used in the clobber list in inline assembly.
This fixes tests fp128.ll and float.ll that failed after r333512.

llvm-svn: 333523
2018-05-30 09:52:18 +00:00
Daniel Cederman
05452859f5 [Sparc] Select correct register class for FP register constraints
Summary: The fX version of floating-point registers only supports
single precision. We need to map the name to dX for doubles and qX
for long doubles if we want getRegForInlineAsmConstraint() to be
able to pick the correct register class.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D47258

llvm-svn: 333512
2018-05-30 06:07:55 +00:00
Craig Topper
8162712d19 [X86] Add unmasked AVX512VNNI instrinsics. Use a select in IR instead.
A future patch will remove the old masked intrinsics.

llvm-svn: 333508
2018-05-30 05:25:59 +00:00
Shiva Chen
dd2cc38f1b [RISCV] Support resolving fixup_riscv_call and add to MCFixupKindInfo table
Resolving fixup_riscv_call by assembler when the linker relaxation diabled
and the function and callsite within the same compile unit.

And also adding static_assert after Infos array declaration
to avoid missing any new fixup in MCFixupKindInfo in the future.

Differential Revision: https://reviews.llvm.org/D47126

llvm-svn: 333487
2018-05-30 01:16:36 +00:00
Craig Topper
9b9c57ff8d [X86] Remove some of the extractelts from the new MOVSS+FMA patterns.
We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced.

So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV.

llvm-svn: 333473
2018-05-29 22:52:09 +00:00
Craig Topper
b8ef8fe7eb [X86] Use VR128X instead of VR128 in EVEX instruction patterns.
llvm-svn: 333464
2018-05-29 20:46:27 +00:00
Craig Topper
43b9a1ec95 [X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.
The order should be controlled in the input pattern.

llvm-svn: 333463
2018-05-29 20:46:26 +00:00
Craig Topper
30d519d924 [X86] Fix a potential crash that occur after r333419.
The code could issue a truncate from a small type to larger type. We need to extend in that case instead.

llvm-svn: 333460
2018-05-29 20:04:10 +00:00
Matt Arsenault
36d73b27ac AMDGPU: Fix typo in option description
llvm-svn: 333457
2018-05-29 19:35:46 +00:00
Matt Arsenault
18ea83d8c3 AMDGPU: Round up kernel argument allocation size
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

llvm-svn: 333456
2018-05-29 19:35:00 +00:00
Sameer AbuAsal
d114bbf2bb [RISCV] Add peepholes for Global Address lowering patterns
Summary:
  Base and offset are always separated when a GlobalAddress node is lowered
  (rL332641) as an optimization to reduce instruction count. However, this
  optimization is not profitable if the Global Address ends up being used in only
  instruction.

  This patch adds peephole optimizations that merge an offset of
  an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.

  The peephole handles three patterns:

 1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset
     --->
      ADDI (LUI %hi(global + offset)) %lo(global + offset).

   This generates:
   lui a0, hi (global + offset)
   add a0, a0, lo (global + offset)

   Instead of

   lui a0, hi (global)
   addi a0, hi (global)
   addi a0, offset

   This pattern is for cases when the offset is small enough to fit in the
   immediate filed of ADDI (less than 12 bits).

 2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset))
     --->
      offset = hi_offset << 12
      ADDI (LUI %hi(global + offset)) %lo(global + offset)

   Which generates the ASM:

   lui  a0, hi(global + offset)
   addi a0, lo(global + offset)

   Instead of:

   lui  a0, hi(global)
   addi a0, lo(global)
   lui a1, (offset)
   add a0, a0, a1

   This pattern is for cases when the offset doesn't fit in an immediate field
   of ADDI but the lower 12 bits are all zeros.

 3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset)))
     --->
        offset = global + offhi20<<12 + offlo12
        ADDI (LUI %hi(global + offset)) %lo(global + offset)

   Which generates the ASM:

   lui  a1, %hi(global + offset)
   addi a1, %lo(global + offset)

   Instead of:

   lui  a0, hi(global)
   addi a0, lo(global)
   lui a1, (offhi20)
   addi a1, (offlo12)
   add a0, a0, a1

   This pattern is for cases when the offset doesn't fit in an immediate field
   of ADDI and both the lower 1 bits and high 20 bits are non zero.

    Reviewers: asb

    Reviewed By: asb

    Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos,
  niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang

llvm-svn: 333455
2018-05-29 19:34:54 +00:00
Konstantin Zhuravlyov
56fe8b5762 AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as
it is set by CP

Differential Revision: https://reviews.llvm.org/D47392

llvm-svn: 333451
2018-05-29 19:09:13 +00:00
Eli Friedman
2685aaa0dc [ARM] Enable SETCCCARRY lowering for Thumb1.
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387

llvm-svn: 333445
2018-05-29 18:17:16 +00:00
Matt Arsenault
27a95e7d2c AMDGPU: Pass function directly instead of MachineFunction
These functions just query the underlying IR function,
so pass it directly.

llvm-svn: 333442
2018-05-29 17:42:50 +00:00
Matt Arsenault
16caab0970 AMDGPU: Add nuw to add off of kernarg ptr
llvm-svn: 333441
2018-05-29 17:42:38 +00:00
Matt Arsenault
0b0d5df0ba DAG: Remove redundant version of getRegisterTypeForCallingConv
There seems to be no real reason to have these separate copies.
The existing implementations just copy each other for x86.
For Mips there is a subtle difference, which is just a bug
since it changes based on the context where which one was called.
Dropping this version, all tests pass. If I try to merge them
to match the removed version, a test fails.

llvm-svn: 333440
2018-05-29 17:42:26 +00:00
Tom Stellard
ea6f6c40ea AMDGPU: Split R600 MCInst lowering into its own class
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47307

llvm-svn: 333439
2018-05-29 17:41:59 +00:00
Evandro Menezes
45428262f7 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00
Simon Atanasyan
5a9ddc59d3 [mips] Process numeric register name in the .set assignment directive
Now LLVM assembler cannot process the following code and generates an
error. GNU tools support .set assignment directive with numeric register
name.

```
.set r4, 4

test.s:1:11: error: invalid token in expression
  .set r4, $4
           ^
```

This patch teach assembler to handle such directives correctly.
Unfortunately a numeric register name cannot be represented as an
expression. That's why we have to maintain a separate `StringMap`
in the `MipsAsmParser` to keep mapping between aliases names and
register numbers.

Differential revision: https://reviews.llvm.org/D47464

llvm-svn: 333428
2018-05-29 15:58:06 +00:00
Amara Emerson
128e26e6c5 Revert "[AArch64] added FP16 vcvth intrinsic support"
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
2018-05-29 15:34:22 +00:00
Sander de Smalen
4f02f1750c [AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors)
Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47365

llvm-svn: 333422
2018-05-29 14:40:24 +00:00
Jonas Devlieghere
9550ad611d [CodeView] Add prefix to CodeView registers.
Adds CVReg to CodeView register names to prevent a duplicate symbol with
CR3 defined in termios.h, as suggested by Zachary on the mailing list.

http://lists.llvm.org/pipermail/llvm-dev/2018-May/123372.html

Differential revision: https://reviews.llvm.org/D47478

rdar://39863705

llvm-svn: 333421
2018-05-29 14:35:34 +00:00
Alexander Ivchenko
94fafebe3a [X86] Scalar mask and scalar move optimizations
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
   and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
   masking in Clang header files.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D47012

llvm-svn: 333419
2018-05-29 14:27:11 +00:00
Lei Huang
843ab43bc4 [PowerPC] Fix the incorrect iterator inside peephole
Instruction selection can insert nodes into the underlying list after the root
node so iterating will thereby miss it. We should NOT assume that, the root node
is the last element in the DAG nodelist.

Patch by: steven.zhang (Qing Shan Zhang)

Differential Revision: https://reviews.llvm.org/D47437

llvm-svn: 333415
2018-05-29 13:38:56 +00:00
Sander de Smalen
5077aac217 [AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions.
This patch addresses the following variants:
  - bitmask immediate,         e.g. 'and z0.d, z0.d, #0x6'.
  - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'.
  - predicated data vectors,   e.g. 'and z0.d, p0/m, z0.d, z1.d'.

And also several aliases, such as: 
  - ORN, alias of ORR.
  - EON, alias of EOR.
  - BIC, alias of AND (immediate variant)
  - MOV, alias of ORR (if unpredicated and source register operands are the same)

Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47363

llvm-svn: 333414
2018-05-29 13:08:43 +00:00
Luke Geeson
30260ba412 [AArch64] added FP16 vcvth intrinsic support
Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705

Reviewers: javed.absar, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: llvm-commits, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46311

llvm-svn: 333410
2018-05-29 11:40:33 +00:00
Simon Atanasyan
3da2f12435 [mips] Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 / HI16 relocations
Emit R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_LO16 and
R_MICROMIPS_GPREL16/R_MICROMIPS_SUB/R_MICROMIPS_HI16 chains of
relocations for %lo(%neg(%gp_rel())) and %hi(%neg(%gp_rel()))
expressions in case of microMIPS.

Differential Revision: http://reviews.llvm.org/D47220

llvm-svn: 333409
2018-05-29 11:33:54 +00:00
Sander de Smalen
89371f799d [AArch64][SVE] Asm: Support for ADD (immediate) instructions.
This patch adds addsub_imm8_opt_lsl_(i8|i16|i32|i64) operands
that are unsigned values in the range 0 to 255. For element widths of
16 bits or higher it may also be a signed multiple of 256 in the
range 0 to 65280.

Note: This also does some refactoring to reuse convenience function
getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be
accepted to be mapped to SUB #4096.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47310

llvm-svn: 333408
2018-05-29 10:39:49 +00:00
Simon Atanasyan
d8b9e05b39 [mips] Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocations
Emit R_MICROMIPS_HIGHER / R_MICROMIPS_HIGHEST relocations for %higher()
and %highest() expressions in case of microMIPS. These relocations do
exactly the same things as R_MIPS_HIGHER / R_MIPS_HIGHEST, but for
consistency it's better to write microMIPS variants.

Differential Revision: http://reviews.llvm.org/D47219

llvm-svn: 333407
2018-05-29 10:27:44 +00:00
Simon Dardis
93850ffd07 [mips] Correct the predicates for a number of instructions.
Previously, their listed predicates were overridden at the scope level.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D46947

llvm-svn: 333405
2018-05-29 09:56:19 +00:00
Simon Atanasyan
af58a9cf89 [mips] Cleanup the code to reduce diff with the upcoming patches. NFC
llvm-svn: 333404
2018-05-29 09:51:33 +00:00
Simon Atanasyan
694bc5993b [mips] Escape else-after-return. NFC
llvm-svn: 333403
2018-05-29 09:51:28 +00:00
Simon Atanasyan
558745488f [mips] Stop parsing a .set assignment if the first argument is not an identifier
Before this fix the following code triggers two error messages. The
second one is at least useless:

  test.s:1:9: error: expected identifier after .set
    .set  123, $a0
          ^
  test-set.s:1:9: error: unexpected token, expected comma
    .set  123, $a0
          ^

llvm-svn: 333402
2018-05-29 09:51:22 +00:00
Tim Renouf
583f69aaff [AMDGPU] Fixed build warning
Summary:
V2: Use cast instead of extra if.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47426

Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a
llvm-svn: 333397
2018-05-29 08:15:37 +00:00
Craig Topper
b3a1ef6ac0 [X86] Disable a DAG combine to allow packed AVX512DQ instructions to be consistently used for i64->float/double conversions.
Summary: We already get this right if the i64 didn't come from a load.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47439

llvm-svn: 333393
2018-05-29 06:22:45 +00:00
Clement Courbet
e67347d398 [X86][Sched] Add InstRW for CLC on Intel after SNB.
Summary:
After SNB, Intel CPUs can rename CF independently of other EFLAGS,
so the renamer can zero it for free. Note that STC still consumes resources.

To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC`

On SNB:
```
---
key:
  opcode_name:     CLC
  mode:            uops
  config:          ''
cpu_name:        sandybridge
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: '3', value: 0.0014, debug_string: SBPort0 }
  - { key: '4', value: 0.0013, debug_string: SBPort1 }
  - { key: '5', value: 0.0003, debug_string: SBPort4 }
  - { key: '6', value: 0.0029, debug_string: SBPort5 }
  - { key: '10', value: 0.0003, debug_string: SBPort23 }
error:           ''
info:            'instruction is serial, repeating a random one.
Snippet:
CLC
'
...
```

On HSW:
```
---
key:
  opcode_name:     CLC
  mode:            uops
  config:          ''
cpu_name:        haswell
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: '3', value: 0.001, debug_string: HWPort0 }
  - { key: '4', value: 0.0009, debug_string: HWPort1 }
  - { key: '5', value: 0.0004, debug_string: HWPort2 }
  - { key: '6', value: 0.0006, debug_string: HWPort3 }
  - { key: '7', value: 0.0002, debug_string: HWPort4 }
  - { key: '8', value: 0.0012, debug_string: HWPort5 }
  - { key: '9', value: 0.0022, debug_string: HWPort6 }
  - { key: '10', value: 0.0001, debug_string: HWPort7 }
error:           ''
info:            'instruction is serial, repeating a random one.
Snippet:
CLC
'
...

```

Reviewers: craig.topper, RKSimon

Subscribers: gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D47362

llvm-svn: 333392
2018-05-29 06:19:39 +00:00
Craig Topper
1e3f8dcb86 [X86] Remove masked vpermi2var/vpermt2var intrinsics and autoupgrade.
We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added.

llvm-svn: 333388
2018-05-29 05:22:05 +00:00
Craig Topper
830987e346 [X86] Add unmasked vermi2var intrinsics so we can use explicit select instructions for masking in clang.
This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch.

llvm-svn: 333386
2018-05-29 03:26:30 +00:00
Craig Topper
78483ed93b [X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode.
These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node.

This is a step towards reducing the number of intrinsics needed.

A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial.

llvm-svn: 333383
2018-05-28 19:33:11 +00:00
Craig Topper
fdbb937205 [X86] Fix typo in comment. NFC
llvm-svn: 333382
2018-05-28 19:33:06 +00:00
Farhana Aleen
7eee05ce7a [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default.
Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found
         to be resolved by some other fixes.

Author: FarhanaAleen
llvm-svn: 333380
2018-05-28 18:15:11 +00:00
Lei Huang
8ee6a72524 [Power9]Legalize and emit code for HW/Byte vector extract and convert to QP
Implemente patterns to extract HWord and Byte vector elements and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46774

llvm-svn: 333377
2018-05-28 16:43:29 +00:00
Zaara Syeda
891074f69a [PowerPC] Set isAsmParserOnly=1 for X-form TLS loads/stores
The X-form TLS load/store instructions added for optimizing the initial-exec
sequence in https://reviews.llvm.org/rL327635 fail to assemble. llvm-mc fails
with the error: invalid operand for instruction. This patch adds these
instructions into a block with isAsmParserOnly, similar to how ADD8TLS_ is
currently handled.

Differential Revision: https://reviews.llvm.org/D47382

llvm-svn: 333374
2018-05-28 15:27:58 +00:00
Daniel Cederman
58eb55fce6 [Sparc] Add .uahalf and .uaword directives
Summary:
Adding these makes it easier to assemble the output from GCC which
generates a lot of .uahalf and .uaword directives.

GAS treats .uahalf and .half the same unless the --enforce-aligned-data
flag is used. I could not find a similar flag for LLVM so it seems that
.half does not have any alignment requirement and is treated the same as
.uahalf should be. If that would change later on then the tests in
sparc-directives.s would fail due to bad alignment.

Reviewers: jyknight, asb

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D47319

llvm-svn: 333372
2018-05-28 12:42:55 +00:00
Craig Topper
6aee8a0262 [X86] Stop forcing X86VPermi2X node index operand to match destination type to make masking pattern matching easier. Add extra patterns with bitcasts instead.
This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future.

This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed.

llvm-svn: 333365
2018-05-28 05:37:25 +00:00
Tim Renouf
6e81c6c470 [AMDGPU] Fixed WWM bug in block otherwise entirely in WQM
Summary:
For a block with WQM on entry and exit and containing no exact mode
code, but containing some WWM code, the WQM pass forgot to process the
block at all and so did not insert code to enter and leave WWM.

This commit fixes that.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47027

Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6
llvm-svn: 333362
2018-05-27 17:26:11 +00:00
Simon Pilgrim
30c4ee488f [X86] Don't hardcode scheduler class
Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added.

llvm-svn: 333360
2018-05-27 14:54:18 +00:00
David Green
b30bc64322 Revert 333358 as it's failing on some builders.
I'm guessing the tests reply on the ARM backend being built.

llvm-svn: 333359
2018-05-27 12:54:33 +00:00
David Green
f772da6436 [UnrollAndJam] Add a new Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

for i..
  ForeBlocks(i)
  for j..
    SubLoopBlocks(i, j)
  AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

for i... i+=2
  ForeBlocks(i)
  ForeBlocks(i+1)
  for j..
    SubLoopBlocks(i, j)
    SubLoopBlocks(i+1, j)
  AftBlocks(i)
  AftBlocks(i+1)
Remainder

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 333358
2018-05-27 12:11:21 +00:00
Eric Christopher
868932d048 Remove boolean argument from isSuitableFromBSS.
The argument was used as an additional negative condition and can be
expressed in the if conditional without needing to pass it down.
Update bss commentary around main use.

llvm-svn: 333357
2018-05-27 11:39:34 +00:00
Eric Christopher
3e4f4eca52 Cleanups for getKindForGlobal:
- Clarify block comment
 - Make Function/GlobalVariable split more explicit.
 - Move locals closer to uses.

llvm-svn: 333356
2018-05-27 11:23:20 +00:00
Craig Topper
c37925ce27 [X86] Remove masking from avx512ifma intrinsics. Use a select instead.
This allows us to avoid having mask and maskz variant. Reducing from 12 intrinsics to 6.

llvm-svn: 333346
2018-05-26 18:55:19 +00:00
George Burgess IV
547d3b9eb0 Replace AA's uses of uint64_t with LocationSize; NFC.
The uint64_ts that we pass around AA to represent MemoryLocation sizes
are logically an Optional<uint64_t>. In D44748, we want to add an extra
'imprecise' bit to this Optional<uint64_t> to represent whether a given
MemoryLocation size is an upper-bound or an exact size. For more context
on why, please see D44748.

That patch is quite large, but reviewers seem to be OK with the
approach. In D45581 (my first attempt to split 'noise' out of D44748),
reames asked that I land a precursor that is solely replacing uint64_t
with LocationSize, which starts out as `using LocationSize = uint64_t;`.
He also gave me the OK to submit this rename without further review.

llvm-svn: 333314
2018-05-25 21:16:58 +00:00
Mark Searles
38d5d10a14 [AMDGPU][Waitcnt] Remove obsolete waitcnt option
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.

Differential Revision: https://reviews.llvm.org/D47378

llvm-svn: 333303
2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin
9cfeb10772 [AMDGPU] Fixed test failure with AMDGPUPerfHint
We shall not keep iterator to a map while map is modified,
this leads to a broken map.

llvm-svn: 333298
2018-05-25 18:46:58 +00:00
Reid Kleckner
8697c9c142 Fix -Winconsistent-missing-overrides in AMDGPU code
llvm-svn: 333291
2018-05-25 17:46:24 +00:00
Stanislav Mekhanoshin
c29c4efaa7 [AMDGPU] Add perf hints to functions
This is adoption of HSAIL perfhint pass. Two types of hints are produced:

1. Function is memory bound.
2. Kernel can use wave limiter.

Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.

Differential Revision: https://reviews.llvm.org/D46992

llvm-svn: 333289
2018-05-25 17:25:12 +00:00
Simon Dardis
7995ddd2b6 [mips] Fix the definitions of lwp, swp
Rather than using a regpair operand of these instructions, use two seperate
operands and a custom converter to handle the implicit second register operand.

Additionally, remove the microMIPS32R6 definition as its redundant.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D47255

llvm-svn: 333288
2018-05-25 16:15:48 +00:00
Krzysztof Parzyszek
977520b723 [Hexagon] Fix packing source vectors in shufflevector selection
When the shuffle mask selected a subvector of the second input vector,
and aligning of the source was performed, the shuffle mask was updated
incorrectly, resulting in an ICE further in the selection process.

llvm-svn: 333279
2018-05-25 14:53:14 +00:00
Simon Pilgrim
9761cc5cf6 [X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286)
As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves

Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models......

llvm-svn: 333271
2018-05-25 12:18:11 +00:00
Sander de Smalen
1dcbd3929f Fix ubsan errors introduced by r333263 re. left-shifting negative values.
llvm-svn: 333270
2018-05-25 11:41:04 +00:00
Sander de Smalen
9b5c7781f7 [AArch64][SVE] Asm: Support for DUP (immediate) instructions.
Unpredicated copy of optionally-shifted immediate to SVE vector,
along with MOV-aliases.

This patch contains parsing and printing support for
cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in
the range -128 to +127. For element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512.
For element-width of 8 bits a range of -128 to 255 is accepted, since a copy
of a byte can be considered either signed/unsigned.

Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift()
and moves the behaviour of trying to shift a plain immediate by an allowed
shift-value to its addImmWithOptionalShiftOperands() method, so that the
parsing itself is generic and allows immediates from multiple shifted operands.
This is done because an immediate can be divisible by both shifted operands.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47309

llvm-svn: 333263
2018-05-25 09:47:52 +00:00
Jonas Paulsson
adc691d8dc [SystemZ] Bugfix in combineSTORE().
Remember to check if store is truncating before calling
combineTruncateExtract().

Review: Ulrich Weigand
llvm-svn: 333262
2018-05-25 09:01:23 +00:00
Tim Renouf
6428f13232 [AMDGPU] Fixed incorrect break from loop
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.

The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.

V2: Addressed some review comments.
V3: Test fixes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44046

Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c
llvm-svn: 333258
2018-05-25 07:55:04 +00:00
Gabor Buella
311b0dd621 [x86] invpcid LLVM intrinsic
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.

Reviewers: craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47141

llvm-svn: 333255
2018-05-25 06:32:05 +00:00
Tom Stellard
eee57c60ef AMDGPU: Remove AMDGPUMCInstLower.h
Summary:
The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp,
so we don't need a header file.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47264

llvm-svn: 333254
2018-05-25 04:57:02 +00:00
Tom Stellard
094abf2283 AMDGPU: Split R600 AsmPrinter code into its own class
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47245

llvm-svn: 333219
2018-05-24 20:02:01 +00:00
Eli Friedman
381e806df7 [AArch64] Improve orr+movk sequences for MOVi64imm.
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK.  The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.

Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.

Differential Revision: https://reviews.llvm.org/D47176

llvm-svn: 333218
2018-05-24 19:38:23 +00:00
Geoff Berry
33474b29b1 [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.
Summary:
Optimize code generated for variable shifts/rotates by taking advantage
of the implicit and/mod done on the variable shift amount register.

Resolves bug 27582 and bug 37421.

Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46844

llvm-svn: 333214
2018-05-24 18:29:42 +00:00
Simon Pilgrim
fb8b3278fa [X86][SSE] Pull out (AND (XOR X, -1), Y) matching into a helper function. NFC.
llvm-svn: 333201
2018-05-24 16:16:42 +00:00
Simon Pilgrim
624afd356c Fix unused variable warnings. NFCI.
llvm-svn: 333195
2018-05-24 15:34:50 +00:00
Simon Pilgrim
f9c0aecf43 [X86][SSE] Pull out OR(AND(~MASK,X),AND(MASK,Y)) matching into a helper function. NFC.
First stage towards matching more variants of the bitselect pattern for combineLogicBlendIntoPBLENDV (PR37549)

llvm-svn: 333191
2018-05-24 15:12:48 +00:00
Simon Pilgrim
cf784a1a98 [X86][BtVer2] Added Jaguar cpu cycle counter to permit llvm-exegesis latency testing
Ideally we'd be able to test a CPU by using __builtin_readcyclecounter()/RDTSC instead (PR37193) if a model/cycle-counter is not specified.

NOTE: Jaguar PMCs don't give good coverage of resource pipes specified in the model (at the macro-vs-micro-op levels) but we should be able to cover at least a few resources.
llvm-svn: 333190
2018-05-24 14:54:32 +00:00
Simon Atanasyan
b05e850f6d [mips] Remove duplicated code from the expandLoadInst. NFC
llvm-svn: 333164
2018-05-24 07:36:18 +00:00
Simon Atanasyan
1e2bc551fd [mips] Remove redundant argument from expandLoadInst/expandStoreInst. NFC
llvm-svn: 333163
2018-05-24 07:36:11 +00:00
Simon Atanasyan
8630b238fa [mips] Add precondition asserts to the expandLoadInst/expandStoreInst. NFC
llvm-svn: 333162
2018-05-24 07:36:06 +00:00
Simon Atanasyan
5b2855c72c [mips] Cleanup the code a bit. NFC
llvm-svn: 333161
2018-05-24 07:36:00 +00:00
Shiva Chen
c0b05a0956 [RISCV] Support linker relax function call from auipc and jalr to jal
To do this:
1. Add fixup_riscv_relax fixup types which eventually will
   transfer to R_RISCV_RELAX relocation types.

2. Insert R_RISCV_RELAX relocation types to auipc function call
   expression when linker relaxation enabled.

Differential Revision: https://reviews.llvm.org/D44886

llvm-svn: 333158
2018-05-24 06:21:23 +00:00
Tom Stellard
4b5731b4b0 AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47181

llvm-svn: 333153
2018-05-24 05:28:34 +00:00
Lei Huang
861e8a0eb7 [PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX
The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
RA based on the register pressure. To avoid ambiguity, we need to remove the
select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
the same issue.

Patch by Qing Shan Zhang (steven.zhang).

Differential Revision: https://reviews.llvm.org/D47178

llvm-svn: 333150
2018-05-24 03:20:28 +00:00
Mandeep Singh Grang
0d47a2dcc0 [RISCV] Lower the tail pseudoinstruction
This patch lowers the tail pseudoinstruction. This has been modeled after ARM's
tail call opt.

llvm-svn: 333137
2018-05-23 22:44:08 +00:00
Sameer AbuAsal
078a9b9c6c [RISCV] Set CostPerUse for registers
Summary:
 Set CostPerUse higher for registers that are not used in the compressed
 instruction set. This will influence the greedy register allocator to reduce
 the use of registers that can't be encoded in 16 bit instructions. This
 affects register allocation even when compressed instruction isn't targeted,
 we see no major negative codegen impact.

Reviewers: asb

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang

Differential Revision: https://reviews.llvm.org/D47039

llvm-svn: 333132
2018-05-23 21:34:30 +00:00
Lei Huang
ad6f9fcebb [Power9]Legalize and emit code for W vector extract and convert to QP
Implemente patterns to extract [Un]signed Word vector element and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46536

llvm-svn: 333115
2018-05-23 19:31:54 +00:00
Lei Huang
ee7bf6cb03 [Power9]Legalize and emit code for DW vector extract and convert to QP
Implemente patterns to extract [Un]signed DWord vector element and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46333

llvm-svn: 333112
2018-05-23 18:36:51 +00:00
Chad Rosier
2e93b5ba53 [CodeGen][AArch64] Use RegUnits to track register aliases. (NFC)
Use RegUnits to track register aliases in AArch64RedundantCopyElimination.

Differential Revision: https://reviews.llvm.org/D47269

llvm-svn: 333107
2018-05-23 17:49:38 +00:00
Petar Jovanovic
beb5e46405 Silence warnings introduced with r333093
r333093 introduced several warnings (-Wlogical-not-parentheses,
-Wbool-compare).
Adding parentheses in MipsSEInstrInfo::isCopyInstr() to silence it.

llvm-svn: 333097
2018-05-23 16:27:51 +00:00
Petar Jovanovic
8c311a61bc [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Nicola Zaghen
80a94d5002 Remove DEBUG macro.
Now that the LLVM_DEBUG() macro landed on the various sub-projects
the DEBUG macro can be removed.
Also change the new uses of DEBUG to LLVM_DEBUG.

Differential Revision: https://reviews.llvm.org/D46952

llvm-svn: 333091
2018-05-23 15:09:29 +00:00
Alex Bradbury
ebc93587cb [RISCV] Add symbol diff relocation support for RISC-V
For RISC-V it is desirable to have relaxation happen in the linker once 
addresses are known, and as such the size between two instructions/byte 
sequences in a section could change.

For most assembler expressions, this is fine, as the absolute address results 
in the expression being converted to a fixup, and finally relocations. 
However, for expressions such as .quad .L2-.L1, the assembler folds this down 
to a constant once fragments are laid out, under the assumption that the 
difference can no longer change, although in the case of linker relaxation the 
differences can change at link time, so the constant is incorrect. One place 
where this commonly appears is in debug information, where the size of a 
function expression is in a form similar to the above.

This patch extends the assembler to allow an AsmBackend to declare that it 
does not want the assembler to fold down this expression, and instead generate 
a pair of relocations that allow the linker to carry out the calculation. In 
this case, the expression is not folded, but when it comes to emitting a 
fixup, the generic FK_Data_* fixups are converted into a pair, one for the 
addition half, one for the subtraction, and this is passed to the relocation 
generating methods as usual. I have named these FK_Data_Add_* and 
FK_Data_Sub_* to indicate which half these are for.

For RISC-V, which supports this via e.g. the R_RISCV_ADD64, R_RISCV_SUB64 pair 
of relocations, these are also set to always emit relocations relative to 
local symbols rather than section offsets. This is to deal with the fact that 
if relocations were calculated on e.g. .text+8 and .text+4, the result 12 
would be stored rather than 4 as both addends are added in the linker.

Differential Revision: https://reviews.llvm.org/D45181
Patch by Simon Cook.

llvm-svn: 333079
2018-05-23 12:36:18 +00:00
Alex Bradbury
f6582865f4 [Sparc] Use addAliasForDirective to support data directives
The Sparc asm parser currently has custom parsing logic for .half, .word, 
.nword and .xword. Rather than use this custom logic, we can just use 
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.

https://reviews.llvm.org/D47003

llvm-svn: 333078
2018-05-23 11:20:28 +00:00
Alex Bradbury
4baebba9fe [AArch64] Use addAliasForDirective to support data directives
The AArch64 asm parser currently has custom parsing logic for .hword, .word, 
and .xword. Rather than use this custom logic, we can just use 
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.

Differential Revision: https://reviews.llvm.org/D47000

llvm-svn: 333077
2018-05-23 11:17:20 +00:00
Alex Bradbury
261ef87b82 [RISCV] Correctly report sizes for builtin fixups
This is a different approach to fixing the problem described in D46746. 
RISCVAsmBackend currently depends on the getSize helper function returning the 
number of bytes a fixup may change (note: some other backends have a similar 
helper named getFixupNumKindBytes). As noted in that review, this doesn't 
return the correct size for FK_Data_1, FK_Data_2, or FK_Data_8 meaning that 
too few bytes will be written in the case of FK_Data_8, and there's the 
potential of writing outside the Data array for the smaller fixups.

D46746 extends getSize to recognise some of the builtin fixup types. Rather 
than having a function that needs to be kept up to date as new builtin or 
target-specific fixups are added, We can calculate an appropriate bound on the 
number of bytes that might be touched using Info.TargetSize and 
Info.TargetOffset.

Differential Revision: https://reviews.llvm.org/D46965

llvm-svn: 333076
2018-05-23 10:53:56 +00:00
Daniel Cederman
271c0cb6c0 [Sparc] Add mnemonic aliases for flush, stb, stba, sth, and stha
Reviewers: jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D47140

llvm-svn: 333068
2018-05-23 08:26:49 +00:00
Roman Tereshin
2d9d4134d1 [GlobalISel][ARM] Adding HPR and QPR regclasses to FPRB regbank
Also bringing ARMRegisterBankInfo::getRegBankFromRegClass
implementation up to speed with the *.td-definition.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D43982

llvm-svn: 333056
2018-05-23 02:59:31 +00:00
Matt Arsenault
b6457804d0 AMDGPU: Fix v2f16 fneg/fabs pattern
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.

llvm-svn: 333019
2018-05-22 20:13:34 +00:00
Eli Friedman
e9e2822fc3 Delete unused variable from r333015.
(The assertion suppressed the unused variable warning on
Release+Asserts builds, so I didn't notice.)

llvm-svn: 333018
2018-05-22 19:38:07 +00:00
Tom Stellard
6955081e45 AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering
Summary: This is always false for R600.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47180

llvm-svn: 333016
2018-05-22 19:37:55 +00:00
Eli Friedman
f156ca7ef2 [MachineOutliner] Add "thunk" outlining for AArch64.
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.

In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.

To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.

Differential Revision: https://reviews.llvm.org/D47173

llvm-svn: 333015
2018-05-22 19:11:06 +00:00
Krzysztof Parzyszek
309f570ea8 [Hexagon] Add patterns for accumulating HVX compares
llvm-svn: 333009
2018-05-22 18:27:02 +00:00
Aleksandar Beserminji
6d75c08f47 [mips] Merge MipsLongBranch and MipsHazardSchedule passes
MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass
because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it
potentially breaks some jumps, so they have to be expanded to long
branches. When some branch is expanded to long branch, it potentially
creates a hazard situation, which should be fixed by adding nops.
New pass is called MipsBranchExpansion, it combines these two passes,
and runs them alternately until one of them reports no changes were made.

Differential Revision: https://reviews.llvm.org/D46641

llvm-svn: 332977
2018-05-22 13:24:38 +00:00
Simon Dardis
e3c6da9e00 [mips] Correct the predicates of the cache and pref instructions
Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D46949

llvm-svn: 332970
2018-05-22 10:55:05 +00:00
Simon Pilgrim
c7f0b62bae [TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput
This enables us to detect more fast path sdiv cases under cost analysis.

This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.

Found while working on D46276

Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.

Differential Revision: https://reviews.llvm.org/D46637

llvm-svn: 332969
2018-05-22 10:40:09 +00:00
Matt Arsenault
ee27f88e14 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

llvm-svn: 332953
2018-05-22 06:32:10 +00:00
Dan Gohman
2df2ccde1c [WebAssembly] Fix fast-isel lowering illegal argument and return types.
For both argument and return types, promote illegal types like i24 to i32,
and if a type can't be easily promoted, clear out the signature before
bailing out, so avoid leaving it in a partially complete state.

Fixes PR37546.

llvm-svn: 332947
2018-05-22 04:58:36 +00:00
Tom Stellard
6f27d8c6b3 AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

llvm-svn: 332930
2018-05-22 02:03:23 +00:00
Craig Topper
9204675e71 [X86] Remove 128/256-bit cvtdq2ps, cvtudq2ps, cvtqq2pd, cvtuqq2pd intrinsics.
These can all be implemented with sitofp/uitofp instructions.

llvm-svn: 332916
2018-05-21 23:15:00 +00:00
Roman Lebedev
b563ad8227 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Reid Kleckner
0f1c09ea4a [X86] Simplify some X86 address mode folding code, NFCI
This code should really do exactly the same thing for 32-bit x86 and
64-bit small code models, with the exception that RIP-relative
addressing can't use base and index registers.

llvm-svn: 332893
2018-05-21 21:03:19 +00:00
Craig Topper
7333222486 [X86] Remove masking from vpternlog intrinsics. Use a select in IR instead.
This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics.

Differential Revision: https://reviews.llvm.org/D47124

llvm-svn: 332890
2018-05-21 20:58:09 +00:00
Peter Collingbourne
7f9b89f0e2 CodeGen: Add a dwo output file argument to addPassesToEmitFile and hook it up to dwo output.
Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47089

llvm-svn: 332881
2018-05-21 20:16:41 +00:00
Peter Collingbourne
799b49b89d MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Peter Collingbourne
492891ef83 Fix ubsan bounds check failure.
llvm-svn: 332866
2018-05-21 19:09:47 +00:00
Stanislav Mekhanoshin
7e50efd55e [AMDGPU] Add divergence analysis as a dependency for ISel
AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage
but does not list it in pass dependencies which may lead to
crash.

Differential Revision: https://reviews.llvm.org/D47151

llvm-svn: 332862
2018-05-21 18:18:52 +00:00
Peter Collingbourne
caacbd9d09 MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Tom Stellard
7c3584c4d4 AMDGPU/GlobalISel: Address post-commit review comments for r332379
MCRegisterInfo::getPhysRegSize() will be deprecated.

llvm-svn: 332856
2018-05-21 17:49:31 +00:00
Andrea Di Biagio
d65d55551b [X86][BtVer2] Add a 'J' prefix to the PRF/RCU defs. NFC
This is to keep the Jaguar model's naming convention. Processor resources all
have a 'J' prefix in the BtVer2 scheduling model.

llvm-svn: 332851
2018-05-21 16:30:26 +00:00
Lama Saba
1bc8b49d1c [X86] - Avoid SFB pass - fix bug in updating the offsets for newly created copies
Change-Id: I169ab6fe7e187727c0298c2a1e2868a683f3e688
llvm-svn: 332849
2018-05-21 16:23:16 +00:00
Simon Pilgrim
7479a1b6ec [X86][SSE] Add an assert to ensure that rotation amount is converted to a scale
Missed in rL332832 where we added SSE v4i32 rotations for PR37426.

llvm-svn: 332844
2018-05-21 15:17:23 +00:00
Tim Northover
080f8761dc ARM: be conservative when asked load/store alignment of weird type.
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.

llvm-svn: 332840
2018-05-21 12:43:54 +00:00
Aleksandar Beserminji
363263c8be [mips] Revert Merge MipsLongBranch and MipsHazardSchedule passes
Revert this patch due buildbot failure.

Differential Revision: https://reviews.llvm.org/D46641

llvm-svn: 332837
2018-05-21 11:38:52 +00:00
Eric Christopher
8dbb727709 Fix up a few grammar issues.
llvm-svn: 332835
2018-05-21 10:27:36 +00:00
Aleksandar Beserminji
56d716f1af [mips] Merge MipsLongBranch and MipsHazardSchedule passes
MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass
because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it
potentially breaks some jumps, so they have to be expanded to long
branches. When some branch is expanded to long branch, it potentially
creates a hazard situation, which should be fixed by adding nops.
New pass is called MipsBranchExpansion, it combines these two passes,
and runs them alternately until one of them reports no changes were made.

Differential Revision: https://reviews.llvm.org/D46641

llvm-svn: 332834
2018-05-21 10:20:02 +00:00
Simon Pilgrim
8df0d84f3d [X86][SSE] Support v4i32 rotations (PR37426)
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.

v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.

Differential Revision: https://reviews.llvm.org/D46954

llvm-svn: 332832
2018-05-21 09:45:59 +00:00
Craig Topper
c9ae8654e1 [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all intrinsics.

llvm-svn: 332824
2018-05-20 23:34:04 +00:00
Simon Dardis
69c7283859 [mips] Add microMIPSR6 ll/sc instructions.
Previously the compiler was using the microMIPSR3 variants, incorrectly.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D46948

llvm-svn: 332820
2018-05-20 17:21:00 +00:00
Simon Pilgrim
01deae48b5 Fix MSVC unused variable warning. NFCI.
AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance.

llvm-svn: 332807
2018-05-19 12:46:02 +00:00
Matt Arsenault
1314df1d33 AMDGPU: Add pass to optimize reqd_work_group_size
Eliminate loads from the dispatch packet when they will have
a known value.

Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.

llvm-svn: 332771
2018-05-18 21:35:00 +00:00
Peter Collingbourne
a2edc5eab3 Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov
7a4589dcbc AMDGPU/NFC: Set symbol's type that is coming from an argument in
EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL.

llvm-svn: 332753
2018-05-18 18:41:37 +00:00
Peter Collingbourne
7e80473026 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Brendon Cahoon
ee2880e215 [Hexagon] Generate post-increment for floating point types
The code that generates post-increments for Hexagon considered
integer values only. This patch adds support to generate them for
floating point values, f32 and f64.

Differential Revision: https://reviews.llvm.org/D47036

llvm-svn: 332748
2018-05-18 18:14:44 +00:00
Simon Pilgrim
aea0653ccc [X86] Add GPR<->XMM Schedule Tags
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)

The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)

llvm-svn: 332745
2018-05-18 17:58:36 +00:00
Craig Topper
a493ff59ea [X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.

This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.

This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.

Fixes PR37499

Differential Revision: https://reviews.llvm.org/D47025

llvm-svn: 332743
2018-05-18 17:48:06 +00:00
Simon Pilgrim
77523e8ad2 [X86][BtVer2] Improve simulation of (V)PINSR values
Include the 6cy delay transferring from the GPR to FPU.

llvm-svn: 332737
2018-05-18 17:09:41 +00:00
Simon Pilgrim
21da060b07 [X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latency
llvm-svn: 332722
2018-05-18 14:22:22 +00:00
Simon Pilgrim
ccd78bd868 [X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332718
2018-05-18 14:08:01 +00:00
Simon Pilgrim
5c1dda1cb8 [X86][AVX] VEXTRACTF128mr store is a WriteFStoreX not WriteFStore
llvm-svn: 332715
2018-05-18 13:17:51 +00:00
Simon Pilgrim
d2ffd6b059 [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332714
2018-05-18 13:13:59 +00:00
Clement Courbet
b982f02652 [ExynosM3] Fix scheduling info.
Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 332713
2018-05-18 13:10:41 +00:00
Simon Pilgrim
2c4eac25a1 [X86][ZnVer1] Cleanup more single match instregexs
llvm-svn: 332712
2018-05-18 13:05:26 +00:00
Jonas Paulsson
277ca5d6e5 [SystemZ] Fix commit message of previous commit.
Sorry, the commit comment for r332703 is completely broken.
My mind slipped - the right description would be:

In SystemZDAGToDAGISel::Select(), in the handling for SELECT_CCMASK:

Check if UpdateNodeOperands() returns a different SDNode and in that
case call ReplaceNode.

Review: Ulrich Weigand.
llvm-svn: 332706
2018-05-18 12:07:16 +00:00
Alexander Ivchenko
9efb6203d0 [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.

Comes with a clang patch (D46881).

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46882

llvm-svn: 332705
2018-05-18 11:58:25 +00:00