1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

60088 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
2faf2dbdbb [AMDGPU] Mark sin/cos load folding as modifying the function.
When the load value is folded into the sin/cos operation, the
AMDGPU library calls simplifier could still mark the function
as unmodified. Instead ensure if there is an early return,
return whether the load was folded into the sin/cos call.

Authored by MJDSys

Differential Revision: https://reviews.llvm.org/D91401
2020-11-13 14:49:33 -08:00
Sam Clegg
372af5a9de [WebAssembly] Move GlobalTLSAddress handling to WebAssemblyISelLowering. NFC
I'm not why it was added to DAGToDAG oringally but it seems
to make sense alongside the non-TLS version:  LowerGlobalAddress

Differential Revision: https://reviews.llvm.org/D91432
2020-11-13 14:35:51 -08:00
Heejin Ahn
fb835643a1 [WebAssembly] Rename atomic.notify and *.atomic.wait
- atomic.notify -> memory.atomic.notify
- i32.atomic.wait -> memory.atomic.wait32
- i64.atomic.wait -> memory.atomic.wait64

See https://github.com/WebAssembly/threads/pull/149.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D91447
2020-11-13 12:04:48 -08:00
Baptiste Saleil
f21e542c3b [PowerPC] Add paired vector load and store builtins and intrinsics
This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs.

Differential Revision: https://reviews.llvm.org/D90799
2020-11-13 12:35:10 -06:00
Jessica Paquette
ea446fa791 [AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc
When we see

```
%sub = G_SUB 0, %x
%select = G_SELECT %cc, %t, %sub
```

Fold away the G_SUB by producing

```
%select = CSNEG %t, %x, cc
```

Simple IR example: https://godbolt.org/z/K8TEnh

This is valid on both sides of the select, but for now, just handle one side.
It may make more sense to handle swapping sides during post-legalizer lowering.

Differential Revision: https://reviews.llvm.org/D90723
2020-11-13 10:12:51 -08:00
Jessica Paquette
ea9106c312 [AArch64][GlobalISel] NFC: Use CmpInst::isUnsigned instead of static helper
Reducing some code duplication.

We had a helper for checking if a predicate is unsigned. Remove that and use
the existing function in Instructions.cpp.

Differential Revision: https://reviews.llvm.org/D91288
2020-11-13 09:35:42 -08:00
Wouter van Oortmerssen
fcaec2bb33 [WebAssembly] Added R_WASM_FUNCTION_OFFSET_I64 for use with DWARF DW_AT_low_pc
Needed for wasm64, see discussion in https://reviews.llvm.org/D91203

Differential Revision: https://reviews.llvm.org/D91395
2020-11-13 09:32:31 -08:00
Jessica Paquette
18f4a04bc7 [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Matt Arsenault
b3aa28b9d4 AMDGPU: Factor out large flat offset splitting 2020-11-13 11:22:13 -05:00
Sam Clegg
258639659d [WebAssembly] Add new relocation type for TLS data symbols
These relocations represent offsets from the __tls_base symbol.

Previously we were just using normal MEMORY_ADDR relocations and relying
on the linker to select a segment-offset rather and absolute value in
Symbol::getVirtualAddress().  Using an explicit relocation type allows
allow us to clearly distinguish absolute from relative relocations based
on the relocation information alone.

One place this is useful is being able to reject absolute relocation in
the PIC case, but still accept TLS relocations.

Differential Revision: https://reviews.llvm.org/D91276
2020-11-13 07:59:29 -08:00
Matt Arsenault
5020d4daa4 AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage 2020-11-13 10:58:17 -05:00
Kerry McLaughlin
c26a89a1b4 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
Simon Pilgrim
16ce8e35dd Fix MSVC signed/unsigned comparison warning. NFCI. 2020-11-13 10:20:48 +00:00
Kazushi (Jam) Marukawa
c5d772e8d0 [VE] Add vst intrinsic instructions
Add vst intrinsic instructions and a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91406
2020-11-13 19:11:57 +09:00
Jay Foad
a68bf7cb45 [AMDGPU] One more use of the new export target names. NFC. 2020-11-13 09:44:09 +00:00
serge-sans-paille
82b6e6053d llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Craig Topper
c2ee7a175e [X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4.
This was a mistake introduced in D91294. I'm not sure how to
exercise this with the existing code, but I hit it while trying
some follow up experiments.
2020-11-12 21:48:45 -08:00
Craig Topper
da8115ff1d [X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte
We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0.

I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue.

Should fix PR48147.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91294
2020-11-12 21:28:18 -08:00
Stanislav Mekhanoshin
4dbdbbe753 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Jessica Paquette
2e5d95bf74 [AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants
Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701
2020-11-12 14:44:01 -08:00
Kazushi (Jam) Marukawa
aa1acddbb3 [VE] Support vld intrinsics
Add intrinsics for vector load instructions.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91332
2020-11-13 07:34:42 +09:00
Stanislav Mekhanoshin
f12ec97f64 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad
7dc590a5ce [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Jay Foad
d376fdebe3 [AMDGPU] Define and use names for export targets. NFC.
Differential Revision: https://reviews.llvm.org/D91289
2020-11-12 19:57:14 +00:00
Craig Topper
fbc8fa8586 [MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes.
Follow up from a similar patch on RISCV 637f19c36b323cc3ab597f6ef138db53be395949

Nothing reads this Glue value that I could see. The SDNode def in
the td file does not have the SDNPOutGlue flag so I don't think
this glue would get properly propagated to MachineSDNodes if it
was used.
2020-11-12 10:34:01 -08:00
Craig Topper
03edfee10d [RISCV] Don't include CodeGen layer files in MC layer
-Use MCRegister instead of Register in MC layer.
-Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits.

Differential Revision: https://reviews.llvm.org/D91114
2020-11-12 07:45:38 -08:00
Craig Topper
5ab383a3c0 [RISCV] Add an ANDI to shift amount of FSL/FSR instructions
The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present.

We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness.

Differential Revision: https://reviews.llvm.org/D90905
2020-11-12 07:33:40 -08:00
David Green
9d0284e0e8 [ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP
Of course there was something missing, in this case a check that the def
of the count register we are adding to a t2DoLoopStartTP would dominate
the insertion point.

In the future, when we remove some of these COPY's in between, the
t2DoLoopStartTP will always become the last instruction in the block,
preventing this from happening. In the meantime we need to check they
are created in a sensible order.

Differential Revision: https://reviews.llvm.org/D91287
2020-11-12 13:47:46 +00:00
Kazushi (Jam) Marukawa
8fbbc2d925 [VE] Change the default type of v64 register class
Change the default type of v64 register class from v512i32 to v256f64.
Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91301
2020-11-12 19:07:07 +09:00
David Sherwood
187863ee97 [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson
bda2529bef [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00
Baptiste Saleil
c452241c8f [PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore
This patch adds support for accumulator/unprimed accumulator
register copy, spill and restore for MMA.

Authored By: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D90616
2020-11-11 16:23:45 -06:00
Jessica Paquette
3ff178e54c [AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.

Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.

(All other 16-bit G_FCONSTANTS are not yet selected.)

Differential Revision: https://reviews.llvm.org/D89164
2020-11-11 13:25:11 -08:00
Craig Topper
74aa09eae4 [RISCV] Remove traces of Glue from RISCVISD::SELECT_CC
We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag.

Since we don't seem to need the Glue just get rid of it from both places.

Differential Revision: https://reviews.llvm.org/D91199
2020-11-11 09:30:48 -08:00
Jessica Paquette
41541293b2 [AArch64][GlobalISel] Select arith extended add/sub in manual selection code
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).

As a result, we could never select things like

```
cmp x1, w0, uxtw #2
```

Because we don't import any patterns for compares.

This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91207
2020-11-11 09:26:03 -08:00
Jessica Paquette
ea9b12d4bc [AArch64][GlobalISel] Select negative arithmetic immediates in manual selector
Previously, we only handled negative arithmetic immediates in the imported
selector code.

Since we don't import code for, say, compares, we were missing opportunities
for things like

```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```

Instead, we would have to materialize the constant and emit a SUBS.

This adds support for selection like above for SUB, SUBS, ADD, and ADDS.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91108
2020-11-11 09:20:05 -08:00
Jay Foad
0665c5dea6 [AMDGPU] Separate out real exp instructions by subtarget. NFC.
Differential Revision: https://reviews.llvm.org/D91247
2020-11-11 17:13:40 +00:00
Jay Foad
dfb3b2257d [AMDGPU] Split exp instructions out into their own tablegen file. NFC.
Differential Revision: https://reviews.llvm.org/D91246
2020-11-11 17:13:40 +00:00
Jay Foad
8325244191 [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Jay Foad
6dcb8b5cd3 Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103c8d8e624b19fad5a5006e7a783ecb7.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Caroline Concatto
bc83bf6558 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim
a67d0946d3 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin
859b066624 [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Kerry McLaughlin
e6fbb8fc55 [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Sam Parker
d182f68a32 [NFC][ARM] Replace lambda with any_of 2020-11-11 10:02:55 +00:00
Amara Emerson
8ba538f8a2 [AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.

I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.

Differential Revision: https://reviews.llvm.org/D91125
2020-11-10 22:21:13 -08:00
Gaurav Jain
c4730a384c [NFC] Use [MC]Register for x86 target
Differential Revision: https://reviews.llvm.org/D91161
2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa
51f907845b [VE] Implement FoldImmediate
Implement FoldImmediate for only integer aritihmetic operations.
Add regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91150
2020-11-11 08:08:32 +09:00
Pirama Arumuga Nainar
8ada0737df [ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt.
Previously we used setRegClass to rgpr, which may expand the register
domain if the result was already in a constrained class (tcgpr in the
above PR).

Differential Revision: https://reviews.llvm.org/D91192
2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin
a006cc60e7 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00