1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

242 Commits

Author SHA1 Message Date
Changpeng Fang
fc2b5c3fe3 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Tom Stellard
d319e18a36 SelectionDAG: Fix a crash on inline asm when output register supports multiple types
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.

Reviewers: arsenm, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17940

llvm-svn: 263022
2016-03-09 16:02:52 +00:00
Sam Kolton
96f1d9ee4d [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Matt Arsenault
d89dec289d AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Matthias Braun
d7e6a2dcfd RegisterCoalescer: Remap subregister lanemasks before exchanging operands
Rematerializing and merging into a bigger register class at the same
time, requires the subregister range lanemasks getting remapped to the
new register class.

This fixes http://llvm.org/PR26805

llvm-svn: 262768
2016-03-05 04:36:13 +00:00
Tom Stellard
c656bfbad2 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Nikolay Haustov
3529c0cbe0 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Matt Arsenault
ceef9d2175 DAGCombiner: Make sure an integer is being truncated
llvm-svn: 262446
2016-03-02 01:36:51 +00:00
Matt Arsenault
cf419f3ccc DAGCombiner: Turn truncate of a bitcasted vector to an extract
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.

This fixes some test failures in a future commit when i64 loads
are changed to promote.

llvm-svn: 262397
2016-03-01 21:31:53 +00:00
Matt Arsenault
807567a0a9 DAGCombiner: Turn extract of bitcasted integer into truncate
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.

llvm-svn: 262358
2016-03-01 18:01:37 +00:00
Changpeng Fang
929a348e60 AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics
Summary:
  This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17614

llvm-svn: 262356
2016-03-01 17:51:23 +00:00
Matt Arsenault
8238b5c22c AMDGPU: Set HasExtractBitInsn
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.

But in most situations we do want the sinking to occur.

llvm-svn: 262296
2016-03-01 04:58:17 +00:00
Matt Arsenault
cd69621a21 AMDGPU: More bits of frame index are known to be zero
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.

This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.

llvm-svn: 262153
2016-02-27 20:26:57 +00:00
Matt Arsenault
71c5d5fa5f DAGCombiner: Don't unnecessarily swap operands in ReassociateOps
In the case where op = add, y = base_ptr, and x = offset, this
transform:

(op y, (op x, c1)) -> (op (op x, y), c1)

breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.

This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.

llvm-svn: 262148
2016-02-27 19:57:45 +00:00
Matt Arsenault
a8505608a7 DAGCombiner: Relax sqrt NaN folding check
This is OK for +0 since compares to +/-0 give the same result.

llvm-svn: 262125
2016-02-27 09:38:05 +00:00
Matt Arsenault
7522feaa7e AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault
1bcd37150d AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Nikolay Haustov
2aef2157de [AMDGPU] Assembler: Basic support for MIMG
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.

Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
2016-02-26 09:51:05 +00:00
Matthias Braun
4033884b97 MachineCopyPropagation: Catch copies of the form A<-B;A<-B
Differential Revision: http://reviews.llvm.org/D17475

llvm-svn: 261966
2016-02-26 03:18:55 +00:00
Matt Arsenault
fe67a001f9 AMDGPU: Add failing testcase for register coalescer
llvm-svn: 261592
2016-02-22 23:45:42 +00:00
Matt Arsenault
c0dde45a41 AMDGPU: Fix alignments in test
I don't think this test was intending to test unaligned load/store.
Change it to use the natural alignment to avoid regressing.

Also adds missing SI checks.

llvm-svn: 261571
2016-02-22 21:04:23 +00:00
Matt Arsenault
34ccf25c19 AMDGPU/R600: Implement allowsMisalignedMemoryAccess
This avoids some test regressions in a future commit
when unaligned operations are expanded when they
have custom lowering.

llvm-svn: 261570
2016-02-22 21:04:16 +00:00
Tom Stellard
060bccc1f3 AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.

This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.

This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.

Reviewers: arsenm, cfang, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17305

llvm-svn: 261385
2016-02-20 00:37:25 +00:00
Tom Stellard
db5af50c55 AMDGPU/SI: Fix s_waitcnt insertion for flat instructions
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions.  The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17366

llvm-svn: 261330
2016-02-19 15:33:13 +00:00
Nicolai Haehnle
9352856846 AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.

IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.

Differential Revision: http://reviews.llvm.org/D17276

llvm-svn: 261224
2016-02-18 16:44:18 +00:00
Matt Arsenault
d964164e4a AMDGPU: Prepare for reducing private element size.
Tests for the new scalarize all private access options will be
included with a future commit.

The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.

llvm-svn: 260804
2016-02-13 04:18:53 +00:00
Tom Stellard
04e6c06525 AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.

llvm-svn: 260792
2016-02-13 02:09:49 +00:00
Matt Arsenault
c77e92f437 AMDGPU: Add intrinsics for sin/cos
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.

llvm-svn: 260782
2016-02-13 01:19:56 +00:00
Matt Arsenault
4ff4c396c1 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Tom Stellard
9943755afb AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765
2016-02-12 23:45:29 +00:00
Tom Stellard
10d903c4f3 [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match.  Also update MC and CodeGen tests.

Differential Revision: http://reviews.llvm.org/D16927

Patch by: Nikolay Haustov

llvm-svn: 260694
2016-02-12 17:57:54 +00:00
Changpeng Fang
7cf99f3396 AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass.
Summary:
  It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D15093

llvm-svn: 260692
2016-02-12 17:11:04 +00:00
Matt Arsenault
37f2de7107 AMDGPU: Set flat_scratch from flat_scratch_init reg
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.

Also stops always initializing flat_scratch even when unused.

In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.

llvm-svn: 260658
2016-02-12 06:31:30 +00:00
Matt Arsenault
628b2818b6 AMDGPU: Set element_size in private resource descriptor
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.

llvm-svn: 260651
2016-02-12 02:40:47 +00:00
Nicolai Haehnle
498b8e6d32 AMDGPU: Quick fix for extreme slowness in spill-scavenge-offset.ll test
Summary: Also, some cosmetic fixes.

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, llvm-commits

Differential Revision: http://reviews.llvm.org/D17161

llvm-svn: 260625
2016-02-12 00:05:34 +00:00
Tom Stellard
7b646abe2d AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations.  When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17102

llvm-svn: 260599
2016-02-11 21:45:07 +00:00
Matt Arsenault
36dc1c179e AMDGPU: Fix constant bus use check with subregisters
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.

This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.

llvm-svn: 260495
2016-02-11 06:15:39 +00:00
Matt Arsenault
e676b40286 AMDGPU: Remove some old intrinsic uses from tests
llvm-svn: 260493
2016-02-11 06:02:01 +00:00
Nicolai Haehnle
e72d316342 AMDGPU: Release the scavenged offset register during VGPR spill
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).

This is a candidate for the release branch.

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, arsenm

Differential Revision: http://reviews.llvm.org/D16558

llvm-svn: 260427
2016-02-10 20:13:58 +00:00
Matt Arsenault
eed0ad4e3e AMDGPU: Remove bfi and bfm intrinsics
Nothing is using them.

llvm-svn: 260123
2016-02-08 19:06:01 +00:00
Matt Arsenault
34d57039a9 SelectionDAG: Lower some range metadata to AssertZext
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.

This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.

llvm-svn: 260109
2016-02-08 16:28:19 +00:00
Matt Arsenault
8009cfb6b5 AMDGPU: Account for LDS alignment
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.

Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.

llvm-svn: 259912
2016-02-05 19:47:29 +00:00
Matt Arsenault
3c264fc8c0 AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

llvm-svn: 259911
2016-02-05 19:47:23 +00:00
Jonas Paulsson
99afcac2ad [ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten.
Recommited, after some fixing with test cases.

Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll

Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll

http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.

llvm-svn: 259673
2016-02-03 17:52:29 +00:00
Matt Arsenault
b7a70ed17f AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

llvm-svn: 259573
2016-02-02 21:16:12 +00:00
Matt Arsenault
1eab1a7019 AMDGPU: Handle promoting memmove
Also add missing tests for the others.

llvm-svn: 259558
2016-02-02 20:28:10 +00:00
Matt Arsenault
48d83980e8 AMDGPU: Skip promote alloca with no optimizations
llvm-svn: 259551
2016-02-02 19:32:42 +00:00
Matt Arsenault
201441fa82 AMDGPU: Whitelist handled intrinsics
We shouldn't crash on unhandled intrinsics.
Also simplify failure handling in loop.

llvm-svn: 259546
2016-02-02 19:18:53 +00:00
Matt Arsenault
aef62a4730 AMDGPU: Use inbounds when calculating workitem offset
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.

Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.

llvm-svn: 259545
2016-02-02 19:18:48 +00:00
Oliver Stannard
a96193f77c Refactor backend diagnostics for unsupported features
Re-commit of r258951 after fixing layering violation.

The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

llvm-svn: 259498
2016-02-02 13:52:43 +00:00