1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

2628 Commits

Author SHA1 Message Date
Matt Arsenault
0d5ed42d84 AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.

llvm-svn: 334044
2018-06-05 19:52:46 +00:00
David Blaikie
93054c1e87 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Stanislav Mekhanoshin
4301179049 [AMDGPU] Small refactoring in the scheduler
After last changes some code can be simplified.

Differential Revision: https://reviews.llvm.org/D47661

llvm-svn: 333934
2018-06-04 17:57:40 +00:00
Stanislav Mekhanoshin
d0dbf4d085 [AMDGPU] Factored out common part of GCNRPTracker::reset()
Differential Revision: https://reviews.llvm.org/D47664

llvm-svn: 333931
2018-06-04 17:21:54 +00:00
Mark Searles
77bed3607d [AMDGPU][Waitcnt] Fix handling of flat instrs
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.

Differential Revision: https://reviews.llvm.org/D46616

llvm-svn: 333926
2018-06-04 16:51:59 +00:00
Nicolai Haehnle
5a12112c99 AMDGPU: Make various NamedOperands upper case
Summary:
Avoid name clashes with the corresponding bit fields in the instruction
encoding.

Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f

Reviewers: arsenm, rampitec, kzhuravl

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47433

llvm-svn: 333905
2018-06-04 14:45:20 +00:00
Nicolai Haehnle
9a21ccd3ad TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Amaury Sechet
72c691d9ab Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Tom Stellard
5c29ac423b AMDGPU/R600: Move intrinsics to IntrinsicsAMDGPU.td
Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47487

llvm-svn: 333720
2018-06-01 02:19:46 +00:00
Stanislav Mekhanoshin
1d773448c3 [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

llvm-svn: 333691
2018-05-31 20:13:51 +00:00
Roman Tereshin
8a0592aae0 [GlobalISel][AMDGPU] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for AMDGPU
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333664
2018-05-31 16:16:48 +00:00
Stanislav Mekhanoshin
1a2c74c831 [AMDGPU] Track occupancy in MFI
Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:

- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.

MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.

New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.

Differential Revision: https://reviews.llvm.org/D47509

llvm-svn: 333629
2018-05-31 05:36:04 +00:00
Jan Vesely
0ce62c1f6a AMDGPU/R600: Make sure functions are cacheline aligned
v2: use "ensureAlignment"
    make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"

Differential Revision: https://reviews.llvm.org/D47516

llvm-svn: 333622
2018-05-31 04:08:08 +00:00
Tom Stellard
d68071370f AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47359

llvm-svn: 333605
2018-05-30 22:55:35 +00:00
Mark Searles
a5207411ca [AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst'
https://reviews.llvm.org/rL333556 caused a buildbot failure.

See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio

/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(),

The unused variable was for debugging purposes; removing that piece of code
to fix the build.

llvm-svn: 333559
2018-05-30 16:27:57 +00:00
Matt Arsenault
635c1ae9bc AMDGPU: Use better alignment for kernarg lowering
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.

llvm-svn: 333558
2018-05-30 16:17:51 +00:00
Mark Searles
87f5640efb [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.

Differential Revision: https://reviews.llvm.org/D47488

llvm-svn: 333556
2018-05-30 15:47:45 +00:00
Matt Arsenault
36d73b27ac AMDGPU: Fix typo in option description
llvm-svn: 333457
2018-05-29 19:35:46 +00:00
Matt Arsenault
18ea83d8c3 AMDGPU: Round up kernel argument allocation size
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

llvm-svn: 333456
2018-05-29 19:35:00 +00:00
Konstantin Zhuravlyov
56fe8b5762 AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as
it is set by CP

Differential Revision: https://reviews.llvm.org/D47392

llvm-svn: 333451
2018-05-29 19:09:13 +00:00
Matt Arsenault
27a95e7d2c AMDGPU: Pass function directly instead of MachineFunction
These functions just query the underlying IR function,
so pass it directly.

llvm-svn: 333442
2018-05-29 17:42:50 +00:00
Matt Arsenault
16caab0970 AMDGPU: Add nuw to add off of kernarg ptr
llvm-svn: 333441
2018-05-29 17:42:38 +00:00
Tom Stellard
ea6f6c40ea AMDGPU: Split R600 MCInst lowering into its own class
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47307

llvm-svn: 333439
2018-05-29 17:41:59 +00:00
Tim Renouf
583f69aaff [AMDGPU] Fixed build warning
Summary:
V2: Use cast instead of extra if.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47426

Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a
llvm-svn: 333397
2018-05-29 08:15:37 +00:00
Farhana Aleen
7eee05ce7a [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default.
Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found
         to be resolved by some other fixes.

Author: FarhanaAleen
llvm-svn: 333380
2018-05-28 18:15:11 +00:00
Tim Renouf
6e81c6c470 [AMDGPU] Fixed WWM bug in block otherwise entirely in WQM
Summary:
For a block with WQM on entry and exit and containing no exact mode
code, but containing some WWM code, the WQM pass forgot to process the
block at all and so did not insert code to enter and leave WWM.

This commit fixes that.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47027

Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6
llvm-svn: 333362
2018-05-27 17:26:11 +00:00
Mark Searles
38d5d10a14 [AMDGPU][Waitcnt] Remove obsolete waitcnt option
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.

Differential Revision: https://reviews.llvm.org/D47378

llvm-svn: 333303
2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin
9cfeb10772 [AMDGPU] Fixed test failure with AMDGPUPerfHint
We shall not keep iterator to a map while map is modified,
this leads to a broken map.

llvm-svn: 333298
2018-05-25 18:46:58 +00:00
Reid Kleckner
8697c9c142 Fix -Winconsistent-missing-overrides in AMDGPU code
llvm-svn: 333291
2018-05-25 17:46:24 +00:00
Stanislav Mekhanoshin
c29c4efaa7 [AMDGPU] Add perf hints to functions
This is adoption of HSAIL perfhint pass. Two types of hints are produced:

1. Function is memory bound.
2. Kernel can use wave limiter.

Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.

Differential Revision: https://reviews.llvm.org/D46992

llvm-svn: 333289
2018-05-25 17:25:12 +00:00
Tim Renouf
6428f13232 [AMDGPU] Fixed incorrect break from loop
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.

The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.

V2: Addressed some review comments.
V3: Test fixes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44046

Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c
llvm-svn: 333258
2018-05-25 07:55:04 +00:00
Tom Stellard
eee57c60ef AMDGPU: Remove AMDGPUMCInstLower.h
Summary:
The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp,
so we don't need a header file.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47264

llvm-svn: 333254
2018-05-25 04:57:02 +00:00
Tom Stellard
094abf2283 AMDGPU: Split R600 AsmPrinter code into its own class
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47245

llvm-svn: 333219
2018-05-24 20:02:01 +00:00
Tom Stellard
4b5731b4b0 AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47181

llvm-svn: 333153
2018-05-24 05:28:34 +00:00
Matt Arsenault
b6457804d0 AMDGPU: Fix v2f16 fneg/fabs pattern
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.

llvm-svn: 333019
2018-05-22 20:13:34 +00:00
Tom Stellard
6955081e45 AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering
Summary: This is always false for R600.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47180

llvm-svn: 333016
2018-05-22 19:37:55 +00:00
Matt Arsenault
ee27f88e14 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

llvm-svn: 332953
2018-05-22 06:32:10 +00:00
Tom Stellard
6f27d8c6b3 AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

llvm-svn: 332930
2018-05-22 02:03:23 +00:00
Peter Collingbourne
799b49b89d MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Stanislav Mekhanoshin
7e50efd55e [AMDGPU] Add divergence analysis as a dependency for ISel
AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage
but does not list it in pass dependencies which may lead to
crash.

Differential Revision: https://reviews.llvm.org/D47151

llvm-svn: 332862
2018-05-21 18:18:52 +00:00
Peter Collingbourne
caacbd9d09 MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Tom Stellard
7c3584c4d4 AMDGPU/GlobalISel: Address post-commit review comments for r332379
MCRegisterInfo::getPhysRegSize() will be deprecated.

llvm-svn: 332856
2018-05-21 17:49:31 +00:00
Simon Pilgrim
01deae48b5 Fix MSVC unused variable warning. NFCI.
AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance.

llvm-svn: 332807
2018-05-19 12:46:02 +00:00
Matt Arsenault
1314df1d33 AMDGPU: Add pass to optimize reqd_work_group_size
Eliminate loads from the dispatch packet when they will have
a known value.

Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.

llvm-svn: 332771
2018-05-18 21:35:00 +00:00
Peter Collingbourne
a2edc5eab3 Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov
7a4589dcbc AMDGPU/NFC: Set symbol's type that is coming from an argument in
EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL.

llvm-svn: 332753
2018-05-18 18:41:37 +00:00
Peter Collingbourne
7e80473026 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Changpeng Fang
8a7b8795fb AMDGPU/SI: Don't promote alloca to vector for atomic load/store
Summary:
  Don't promote alloca to vector for atomic load/store

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D46085

llvm-svn: 332673
2018-05-17 21:49:44 +00:00
Changpeng Fang
50e5462dc9 AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary:
  The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks
to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working.

In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops.
This will make CFG with infinite loops be structurized.

Reviewer:
  nhaehnle

Differential Revision:
  https://reviews.llvm.org/D46340

llvm-svn: 332625
2018-05-17 16:45:01 +00:00
Konstantin Zhuravlyov
deea84a15a AMDGPU : Recalculate SGPRs when trap handler is supported
Differential Revision: https://reviews.llvm.org/D29911

llvm-svn: 332523
2018-05-16 20:47:48 +00:00