1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

217 Commits

Author SHA1 Message Date
Matt Arsenault
d4bebc0dfd AMDGPU: Simplify isSchedulingBoundary
llvm-svn: 274953
2016-07-09 01:13:51 +00:00
Duncan P. N. Exon Smith
5493a067e1 AMDGPU: Remove implicit iterator conversions, NFC
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the AMDGPU backend.  In most cases, I made them less
attractive by preferring MachineInstr& or using a ranged-based for loop.

Once all the backends are fixed I'll make the operator explicit so that
this doesn't bitrot back.

llvm-svn: 274906
2016-07-08 19:16:05 +00:00
Matt Arsenault
2792ae2900 AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

llvm-svn: 274564
2016-07-05 17:09:01 +00:00
Duncan P. N. Exon Smith
193410d6d7 CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr.  This is a
general API improvement.

Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other.  Instead I've done everything as a block and just
updated what was necessary.

This is mostly mechanical fixes: adding and removing `*` and `&`
operators.  The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency.  Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy.  I couldn't run tests
for AVR since llc doesn't link with it turned on.

llvm-svn: 274189
2016-06-30 00:01:54 +00:00
Matt Arsenault
102857f009 AMDGPU: Remove unused function
llvm-svn: 274033
2016-06-28 16:59:49 +00:00
Matt Arsenault
8603948f83 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Matt Arsenault
efcd632db2 AMDGPU: readlane/writelane do not read exec
llvm-svn: 273525
2016-06-23 01:26:16 +00:00
NAKAMURA Takumi
24b157d37a Reformat blank lines.
llvm-svn: 273131
2016-06-20 01:05:15 +00:00
NAKAMURA Takumi
01c29da292 Untabify.
llvm-svn: 273129
2016-06-20 00:37:41 +00:00
Changpeng Fang
a6cbf1d88c AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and eliminateFrameIndex
Reviewers: arsenm, tstellarAMD

Differential Revision:  http://reviews.llvm.org/21438

llvm-svn: 272958
2016-06-16 21:20:47 +00:00
Tom Stellard
98822b77c0 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Re-commit this after fixing a bug where we were trying to use a
reference to a Triple object that had already been destroyed.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272705
2016-06-14 20:29:59 +00:00
Tom Stellard
de6d6f0171 Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"
This reverts commit r272675.

llvm-svn: 272677
2016-06-14 15:16:35 +00:00
Tom Stellard
b56b2a98e2 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272675
2016-06-14 15:11:01 +00:00
Marek Olsak
f30857fef7 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Matt Arsenault
f138b4af09 AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc
The condition reg of the cndmask_b64 expansion can't be killed by
the first one, and the implicit super register implicit def is needed.

llvm-svn: 272554
2016-06-13 15:53:52 +00:00
Benjamin Kramer
e80783f62f Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Matt Arsenault
93cec24ce4 AMDGPU: Add function for getting instruction size
llvm-svn: 271936
2016-06-06 20:10:33 +00:00
Matt Arsenault
bc59009ab9 AMDGPU: Handle flat in getMemOpBaseRegImmOfs
It can still report the base register, and the uses give up when it
fails.

llvm-svn: 271575
2016-06-02 20:05:20 +00:00
Matt Arsenault
dffa97a6fa AMDGPU: Fix incorrectly setting kill flag when copying register tuples
This fixes some verifier errors when trackLivenessAfterRegAlloc is
enabled.

llvm-svn: 271446
2016-06-02 00:04:30 +00:00
Matt Arsenault
06895e862f AMDGPU: Fix verifier error when spilling SGPRs
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.

llvm-svn: 270301
2016-05-21 00:53:42 +00:00
Matt Arsenault
c34a7d2258 AMDGPU: Handle cbranch vccz/vccnz
llvm-svn: 270297
2016-05-21 00:29:40 +00:00
Matt Arsenault
5438a4669d AMDGPU: Implement ReverseBranchCondition
llvm-svn: 270296
2016-05-21 00:29:34 +00:00
Matt Arsenault
a197a65904 AMDGPU: Implement AnalyzeBranch
Original patch by Tom Stellard

llvm-svn: 270295
2016-05-21 00:29:27 +00:00
Matt Arsenault
4449ad7408 AMDGPU: Remove verifier check for scc live ins
We only really need this to be true for SIFixSGPRCopies.
I'm not sure there's any way this could happen before that point.

Fixes a case where MachineCSE could introduce a cross block
scc use.

llvm-svn: 269391
2016-05-13 04:15:48 +00:00
Tom Stellard
097fcbcf91 AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260
We can't use MI->getDebugLoc() when MI is an iterator that could be
MBB.end().

llvm-svn: 268265
2016-05-02 18:02:24 +00:00
Tom Stellard
51b37329c1 AMDGPU/SI: Enable the post-ra scheduler
Summary:
This includes a hazard recognizer implementation to replace some of
the hazard handling we had during frame index elimination.

Reviewers: arsenm

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18602

llvm-svn: 268143
2016-04-30 00:23:06 +00:00
Tom Stellard
33134ca52e AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions
Summary:
These instructions can add an immediate offset to the address, like other
ds instructions.

Reviewers: arsenm

Subscribers: arsenm, scchan

Differential Revision: http://reviews.llvm.org/D19233

llvm-svn: 268043
2016-04-29 14:34:26 +00:00
Etienne Bergeron
37376d525a Fix incorrect redundant expression in target AMDGPU.
Summary:
The expression is detected as a redundant expression.
Turn out, this is probably a bug.

```
/home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression]
  if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) {
```

Reviewers: rnk, tstellarAMD

Subscribers: arsenm, cfe-commits

Differential Revision: http://reviews.llvm.org/D19460

llvm-svn: 267415
2016-04-25 15:06:33 +00:00
Nicolai Haehnle
f54e57a212 AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.

Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.

This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19191

llvm-svn: 267102
2016-04-22 04:04:08 +00:00
Nicolai Haehnle
e4cebbe0dc AMDGPU: Guard VOPC instructions against incorrect commute
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19208

llvm-svn: 266825
2016-04-19 21:58:22 +00:00
Jun Bum Lim
ad7ab4cf46 [MachineScheduler]Add support for store clustering
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.

This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().

llvm-svn: 266437
2016-04-15 14:58:38 +00:00
Matt Arsenault
2dfb6d03c5 AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Tom Stellard
7c8ab79409 AMDGPU/SI: Fix spilling of 96-bit registers
Summary:
It seems like this was broken in r252327.  I thought we had test cases
for this, but it's really hard to tirgger spills of this exact register
size since they aren't used very much.

Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19021

llvm-svn: 266152
2016-04-12 23:57:30 +00:00
Tom Stellard
b255b03243 AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates
Summary: This makes it possible to insert nops at the end of blocks.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18549

llvm-svn: 265678
2016-04-07 14:47:07 +00:00
Nicolai Haehnle
69b2d0adeb AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Matthias Braun
8623ebcaee RegisterScavenger: Take a reference as enterBasicBlock() argument.
Make it obvious that the argument cannot be nullptr.
Remove an unnecessary nullptr check in initRegState.

llvm-svn: 265511
2016-04-06 02:47:09 +00:00
Tom Stellard
fe96362048 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589
2016-03-28 16:10:13 +00:00
Nicolai Haehnle
94ebbdf753 AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982
2016-03-21 20:28:33 +00:00
Michel Danzer
fe529a11c9 AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
2016-03-16 09:10:35 +00:00
Nikolay Haustov
941b85256d [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Chad Rosier
689669fdb4 [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.
http://reviews.llvm.org/D17967

llvm-svn: 263021
2016-03-09 16:00:35 +00:00
Tom Stellard
c656bfbad2 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Matt Arsenault
7fe831ea78 AMDGPU: Simplify boolean conditional return statements
Patch by Richard Thomson

llvm-svn: 262536
2016-03-02 23:00:21 +00:00
Matt Arsenault
f18c3f7466 AMDGPU: Cleanup suggested in bug 23960
llvm-svn: 262456
2016-03-02 04:05:14 +00:00
Changpeng Fang
929a348e60 AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics
Summary:
  This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17614

llvm-svn: 262356
2016-03-01 17:51:23 +00:00
Tom Stellard
060bccc1f3 AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.

This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.

This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.

Reviewers: arsenm, cfang, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17305

llvm-svn: 261385
2016-02-20 00:37:25 +00:00
Tom Stellard
5d2e8e7ab0 [AMDGPU] Rename $dst operand to $vdst for VOP instructions.
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.

Reviewers: vpykhtin, tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, nhaustov

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D16920

llvm-svn: 260986
2016-02-16 18:14:56 +00:00
Tom Stellard
9943755afb AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765
2016-02-12 23:45:29 +00:00
Matt Arsenault
628b2818b6 AMDGPU: Set element_size in private resource descriptor
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.

llvm-svn: 260651
2016-02-12 02:40:47 +00:00
Tom Stellard
7b646abe2d AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations.  When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17102

llvm-svn: 260599
2016-02-11 21:45:07 +00:00
Tom Stellard
4c0cead548 AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist
Summary:
When we split SMRD instructions into two MUBUFs we were adding the users
of the newly created MUBUFs to the VALU worklist.  However, the only
users these instructions had was the REG_SEQUENCE that was inserted
by splitSMRD when the original SMRD instruction was split.

We need to make sure to add the users of the original SMRD to the VALU
worklist before it is split.

I have a test case, but it requires one other bug fix, so it will be
added in a later commt.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17101

llvm-svn: 260588
2016-02-11 21:14:34 +00:00
Matt Arsenault
36dc1c179e AMDGPU: Fix constant bus use check with subregisters
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.

This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.

llvm-svn: 260495
2016-02-11 06:15:39 +00:00
Tom Stellard
350a5c65b3 AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16862

llvm-svn: 259900
2016-02-05 18:44:57 +00:00
Tom Stellard
4b06827ba4 AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp
Summary:
Also delete all the stub functions that are identical to the
implementations in TargetInstrInfo.cpp.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16609

llvm-svn: 259054
2016-01-28 16:04:37 +00:00
Nicolai Haehnle
a2b45f7350 AMDGPU/SI: Add SI Machine Scheduler
Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

llvm-svn: 257609
2016-01-13 16:10:10 +00:00
Nicolai Haehnle
8f90c2a416 AMDGPU/SI: Fold operands with sub-registers
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Reviewers: tstellarAMD, arsenm, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15875

llvm-svn: 257074
2016-01-07 17:10:29 +00:00
Nicolai Haehnle
50527c616a AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg
Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15629

llvm-svn: 256073
2015-12-19 01:36:26 +00:00
Nicolai Haehnle
d44890b802 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

llvm-svn: 256072
2015-12-19 01:16:06 +00:00
Changpeng Fang
555ffaefab AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

    Reviewers: none

    Subscribers: none

    Differential Revision: none

llvm-svn: 256022
2015-12-18 20:04:28 +00:00
Changpeng Fang
5a4421c8c1 Revert "AMDGPU/SI: Test commit"
This reverts commit a493cb636e0152ad28210934a47c6c44b1437193.

llvm-svn: 256021
2015-12-18 20:04:26 +00:00
Changpeng Fang
aa7bdc66d8 AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

Reviewers: none

Subscribers: none

Differential Revision: none

llvm-svn: 256020
2015-12-18 19:57:41 +00:00
Nicolai Haehnle
5b0b47c0c0 AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.

I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15542

llvm-svn: 255906
2015-12-17 16:46:42 +00:00
Tom Stellard
d2a657e08d AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204
2015-12-10 02:13:01 +00:00
Matt Arsenault
fe33d5e59a AMDGPU: Optimize VOP2 operand legalization
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.

This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.

llvm-svn: 254452
2015-12-01 19:57:17 +00:00
Matt Arsenault
08cdb00306 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00
Matt Arsenault
d22762f29f AMDGPU: Rename enums to be consistent with HSA code object terminology
llvm-svn: 254330
2015-11-30 21:15:57 +00:00
Matt Arsenault
c200b4cb4e AMDGPU: Remove SIPrepareScratchRegs
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

llvm-svn: 254329
2015-11-30 21:15:53 +00:00
Marek Olsak
d73d332555 AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function

It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.

llvm-svn: 254095
2015-11-25 21:22:45 +00:00
Matt Arsenault
c445aba1ed AMDGPU: Create emergency stack slots during frame lowering
Test has a bogus verifier error which will be fixed by later commits.

llvm-svn: 252327
2015-11-06 18:17:45 +00:00
Matt Arsenault
9773b5b96d AMDGPU: Remove unused scratch resource operands
The SGPR spill pseudos don't actually use them.

llvm-svn: 252324
2015-11-06 18:07:53 +00:00
Matt Arsenault
a7afaaa71a AMDGPU: Fix hardcoded alignment of spill.
Instead of forcing 4 alignment when spilled, set register class
alignments.

llvm-svn: 252322
2015-11-06 17:54:47 +00:00
Matt Arsenault
56d60d3741 AMDGPU: Also track whether SGPRs were spilled
llvm-svn: 252145
2015-11-05 05:27:10 +00:00
Matt Arsenault
b6bed238b4 AMDGPU: Fix assert when legalizing atomic operands
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.

This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.

llvm-svn: 252140
2015-11-05 02:46:56 +00:00
Matt Arsenault
62d416ff43 AMDGPU: Make findUsedSGPR more readable
Add more comments etc.

llvm-svn: 251996
2015-11-03 22:30:15 +00:00
Matt Arsenault
d1baf0fb57 AMDGPU: Simplify VOP3 operand legalization.
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.

We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.

For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).

The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand.  However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.

llvm-svn: 250951
2015-10-21 21:51:02 +00:00
Matt Arsenault
a710403d18 AMDGPU: Fix not checking implicit operands in verifyInstruction
When verifying constant bus restrictions, this wasn't catching
uses in implicit operands.

llvm-svn: 250948
2015-10-21 21:15:01 +00:00
Matt Arsenault
aa9e5394b5 AMDGPU: Add MachineInstr overloads for instruction format tests
llvm-svn: 250797
2015-10-20 04:35:43 +00:00
Matt Arsenault
28c28a361a AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

llvm-svn: 249494
2015-10-07 00:42:51 +00:00
Matt Arsenault
93549f5707 AMDGPU/SI: Add verifier check for exec reads
Make sure we aren't accidentally not setting
these in the instruction definitions.

llvm-svn: 249170
2015-10-02 18:58:37 +00:00
Marek Olsak
074d497be0 AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
2015-09-29 23:37:32 +00:00
Matt Arsenault
9ca4652ae6 AMDGPU: Factor switch into separate function
llvm-svn: 248742
2015-09-28 20:54:57 +00:00
Matt Arsenault
0376c2dc85 AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

llvm-svn: 248741
2015-09-28 20:54:52 +00:00
Matt Arsenault
f3f42b5b21 AMDGPU: Fix moving SMRD loads with literal offsets on CI
llvm-svn: 248740
2015-09-28 20:54:46 +00:00
Matt Arsenault
28211ed700 AMDGPU: Fix splitting SMRD with large offset
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.

Test will be included in a following commit
which fixes a related issue.

llvm-svn: 248739
2015-09-28 20:54:42 +00:00
Andrew Kaylor
8d27e2d077 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Matt Arsenault
04fdcb1cfc AMDGPU: Construct new buffer instruction when moving SMRD
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

llvm-svn: 248627
2015-09-25 22:21:19 +00:00
Matt Arsenault
a22e195f0c AMDGPU: Re-justify workaround and fix worked around problem
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.

If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.

llvm-svn: 248585
2015-09-25 17:08:42 +00:00
Matt Arsenault
fc8f81bb42 AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources
This avoids needting to re-legalize the new REG_SEQUENCE.

llvm-svn: 248584
2015-09-25 17:08:40 +00:00
Matt Arsenault
fecfb71096 AMDGPU: Return after instruction is processed.
llvm-svn: 248476
2015-09-24 07:51:28 +00:00
Matt Arsenault
8c0e36fede AMDGPU: Remove another unnecessary check from commuteInstruction
llvm-svn: 248475
2015-09-24 07:51:25 +00:00
Matt Arsenault
3b9edaf5a4 AMDGPU: Reduce number of copies emitted
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

llvm-svn: 248467
2015-09-24 07:16:37 +00:00
Matt Arsenault
3e7d50438f AMDGPU: Remove unnecessary check
If the instruction doesn't have enough operands, it
either shouldn't be marked as isCommutable or is malformed.

llvm-svn: 248242
2015-09-22 04:17:45 +00:00
Matt Arsenault
e27d2bced7 AMDGPU/SI: Fix more cases of losing exec operands
llvm-svn: 247230
2015-09-10 01:23:28 +00:00
Matt Arsenault
bcf29c51fa AMDGPU: Extract full 64-bit subregister and use subregs
Instead of extracting both 32-bit components from the 128-bit
register. This produces fewer copies and is easier for
the copy peephole optimizer to understand and see the actual uses
as extracts from a reg_sequence.

This avoids needing to handle subregister composing in the
PeepholeOptimizer's ValueTracker for this case.

llvm-svn: 247162
2015-09-09 17:03:29 +00:00
Matt Arsenault
bcaa9eed06 AMDGPU: Fix adding redundant implicit operands
These are already added during the MachineInstr construction,
so this was adding the implicit registers twice.

llvm-svn: 246525
2015-09-01 02:02:21 +00:00
Matt Arsenault
89301152c2 AMDGPU: Set mem operands for spill instructions
llvm-svn: 246357
2015-08-29 06:48:57 +00:00
Matt Arsenault
55f6e54b27 AMDGPU: Fix dropping mem operands when moving to VALU
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.

We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.

llvm-svn: 246356
2015-08-29 06:48:46 +00:00
Matt Arsenault
ec9bd6fd7e AMDGPU: Delete dead code
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.

The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.

Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.

llvm-svn: 246080
2015-08-26 20:48:08 +00:00
Matt Arsenault
951daff52a AMDGPU: Don't reprocess instructions when splitting i64 bcnt
llvm-svn: 246079
2015-08-26 20:48:04 +00:00
Matt Arsenault
a7c6405427 AMDGPU: Fix not moving users of s_bfe_i64 to VALU
This wouldn't propagate to users of the original BFE
and would hit a verifier error.

llvm-svn: 246078
2015-08-26 20:47:58 +00:00
Matt Arsenault
c3af5ff7c8 AMDGPU: Don't create intermediate SALU instructions
When splitting 64-bit operations, create the correct
VALU instructions immediately.

This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.

llvm-svn: 246077
2015-08-26 20:47:50 +00:00
Benjamin Kramer
69a3fdb314 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Matt Arsenault
b0ba266970 AMDGPU/SI: Remove VCCReg
llvm-svn: 244380
2015-08-08 00:41:48 +00:00
Matt Arsenault
5da7c5df39 AMDGPU/SI: Remove EXECReg
For the same reasons as the other physical registers.

llvm-svn: 244062
2015-08-05 16:42:57 +00:00
Alex Lorenz
ba9313c06c AMDGPU/SI: Add implicit register operands in the correct order.
This commit fixes a bug in the class 'SIInstrInfo' where the implicit register
machine operands were added to a machine instruction in an incorrect order -
the implicit uses were added before the implicit defs.

I found this bug while working on moving the implicit register operand
verification code from the MIR parser to the machine verifier.

This commit also makes the method 'addImplicitDefUseOperands' in the machine
instruction class public so that it can be reused in the 'SIInstrInfo' class.

Reviewers: Matt Arsenault

Differential Revision: http://reviews.llvm.org/D11689

llvm-svn: 243799
2015-07-31 23:30:09 +00:00
Tom Stellard
99ac371609 AMDGPU/SI: Simplify moveSMRDToVALU()
Summary:
Replace the switch on instruction opcode with a switch on register size.
This way we don't need to update the switch statement when we add new
SMRD variants.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11601

llvm-svn: 243652
2015-07-30 16:20:42 +00:00
Tom Stellard
c3c2e45667 AMDGPU/SI: Remove isTriviallyReMaterializable() function from SIInstrInfo
Summary:
This function is never called.  isReallyTriviallyReMaterializable() is
the function that should be implemented instead.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11620

llvm-svn: 243651
2015-07-30 16:20:40 +00:00
Matt Arsenault
988dd980cf AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.

Also remove the assert that offset1 > offset0.

There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.

llvm-svn: 242174
2015-07-14 17:57:36 +00:00
Tom Stellard
24371a62d2 AMDGPU/SI: Select mad patterns to v_mac_f32
The two-address instruction pass will convert these back to v_mad_f32
if necessary.

Differential Revision: http://reviews.llvm.org/D11060

llvm-svn: 242038
2015-07-13 15:47:57 +00:00
Tom Stellard
8d7c9eb6f3 AMDGPU/SI: Fix crash on physical registers in SIInstrInfo::isOperandLegal()
No test case for this.  I ran into it while working on some improvements
to SIShrinkInstructions.cpp.

llvm-svn: 241816
2015-07-09 16:30:27 +00:00
Tom Stellard
a50ac5923b AMDPGU/SI: Use correct resource descriptors for VI on HSA
Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D10777

llvm-svn: 240841
2015-06-26 21:58:42 +00:00
Marek Olsak
6aff2cf35d AMDGPU: really don't commute REV opcodes if the target variant doesn't exist
If pseudoToMCOpcode failed, we would return the original opcode, so operands
would be swapped, but the instruction would remain the same.
It resulted in LSHLREV a, b ---> LSHLREV b, a.

This fixes Glamor text rendering and
piglit/arb_sample_shading-builtin-gl-sample-mask on VI.

This is a candidate for stable branches.

v2: the test was simplified by Tom Stellard
llvm-svn: 240824
2015-06-26 20:29:10 +00:00
Eric Christopher
0b2dfae3ba Fix "the the" in comments.
llvm-svn: 240112
2015-06-19 01:53:21 +00:00
Sanjoy Das
ce0590cf7a [TargetInstrInfo] Rename getLdStBaseRegImmOfs and implement for x86.
Summary:

TargetInstrInfo::getLdStBaseRegImmOfs to
TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86.  The
implementation only handles a few easy cases now and will be made more
sophisticated in the future.

This is NFCI: the only user of `getLdStBaseRegImmOfs` (now
`getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion`
is disabled for x86.

Reviewers: reames, ab, MatzeB, atrick

Reviewed By: MatzeB, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10199

llvm-svn: 239741
2015-06-15 18:44:14 +00:00
Tom Stellard
3f1708598e R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00
Tom Stellard
39f7e52397 Revert "AMDGPU: Add core backend files for R600/SI codegen v6"
This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea.

llvm-svn: 160303
2012-07-16 18:19:53 +00:00
Tom Stellard
9f326179fc AMDGPU: Add core backend files for R600/SI codegen v6
llvm-svn: 160270
2012-07-16 14:17:08 +00:00