1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
Commit Graph

129 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
ac301b1911 [AMDGPU] Merge M0 initializations
Merges equivalent initializations of M0 and hoists them into a common
dominator block. Technically the same code can be used with any
register, physical or virtual.

Differential Revision: https://reviews.llvm.org/D32279

llvm-svn: 301228
2017-04-24 19:37:54 +00:00
Krzysztof Parzyszek
ce1e95e40d Move size and alignment information of regclass to TargetRegisterInfo
1. RegisterClass::getSize() is split into two functions:
   - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
   - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
   - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;

This will allow making those values depend on subtarget features in the
future.

Differential Revision: https://reviews.llvm.org/D31783

llvm-svn: 301221
2017-04-24 18:55:33 +00:00
Matt Arsenault
8f5e36a0ef Fix typo
llvm-svn: 300597
2017-04-18 20:59:46 +00:00
Stanislav Mekhanoshin
f25d7baa1b [AMDGPU] added SIInstrInfo::getAddNoCarry() helper
Addressed rest of post submit comments from D31993.

Differential Revision: https://reviews.llvm.org/D32057

llvm-svn: 300288
2017-04-14 00:33:44 +00:00
Stanislav Mekhanoshin
14ca3863d3 Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

llvm-svn: 296182
2017-02-24 21:56:16 +00:00
Stanislav Mekhanoshin
cfe4b7cb52 Correct register pressure calculation in presence of subregs
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

llvm-svn: 296009
2017-02-23 20:19:44 +00:00
Matt Arsenault
85a1bec778 AMDGPU: Don't use stack space for SGPR->VGPR spills
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.

I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.

The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.

llvm-svn: 295753
2017-02-21 19:12:08 +00:00
Matt Arsenault
a207e31c14 AMDGPU: Merge initial gfx9 support
llvm-svn: 295554
2017-02-18 18:29:53 +00:00
Stanislav Mekhanoshin
99ec2d0f0b [AMDGPU] Override PSet for M0
This change returns empty PSet list for M0 register. Otherwise its
PSet as defined by tablegen is SReg_32. This results in incorrect
register pressure calculation every time an instruction uses M0.
Such uses count as SReg_32 PSet and inadequately increase pressure
on SGPRs.

Differential Revision: https://reviews.llvm.org/D29798

llvm-svn: 294691
2017-02-10 02:07:58 +00:00
Stanislav Mekhanoshin
07d39af714 [AMDGPU] Implement register pressure callbacks
Implement getRegPressureLimit and getRegPressureSetLimit callbacks in
SIRegisterInfo.

This makes standard converge scheduler to behave almost the same as
GCNScheduler, sometime slightly better sometimes a bit worse.
In gerenal that is also possible to switch GCNScheduler to use these
callbacks instead of getMaxWaves(), which also makes GCNScheduler
slightly better on some tests and slightly worse on another. A big
win is behavior with converge scheduler.

Note, these are used not only by scheduling, but in places like
MachineLICM.

Differential Revision: https://reviews.llvm.org/D29700

llvm-svn: 294518
2017-02-08 21:22:03 +00:00
Konstantin Zhuravlyov
9fccb13d0b [AMDGPU] Move register related queries to subtarget class
Differential Revision: https://reviews.llvm.org/D29318

llvm-svn: 294440
2017-02-08 13:02:33 +00:00
Tom Stellard
bbf29e433b AMDGPU add support for spilling to a user sgpr pointed buffers
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Patch By: Dave Airlie

Reviewers: nhaehnle, arsenm, tstellarAMD

Reviewed By: arsenm

Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25428

llvm-svn: 293000
2017-01-25 01:25:13 +00:00
Stanislav Mekhanoshin
3a97f30b01 [AMDGPU] Do not allow register coalescer to create big superregs
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.

With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.

Differential Revision: https://reviews.llvm.org/D28782

llvm-svn: 292413
2017-01-18 17:30:05 +00:00
Diana Picus
971b3bbda9 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Krzysztof Parzyszek
b6cc44c368 Extract LaneBitmask into a separate type
Specifically avoid implicit conversions from/to integral types to
avoid potential errors when changing the underlying type. For example,
a typical initialization of a "full" mask was "LaneMask = ~0u", which
would result in a value of 0x00000000FFFFFFFF if the type was extended
to uint64_t.

Differential Revision: https://reviews.llvm.org/D27454

llvm-svn: 289820
2016-12-15 14:36:06 +00:00
Matt Arsenault
c2c2a10170 AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.

Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.

The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.

llvm-svn: 289306
2016-12-10 00:39:12 +00:00
Marek Olsak
191936bbf2 AMDGPU/SI: Don't reserve XNACK when it's disabled
Summary:
This frees 2 additional scalar registers.

These are results from all of my 3 patches combined:

  Polaris:
    Spilled SGPRs: 2231 -> 1517 (-32.00 %)

  Tonga:
    Spilled SGPRs: 3829 -> 2608 (-31.89 %)
    Spilled VGPRs: 100 -> 84 (-16.00 %)

  Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader
  limited to 64 VGPRs.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27151

llvm-svn: 289262
2016-12-09 19:49:54 +00:00
Marek Olsak
bb1829874b AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Summary: This frees 2 scalar registers.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27150

llvm-svn: 289261
2016-12-09 19:49:48 +00:00
Marek Olsak
ed98e90e56 AMDGPU/SI: Allow using SGPRs 96-101 on VI
Summary:
There is no point in setting SGPRS=104, because VI allocates SGPRs
in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs
for general purposes.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27149

llvm-svn: 289260
2016-12-09 19:49:40 +00:00
Stanislav Mekhanoshin
e735965677 [AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use
Differential Revision: https://reviews.llvm.org/D27225

llvm-svn: 289095
2016-12-08 20:07:23 +00:00
Nicolai Haehnle
2a19d502fb AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg
Summary:
Without the fix to isFrameOffsetLegal to consider the instruction's
immediate offset, the new test case hits the corresponding assertion in
resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a
different base register.

With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of
places because frame base registers are added where they're not needed.
This is addressed by properly implementing needsFrameBaseReg, which also
helps to avoid unnecessary zero frame indices in a bunch of other places.

Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D27344

llvm-svn: 289048
2016-12-08 14:08:02 +00:00
Saleem Abdulrasool
3523c1f6d4 AMDGPU: remove a couple of unused variables
lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const':
	lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable]
	   const TargetRegisterClass *SubRC = nullptr;
	                              ^
	lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const':
	lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable]
	   const TargetRegisterClass *SubRC = nullptr;
	                              ^

The variable was assigned to, but never used.  The functions called did not
mutate state.  Simplify the logic and remove the variable.  Identified by gcc
5.4.0.

llvm-svn: 288601
2016-12-03 22:25:21 +00:00
Matt Arsenault
f24fcd4ad7 AMDGPU: Use wider scalar spills for SGPR spilling
Since the spill is for the whole wave, these
don't have the swizzling problems that vector stores do
and a single 4-byte allocation is enough to spill a 64 element
register. This should reduce the number of spill instructions and
put all the spills for a register in the same cacheline.

This should save allocated private size, but for now it doesn't.
The extra slots are allocated for each component, but never used
because the frame layout is essentially finalized before frame
indices are replaced. For always using the scalar store path,
this should probably be moved into processFunctionBeforeFrameFinalized.

llvm-svn: 288445
2016-12-02 00:54:45 +00:00
Matt Arsenault
3c5076a42d AMDGPU: Materialize frame index before add
It isn't generally safe to fold the frame index
directly into the operand since it will possibly
not be an inline immediate after it is expanded.

This surprisingly seems to produce better code, since
the FI doesn't prevent folding other immediate operands.

llvm-svn: 288185
2016-11-29 19:20:48 +00:00
Marek Olsak
30b976334f AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

llvm-svn: 287942
2016-11-25 17:37:09 +00:00
Marek Olsak
9d8f0b805a Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

llvm-svn: 287936
2016-11-25 16:03:34 +00:00
Marek Olsak
35ac58863e Revert "AMDGPU: Fix MMO when splitting spill"
This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1.

llvm-svn: 287935
2016-11-25 16:03:27 +00:00
Marek Olsak
663483a60f Revert "AMDGPU: Fix adding extra implicit def of register"
This reverts commit e834ce5976567575621901fb967b8018b9916d71.

llvm-svn: 287934
2016-11-25 16:03:22 +00:00
Marek Olsak
56e27f3dc2 Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"
This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb.

llvm-svn: 287933
2016-11-25 16:03:19 +00:00
Marek Olsak
c530d56272 Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.

llvm-svn: 287932
2016-11-25 16:03:15 +00:00
Marek Olsak
2cef424fef Revert "AMDGPU: Remove m0 spilling code"
This reverts commit f18de36554eb22416f8ba58e094e0272523a4301.

llvm-svn: 287931
2016-11-25 16:03:06 +00:00
Marek Olsak
55154098e4 Revert "AMDGPU: Preserve m0 value when spilling"
This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce.

llvm-svn: 287930
2016-11-25 16:03:02 +00:00
Matt Arsenault
4a06c5b78a AMDGPU: Preserve m0 value when spilling
llvm-svn: 287844
2016-11-24 00:26:50 +00:00
Matt Arsenault
dac54cd124 TRI: Add hook to pass scavenger during frame elimination
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.

It might be better to just always use the scavenger and stop
creating virtual registers.

llvm-svn: 287843
2016-11-24 00:26:47 +00:00
Matt Arsenault
eb4e4ccc03 AMDGPU: Remove m0 spilling code
Since m0 isn't allocatable it should never be spilled anymore.

llvm-svn: 287842
2016-11-24 00:26:44 +00:00
Matt Arsenault
9a257a9a17 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Matt Arsenault
8316e1ce66 AMDGPU: Fix not setting kill flag on temp reg when spilling
llvm-svn: 287808
2016-11-23 21:00:12 +00:00
Matt Arsenault
4fa66c3653 AMDGPU: Fix adding extra implicit def of register
In the scalar case, there's no reason to add an additional
def of the same register.

llvm-svn: 287807
2016-11-23 21:00:10 +00:00
Matt Arsenault
dae776c6dc AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.

llvm-svn: 287806
2016-11-23 20:52:53 +00:00
Simon Pilgrim
a971e6cc43 Fix spelling mistakes in AMDGPU target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287333
2016-11-18 11:04:02 +00:00
Tom Stellard
22310389fc AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

llvm-svn: 287131
2016-11-16 18:42:17 +00:00
Matt Arsenault
96a200f6e7 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

llvm-svn: 286766
2016-11-13 18:20:54 +00:00
Matt Arsenault
963aa9b692 AMDGPU: Try to fix (non-clang?) bot builds
llvm-svn: 286120
2016-11-07 16:52:50 +00:00
Matt Arsenault
c8bd3705e6 AMDGPU: Refactor copyPhysReg
Separate the subregister splitting logic to re-use later.

llvm-svn: 286118
2016-11-07 16:39:22 +00:00
Matt Arsenault
3085819547 AMDGPU: Stop creating unused virtual registers
These are only used in the spill to VMEM path. Move them to
the one use.

llvm-svn: 285756
2016-11-01 21:58:07 +00:00
Matt Arsenault
0536acc73b AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00
Matt Arsenault
f048150fc7 Reapply "AMDGPU: Don't use offen if it is 0"
This reverts r283003

llvm-svn: 285203
2016-10-26 15:08:16 +00:00
Nicolai Haehnle
1c18014661 AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

llvm-svn: 284215
2016-10-14 09:03:04 +00:00
Matthias Braun
15228aeb1d AMDGPU: Do not re-use tmpreg in spill/restore lowering
The register scavenging code does not support multiple definitions of
the same vreg.

Differential Revision: https://reviews.llvm.org/D25220

llvm-svn: 283369
2016-10-05 20:02:51 +00:00
Matt Arsenault
6919c78505 AMDGPU: Factor SGPR spilling into separate functions
llvm-svn: 283175
2016-10-04 01:14:56 +00:00