1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

1029 Commits

Author SHA1 Message Date
Mark Searles
2ff29f20d2 [AMDGPU] Require waitcnt before barrier for all targets; adjust tests.
Differential Revision: https://reviews.llvm.org/D33576

llvm-svn: 304217
2017-05-30 16:22:43 +00:00
Konstantin Zhuravlyov
d5fb323bee Resubmit r303859 with test fixed.
[AMDGPU] add intrinsic for s_getpc

Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Patch by Tim Corringham

llvm-svn: 304031
2017-05-26 20:38:26 +00:00
Tom Stellard
69bcb42fb7 AMDGPU/GlobalISel: Mark 32-bit float constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33212

llvm-svn: 304003
2017-05-26 16:40:03 +00:00
Matthias Braun
120c5b7053 CodeGen: Rename DEBUG_TYPE to match passnames
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.

llvm-svn: 303921
2017-05-25 21:26:32 +00:00
Nico Weber
3d129c51cd Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.
llvm-svn: 303902
2017-05-25 19:19:29 +00:00
Tim Corringham
ea6f7a370f [AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32862

llvm-svn: 303859
2017-05-25 14:04:14 +00:00
Simon Pilgrim
04596189c2 [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)
This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045

Differential Revision: https://reviews.llvm.org/D33451

llvm-svn: 303691
2017-05-23 21:27:15 +00:00
Changpeng Fang
a2949e55f0 AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca
Summary:
  Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other.
As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage
related checking out and replace it after the calling convention checking.

Reviewer:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33139

llvm-svn: 303684
2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin
6d04b87725 [AMDGPU] Combine and (srl) into shl (bfe)
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.

It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.

TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.

Differential Revision: https://reviews.llvm.org/D33455

llvm-svn: 303681
2017-05-23 19:54:48 +00:00
Stanislav Mekhanoshin
683af00312 [AMDGPU] Convert shl (add) into add (shl)
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.

Differential Revision: https://reviews.llvm.org/D33432

llvm-svn: 303641
2017-05-23 15:59:58 +00:00
Stanislav Mekhanoshin
edf30ecc75 [AMDGPU] Narrow lshl from 64 to 32 bit if possible
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)

Differential Revision: https://reviews.llvm.org/D33367

llvm-svn: 303569
2017-05-22 16:58:10 +00:00
Matthias Braun
d0b8ae859d Fix typo in test
llvm-svn: 303436
2017-05-19 17:25:20 +00:00
Matthias Braun
78b1b99fdb LiveIntervalAnalysis: Fix missing case in pruneSubRegValues()
pruneSubRegValues() needs to remove subregister ranges starting at
instructions that later get removed by eraseInstrs(). It missed to check
one case in which eraseInstrs() would remove an instruction.

Fixes http://llvm.org/PR32688

llvm-svn: 303396
2017-05-19 00:18:03 +00:00
Sam Kolton
6e32c9563b [AMDGPU] SDWA operands should not intersect with potential MIs
Summary:
There should be no intesection between SDWA operands and potential MIs. E.g.:
```
v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0
v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0
v_add_u32 v3, v4, v2
```
In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it.

Reviewers: vpykhtin, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32804

llvm-svn: 303347
2017-05-18 12:12:03 +00:00
Matt Arsenault
ab4fb8ba2f AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Matt Arsenault
99a25d4af1 AMDGPU: Remove old intrinsic uses
llvm-svn: 303305
2017-05-17 21:38:21 +00:00
Matt Arsenault
13c6f6f3f1 AMDGPU: Make better use of op_sel with high components
Handle more general swizzles.

llvm-svn: 303296
2017-05-17 20:30:58 +00:00
Matt Arsenault
0ece5089f7 AMDGPU: Try to use op_sel when selecting packed instructions
Avoids instructions to pack a vector when the source is really
a scalar being broadcast.

Also be smarter and look for per-component fneg.

Doesn't yet handle scalar from upper half of register
or other swizzles.

llvm-svn: 303291
2017-05-17 20:00:00 +00:00
Matt Arsenault
b41d61b11a AMDGPU: Fix min3/max3 combines for f16/i16
Fix missing instruction definitions for min3/max3.

llvm-svn: 303284
2017-05-17 19:25:06 +00:00
Nirav Dave
3633380341 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Dmitry Preobrazhensky
5a5f736ba9 [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33123

llvm-svn: 303070
2017-05-15 14:28:23 +00:00
Changpeng Fang
c5587a9cbd AMDGPU/SI: Don't promote to vector if the load/store is volatile.
Summary:
  We should not change volatile loads/stores in promoting alloca to vector.

Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33107

llvm-svn: 302943
2017-05-12 20:31:12 +00:00
Tom Stellard
31c4d52ea9 AMDGPU: Add lit.local.cfg to disable global-isel tests when global-isel is disabled
This should fix bots broken by r302919.

llvm-svn: 302928
2017-05-12 17:59:30 +00:00
Tom Stellard
e65dcab676 AMDGPU/GlobalISel: Mark 32-bit integer constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33115

llvm-svn: 302919
2017-05-12 16:46:46 +00:00
Matt Arsenault
4f44d0e3e0 AMDGPU: Remove tfe bit from flat instruction definitions
We don't use it and it was removed in gfx9, and the encoding
bit repurposed.

Additionally actually using it requires changing the output register
class, which wasn't done anyway.

llvm-svn: 302814
2017-05-11 17:38:33 +00:00
Matt Arsenault
fd9981d4ab AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

llvm-svn: 302813
2017-05-11 17:26:25 +00:00
Dmitry Preobrazhensky
299fc6910a [AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output
See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D32913

llvm-svn: 302648
2017-05-10 13:00:28 +00:00
Kannan Narayanan
e4592ed923 [AMDGPU] In the new waitcnt insertion pass, use getHeader
instead of getTopBlock to find the loop header.

Differential Revision: https://reviews.llvm.org/D32831

llvm-svn: 302290
2017-05-05 21:10:17 +00:00
Matthias Braun
46d58287e3 MIParser/MIRPrinter: Compute block successors if not explicitely specified
- MIParser: If the successor list is not specified successors will be
  added based on basic block operands in the block and possible
  fallthrough.

- MIRPrinter: Adds a new `simplify-mir` option, with that option set:
  Skip printing of block successor lists in cases where the
  parser is guaranteed to reconstruct it. This means we still print the
  list if some successor cannot be determined (happens for example for
  jump tables), if the successor order changes or branch probabilities
  being unequal.

Differential Revision: https://reviews.llvm.org/D31262

llvm-svn: 302289
2017-05-05 21:09:30 +00:00
Konstantin Zhuravlyov
5ce6b9a574 AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0
This field is populated by the CP

Differential Revision: https://reviews.llvm.org/D32619

llvm-svn: 302277
2017-05-05 20:13:55 +00:00
Marek Olsak
372ac26def AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D32645

llvm-svn: 302200
2017-05-04 22:25:20 +00:00
Chad Rosier
1f7103004e [DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).
Differential Revision: http://reviews.llvm.org/D32596

llvm-svn: 302153
2017-05-04 14:14:44 +00:00
Matt Arsenault
a855907ec4 AMDGPU: Don't promote alloca to LDS for leaf functions
LDS use in leaf functions not currently handled.

llvm-svn: 301958
2017-05-02 18:33:18 +00:00
Matt Arsenault
ca84f60073 AMDGPU: Make intrinsics speculatable
llvm-svn: 301937
2017-05-02 16:57:44 +00:00
Matt Arsenault
e7ea202b71 AMDGPU: Fix copies from physical registers in SIFixSGPRCopies
This would assert when there were multiple defs of
a physical register.

We just need to move all of the users of it.

llvm-svn: 301730
2017-04-29 01:26:34 +00:00
Marek Olsak
d1c320b134 AMDGPU: Add new amdgcn.init.exec intrinsics
v2: More tests, bug fixes, cosmetic changes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D31762

llvm-svn: 301677
2017-04-28 20:21:58 +00:00
Konstantin Zhuravlyov
568217ba62 AMDGPU: Fix ValueKind code object metadata for images
Differential Revision: https://reviews.llvm.org/D32504

llvm-svn: 301360
2017-04-25 20:38:26 +00:00
Matt Arsenault
f19e85c475 Revert "StructurizeCFG: Directly invert cmp instructions"
This reverts commit r300732. This breaks a few tests.
I think the problem is related to adding more uses of
the condition that don't yet exist at this point.

llvm-svn: 301242
2017-04-24 20:25:01 +00:00
Matt Arsenault
b7fb34ab2b AMDGPU: Select scratch mubuf offsets when pointer is a constant
In call sequence setups, there may not be a frame index base
and the pointer is a constant offset from the frame
pointer / scratch wave offset register.

llvm-svn: 301230
2017-04-24 19:40:59 +00:00
Stanislav Mekhanoshin
ac301b1911 [AMDGPU] Merge M0 initializations
Merges equivalent initializations of M0 and hoists them into a common
dominator block. Technically the same code can be used with any
register, physical or virtual.

Differential Revision: https://reviews.llvm.org/D32279

llvm-svn: 301228
2017-04-24 19:37:54 +00:00
Yaxun Liu
c5f291f5da CodeGen: Add a hook for getFenceOperandTy
Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0.
This is fine for most targets. However for amdgcn target, the size of pointer in address space 0
depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is
32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target
triple environment. Therefore a hook is need in target lowering for getting the fence operand type.

This patch has no effect on targets other than amdgcn.

Differential Revision: https://reviews.llvm.org/D32186

llvm-svn: 301215
2017-04-24 18:26:27 +00:00
Matt Arsenault
237f3db4bd AMDGPU: Move trap lowering to DAG
Fixes traps in any block besides the entry block,
and fixes depending on a live-in physical register
by using a virtual register copy.

Also happens to stop emitting a nop in the case
debug trap is not supported.

llvm-svn: 301206
2017-04-24 17:49:13 +00:00
Nicolai Haehnle
b70d32840d AMDGPU: Move v_readlane lane select from VGPR to SGPR
Summary:
Fix a compiler bug when the lane select happens to end up in a VGPR.

Clarify the semantic of the corresponding intrinsic to be that of
the corresponding GLSL: the lane select must be uniform across a
wave front, otherwise results are undefined.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D32343

llvm-svn: 301197
2017-04-24 17:17:36 +00:00
Nicolai Haehnle
6702dd6007 AMDGPU: Fix crash when scheduling non-memory SMRD instructions
Summary: Fixes piglit spec/arb_shader_clock/execution/*

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D32345

llvm-svn: 301191
2017-04-24 16:53:52 +00:00
David Blaikie
adc641f6a1 Fix test from polluting the source tree
(though this seems like a "does this not crash" test - which isn't very
good. Should be fixed)

llvm-svn: 301071
2017-04-22 07:53:40 +00:00
Konstantin Zhuravlyov
9a6a14283a AMDGPU: Temporarily disable packed inlinable literals (v2f16, v2i16)
Differential Revision: https://reviews.llvm.org/D32361

llvm-svn: 301028
2017-04-21 19:45:22 +00:00
Konstantin Zhuravlyov
c2d91f0d4e AMDGPU: Fix S_PACK_HH_B32_B16
- We really ought to zero out lower 16 bits

Differential Revision: https://reviews.llvm.org/D32356

llvm-svn: 301026
2017-04-21 19:35:05 +00:00
Yaxun Liu
4552eabc60 [AMDGPU] Handle SI_MASKED_UNREACHABLE in instruction emitter
SI_MASKED_UNREACHABLE does not have machine instruction encoding.
It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some
other pseudo instructions.

This patch fixes compilation failure of RadeonRays.

Differential Revision: https://reviews.llvm.org/D32364

llvm-svn: 301025
2017-04-21 19:32:02 +00:00
Konstantin Zhuravlyov
079d4d7f9f AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormals
Differential Revision: https://reviews.llvm.org/D32085

llvm-svn: 301023
2017-04-21 19:25:33 +00:00
Yaxun Liu
0c1bf45146 CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.

However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.

This patch fixes that.

Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.

AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.

Differential Revision: https://reviews.llvm.org/D32021

llvm-svn: 300864
2017-04-20 18:15:34 +00:00