1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

16 Commits

Author SHA1 Message Date
Matt Arsenault
d7e3e9c393 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Alexander Timofeev
11308327e5 [AMDGPU] Divergence driven instruction selection. Part 1.
Summary: This change is the first part of the AMDGPU target description
    change. The aim of it is the effective splitting the vector and scalar
    flows at the selection stage. Selection uses predicate functions based
    on the framework implemented earlier - https://reviews.llvm.org/D35267

    Differential revision: https://reviews.llvm.org/D52019

    Reviewers: rampitec

llvm-svn: 342719
2018-09-21 10:31:22 +00:00
Matt Arsenault
2b0231f519 AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Matt Arsenault
515e95accf AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Matt Arsenault
e4ccb8ba03 AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Alexander Timofeev
71cf453d98 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307097
2017-07-04 17:32:00 +00:00
NAKAMURA Takumi
e6c7524092 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

llvm-svn: 307054
2017-07-04 02:14:18 +00:00
Alexander Timofeev
ef2bf78d2f [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307026
2017-07-03 14:54:11 +00:00
Simon Pilgrim
04596189c2 [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)
This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045

Differential Revision: https://reviews.llvm.org/D33451

llvm-svn: 303691
2017-05-23 21:27:15 +00:00
Matt Arsenault
dd9ab77318 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Tom Stellard
fca8e2011d AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 286464
2016-11-10 16:02:37 +00:00
Tom Stellard
8e72cd5271 Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

llvm-svn: 285995
2016-11-04 13:06:34 +00:00
Tom Stellard
1eb5b9fee5 AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 285939
2016-11-03 17:13:50 +00:00
Matt Arsenault
14dbe93675 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Matt Arsenault
1adcdc6e96 AMDGPU: Add sdst operand to VOP2b instructions
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.

llvm-svn: 246358
2015-08-29 07:16:50 +00:00
Tom Stellard
3f1708598e R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00