1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

180 Commits

Author SHA1 Message Date
Matt Arsenault
ef8518e8b1 AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Matt Arsenault
9249507611 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
Valery Pykhtin
6c39a922ef [AMDGPU] Refactor MUBUF/MTBUF instructions
Differential revision: https://reviews.llvm.org/D24295

llvm-svn: 281137
2016-09-10 13:09:16 +00:00
Matt Arsenault
e16ffe6fc0 AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndex
llvm-svn: 281128
2016-09-10 01:20:33 +00:00
Matt Arsenault
05929eb705 AMDGPU: Fix scheduling info for spill pseudos
These defaulted to Write32Bit. I don't think this actually matters
since these don't exist during scheduling.

llvm-svn: 281127
2016-09-10 01:20:28 +00:00
Matt Arsenault
33e593fd71 AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

llvm-svn: 281117
2016-09-09 23:32:53 +00:00
Sam Kolton
94d1ca9e9c AMDGPU] Assembler: better support for immediate literals in assembler.
Summary:
Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:
    - We parsed fp literal:
        - Instruction expects 64-bit operand:
            - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
                - then we do nothing this literal
            - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
                - report error
            - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
                - If instruction expect fp operand type (f64)
                    - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
                        - If so then do nothing
                    - Else (e.g. v_fract_f64 v[0:1], 3.1415)
                        - report warning that low 32 bits will be set to zeroes and precision will be lost
                        - set low 32 bits of literal to zeroes
                - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
                    - report error as it is unclear how to encode this literal
        - Instruction expects 32-bit operand:
            - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
            - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
                - do nothing
                - Else report error
            - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
    - Parsed binary literal:
        - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
            - do nothing
        - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
            - report error
        - Else, literal is not-inlinable and we are not required to inline it
            - Are high 32 bit of literal zeroes or same as sign bit (32 bit)
                - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
            - Else
                - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {
    OPERAND_REG_IMM32_INT,
    OPERAND_REG_IMM32_FP,
    OPERAND_REG_INLINE_C_INT,
    OPERAND_REG_INLINE_C_FP,
}
'''

This is not working yet:
    - Several tests are failing
    - Problems with predicate methods for inline immediates
    - LLVM generated assembler parts try to select e64 encoding before e32.
More changes are required for several AsmOperands.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, artem.tamazov

Differential Revision: https://reviews.llvm.org/D22922

llvm-svn: 281050
2016-09-09 14:44:04 +00:00
Valery Pykhtin
616e586d42 [AMDGPU] Refactor FLAT TD instructions
Differential revision: https://reviews.llvm.org/D24072

llvm-svn: 280655
2016-09-05 11:22:51 +00:00
Matt Arsenault
ecba57a1e7 AMDGPU: Set sizes of spill pseudos
llvm-svn: 280595
2016-09-03 17:25:44 +00:00
Nicolai Haehnle
472bdb08df AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

llvm-svn: 280589
2016-09-03 12:26:32 +00:00
Matt Arsenault
7f6114dde5 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Changpeng Fang
bd40598ced AMDGPU/SI: MIMG TD Refactoring.
Summary:
 Created a new td file MIMGInstructions.td which contains all definitions
of MIMG related instructions.

Reviewed by:
  kzhuravl, vpykhtin

Differential Revision:
  http://reviews.llvm.org/D24106

llvm-svn: 280385
2016-09-01 17:54:54 +00:00
Valery Pykhtin
73620e7ed6 [AMDGPU] Scalar Memory instructions TD refactoring
Differential revision: https://reviews.llvm.org/D23996

llvm-svn: 280349
2016-09-01 09:56:47 +00:00
Valery Pykhtin
7ff35e6fee [AMDGPU] Refactor SOP instructions TD files.
Differential revision: https://reviews.llvm.org/D23617

llvm-svn: 280101
2016-08-30 15:20:31 +00:00
Matt Arsenault
5c1096e49d AMDGPU: Remove unneeded implicit exec uses/defs
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.

llvm-svn: 279909
2016-08-27 03:00:51 +00:00
Matt Arsenault
d3892e17d5 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault
af7cdf6a46 AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Matt Arsenault
9b698596fe AMDGPU: Fix sched type for branches
llvm-svn: 279900
2016-08-27 00:51:02 +00:00
Matt Arsenault
09098bc2b3 AMDGPU: Remove register operand from si_mask_branch
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.

llvm-svn: 279899
2016-08-27 00:42:21 +00:00
Changpeng Fang
351c66ab91 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Wei Ding
73862a15f7 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Matt Arsenault
321978a22d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
Michael Kuperstein
c89e7c5616 [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129
2016-08-18 20:08:15 +00:00
Wei Ding
ea4d7271dc AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Valery Pykhtin
ef1d950dec [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Wei Ding
f99a9aad71 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Wei Ding
afed1ab282 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Changpeng Fang
179532ade9 AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Matt Arsenault
cb12ba3447 AMDGPU: s_setpc_b64 should be an indirect branch
llvm-svn: 278278
2016-08-10 19:20:02 +00:00
Matt Arsenault
806c7ea5a9 AMDGPU: Set sizes on control flow pseudos
llvm-svn: 278276
2016-08-10 19:11:51 +00:00
Matt Arsenault
5267e59706 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Nicolai Haehnle
1212f85a97 AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

llvm-svn: 277504
2016-08-02 19:31:14 +00:00
Valery Pykhtin
448b27ff70 [AMDGPU] refactor DS instruction definitions. NFC.
Differential revision: https://reviews.llvm.org/D22522

llvm-svn: 277344
2016-08-01 14:21:30 +00:00
Matt Arsenault
fc3cf0b620 AMDGPU: Set s_setpc_b64 as a terminator
llvm-svn: 277259
2016-07-30 01:40:34 +00:00
Wei Ding
ce5d57c9f9 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Nicolai Haehnle
ca079d3c9f AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
Matt Arsenault
3d9fa5a776 AMDGPU: Use implicit_def for selecting anyext
llvm-svn: 276819
2016-07-26 23:06:33 +00:00
Matt Arsenault
d60a4f902f AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Matt Arsenault
bd4ed9f420 AMDGPU: Delete more dead code
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.

llvm-svn: 276436
2016-07-22 17:01:25 +00:00
Matt Arsenault
b7536203de AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Matt Arsenault
86ee9f5013 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault
93828aab9b AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault
f3c657912f AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault
924a52e452 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault
23ec7caeb7 AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Wei Ding
3dc39df64c AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Nicolai Haehnle
67daedb5d4 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Matt Arsenault
de814992a0 AMDGPU: Cleanup pseudoinstructions
llvm-svn: 275133
2016-07-12 00:23:17 +00:00
Matt Arsenault
b860f0d916 AMDGPU: Fix missing scc def on control flow pseudos
These are all expanded to instructions that include an scc def.

llvm-svn: 275132
2016-07-12 00:08:14 +00:00
Matt Arsenault
64f6653f73 Revert "AMDGPU: Remove unused control flow intrinsic"
llvm-svn: 274978
2016-07-09 17:18:39 +00:00