1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00
Commit Graph

768 Commits

Author SHA1 Message Date
Matt Arsenault
babc0fbd04 R600/SI: Move all fabs / fneg handling to patterns
llvm-svn: 215749
2014-08-15 18:42:22 +00:00
Matt Arsenault
134b10c2db R600/SI: Use source modifiers for f64 fneg
llvm-svn: 215748
2014-08-15 18:42:18 +00:00
Matt Arsenault
4061bd037d R600/SI: Use source modifier for f64 fabs
llvm-svn: 215747
2014-08-15 18:42:15 +00:00
Matt Arsenault
ebb5a25a24 R600/SI: Fix offset folding in some cases with shifted pointers.
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.

This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.

llvm-svn: 215739
2014-08-15 17:49:05 +00:00
Matt Arsenault
014e16538b R600/SI: Add intrinsic for ldexp
llvm-svn: 215734
2014-08-15 17:30:25 +00:00
Matt Arsenault
85044b8440 R600/SI: Implement isLegalAddressingMode
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.

More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.

llvm-svn: 215732
2014-08-15 17:17:07 +00:00
Matt Arsenault
16534528fc R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

llvm-svn: 215564
2014-08-13 18:14:11 +00:00
Hal Finkel
ac8c24afbf Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID
Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and
INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are
used for memory intrinsics (those with MachineMemOperands), and sometimes not.
When not, the nodes are not created as instances of MemIntrinsicSDNode, but
rather created as some other subclass of SDNode using DAG::getNode. When they
are memory intrinsics, they are created using DAG::getMemIntrinsicNode as
instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of
MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby
MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would
return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID
would return false. In r214452, MemSDNode::classof was changed to return true
for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The
problem is that neither the pre-r214452 logic and the post-r214452 logic are
really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID
nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and
the return value from classof needs to reflect that. This was broken before
r214452 (because MemIntrinsicSDNode::classof always returned true), and was
broken afterward (because MemSDNode::classof also always returned true), and
will now be correct.

The minimal solution is to grab one of the SubclassData bits (there is one left
for MemIntrinsicSDNode nodes) and use it to store whether or not a particular
INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of
MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof
and MemSDNode::classof to return the correct answer for the underlying object
for both the memory-intrinsic and non-memory-intrinsic cases.

This fixes the problem that r214452 created in the SelectionDAGDumper (thanks
to Matt Arsenault for pointing it out).

Because PowerPC does not implement getTgtMemIntrinsic, this change breaks
test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will
fix it in a follow-up commit.

llvm-svn: 215511
2014-08-13 01:15:37 +00:00
Jan Vesely
c9798145af R600: Use optimized 24bit path in udivrem
v2: drop enum keyword
    use correct extension mode
    don't bother computing the sign in unsinged case

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215462
2014-08-12 17:31:20 +00:00
Jan Vesely
73bab311bb R600: Use i24 optimized path for SREM
v2: add tests
    rename LowerSDIV24 to LowerSDIVREM24
    handle the rem part in this function

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215460
2014-08-12 17:31:17 +00:00
Tom Stellard
12111b0735 R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant
This saves us from having to copy a 64-bit 0 value into VGPRs for
BUFFER_* instruction which only have a 12-bit immediate offset.

llvm-svn: 215399
2014-08-11 22:18:17 +00:00
Tom Stellard
553839dddc R600/SI: Add check for low 32 bits of encoding to mubuf tests
There are no variable values like registers encoded in the low 32 bits of MUBUF
instructions, so it is relatively easy to check these bits, and it will
help prevent us from introducing encoding bugs.

llvm-svn: 215397
2014-08-11 22:18:11 +00:00
Tom Stellard
c6b168f62f R600/SI: Clear lds bit on MUBUF instructions used for private stores
This bit was left uninitialized, which was causing some random failures
of piglit tests.

NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 215396
2014-08-11 22:18:09 +00:00
Tom Stellard
12ad2df47b R600/SI: Fix broken test
llvm-svn: 215395
2014-08-11 22:18:05 +00:00
Tom Stellard
bcebb7fec6 R600/SI: Custom lower CONCAT_VECTORS
This will lower them using register copies rather than loads and stores
to the stack.

llvm-svn: 215270
2014-08-09 01:06:56 +00:00
Tom Stellard
b8bff2be44 R600/SI: Update concat_vectors.ll to check for scratch usage
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.

With this change I was able to uncover one broken test which will
be fixed in a future commit.

llvm-svn: 215269
2014-08-09 01:06:53 +00:00
Matt Arsenault
feb78a527b R600: Cleanup fadd and fsub tests
llvm-svn: 214991
2014-08-06 20:27:55 +00:00
Matt Arsenault
3005d01057 R600: Increase nearby load scheduling threshold.
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.

llvm-svn: 214943
2014-08-06 00:29:49 +00:00
Matt Arsenault
68712599d9 R600/SI: Implement areLoadsFromSameBasePtr
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.

llvm-svn: 214942
2014-08-06 00:29:43 +00:00
Tom Stellard
e624513d2d R600/SI: Update MUBUF assembly string to match AMD proprietary compiler
llvm-svn: 214866
2014-08-05 14:48:12 +00:00
Tom Stellard
e589f0e5f7 R600/SI: Avoid generating REGISTER_LOAD instructions.
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.

llvm-svn: 214865
2014-08-05 14:40:52 +00:00
Matt Arsenault
90c2681a28 R600/SI: Fix extra whitespace in asm str
This slipped in in r214467, so something like

V_MOV_B32_e32  v0, ... is now printed with 2 spaces
between the instruction name and first operand.

llvm-svn: 214660
2014-08-03 05:27:14 +00:00
Matt Arsenault
7ac48e4d02 R600: Cleanup fneg tests
llvm-svn: 214612
2014-08-02 02:26:51 +00:00
Tom Stellard
2e31693e97 Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp"
This reverts commit r214566.

I did not mean to commit this yet.

llvm-svn: 214572
2014-08-01 21:55:50 +00:00
Tom Stellard
150fd6c318 R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.

llvm-svn: 214566
2014-08-01 21:50:47 +00:00
Matt Arsenault
27877bcbc0 R600: Cleanup test
Remove -CHECKs, use multiple prefixes, name values,
also test the @llvm.fabs version

llvm-svn: 214525
2014-08-01 17:00:29 +00:00
Tom Stellard
827479f1ca R600/SI: Do abs/neg folding with ComplexPatterns
Abs/neg folding has moved out of foldOperands and into the instruction
selection phase using complex patterns.  As a consequence of this
change, we now prefer to select the 64-bit encoding for most
instructions and the modifier operands have been dropped from
integer VOP3 instructions.

llvm-svn: 214467
2014-08-01 00:32:39 +00:00
Tom Stellard
313fcff563 R600/SI: Fold immediates when shrinking instructions
This will prevent us from using extra MOV instructions once we prefer
selecting 64-bit instructions.

llvm-svn: 214464
2014-08-01 00:32:33 +00:00
Tom Stellard
3ac3ae86a9 R600/SI: Fix incorrect commute operation in shrink instructions pass
We were commuting the instruction by still shrinking it using the
original opcode.

NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 214463
2014-08-01 00:32:28 +00:00
Jan Vesely
4439cb6485 R600: Modernize work item intrinsics test
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 214451
2014-07-31 22:11:03 +00:00
Matt Arsenault
0633beedfb R600: Modernize test
llvm-svn: 214108
2014-07-28 18:06:08 +00:00
Matt Arsenault
1c1d6d00fc R600/SI: Implement getOptimalMemOpType
The default guess uses i32. This needs an address space argument
to really do the right thing in all cases.

llvm-svn: 214104
2014-07-28 17:49:26 +00:00
Matt Arsenault
76c7b7a591 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Matt Arsenault
257404a154 R600/SI: Fix broken test.
There was no check prefix for the instruction lines.
Match what is emitted though, although I'm pretty sure it is
incorrect.

llvm-svn: 214035
2014-07-26 21:21:42 +00:00
Chandler Carruth
581bf74660 [SDAG] When performing post-legalize DAG combining, run the legalizer
over each node in the worklist prior to combining.

This allows the combiner to produce new nodes which need to go back
through legalization. This is particularly useful when generating
operands to target specific nodes in a post-legalize DAG combine where
the operands are significantly easier to express as pre-legalized
operations. My immediate use case will be PSHUFB formation where we need
to build a constant shuffle mask with a build_vector node.

This also refactors the relevant functionality in the legalizer to
support this, and updates relevant tests. I've spoken to the R600 folks
and these changes look like improvements to them. The avx512 change
needs to be investigated, I suspect there is a disagreement between the
legalizer and the DAG combiner there, but it seems a minor issue so
leaving it to be re-evaluated after this patch.

Differential Revision: http://reviews.llvm.org/D4564

llvm-svn: 214020
2014-07-26 05:49:40 +00:00
Chandler Carruth
8182e043d9 [SDAG] Introduce a combined set to the DAG combiner which tracks nodes
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.

This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.

This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).

I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.

With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).

Differential Revision: http://reviews.llvm.org/D4638

llvm-svn: 213898
2014-07-24 22:15:28 +00:00
Matt Arsenault
cfdaa492e4 R600: Add FMA instructions for Evergreen
llvm-svn: 213882
2014-07-24 17:41:01 +00:00
Matt Arsenault
0a2084b376 R600: Match rcp node on pre-SI
llvm-svn: 213844
2014-07-24 06:59:24 +00:00
Matt Arsenault
34661ee593 R600: Fix LowerSDIV24
Use ComputeNumSignBits instead of checking for i8 / i16 which only
worked when AMDIL was lying about having legal i8 / i16.

If an integer is known to fit in 24-bits, we can
do division faster with float ops.

llvm-svn: 213843
2014-07-24 06:59:20 +00:00
Chandler Carruth
908e62868c [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate
insertions.

The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).

This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.

A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.

I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.

Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.

Differential Revision: http://reviews.llvm.org/D4616

llvm-svn: 213727
2014-07-23 07:08:53 +00:00
Tom Stellard
7c4b3a94b6 R600/SI: Add instruction shrinking pass
This pass converts 64-bit instructions to 32-bit when possible.

llvm-svn: 213561
2014-07-21 16:55:33 +00:00
Tom Stellard
08b253cba1 R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code
There are a few more cleanups to do, but I ran into some problems
with ext loads and trunc stores, when I tried to change some of the
vector loads and stores from custom to legal, so I wasn't able to
get rid of everything.

llvm-svn: 213552
2014-07-21 15:45:06 +00:00
Tom Stellard
ed0ccca70d R600/SI: Use scratch memory for large private arrays
llvm-svn: 213551
2014-07-21 15:45:01 +00:00
Tom Stellard
5bfbb25d6b R600/SI: Store constant initializer data in constant memory
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.

This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.

llvm-svn: 213530
2014-07-21 14:01:14 +00:00
Tom Stellard
7f35eb40ab R600/SI: Use VALU for i1 XOR
llvm-svn: 213528
2014-07-21 14:01:10 +00:00
Matt Arsenault
a8f2abe691 R600: Add missing test for concat_vectors
llvm-svn: 213473
2014-07-20 07:13:17 +00:00
Matt Arsenault
2d097d5e02 R600/SI: Remove dead code and add missing tests.
This probably was killed by some generic DAGCombiner
improvements in checking the TargetBooleanContents instead
of just 1.

llvm-svn: 213471
2014-07-20 06:11:02 +00:00
Matt Arsenault
840d57e330 R600/SI: implement range reduction for sin/cos
These instructions can only take a limited input range, and return
the constant value 1 out of range. We should do range reduction to
be able to process arbitrary values. Use a FRACT instruction after
normalization to achieve this. Also add a test for constant folding
with the lowered code with unsafe-fp-math enabled.

v2: use DAG lowering instead of intrinsic, adapt test
v3: calculate constant, fold pattern into instruction definition
v4: misc style fixes, add sin-fold testcase, cosmetics

Patch by Grigori Goronzy

llvm-svn: 213458
2014-07-19 18:44:39 +00:00
Tim Northover
854fe649af R600: support fpext/fptrunc operations to and from f16.
llvm-svn: 213376
2014-07-18 13:01:37 +00:00
Tim Northover
e4c93c0798 CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added
when it's needed.  But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.

Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.

llvm-svn: 213372
2014-07-18 12:41:46 +00:00
Tim Northover
62ad6904d9 R600: rename misleading fp16 test.
This test is actually going in the opposite direction to what the
filename and function name suggested.

llvm-svn: 213358
2014-07-18 08:43:30 +00:00
Tim Northover
de7867151d R600: support f16 -> f64 conversion intrinsic.
Unfortunately, we don't seem to have a direct truncation, but the
extension can be legally split into two operations so we should
support that.

llvm-svn: 213357
2014-07-18 08:43:24 +00:00
Tim Northover
eae1f1c8cc CodeGen: extend f16 conversions to permit types > float.
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.

During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.

Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.

llvm-svn: 213248
2014-07-17 10:51:23 +00:00
Matt Arsenault
15eb0d54b0 R600/SI: Allow using f32 rcp / rsq when denormals not handled.
These are precise enough to use for OpenCL unless denormals
are handled.

llvm-svn: 213107
2014-07-15 23:50:10 +00:00
Matt Arsenault
c093eee935 R600/SI: Fix select on i1
llvm-svn: 213096
2014-07-15 21:44:37 +00:00
Matt Arsenault
1ceb5e82c1 R600/SI: Implement less wrong f32 fdiv
Assuming single precision denormals and accurate sqrt/div are not
reported, this passes the OpenCL conformance test.

llvm-svn: 213089
2014-07-15 20:18:31 +00:00
Jan Vesely
aa9875787e R600: Implement zero undef variants of ctlz/cttz
v2: use ffbh/l if available
v3: Rebase on top of Matt's SI patches

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 213072
2014-07-15 15:51:09 +00:00
Matt Arsenault
211ccabffb R600: Add dag combine for copy of an illegal type.
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.

llvm-svn: 213031
2014-07-15 02:06:31 +00:00
Matt Arsenault
24911cb984 R600: Add denormal handling subtarget features.
llvm-svn: 213018
2014-07-14 23:40:49 +00:00
Matt Arsenault
62262e12fa R600/SI: Default to no single precision denormals.
llvm-svn: 213017
2014-07-14 23:40:43 +00:00
Matt Arsenault
f6ce1be587 R600: Run more tests with promote alloca disabled.
Re-run tests changed in r211110 to test both paths.
Also fix broken check line.

llvm-svn: 212895
2014-07-13 02:46:17 +00:00
Matt Arsenault
fd221a07bc R600: Run private-memory test with and without alloca promote
The unpromoted path still needs to be tested since we can't
always promote to using LDS.

llvm-svn: 212894
2014-07-13 02:18:06 +00:00
Matt Arsenault
31e0179e8a R600: Add missing tests for some intrinsics
llvm-svn: 212870
2014-07-12 00:36:19 +00:00
Marek Olsak
7757b563f1 R600/SI: Use i32 vectors for resources and samplers
This affects new intrinsics only.

What surprises me is that v32i8 still works.

llvm-svn: 212831
2014-07-11 17:11:52 +00:00
Marek Olsak
6db1789f95 R600/SI: add sample and image intrinsics exposing all instruction fields
We need the intrinsics with offsets, so why not just add them all.
The R128 parameter will also be useful for reducing SGPR usage.
GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent",
so Mesa will probably translate those to slc, glc, etc.

When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics.

llvm-svn: 212830
2014-07-11 17:11:46 +00:00
Jan Vesely
f405b95fd6 R600: Implement float to long/ulong
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering

v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
    add comment about using FP_TO_SINT for uints

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 212773
2014-07-10 22:40:21 +00:00
Matt Arsenault
49a604853d Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.""
Don't try to convert the select condition type.

llvm-svn: 212750
2014-07-10 18:21:04 +00:00
NAKAMURA Takumi
ea850a0edc Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."
This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations.

llvm-svn: 212708
2014-07-10 11:37:28 +00:00
Matt Arsenault
834bcebaa6 R600/SI: Add support for llvm.convert.{to|from}.fp16
llvm-svn: 212676
2014-07-10 03:22:20 +00:00
Matt Arsenault
479a1f90e1 Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.
Do this if the truncate is free and the select is legal.

llvm-svn: 212640
2014-07-09 19:12:07 +00:00
Matt Arsenault
f39a57f581 R600: Fix mishandling of load / store chains.
Fixes various bugs with reordering loads and stores.
Scalarized vector loads weren't collecting the chains
at all.

llvm-svn: 212473
2014-07-07 18:34:45 +00:00
Tom Stellard
209c137768 R600: Promote i64 loads to v2i32
llvm-svn: 212216
2014-07-02 20:53:54 +00:00
Matt Arsenault
000617e7b3 Revert "Temporary hack to try cleaning extra .s file from bots."
llvm-svn: 211967
2014-06-27 23:11:26 +00:00
Matt Arsenault
20ec742a75 Temporary hack to try cleaning extra .s file from bots.
llvm-svn: 211963
2014-06-27 21:43:50 +00:00
David Blaikie
06bae5c2fe Fix test so it doesn't try to write out temporary files into the test tree.
llvm-svn: 211916
2014-06-27 17:45:43 +00:00
Matt Arsenault
128df7aaf1 R600: Don't crash on unhandled instruction in promote alloca
llvm-svn: 211906
2014-06-27 16:52:49 +00:00
Matt Arsenault
19ebc28a85 R600: Add some testcases for promote alloca pass.
More complicated GEPs are skipped. Add some tests to
actually stress this skipping.

llvm-svn: 211859
2014-06-27 03:55:55 +00:00
Matt Arsenault
c2d6bb970a R600/SI: Add FP mode bits to binary.
The default rounding mode to initialize the mode register needs
to be reported to the runtime. Fill in other bits a kernel
may be interested in setting for future use.

llvm-svn: 211791
2014-06-26 17:22:30 +00:00
Matt Arsenault
435a7c1256 R600: Fix vector FMA
llvm-svn: 211757
2014-06-26 01:28:05 +00:00
Tom Stellard
840992bb71 R600: Promote i64 stores to v2i32
Now we need only one 64-bit pattern for stores.

llvm-svn: 211643
2014-06-24 23:33:04 +00:00
Matt Arsenault
37d6d91b5b R600: Fix inconsistency in rsq instructions.
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.

It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.

llvm-svn: 211637
2014-06-24 22:13:39 +00:00
Matt Arsenault
0cbe5b2357 R600/SI: Fix div_scale intrinsic.
The operand that must match one of the others does matter,
and implement selecting for it.

llvm-svn: 211523
2014-06-23 18:28:28 +00:00
Matt Arsenault
68285ece4f R600: Move add/sub with overflow out of AMDILISelLowering
Add more tests for these.

llvm-svn: 211517
2014-06-23 18:00:49 +00:00
Matt Arsenault
b319678001 R600/SI: Handle i64 sub.
We can handle it the same way as add

llvm-svn: 211514
2014-06-23 18:00:38 +00:00
Jan Vesely
fd4ac4d9f5 R600: Add udivrem test
v2: move < %s to the end of the line
    space after ;
    add v4i32 test

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211476
2014-06-22 21:42:58 +00:00
Tom Stellard
ae7faa387d R600/SI: Add patterns for ctpop inside a branch
llvm-svn: 211378
2014-06-20 17:06:11 +00:00
Tom Stellard
2d56aec1cd R600/SI: Add a pattern for f32 ftrunc
llvm-svn: 211377
2014-06-20 17:06:09 +00:00
Tom Stellard
0c64d4a322 R600: Expand vector flog2
llvm-svn: 211376
2014-06-20 17:06:07 +00:00
Tom Stellard
f2ad4b6bb8 R600: Expand vector fexp2
llvm-svn: 211375
2014-06-20 17:06:05 +00:00
Tom Stellard
d4aa49ad5e R600/SI: Add a VALU pattern for i64 xor
llvm-svn: 211373
2014-06-20 17:05:57 +00:00
Matt Arsenault
47b0791075 R600: Add a few tests I forgot to add.
These belong with r210827

llvm-svn: 211253
2014-06-19 04:24:43 +00:00
Matt Arsenault
b82983ef6a R600/SI: Add intrinsics for various math instructions.
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.

llvm-svn: 211247
2014-06-19 01:19:19 +00:00
Matt Arsenault
c86884a54a R600: Handle fnearbyint
The difference from rint isn't really relevant here,
so treat them as equivalent. OpenCL doesn't have nearbyint,
so this is sort of pointless other than for completeness.

llvm-svn: 211229
2014-06-18 22:03:45 +00:00
Marek Olsak
d80d0ca951 R600/SI: add gather4 and getlod intrinsics (v3)
This contains all the previous patches + getlod support on top of it.
It doesn't use SDNodes anymore, so it's quite small.
It also adds v16i8 to SReg_128, which is used for the sampler descriptor.

Reviewed-by: Tom Stellard
llvm-svn: 211228
2014-06-18 22:00:29 +00:00
Jan Vesely
3b8464bc4e R600: Expand vector fceil
Move fp64 fceil tests to fceil64.ll

v2: rebase

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211194
2014-06-18 17:57:29 +00:00
Matt Arsenault
a46ba4c9d1 R600/SI: Add intrinsics for brev instructions
llvm-svn: 211187
2014-06-18 17:13:57 +00:00
Matt Arsenault
48848ba546 R600/SI: Prettier operand printing for 64-bit ops.
Copy what is done for 32-bit already so the order is about the same.

llvm-svn: 211186
2014-06-18 17:13:51 +00:00
Matt Arsenault
068d030935 R600: Implement f64 ftrunc, ffloor and fceil.
CI has instructions for these, so this fixes them for older hardware.

llvm-svn: 211183
2014-06-18 17:05:30 +00:00
Matt Arsenault
77f7e6fc35 R600: Custom lower f64 frint for pre-CI
llvm-svn: 211182
2014-06-18 17:05:26 +00:00
Jan Vesely
79097fa008 R600: Implement 64bit SRA
v2: Use capitalized variable name

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211159
2014-06-18 12:27:17 +00:00
Jan Vesely
e8b0cf21f9 R600: Implement 64bit SRL
v2: use C++ style comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211158
2014-06-18 12:27:15 +00:00
Jan Vesely
945809b390 R600: Implement 64bit SHL
v2: Use c++ style comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211157
2014-06-18 12:27:13 +00:00
Matt Arsenault
508925b13a R600/SI: Match cttz_zero_undef
llvm-svn: 211116
2014-06-17 17:36:27 +00:00
Matt Arsenault
71fb43e88e R600/SI: Match ctlz_zero_undef
llvm-svn: 211115
2014-06-17 17:36:24 +00:00
Tom Stellard
a529beed9c R600: Use LDS and vectors for private memory
llvm-svn: 211110
2014-06-17 16:53:14 +00:00
Tom Stellard
6987d06184 SelectionDAG: Expand i64 = FP_TO_SINT i32
llvm-svn: 211108
2014-06-17 16:53:07 +00:00
Matt Arsenault
fafd3cb5a2 R600: Add a rotr testcase I forgot to add
llvm-svn: 211002
2014-06-15 21:09:00 +00:00
Matt Arsenault
a88eef222c R600: Remove a few more things from AMDILISelLowering
Try to keep all the setOperationActions for integer ops
together.

llvm-svn: 211001
2014-06-15 21:08:58 +00:00
Matt Arsenault
1f47d520f5 R600: Fix assert on vector sdiv
llvm-svn: 211000
2014-06-15 21:08:54 +00:00
Matt Arsenault
6f5ac69231 R600: Report that integer division is expensive.
Divides by weird constants now emit much better code.

llvm-svn: 210995
2014-06-15 19:48:16 +00:00
NAKAMURA Takumi
a6ff4c4e16 Don't expect tests always crashing. Add "REQUIRES:asserts".
llvm-svn: 210983
2014-06-15 01:01:11 +00:00
Matt Arsenault
5f7306c2c6 R600: Add failing testcases.
These are reduced from assert in the
OpenCV CvtColor8u.BGR5652GRAY test.

llvm-svn: 210969
2014-06-14 04:26:09 +00:00
Matt Arsenault
b2c8575d08 R600: Fix asserts related to constant initializers
This would assert if a constant address space was extern
and therefore didn't have an initializer. If the initializer
was undef, it would hit the unreachable unhandled initializer case.

An extern global should never really occur since we don't have
machine linking, but bugpoint likes to remove initializers.

llvm-svn: 210967
2014-06-14 04:26:05 +00:00
Tim Northover
b9ec29d7c5 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

llvm-svn: 210903
2014-06-13 14:24:07 +00:00
Matt Arsenault
e8c6185eba R600/SI: Fix selection error on i64 rotl / rotr.
Evergreen is still broken due to missing shl_parts.

llvm-svn: 210885
2014-06-13 04:00:30 +00:00
Matt Arsenault
e19ddbd0dc R600: Mostly remove remaining AMDIL intrinsics.
Delete all unused ones, and add new AMDGPU named intrinsics for
the ones that are. Handle the old AMDIL names for comptability (although
remove their GCCBuiltin names) and add tests since there weren't any
for these before.

llvm-svn: 210827
2014-06-12 21:15:44 +00:00
Tom Stellard
5f887c9493 Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors"
This reverts commit r210540, adds a testcase for the regression it
caused, and marks the R600 test it was supposed to fix as XFAIL.

llvm-svn: 210792
2014-06-12 16:04:47 +00:00
Matt Arsenault
2c82883c7a R600/SI: Use a register set to -1 for data0 on ds_inc*/ds_dec*
There is not such thing as a 0-data ds instruction, and the data
operand needs to be a vgpr set to something meaningful.

llvm-svn: 210756
2014-06-12 08:21:54 +00:00
Matt Arsenault
b6b9bb5978 R600/SI: Fix bitcast between v2i32 and f64
This is the same problem fixed in r210664 for more types.

The test passes without this fix. For some reason
I'm only hitting this when creating selects lowered
to v2i32 selects.

llvm-svn: 210692
2014-06-11 19:31:13 +00:00
Matt Arsenault
33d239c3a3 R600/SI: Add common 64-bit LDS atomics
llvm-svn: 210680
2014-06-11 18:08:54 +00:00
Matt Arsenault
ffa54309ef R600/SI: Add 32-bit LDS atomic cmpxchg
llvm-svn: 210678
2014-06-11 18:08:48 +00:00
Matt Arsenault
5784f403fd R600/SI: Use LDS atomic inc / dec
llvm-svn: 210677
2014-06-11 18:08:45 +00:00
Matt Arsenault
3067074b27 R600/SI: Add other LDS atomic operations
llvm-svn: 210676
2014-06-11 18:08:42 +00:00
Matt Arsenault
e633c2eeb6 R600/SI: Fix backwards names for local atomic instructions.
The manual lists them as *_RTN_U32, not *_U32_RTN, which is more
consistent with how every other sized instruction is named.

llvm-svn: 210674
2014-06-11 18:08:37 +00:00
Matt Arsenault
174782d2e9 R600/SI: Refactor local atomics.
Use patterns that will also match the immediate offset to
match the normal read / writes.

llvm-svn: 210673
2014-06-11 18:08:34 +00:00
Matt Arsenault
a75d166beb R600/SI: Use v_cvt_f32_ubyte* instructions
This eliminates extra extract instructions when loading an i8 vector to
a float vector.

llvm-svn: 210666
2014-06-11 17:50:44 +00:00
Matt Arsenault
52e8d9b98d R600/SI: Fix selection failure on scalar_to_vector
There seem to be only 2 places that produce these,
and it's kind of tricky to hit them.

Also fixes failure to bitcast between i64 and v2f32,
although this for some reason wasn't actually broken in the
simple bitcast testcase, but did in the scalar_to_vector one.

llvm-svn: 210664
2014-06-11 17:40:32 +00:00
Matt Arsenault
4f96643a42 R600: Use BCNT_INT for evergreen
llvm-svn: 210569
2014-06-10 19:18:28 +00:00
Matt Arsenault
6387e9a3dc R600/SI: Implement i64 ctpop
llvm-svn: 210568
2014-06-10 19:18:24 +00:00
Matt Arsenault
8407076508 R600/SI: Use bcnt instruction for ctpop
llvm-svn: 210567
2014-06-10 19:18:21 +00:00
Matt Arsenault
d30b483e1a R600: Handle fcopysign
llvm-svn: 210564
2014-06-10 19:00:20 +00:00
Matt Arsenault
5bfef73e00 R600/SI: Handle sign_extend and zero_extend to i64 with patterns.
llvm-svn: 210563
2014-06-10 18:54:59 +00:00
Tom Stellard
aab1db4cd9 SelectionDAG: Expand SELECT_CC to SELECT + SETCC
This consolidates code from the Hexagon, R600, and XCore targets.

No functionality change intended.

llvm-svn: 210539
2014-06-10 16:01:22 +00:00
Alp Toker
03b6e12fae Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Matt Arsenault
c9f3bd4d6c R600/SI: Keep 64-bit not on SALU
llvm-svn: 210476
2014-06-09 16:36:31 +00:00
Matt Arsenault
a34a3c834c R600: Fix selection failure for vector bswap
llvm-svn: 210475
2014-06-09 16:20:25 +00:00
Matt Arsenault
232bf82e54 R600: Add more and testcases
llvm-svn: 210453
2014-06-09 08:36:53 +00:00
Rafael Espindola
e5f71f18e0 Allow aliases to be unnamed_addr.
Alias with unnamed_addr were in a strange state. It is stored in GlobalValue,
the language reference talks about "unnamed_addr aliases" but the verifier
was rejecting them.

It seems natural to allow unnamed_addr in aliases:

* It is a property of how it is accessed, not of the data itself.
* It is perfectly possible to write code that depends on the address
of an alias.

This patch then makes unname_addr legal for aliases. One side effect is that
the syntax changes for a corner case: In globals, unnamed_addr is now printed
before the address space.

llvm-svn: 210302
2014-06-06 01:20:28 +00:00
Matt Arsenault
700a8a731b R600: Fix test. Using wrong check prefix.
llvm-svn: 210244
2014-06-05 08:00:36 +00:00
Matt Arsenault
9e400b2e26 R600/SI: Match rsq instructions
llvm-svn: 210226
2014-06-05 00:15:55 +00:00
Matt Arsenault
ff3cea9ab5 R600/SI: Fix [s|u]int_to_fp for i1
llvm-svn: 209971
2014-05-31 06:47:42 +00:00
Matt Arsenault
bfc007dbb5 R600: Try to convert BFE back to standard bit ops when possible.
This allows existing DAG combines to work on them, and then
we can re-match to BFE if necessary during instruction selection.

llvm-svn: 209462
2014-05-22 18:09:12 +00:00
Matt Arsenault
90d0fd2ea0 R600: Add dag combine for BFE
llvm-svn: 209461
2014-05-22 18:09:07 +00:00
Matt Arsenault
4ab9246e99 R600: Implement ComputeNumSignBitsForTargetNode for BFE
llvm-svn: 209460
2014-05-22 18:09:03 +00:00
Matt Arsenault
c7d0679684 R600: Expand mul24 for GPUs without it
llvm-svn: 209458
2014-05-22 18:00:24 +00:00
Matt Arsenault
fcb6cf68ee R600: Expand mad24 for GPUs without it
llvm-svn: 209457
2014-05-22 18:00:20 +00:00
Matt Arsenault
e43426533f R600: Add intrinsics for mad24
llvm-svn: 209456
2014-05-22 18:00:15 +00:00
Matt Arsenault
cec6ae49e8 R600/SI: Match fp_to_uint / uint_to_fp for f64
llvm-svn: 209388
2014-05-22 03:20:30 +00:00
Matt Arsenault
094f9f1e9c R600: Partially fix constant initializers for structs and vectors.
This should extend the current workaround to work with structs
that only contain legal, scalar types.

llvm-svn: 209331
2014-05-21 22:42:42 +00:00
Matt Arsenault
dc960fcadc R600: Add failing testcases for constant initializers.
Constant initializers involving illegal types hit an assertion.

Patch by: Jan Vesely <jan.vesely@rutgers.edu>

llvm-svn: 209330
2014-05-21 22:42:38 +00:00
Tom Stellard
2022c1eb1b R600/SI: Promote f32 SELECT to i32
llvm-svn: 209024
2014-05-16 20:56:41 +00:00
Tom Stellard
77051e93a5 R600/SI: Only use SALU instructions for 64-bit add in a block of CF depth 0
llvm-svn: 208886
2014-05-15 14:41:54 +00:00
Tom Stellard
efb8470c62 R600/SI: Use VALU instructions for i1 ops
llvm-svn: 208885
2014-05-15 14:41:50 +00:00
Jay Foad
e0eac700cb Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Matt Arsenault
102b7be363 R600/SI: Try to fix BFE operands when moving to VALU
This was broken by r208479

llvm-svn: 208740
2014-05-13 23:45:50 +00:00
Matt Arsenault
c2251d492b R600: Add mul24 intrinsics
llvm-svn: 208604
2014-05-12 17:49:57 +00:00
Matt Arsenault
43171f4aad Make SimplifyDemandedBits understand BUILD_PAIR
llvm-svn: 208598
2014-05-12 17:14:48 +00:00
Vincent Lejeune
03352d8b38 R600/SI: Fold fabs/fneg into src input modifier
llvm-svn: 208480
2014-05-10 19:18:39 +00:00
Vincent Lejeune
840594f1e6 R600/SI: Prettier display of input modifiers
llvm-svn: 208479
2014-05-10 19:18:33 +00:00
Tom Stellard
1a986ede50 R600/SI: Teach SIInstrInfo::moveToVALU() how to move S_LOAD_*_IMM instructions
llvm-svn: 208432
2014-05-09 16:42:22 +00:00
Tom Stellard
550d79da42 R600/SI: Fix SMRD pattern for offsets > 32 bits
We were dropping the high bits of 64-bit immediate offsets.

llvm-svn: 208431
2014-05-09 16:42:21 +00:00
Tom Stellard
9562e8a6ba R600: Expand i64 SELECT_CC
llvm-svn: 208430
2014-05-09 16:42:19 +00:00
Tom Stellard
83d3208148 R600: Move MIN/MAX matching from LowerOperation() to PerformDAGCombine()
llvm-svn: 208429
2014-05-09 16:42:16 +00:00
Tom Stellard
1291d8f8c2 R600: Expand i64 ISD:SUB
llvm-svn: 208005
2014-05-05 21:47:15 +00:00
Tom Stellard
18ca382db4 R600: Expand vector sin and cos.
v2: move code to AMDGPUISelLowering.cpp
    squash with tests (both EG and SI)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 207845
2014-05-02 15:41:47 +00:00
Tom Stellard
05e86018ff R600: Expand TruncStore i64 -> {i16,i8}
llvm-svn: 207844
2014-05-02 15:41:46 +00:00
Matt Arsenault
293918660a R600/SI: Fix verifier error with pseudo store instructions.
Use i32 instead of specifying SReg_32. When this is
the pseudo INDIRECT_BASE_ADDR, this would give a bogus
verifier error.

llvm-svn: 207770
2014-05-01 16:37:52 +00:00
Tom Stellard
690d34daa4 R600/SI: Use VALU instructions for copying i1 values
We can't use SALU instructions for this since they ignore the EXEC mask
and are always executed.

This fixes several OpenCV tests.

llvm-svn: 207661
2014-04-30 15:31:33 +00:00
Tom Stellard
5a08396499 R600/SI: Teach moveToVALU how to handle some SMRD instructions
llvm-svn: 207660
2014-04-30 15:31:29 +00:00
Tom Stellard
c58951c37c R600/SI: Custom lower SI_IF and SI_ELSE to avoid machine verifier errors
SI_IF and SI_ELSE are terminators which also produce a value.  For
these instructions ISel always inserts a COPY to move their value
to another basic block.  This COPY ends up between SI_(IF|ELSE)
and the S_BRANCH* instruction at the end of the block.

This breaks MachineBasicBlock::getFirstTerminator() and also the
machine verifier which assumes that terminators are grouped together at
the end of blocks.

To solve this we coalesce the copy away right after ISel to make sure
there are no instructions in between terminators at the end of blocks.

llvm-svn: 207591
2014-04-29 23:12:53 +00:00
Tom Stellard
26a1c5b403 R600/SI: Only select SALU instructions in the entry or exit block
SALU instructions ignore control flow, so it is not always safe to use
them within branches.  This is a partial solution to this problem
until we can come up with something better.

llvm-svn: 207590
2014-04-29 23:12:48 +00:00
Tom Stellard
9112e301ba R600: optimize the UDIVREM 64 algorithm
This is a squash of several optimization commits:
 - calculate DIV_Lo and DIV_Hi separately
 - use BFE_U32 if we are operating on 32bit values
 - use precomputed constants instead of shifting in UDVIREM
 - skip the first 32 iterations of udivrem

v2: Check whether BFE is supported before using it

Patch by: Jan Vesely

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207589
2014-04-29 23:12:46 +00:00
Matt Arsenault
e34c30a3c0 R600: Add a test that used to be broken that I forgot to add
llvm-svn: 207017
2014-04-23 19:45:05 +00:00
Matt Arsenault
f022fe68e4 R600: Emit error instead of unreachable on function call
llvm-svn: 206904
2014-04-22 16:42:00 +00:00
Matt Arsenault
01a0b32658 R600: Make sign_extend_inreg legal.
Don't know why I didn't just do this in the first place.

llvm-svn: 206862
2014-04-22 03:49:30 +00:00
Matt Arsenault
6b6f53eaec R600/SI: Try to use scalar BFE.
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.

llvm-svn: 206558
2014-04-18 05:19:26 +00:00
Matt Arsenault
42cf57d738 R600/SI: Match sign_extend_inreg to s_sext_i32_i8 and s_sext_i32_i16
llvm-svn: 206547
2014-04-18 01:53:18 +00:00
Tom Stellard
59f91bb185 R600/SI: Use SReg_64 instead of VSrc_64 when selecting BUILD_PAIR
llvm-svn: 206541
2014-04-18 00:36:21 +00:00
Tom Stellard
50135a875d R600/SI: Stop using i128 as the resource descriptor type
Having i128 as a legal type complicates the legalization phase.  v4i32
is already a legal type, so we will use that instead.

This fixes several piglit tests.

llvm-svn: 206500
2014-04-17 21:00:11 +00:00
Matt Arsenault
628cc59d6b R600/SI: f64 frint is legal on CI
llvm-svn: 206475
2014-04-17 17:06:37 +00:00
Matt Arsenault
adccea7f1a R600/SI: Fix zext from i1 to i64
llvm-svn: 206437
2014-04-17 02:03:08 +00:00
Matt Arsenault
f7ba7017b0 R600: Extend r600 sign_extend_inreg tests for EG
Patch by: Jan Vesely <jan.vesely@rutgers.edu>

llvm-svn: 206349
2014-04-16 01:41:34 +00:00
Matt Arsenault
f045301bf1 R600/SI: Print more immediates in hex format
Print in decimal for inline immediates, and hex otherwise. Use hex
always for offsets in addressing offsets.

This approximately matches what the shader compiler does.

llvm-svn: 206335
2014-04-15 22:32:49 +00:00
Matt Arsenault
a43cbe5951 R600/SI: Fix loads of i1
llvm-svn: 206330
2014-04-15 22:28:39 +00:00
Tom Stellard
9ea60803c4 SelectionDAG: Use helper function to improve legalization of ISD::MUL
The TargetLowering::expandMUL() helper contains lowering code extracted
from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more
ISD::MUL patterns without having to use a library call.

llvm-svn: 206037
2014-04-11 16:12:01 +00:00
Matt Arsenault
ffd08a2504 R600/SI: Match not instruction.
llvm-svn: 205837
2014-04-09 07:16:16 +00:00
Tom Stellard
5e0d95bd87 R600/SI: Handle INSERT_SUBREG in SIFixSGPRCopies
llvm-svn: 205732
2014-04-07 19:45:45 +00:00
Tom Stellard
557024a30d R600: Match 24-bit arithmetic patterns in a Target DAGCombine
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.

This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched.  This occasionally
resulted in some instructions being incorrectly deleted from the
program.

v2:
  - Fix bug with 64-bit mul

llvm-svn: 205731
2014-04-07 19:45:41 +00:00
Tom Stellard
981d7d5d7e R600: Correct opcode for BFE_INT
Acording to AMD documentation, the correct opcode for
BFE_INT is 0x5, not 0x4

Fixes Arithm/Absdiff.Mat/3 OpenCV test

Patch by: Bruno Jiménez

llvm-svn: 205562
2014-04-03 20:19:29 +00:00
Tom Stellard
76577a21a1 R600/SI: Lower 64-bit immediates using REG_SEQUENCE
llvm-svn: 205561
2014-04-03 20:19:27 +00:00
Tom Stellard
7fed4dd0dd TargetLibraryInfo: Disable memcpy and memset on R600
There are no implementations of these for R600.

llvm-svn: 205455
2014-04-02 19:53:29 +00:00
Matt Arsenault
df61dc156f Fix missing RUN line in test
llvm-svn: 205341
2014-04-01 18:34:13 +00:00
Matt Arsenault
0062eb7871 Make isSetCCEquivalent respect the TargetBooleanContents
llvm-svn: 205336
2014-04-01 18:13:26 +00:00
Matt Arsenault
c36c1df67d R600: Compute masked bits for min and max
llvm-svn: 205242
2014-03-31 19:35:33 +00:00
Matt Arsenault
0d30a17857 R600: Add BFE, BFI, and BFM intrinsics to help with writing tests.
llvm-svn: 205236
2014-03-31 18:21:18 +00:00
Tom Stellard
c6c05561d5 R600/SI: Lower i64 SELECT by bitcasting to a vector type
This allows allows us to replace ISD::EXTRACT_ELEMENT, which is lowered
using shifts, with ISD::EXTRACT_VECTOR_ELT, which is a no-op.

llvm-svn: 205187
2014-03-31 14:01:55 +00:00
Matt Arsenault
7f99777a74 R600: Implement isZExtFree.
This allows 64-bit operations that are truncated to be reduced
to 32-bit ones.

llvm-svn: 204946
2014-03-27 17:23:31 +00:00
Matt Arsenault
e42a0c31f3 R600/SI: Fix unreachable with a sext_in_reg to an illegal type.
llvm-svn: 204945
2014-03-27 17:23:24 +00:00
Matt Arsenault
97718f1b49 R600: Add a testcase for sext_in_reg I missed.
This sext_inreg i32 in i64 case was already handled, but not enabled.

llvm-svn: 204840
2014-03-26 18:31:06 +00:00
Matt Arsenault
a88c889ce0 R600: Add failing testcase for <3 x i32> stores.
This is supposed to have the same store size and alignment as <4 x i32>,
but currently is split into a 64-bit and 32-bit store.

llvm-svn: 204729
2014-03-25 16:50:55 +00:00
Matt Arsenault
94cdf74a4b R600/SI: Fix extra mov from legalizing 64-bit SALU ops.
Check the register class of each operand individually
to avoid an extra copy to a vgpr.

llvm-svn: 204662
2014-03-24 20:08:13 +00:00
Matt Arsenault
3436234471 R600/SI: Sub-optimial fix for 64-bit immediates with SALU ops.
No longer asserts, but now you get moves loading legal immediates
into the split 32-bit operations.

llvm-svn: 204661
2014-03-24 20:08:09 +00:00
Matt Arsenault
ed12a24627 R600/SI: Fix 64-bit bit ops that require the VALU.
Try to match scalar and first like the other instructions.
Expand 64-bit ands to a pair of 32-bit ands since that is not
available on the VALU.

llvm-svn: 204660
2014-03-24 20:08:05 +00:00
Matt Arsenault
7ae7f52221 R600: Implement isNarrowingProfitable.
llvm-svn: 204658
2014-03-24 19:43:31 +00:00
Matt Arsenault
e063f39ed3 R600/SI: Fix 64-bit private loads.
llvm-svn: 204630
2014-03-24 17:50:46 +00:00
Matt Arsenault
f0af6362fd R600/SI: Move instruction patterns to scalar versions.
Some of them also had the pattern on both, so this removes the
duplication.

llvm-svn: 204492
2014-03-21 18:01:18 +00:00
Tom Stellard
e5e3293278 R600/SI: Handle MUBUF instructions in SIInstrInfo::moveToVALU()
llvm-svn: 204476
2014-03-21 15:51:57 +00:00
Tom Stellard
8078855521 R600/SI: Handle S_MOV_B64 in SIInstrInfo::moveToVALU()
llvm-svn: 204475
2014-03-21 15:51:54 +00:00
Matt Arsenault
a604a1a412 R600/SI: Add support for 64-bit LDS writes
llvm-svn: 204274
2014-03-19 22:19:54 +00:00
Matt Arsenault
35f86bd433 R600/SI: Add support for 64-bit LDS loads.
v2:
  -Use correct opcode for DS_READ_64

llvm-svn: 204273
2014-03-19 22:19:52 +00:00
Matt Arsenault
38344ebbaf R600/SI: Match i16 immediate offset of LDS instructions.
llvm-svn: 204272
2014-03-19 22:19:49 +00:00
Matt Arsenault
194b9e9539 R600/SI: Fix test checking wrong instruction operand.
The source and destination happen to be the same register.

llvm-svn: 204271
2014-03-19 22:19:45 +00:00
Matt Arsenault
45311f1864 R600/SI: Don't display the GDS bit.
It isn't actually used now, and probably never will be, plus it makes
tests less annoying. I also think SC prints GDS instructions as a
separate instruction name.

llvm-svn: 204270
2014-03-19 22:19:43 +00:00
NAKAMURA Takumi
a0deabb112 CodeGen/R600/v_cndmask.ll: Relax an expression to unbreak msvcrt.
V_CNDMASK_B32_e64 v0, v0, -1.#QNAN0e+00, s[2:3], 0, 0, 0, 0

FIXME: We really need to implement our formatter...
llvm-svn: 204118
2014-03-18 06:17:22 +00:00
Kevin Enderby
b8221f3c03 Making a guess to fix the test case with r204056 to get the build bot working.
llvm-svn: 204073
2014-03-17 19:00:03 +00:00
Matt Arsenault
553297669c R600: Match sign_extend_inreg to BFE instructions
llvm-svn: 204072
2014-03-17 18:58:11 +00:00
Tom Stellard
6f60ceca31 R600/SI: Fix implementation of isInlineConstant() used by the verifier
The type of the immediates should not matter as long as the encoding is
equivalent to the encoding of one of the legal inline constants.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 204056
2014-03-17 17:03:52 +00:00
Tom Stellard
6b4e505e41 R600/SI: Use correct dest register class for V_READFIRSTLANE_B32
This instructions writes to an 32-bit SGPR.  This change required adding
the 32-bit VCC_LO and VCC_HI registers, because the full VCC register
is 64 bits.

This fixes verifier errors on several of the indirect addressing piglit
tests.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 204055
2014-03-17 17:03:51 +00:00
Tom Stellard
c33b600343 R600: LDS instructions shouldn't implicitly define OQAP
LDS instructions are pseudo instructions which model
the OQAP defs and uses within a single instruction.

This fixes a hang in the opencv MedianFilter tests.

llvm-svn: 203818
2014-03-13 17:13:04 +00:00
Matt Arsenault
469ede65b2 R600: Fix trunc store from i64 to i1
llvm-svn: 203695
2014-03-12 18:45:52 +00:00
Tom Stellard
bee4678d48 R600/SI: Using SGPRs is illegal for instructions that read carry-out from VCC
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203281
2014-03-07 20:12:39 +00:00
Tom Stellard
230af572ff R600/SI: Custom lower i1 stores
These are sometimes created by the shrink to boolean optimization in the
globalopt pass.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203280
2014-03-07 20:12:33 +00:00
Matt Arsenault
8140d7d370 R600: Fix extloads from i8 / i16 to i64.
This appears to only be working for global loads. Private
and local break for other reasons.

llvm-svn: 203135
2014-03-06 17:34:12 +00:00
Matt Arsenault
f68a94e609 R600/SI: Expand selects on vectors.
llvm-svn: 203134
2014-03-06 17:34:03 +00:00
Matt Arsenault
394a9d104d R600: Add failing control flow tests.
Simple cases hit a variety of problems at -O0.

llvm-svn: 202601
2014-03-01 21:45:41 +00:00
Tom Stellard
6280afdecd R600/SI: Expand all v16[if]32 operations
llvm-svn: 202543
2014-02-28 21:36:37 +00:00
Michel Danzer
8edacce1de R600/SI: Optimize SI_KILL for constant operands
If the SI_KILL operand is constant, we can either clear the exec mask if
the operand is negative, or do nothing otherwise.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202337
2014-02-27 01:47:09 +00:00
Michel Danzer
0ddce64f7c R600/SI: Allow SI_KILL for geometry shaders
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202336
2014-02-27 01:47:02 +00:00
Tom Stellard
3dafad8efc R600/SI: Custom select 64-bit ADD
llvm-svn: 202194
2014-02-25 21:36:18 +00:00
Matt Arsenault
a3de4dc001 R600/SI - Add new CI arithmetic instructions.
Does not yet include larger part required
to match v_mad_i64_i32 / v_mad_u64_u32.

llvm-svn: 202077
2014-02-24 21:01:28 +00:00
Quentin Colombet
5c6ea83f97 [CodeGenPrepare] Fix the check of the legality of an instruction.
The API expects an ISD opcode, not an IR opcode.
Fixes a regression for R600.

Related to <rdar://problem/15519855>.

llvm-svn: 201923
2014-02-22 01:06:41 +00:00
Nico Rieck
f3b62a4af6 Fix more broken CHECK lines
llvm-svn: 201493
2014-02-16 13:28:39 +00:00
Quentin Colombet
5700bbac29 [CodeGenPrepare][AddressingModeMatcher] Give up on type promotion if the
transformation does not bring any immediate benefits and introduce an illegal
operation. 

llvm-svn: 201439
2014-02-14 22:23:22 +00:00
Tom Stellard
a3a801780f TargetLowering: n * r where n > 2 should be an illegal addressing mode
llvm-svn: 201433
2014-02-14 21:10:34 +00:00
Tom Stellard
988925aeae R600/SI: Expand all v8[if]32 operations
llvm-svn: 201371
2014-02-13 23:34:15 +00:00
Tom Stellard
309a624102 R600/SI: Add a pattern for i32 anyext
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 201370
2014-02-13 23:34:13 +00:00
Tom Stellard
4b0c3551df R600/SI: Completely Disable TypeRewriter on compute
llvm-svn: 201369
2014-02-13 23:34:12 +00:00
Tom Stellard
4447febe55 R600/SI: Split global vector loads with more than 4 elements
llvm-svn: 201368
2014-02-13 23:34:10 +00:00
Tom Stellard
de306d4a6d R600/SI: Add ShaderType attribute to some tests
llvm-svn: 201367
2014-02-13 23:34:07 +00:00
Matt Arsenault
cc13cc04ab R600/SI: Fix assertion on infinite loops.
This isn't the most useful case to fix in the real world,
but bugpoint runs into this.

llvm-svn: 201177
2014-02-11 21:12:38 +00:00
Tom Stellard
2712f90019 R600/SI: Initialize M0 and emit S_WQM_B64 whenever DS instructions are used
DS instructions that access local memory can only uses addresses that
are less than or equal to the value of M0.  When M0 is uninitialized,
then we experience undefined behavior.

This patch also changes the behavior to emit S_WQM_B64 on pixel shaders
no matter what kind of DS instruction is used.

llvm-svn: 201097
2014-02-10 16:58:30 +00:00
Matt Arsenault
accd6717ba R600/SI: Add failing test for 3 x i64 vectors.
Stores of <4 x i64> do work (although they do expand to 4 stores
instead of 2), but 3 x i64 vectors fail to select.

llvm-svn: 200989
2014-02-07 20:29:40 +00:00
Tom Stellard
1906c48d55 R600/SI: Add a MUBUF store pattern for Reg+Imm offsets
llvm-svn: 200935
2014-02-06 18:36:41 +00:00
Tom Stellard
c690406420 R600/SI: Add a MUBUF store pattern for Imm offsets
llvm-svn: 200934
2014-02-06 18:36:39 +00:00
Tom Stellard
2e3a1cc4d8 R600/SI: Add a MUBUF load pattern for Reg+Imm offsets
llvm-svn: 200933
2014-02-06 18:36:38 +00:00
Tom Stellard
879ab71511 R600/SI: Use immediates offsets for SMRD instructions whenever possible
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.

llvm-svn: 200932
2014-02-06 18:36:34 +00:00
Michel Danzer
ed3052cc54 R600/SI: Add pattern for zero-extending i1 to i32
Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830
2014-02-05 09:48:05 +00:00
Tom Stellard
279daf2506 R600/SI: Custom lower i64 ISD::SELECT
llvm-svn: 200774
2014-02-04 17:18:40 +00:00
Tom Stellard
f4a180e50b R600: Enable vector fpow.
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."

Patch by: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773
2014-02-04 17:18:37 +00:00
Michel Danzer
292dfb1151 R600/SI: Fix fneg for 0.0
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.

Also add a pattern for (fneg (fabs ...)).

Fixes a bunch of bit encoding piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200743
2014-02-04 07:12:38 +00:00