1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

439 Commits

Author SHA1 Message Date
Matt Arsenault
a3de4dc001 R600/SI - Add new CI arithmetic instructions.
Does not yet include larger part required
to match v_mad_i64_i32 / v_mad_u64_u32.

llvm-svn: 202077
2014-02-24 21:01:28 +00:00
Quentin Colombet
5c6ea83f97 [CodeGenPrepare] Fix the check of the legality of an instruction.
The API expects an ISD opcode, not an IR opcode.
Fixes a regression for R600.

Related to <rdar://problem/15519855>.

llvm-svn: 201923
2014-02-22 01:06:41 +00:00
Nico Rieck
f3b62a4af6 Fix more broken CHECK lines
llvm-svn: 201493
2014-02-16 13:28:39 +00:00
Quentin Colombet
5700bbac29 [CodeGenPrepare][AddressingModeMatcher] Give up on type promotion if the
transformation does not bring any immediate benefits and introduce an illegal
operation. 

llvm-svn: 201439
2014-02-14 22:23:22 +00:00
Tom Stellard
a3a801780f TargetLowering: n * r where n > 2 should be an illegal addressing mode
llvm-svn: 201433
2014-02-14 21:10:34 +00:00
Tom Stellard
988925aeae R600/SI: Expand all v8[if]32 operations
llvm-svn: 201371
2014-02-13 23:34:15 +00:00
Tom Stellard
309a624102 R600/SI: Add a pattern for i32 anyext
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 201370
2014-02-13 23:34:13 +00:00
Tom Stellard
4b0c3551df R600/SI: Completely Disable TypeRewriter on compute
llvm-svn: 201369
2014-02-13 23:34:12 +00:00
Tom Stellard
4447febe55 R600/SI: Split global vector loads with more than 4 elements
llvm-svn: 201368
2014-02-13 23:34:10 +00:00
Tom Stellard
de306d4a6d R600/SI: Add ShaderType attribute to some tests
llvm-svn: 201367
2014-02-13 23:34:07 +00:00
Matt Arsenault
cc13cc04ab R600/SI: Fix assertion on infinite loops.
This isn't the most useful case to fix in the real world,
but bugpoint runs into this.

llvm-svn: 201177
2014-02-11 21:12:38 +00:00
Tom Stellard
2712f90019 R600/SI: Initialize M0 and emit S_WQM_B64 whenever DS instructions are used
DS instructions that access local memory can only uses addresses that
are less than or equal to the value of M0.  When M0 is uninitialized,
then we experience undefined behavior.

This patch also changes the behavior to emit S_WQM_B64 on pixel shaders
no matter what kind of DS instruction is used.

llvm-svn: 201097
2014-02-10 16:58:30 +00:00
Matt Arsenault
accd6717ba R600/SI: Add failing test for 3 x i64 vectors.
Stores of <4 x i64> do work (although they do expand to 4 stores
instead of 2), but 3 x i64 vectors fail to select.

llvm-svn: 200989
2014-02-07 20:29:40 +00:00
Tom Stellard
1906c48d55 R600/SI: Add a MUBUF store pattern for Reg+Imm offsets
llvm-svn: 200935
2014-02-06 18:36:41 +00:00
Tom Stellard
c690406420 R600/SI: Add a MUBUF store pattern for Imm offsets
llvm-svn: 200934
2014-02-06 18:36:39 +00:00
Tom Stellard
2e3a1cc4d8 R600/SI: Add a MUBUF load pattern for Reg+Imm offsets
llvm-svn: 200933
2014-02-06 18:36:38 +00:00
Tom Stellard
879ab71511 R600/SI: Use immediates offsets for SMRD instructions whenever possible
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.

llvm-svn: 200932
2014-02-06 18:36:34 +00:00
Michel Danzer
ed3052cc54 R600/SI: Add pattern for zero-extending i1 to i32
Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830
2014-02-05 09:48:05 +00:00
Tom Stellard
279daf2506 R600/SI: Custom lower i64 ISD::SELECT
llvm-svn: 200774
2014-02-04 17:18:40 +00:00
Tom Stellard
f4a180e50b R600: Enable vector fpow.
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."

Patch by: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773
2014-02-04 17:18:37 +00:00
Michel Danzer
292dfb1151 R600/SI: Fix fneg for 0.0
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.

Also add a pattern for (fneg (fabs ...)).

Fixes a bunch of bit encoding piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200743
2014-02-04 07:12:38 +00:00
Matt Arsenault
f89553a645 Add some xfailed R600 tests for 64-bit private accesses.
llvm-svn: 200620
2014-02-02 00:13:12 +00:00
Matt Arsenault
4198d962c0 R600/SI: Fix insertelement with dynamic indices.
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.

llvm-svn: 200619
2014-02-02 00:05:35 +00:00
Michel Danzer
71542b5f92 R600/SI: Add pattern for truncating i32 to i1
Fixes half a dozen piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200283
2014-01-28 03:01:16 +00:00
Michel Danzer
65a5397c22 R600/SI: Add intrinsic for BUFFER_LOAD_DWORD* instructions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
2014-01-27 07:20:51 +00:00
Michel Danzer
36dd8ac577 R600/SI: Add intrinsic for S_SENDMSG instruction
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
2014-01-27 07:20:44 +00:00
Tom Stellard
e8c59f575b R600: Disable the BFE pattern
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason.  I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.

llvm-svn: 199918
2014-01-23 18:49:33 +00:00
Tom Stellard
25fa3e2b1d R600: Correctly handle vertex fetch clauses the precede ENDIFs
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.

llvm-svn: 199917
2014-01-23 18:49:31 +00:00
Tom Stellard
ab9b18423b R600: Unconditionally unroll loops that contain GEPs with alloca pointers
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.

Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.

llvm-svn: 199916
2014-01-23 18:49:28 +00:00
Tom Stellard
6f13c22a7a R600: Recommit 199842: Add work-around for the CF stack entry HW bug
The unit test is now disabled on non-asserts builds.

The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)

We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.

reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199905
2014-01-23 16:18:02 +00:00
Tom Stellard
d5181ee67d Revert "R600: Add work-around for the CF stack entry HW bug"
This reverts commit 35b8331cad6eb512a2506adbc394201181da94ba.

The -debug-only flag for llc doesn't appear to be available in
all build configurations.

llvm-svn: 199845
2014-01-22 22:20:54 +00:00
Tom Stellard
cd874ab98c R600: Add work-around for the CF stack entry HW bug
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)

We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.

reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199842
2014-01-22 21:55:46 +00:00
Tom Stellard
ae477cc774 R600: Refactor stack size calculation
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199840
2014-01-22 21:55:43 +00:00
Tom Stellard
19af07fe92 R600: MOVA is vector only
llvm-svn: 199827
2014-01-22 19:24:24 +00:00
Tom Stellard
0971c460b5 R600: Take alignment into account when calculating the stack offset
llvm-svn: 199826
2014-01-22 19:24:23 +00:00
Tom Stellard
d424fe57e4 R600: Add support for global addresses with constant initializers
llvm-svn: 199825
2014-01-22 19:24:21 +00:00
Tom Stellard
452996a15e R600: Begin private memory at the second GPR.
This way private memory does not over-write work group information
stored in GPRs 0 and 1.

llvm-svn: 199824
2014-01-22 19:24:19 +00:00
Tom Stellard
369c33de20 R600/SI: Add support for i8 and i16 private loads/stores
llvm-svn: 199823
2014-01-22 19:24:14 +00:00
Benjamin Kramer
002aed9cb3 Fix broken CHECK lines.
llvm-svn: 199016
2014-01-11 21:06:00 +00:00
Tom Stellard
b39ac07c09 R600: Allow ftrunc
v2: Add ftrunc->TRUNC pattern instead of replacing int_AMDGPU_trunc
v3: move ftrunc pattern next to TRUNC definition, it's available since R600

Patch By: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 197783
2013-12-20 05:11:55 +00:00
NAKAMURA Takumi
e67a3fe0ef Add REQUIRES:asserts to 3 tests in llvm/test/CodeGen/R600 added in r192212.
They are failing in assertions.

llvm-svn: 197669
2013-12-19 10:41:12 +00:00
Matt Arsenault
e64331a159 R600/SI: Make private pointers be 32-bit.
Different sized address spaces should theoretically work
most of the time now, and since 64-bit add is currently
disabled, using more 32-bit pointers fixes some cases.

llvm-svn: 197659
2013-12-19 05:32:55 +00:00
Matt Arsenault
329326f031 R600/SI: Minor improvements to test.
Use CHECK-LABEL, add an i64 version, check store instructions.

llvm-svn: 197293
2013-12-14 00:38:04 +00:00
Matt Arsenault
0a974aca94 R600/SI: Add i64 cmp tests
llvm-svn: 196960
2013-12-10 21:11:55 +00:00
Vincent Lejeune
73c6e97c15 R600: Fix an infinite loop when trying to reorganize export/tex vector input
llvm-svn: 196923
2013-12-10 14:43:31 +00:00
Vincent Lejeune
cd5d9a9849 R600: Fix input modifiers lost for Cayman
llvm-svn: 196922
2013-12-10 14:43:27 +00:00
Vincent Lejeune
8f22bc4540 Add a RequireStructuredCFG Field to TargetMachine.
llvm-svn: 196634
2013-12-07 01:49:19 +00:00
Matt Arsenault
6f14dd54b4 R600/SI: Add comments for number of used registers.
llvm-svn: 196467
2013-12-05 05:15:35 +00:00
Vincent Lejeune
26780e84f1 R600: Workaround for cayman loop bug
llvm-svn: 196121
2013-12-02 17:29:37 +00:00
Tom Stellard
95624c101d R600: Expand vector FABS
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195881
2013-11-27 21:23:39 +00:00
Tom Stellard
eac3acc854 R600/SI: Implement spilling of SGPRs v5
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.

v2:
  - Fix encoding of Lane Mask
  - Use correct register flags, so we don't overwrite the low dword
    when restoring multi-dword registers.

v3:
  - Register spilling seems to hang the GPU, so replace all shaders
    that need spilling with a dummy shader.

v4:
  - Fix *LANE definitions
  - Change destination reg class for 32-bit SMRD instructions

v5:
  - Remove small optimization that was crashing Serious Sam 3.

https://bugs.freedesktop.org/show_bug.cgi?id=68224
https://bugs.freedesktop.org/show_bug.cgi?id=71285

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195880
2013-11-27 21:23:35 +00:00
Tom Stellard
d386cdf4d0 R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195879
2013-11-27 21:23:29 +00:00
Tom Stellard
0a14ce13e1 R600: Add support for ISD::FROUND
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195878
2013-11-27 21:23:20 +00:00
Tom Stellard
5da7926d0a R600/SI: Fixing handling of condition codes
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195514
2013-11-22 23:07:58 +00:00
Tom Stellard
c2f05239d7 SelectionDAG: Optimize expansion of vec_type = BITCAST scalar_type
The legalizer can now do this type of expansion for more
type combinations without loading and storing to and
from the stack.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195398
2013-11-22 00:41:05 +00:00
Matt Arsenault
be108f1643 R600/SI: Fix moveToVALU when the first operand is VSrc.
Moving into a VSrc doesn't always work, since it could be
replaced with an SGPR later.

llvm-svn: 195042
2013-11-18 20:09:55 +00:00
Matt Arsenault
cdea5c8fe0 R600/SI: Fix multiple SGPR reads when using VCC.
No other SGPR operands are allowed, so if VCC is
used, move the other to a VGPR.

llvm-svn: 195041
2013-11-18 20:09:50 +00:00
Matt Arsenault
485f69c9cf R600/SI: Implement add i64, but do not yet enable.
Test doesn't actually check the output. I need
to fix add i64 being matched for the addressing
calculations.

llvm-svn: 195040
2013-11-18 20:09:47 +00:00
Matt Arsenault
62a8d8b89a R600/SI: Move patterns to match add / sub to scalar instructions
llvm-svn: 195034
2013-11-18 20:09:29 +00:00
Tom Stellard
84bb236e61 R600: Enable the IR structurizer by default
llvm-svn: 195031
2013-11-18 19:43:44 +00:00
Tom Stellard
f1b1fa4727 R600: Fix a crash in the AMDILCFGStrucurizer
The ifPatternMatch() function was not correctly reporting the number
of matches in some cases.

llvm-svn: 195030
2013-11-18 19:43:38 +00:00
Tom Stellard
d0cdc72805 R600/SI: Fix illegal VGPR->SGPR copy inside of loop
llvm-svn: 195026
2013-11-18 18:50:20 +00:00
Tom Stellard
47634da2de R600/SI: Fix another case of illegal VGPR->SGPR copy
llvm-svn: 195025
2013-11-18 18:50:15 +00:00
Matt Arsenault
ae406d5aa1 Use right address space pointer size
llvm-svn: 194940
2013-11-17 00:06:39 +00:00
Matt Arsenault
3f72b0ae69 Fix assert on unaligned access to global with different address space size.
llvm-svn: 194934
2013-11-16 20:50:54 +00:00
Matt Arsenault
82257ae18e Fix codegen for null different sized pointer.
llvm-svn: 194932
2013-11-16 20:24:41 +00:00
Vincent Lejeune
2a45033d9c R600: Make dot_4 instructions predicable
llvm-svn: 194927
2013-11-16 16:24:41 +00:00
Tom Stellard
01fa6ad95f R600/SI: Add VReg_96 register class to SIRegisterInfo::hasVGPRs()
This fixes a crash with GNOME settings manager.

llvm-svn: 194836
2013-11-15 18:26:45 +00:00
Matt Arsenault
084675c776 Add target hook to prevent folding some bitcasted loads.
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)

On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.

Patch by Micah Villmow.

llvm-svn: 194783
2013-11-15 04:42:23 +00:00
Tom Stellard
43da22dc72 R600: Fix scheduling of instructions that use the LDS output queue
The LDS output queue is accessed via the OQAP register.  The OQAP
register cannot be live across clauses, so if value is written to the
output queue, it must be retrieved before the end of the clause.
With the machine scheduler, we cannot statisfy this constraint, because
it lacks proper alias analysis and it will mark some LDS accesses as
having a chain dependency on vertex fetches.  Since vertex fetches
require a new clauses, the dependency may end up spiltting OQAP uses and
defs so the end up in different clauses.  See the lds-output-queue.ll
test for a more detailed explanation.

To work around this issue, we now combine the LDS read and the OQAP
copy into one instruction and expand it after register allocation.

This patch also adds some checks to the EmitClauseMarker pass, so that
it doesn't end a clause with a value still in the output queue and
removes AR.X and OQAP handling from the scheduler (AR.X uses and defs
were already being expanded post-RA, so the scheduler will never see
them).

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 194755
2013-11-15 00:12:45 +00:00
Matt Arsenault
41b059ed44 R600/SI: Add testcase for problem I ran into
with the older version of the moveToVALU changes.

llvm-svn: 194682
2013-11-14 07:57:29 +00:00
Tom Stellard
c38302be13 R600/SI: Add support for private address space load/store
Private address space is emulated using the register file with
MOVRELS and MOVRELD instructions.

llvm-svn: 194626
2013-11-13 23:36:50 +00:00
Tom Stellard
90e4344bae R600/SI: Prefer SALU instructions for bit shift operations
All shift operations will be selected as SALU instructions and then
if necessary lowered to VALU instructions in the SIFixSGPRCopies pass.

This allows us to do more operations on the SALU which will improve
performance and is also required for implementing private memory
using indirect addressing, since the private memory pointers must stay
in the scalar registers.

This patch includes some fixes from Matt Arsenault.

llvm-svn: 194625
2013-11-13 23:36:37 +00:00
Matt Arsenault
9c10e82e9e R600: Fix selection failure on EXTLOAD
llvm-svn: 194547
2013-11-13 02:39:07 +00:00
Matt Arsenault
70be5dff43 R600/SI: Change formatting of printed registers.
Print the range of registers used with a single letter prefix.
This better matches what the shader compiler produces and
is overall less obnoxious than concatenating all of the
subregister names together.

Instead of SGPR0, it will print s0. Instead of SGPR0_SGPR1,
it will print s[0:1] and so on.

There doesn't appear to be a straightforward way
to get the actual register info in the InstPrinter,
so this parses the generated name to print with the
new syntax.

The required test changes are pretty nasty, and register
matching regexes are now worse. Since there isn't a way to
add to a variable in FileCheck, some of the tests now don't
check the exact number of registers used, but I don't think that
will be a real problem.

llvm-svn: 194443
2013-11-12 02:35:51 +00:00
Matt Arsenault
7970dec395 R600/SI: Add test that fails due to requiring i64 mul for pointers
llvm-svn: 194433
2013-11-11 23:31:02 +00:00
Vincent Lejeune
54d9c8726b R600: Use function inputs to represent data stored in gpr
llvm-svn: 194425
2013-11-11 22:10:24 +00:00
Vincent Lejeune
5f1f106136 R600: Fix LowerUDIVREM
llvm-svn: 194153
2013-11-06 17:36:04 +00:00
Matt Arsenault
4bf5f8fbee Fix CodeGen for unaligned loads with address spaces
llvm-svn: 193721
2013-10-30 23:30:05 +00:00
Tom Stellard
e058534e9a R600: Custom lower f32 = uint_to_fp i64
llvm-svn: 193701
2013-10-30 17:22:05 +00:00
Tom Stellard
bf6d714576 R600/SI: Add compute support for CI v2
v2:
  - Fix LDS size calculation

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 193621
2013-10-29 16:37:28 +00:00
Tom Stellard
6e8144c932 R600: Expand vector FSQRT ops
llvm-svn: 193620
2013-10-29 16:37:20 +00:00
Tom Stellard
7df5f52e81 R600/SI: fix MIMG writemask adjustement
This fixes piglit:
- shaders/glsl-fs-texture2d-masked
- shaders/glsl-fs-texture2d-masked-4

Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 193222
2013-10-23 02:53:47 +00:00
Tom Stellard
2b6ff7e802 R600: Fix handling of vector kernel arguments
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted.  In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.

llvm-svn: 193215
2013-10-23 00:44:32 +00:00
Tom Stellard
a1dbb396e5 R600/SI: Add support for i64 bitwise or
llvm-svn: 193213
2013-10-23 00:44:19 +00:00
Tom Stellard
914bfa633e R600/SI: Use S_LOAD_DWORD instructions for v8i32 and v16i32
llvm-svn: 193212
2013-10-23 00:44:12 +00:00
Tom Stellard
5908e906e2 R600: Simplify handling of private address space
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated.  The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory.  This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.

For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
   MOV instructions that use indirect addressing.

llvm-svn: 193179
2013-10-22 18:19:10 +00:00
Matt Arsenault
aad9d96b2b Fix CodeGen for vectors of pointers with address spaces.
llvm-svn: 193112
2013-10-21 20:03:58 +00:00
Matt Arsenault
dad904931b Fix CodeGen for different size address space GEPs
llvm-svn: 193111
2013-10-21 20:03:54 +00:00
Tom Stellard
d5d95a9800 R600: Fix a crash in the AMDILCFGStructurizer
We were calling llvm_unreachable() when failing to optimize the
branch into if case.  However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192813
2013-10-16 17:06:02 +00:00
Vincent Lejeune
7594bd2071 R600: improve dump of S_WAITCNT
llvm-svn: 192557
2013-10-13 17:56:28 +00:00
Vincent Lejeune
316b632e03 R600: Use masked read sel for texture instructions
llvm-svn: 192554
2013-10-13 17:56:10 +00:00
Vincent Lejeune
b337ac16bc R600: fix swizzle export
llvm-svn: 192553
2013-10-13 17:56:04 +00:00
Matt Arsenault
289accc07f R600: Add scalar i32 add test
llvm-svn: 192501
2013-10-11 21:03:41 +00:00
Matt Arsenault
d5c3e13cc5 Use CHECK-LABEL
llvm-svn: 192500
2013-10-11 21:03:39 +00:00
Matt Arsenault
6f45619203 R600: Fix trunc i64 to i32 on SI
llvm-svn: 192375
2013-10-10 18:04:16 +00:00
Tom Stellard
582bb030db R600/SI: Use -verify-machineinstrs for most tests
We can't enable the verifier for tests with SI_IF and SI_ELSE, because
these instructions are always followed by a COPY which copies their
result to the next basic block.  This violates the machine verifier's
rule that non-terminators can not folow terminators.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192366
2013-10-10 17:11:46 +00:00
Matt Arsenault
ed8ec1a52a Add some xfaild R600 tests.
These are bugs to fix later.

llvm-svn: 192212
2013-10-08 18:06:36 +00:00
Vincent Lejeune
c7c1075d49 R600: add a pass that merges clauses.
llvm-svn: 191790
2013-10-01 19:32:58 +00:00
Vincent Lejeune
0321b7798e R600: Put PRED_X instruction in its own clause
llvm-svn: 191789
2013-10-01 19:32:49 +00:00
Vincent Lejeune
e0ac07a3cb R600: Enable -verify-machineinstrs in some tests.
llvm-svn: 191788
2013-10-01 19:32:38 +00:00
Manman Ren
ad317a135a TBAA: update tbaa format from scalar format to struct-path aware format.
llvm-svn: 191690
2013-09-30 18:17:55 +00:00
Manman Ren
799fd39420 TBAA: remove !tbaa from testing cases when they are not needed.
llvm-svn: 191689
2013-09-30 18:17:35 +00:00
Tom Stellard
1cb4ba2a4d R600: Fix handling of NAN in comparison instructions
We were completely ignoring the unorder/ordered attributes of condition
codes and also incorrectly lowering seto and setuo.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 191603
2013-09-28 02:50:50 +00:00
Vincent Lejeune
439c29a29d R600: Move code handling literal folding into R600ISelLowering.
llvm-svn: 190644
2013-09-12 23:44:53 +00:00
Vincent Lejeune
82c06999cd R600: Move fabs/fneg/sel folding logic into PostProcessIsel
This move makes possible to correctly handle multiples instructions
from a single pattern.

llvm-svn: 190643
2013-09-12 23:44:44 +00:00
Tom Stellard
6a507da088 R600/SI: expose TBUFFER_STORE_FORMAT_* for OpenGL transform feedback
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.

The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.

The maximum number of input SGPRs is bumped to 17.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
2013-09-12 02:55:14 +00:00
Aaron Watry
e4512c5eff R600: Add support for LDS atomic subtract
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190200
2013-09-06 20:17:42 +00:00
Matt Arsenault
f658af2617 Teach CodeGenPrepare about address spaces
llvm-svn: 190112
2013-09-06 00:18:43 +00:00
Matt Arsenault
071be273be R600: Fix i64 to i32 trunc on SI
llvm-svn: 190091
2013-09-05 19:41:10 +00:00
Tom Stellard
ce0432a0c3 R600: Add support for local memory atomic add
llvm-svn: 190080
2013-09-05 18:38:09 +00:00
Tom Stellard
6c1db18560 R600: Expand SELECT nodes rather than custom lowering them
llvm-svn: 190079
2013-09-05 18:38:03 +00:00
Tom Stellard
8f7c5a681a R600: Fix incorrect LDS size calculation
GlobalAdderss nodes that appeared in more than one basic block were
being counted twice.

llvm-svn: 190078
2013-09-05 18:37:57 +00:00
Tom Stellard
d2fff2dd99 R600/SI: Don't emit S_WQM_B64 instruction for compute shaders
llvm-svn: 190077
2013-09-05 18:37:52 +00:00
Vincent Lejeune
4fd20e35e6 R600: Use shared op optimization when checking cycle compatibility
llvm-svn: 189981
2013-09-04 19:53:54 +00:00
Vincent Lejeune
4a8c23c168 R600: Non vector only instruction can be scheduled on trans unit
llvm-svn: 189980
2013-09-04 19:53:46 +00:00
Vincent Lejeune
95def9718e R600: Remove fmul.v4f32.ll test which is redundant with fmul.ll
llvm-svn: 189978
2013-09-04 19:53:22 +00:00
Michel Danzer
084b7ef3da R600/SI: Enable local-memory-two-objects lit test
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 189334
2013-08-27 10:28:26 +00:00
Tom Stellard
f7fd8102dd SelectionDAG: Remove unnecessary uses of TargetLowering::getPointerTy()
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().

Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());

llvm-svn: 189227
2013-08-26 15:06:10 +00:00
Tom Stellard
471cae1398 R600: Add support for vector local memory loads
llvm-svn: 189226
2013-08-26 15:06:04 +00:00
Tom Stellard
951bdd0d80 R600: Add support for i8 and i16 local memory loads
llvm-svn: 189225
2013-08-26 15:05:59 +00:00
Tom Stellard
dec9289d7b SelectionDAG: Use correct pointer size when splitting vector stores
llvm-svn: 189224
2013-08-26 15:05:55 +00:00
Tom Stellard
38c07cc5d7 R600: Add support for i8 and i16 local memory stores
llvm-svn: 189223
2013-08-26 15:05:49 +00:00
Tom Stellard
743d74f1b3 R600: Add support for v4i32 and v2i32 local stores
llvm-svn: 189222
2013-08-26 15:05:44 +00:00
Tom Stellard
1287fd01c3 SelectionDAG: Use correct pointer size when lowering function arguments v2
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes.  The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.

This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.

v2:
  - Add more helper functions to TargetLoweringBase
  - Use CHECK-LABEL for tests

llvm-svn: 189221
2013-08-26 15:05:36 +00:00
Bill Wendling
fa2b06a7e7 Update to remove the no-frame-pointer-elim-non-leaf flag if it was set to 'false'.
llvm-svn: 189068
2013-08-22 21:28:54 +00:00
Tom Stellard
d70d216860 R600/SI: Fix another case of illegal VGPR to SGPR copy
This fixes a crash in Unigine Tropics.

https://bugs.freedesktop.org/show_bug.cgi?id=68389

llvm-svn: 189057
2013-08-22 20:21:02 +00:00
Tom Stellard
721e3acccd SelectionDAG: Make sure stores are always added to the LegalizedNodes list
When truncated vector stores were being custom lowered in
VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair
was not being added to LegalizedNodes list.  Instead of the legalized
result being passed to VectorLegalizer::TranslateLegalizeResult(),
the result was being passed back into VectorLegalizer::LegalizeOp(),
which ended up adding a (new, new) pair to the list instead.

This was causing an assertion failure when a custom lowered truncated
vector store was the last instruction a basic block and the VectorLegalizer
was unable to find it in the LegalizedNodes list when updating the
DAG root.

llvm-svn: 188953
2013-08-21 22:42:58 +00:00
Tom Stellard
ad43e88afa R600: Expand vector FRINT ops
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188598
2013-08-16 23:51:33 +00:00
Tom Stellard
e42573d2cc R600: Expand vector FFLOOR ops
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188597
2013-08-16 23:51:29 +00:00
Tom Stellard
0721bae8ba R600: Expand vector float operations for both SI and R600
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188596
2013-08-16 23:51:24 +00:00
Michel Danzer
65d5ad5728 R600/SI: Add pattern for xor of i1
Fixes two recent piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188559
2013-08-16 16:19:31 +00:00
Michel Danzer
acc130ec54 R600/SI: Fix broken encoding of DS_WRITE_B32
The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.

Undo that damage and only apply the SMRD logic to that.

Fixes some derivates related piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558
2013-08-16 16:19:24 +00:00
Tom Stellard
77968acef1 Revert "R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions"
This reverts commit a6a39ced095c2f453624ce62c4aead25db41a18f.
This is the wrong version of this fix.

llvm-svn: 188523
2013-08-16 01:18:43 +00:00
Tom Stellard
25dbdabc12 R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write).  This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.

llvm-svn: 188522
2013-08-16 01:12:20 +00:00
Tom Stellard
284558892e R600: Add support for global vector loads with element types less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188521
2013-08-16 01:12:16 +00:00
Tom Stellard
c42a38e3ad R600: Add support for global vector stores with elements less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188520
2013-08-16 01:12:11 +00:00
Tom Stellard
8d9a460dad R600: Add support for i16 and i8 global stores
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188519
2013-08-16 01:12:06 +00:00
Tom Stellard
f0f0f6e071 R600: Add support for v4i32 stores on Cayman
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188518
2013-08-16 01:12:00 +00:00
Tom Stellard
9da87c6553 R600: Enable folding of inline literals into REQ_SEQUENCE instructions
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188517
2013-08-16 01:11:55 +00:00
Tom Stellard
8061257aaf R600: Change the RAT instruction assembly names so they match the docs
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188515
2013-08-16 01:11:46 +00:00
Daniel Dunbar
a496d61c01 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Tom Stellard
0f3c885b1a R600/SI: Improve legalization of vector operations
This should fix hangs in the OpenCL piglit tests.

llvm-svn: 188431
2013-08-14 23:25:00 +00:00
Tom Stellard
20e208af7d R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsics
llvm-svn: 188430
2013-08-14 23:24:53 +00:00
Tom Stellard
d7b0828247 R600/SI: Convert v16i8 resource descriptors to i128
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.

This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:

https://bugs.freedesktop.org/show_bug.cgi?id=66805

llvm-svn: 188429
2013-08-14 23:24:45 +00:00
Tom Stellard
257882f15c R600/SI: Use i8 types for resource descriptors in tests
We switched from i32 to i8 types a while ago and the tests were never
updated.

llvm-svn: 188428
2013-08-14 23:24:37 +00:00
Tom Stellard
649e8ff0ee R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.

v2:
  - Use an SGPR register class if all the operands of BUILD_VECTOR are
    SGPRs.

llvm-svn: 188427
2013-08-14 23:24:32 +00:00
Tom Stellard
599374cf06 R600/SI: Assign a register class to the $vaddr operand for MIMG instructions
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.

llvm-svn: 188425
2013-08-14 23:24:17 +00:00
Tom Stellard
6287532c80 R600/SI: Handle MSAA texture targets
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188421
2013-08-14 22:22:14 +00:00
Tom Stellard
5d16e4f78e R600/SI: Allow conversion between v32i8 and v8i32
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188420
2013-08-14 22:22:09 +00:00