1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

246 Commits

Author SHA1 Message Date
Marek Olsak
319154a0b6 R600/SI: Expand fract to floor, then only select V_FRACT on CI
V_FRACT is buggy on SI.

R600-specific code is left intact.

v2: drop the multiclass, use complex VOP3 patterns
llvm-svn: 233075
2015-03-24 13:40:08 +00:00
Matt Arsenault
d59f9a3d0d R600/SI: Remove v_sub_f64 pseudo
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.

llvm-svn: 230072
2015-02-20 22:10:45 +00:00
Matt Arsenault
82f523f0fa R600: Use new fmad node.
This enables a few useful combines that used to only
use fma.

Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.

llvm-svn: 230071
2015-02-20 22:10:41 +00:00
Matt Arsenault
9c682e9039 R600/SI: Fix implicit vcc operand to v_div_fmas_*
This should allow finally fixing the f64 fdiv implementation.

Test is disabled for VI since there seems to be a problem with one
of the buffer load instructions on it.

llvm-svn: 229236
2015-02-14 04:22:00 +00:00
Tom Stellard
e523dc6621 R600/SI: Make more store operations legal
v2i32, i32, trunc i32 to i16, and truc i32 to i8 stores are legal for
all address spaces.  We had marked them as custom in order to lower
them for the private address space, but this is no longer necessary.

This enables lowering of misaligned stores of these types in the
DAGLegalizer.

llvm-svn: 228189
2015-02-04 20:49:51 +00:00
Tom Stellard
5ea00f06af R600: Don't promote i64 stores to v2i32 during DAG legalization
We take care of this during instruction selection now.  This
fixes a potential infinite loop when lowering misaligned stores.

llvm-svn: 228188
2015-02-04 20:49:49 +00:00
Eric Christopher
ce1de59ee6 Reuse a bunch of cached subtargets and remove getSubtarget calls
without a Function argument.

llvm-svn: 227638
2015-01-30 23:24:40 +00:00
Eric Christopher
aacfef65cf Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Tom Stellard
27630189de R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.

It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.

llvm-svn: 226945
2015-01-23 22:05:45 +00:00
Jan Vesely
042d995634 R600: Try to use lower types for 64bit division if possible
v2: add and enable tests for SI

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226881
2015-01-22 23:42:43 +00:00
Jan Vesely
ef787763d5 R600: Simplify LowerUDIVREM
optimizations can handle removing the Hi part operations.
The generated code is identical for R600, ~10% icount reduction for SI

v2: rebase

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226879
2015-01-22 23:42:39 +00:00
Matt Arsenault
3c5cd0f801 R600/SI: Custom lower fround
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.

llvm-svn: 226682
2015-01-21 18:18:25 +00:00
Matt Arsenault
22a9f67443 Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.

This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.

This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.

llvm-svn: 225925
2015-01-14 01:35:17 +00:00
Matt Arsenault
fb153e33c0 R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.

llvm-svn: 225828
2015-01-13 20:53:23 +00:00
Matt Arsenault
4cc5aca737 R600: Implement getRsqrtEstimate
Only do for f32 since I'm unclear on both what this is expecting
for the refinement steps in terms of accuracy, and what
f64 instruction actually provides.

llvm-svn: 225827
2015-01-13 20:53:18 +00:00
Matt Arsenault
ef34769c29 R600: Make cttz / ctlz cheap to speculate
Speculating things is generally good. SI+ has instructions for these
for 32-bit values. This is still probably better even with the expansion
for 64-bit values, although it is odd that this callback doesn't have
the size as a parameter.

llvm-svn: 225822
2015-01-13 19:46:48 +00:00
Ahmed Bougacha
4150499cd1 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Matt Arsenault
08086327f3 R600/SI: Add class intrinsic
llvm-svn: 225305
2015-01-06 23:00:37 +00:00
Matt Arsenault
47c2c0f05a R600: Remove outdated comment
llvm-svn: 224648
2014-12-19 23:29:13 +00:00
Matt Arsenault
a3a9080dce R600/SI: Only form min/max with 1 use.
If the condition is used for something else, this increases
the number of instructions.

llvm-svn: 224646
2014-12-19 23:15:30 +00:00
Matt Arsenault
c1a6f36235 R600: Fix min/max matching problems with unordered compares
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.

llvm-svn: 224094
2014-12-12 02:30:37 +00:00
Matt Arsenault
b0274833dd Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Marek Olsak
15875571f6 R600/SI: Update instruction conversions for VI
There are 3 changes:
- Convert 32-bit S_LSHL/LSHR/ASHR to their V_*REV variants for VI
- Lower RSQ_CLAMP for VI
- Don't generate MIN/MAX_LEGACY on VI

llvm-svn: 223604
2014-12-07 12:19:03 +00:00
Matt Arsenault
796e0c24e7 R600/SI: Use ZeroOrNegativeOneBooleanContent
This sort of doesn't matter since the setcc type is i1, but
this previously was using the default UndefinedBooleanContent. This
makes it more consistent with R600. This enables more optimizations
which typically give up on UndefinedBooleanContent. For example,
there is already a special case target DAG combine for
setcc + sext which can be eliminated in favor of what the generic
DAG combiner can do if it assumes boolean values are sign extended.
Since -1 is an inline immediate, using it is basically free and the
backend already uses it when a boolean value is needed in a wider type.

llvm-svn: 222850
2014-11-26 21:23:15 +00:00
Matt Arsenault
417f5ceb20 R600: Fix assert on copy of an i1 on pre-SI
i1 is not a legal type on Evergreen, so this combine proceeded
and tried to produce a bitcast between i1 and i8.

llvm-svn: 222630
2014-11-23 02:57:52 +00:00
Matt Arsenault
1298bf6e9c R600: Permute operands when selecting legacy min/max
This gets the correct NaN behavior based on the compare type
the hardware uses. This now passes the new piglit test I have
for this on SI.

Add stricter tests for the operand order.

llvm-svn: 222079
2014-11-15 05:02:57 +00:00
Tom Stellard
d8a0a4cc2b R600: Fix 64-bit integer division
This fixes a failure in one of the oclconform tests.

Patch by: Jan Vesely

llvm-svn: 222073
2014-11-15 01:07:57 +00:00
Tom Stellard
573a5f6172 R600: Factor i64 UDIVREM lowering into its own fuction
This is so it could potentially be used by SI.  However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.

llvm-svn: 222072
2014-11-15 01:07:53 +00:00
Matt Arsenault
633ce0ecd7 R600/SI: Combine min3/max3 instructions
llvm-svn: 222032
2014-11-14 20:08:52 +00:00
Matt Arsenault
346bdd92b1 R600/SI: Match integer min / max instructions
llvm-svn: 222015
2014-11-14 18:30:06 +00:00
Matt Arsenault
08233590d4 R600/SI: Fix fmin_legacy / fmax_legacy matching for SI
select_cc is expanded on SI, so this was never matched.

llvm-svn: 221941
2014-11-13 23:03:09 +00:00
Aditya Nandakumar
4d9c1ff994 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Matt Arsenault
8f277a520f R600: Error on initializer for LDS.
Also give a proper error for other address spaces.

llvm-svn: 221917
2014-11-13 19:56:13 +00:00
Aditya Nandakumar
b93fb292df This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
2014-11-13 09:26:31 +00:00
Matt Arsenault
2257f6b589 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Matt Arsenault
98d33a4281 R600/SI: Add missing parameter to div_fmas intrinsic
llvm-svn: 220338
2014-10-21 22:20:55 +00:00
Matt Arsenault
75125bd463 R600: Fix nonsensical implementation of computeKnownBits for BFE
This was resulting in invalid simplifications of sdiv

llvm-svn: 219953
2014-10-16 20:07:40 +00:00
Matt Arsenault
c79fc2137a R600: Remove dead function
llvm-svn: 219879
2014-10-16 00:08:09 +00:00
Matt Arsenault
302179d41a R600: Remove unnecessary part of computeKnownBitsForTargetNode
Zero-width BFEs are combined away already, so there's no point in
handling them.

llvm-svn: 219868
2014-10-15 23:37:49 +00:00
Matt Arsenault
03564ece92 Move variable down to use
llvm-svn: 219867
2014-10-15 23:37:42 +00:00
Matt Arsenault
1d906ecdac R600: Fix miscompiles when BFE has multiple uses
SimplifyDemandedBits would break the other uses of the operand.

llvm-svn: 219819
2014-10-15 17:58:34 +00:00
Matt Arsenault
cb725dcde9 R600: Use existing variable
llvm-svn: 219778
2014-10-15 05:07:00 +00:00
Matt Arsenault
04b9e0240c R600: Remove outdated comment
llvm-svn: 219777
2014-10-15 05:06:57 +00:00
Matt Arsenault
c421684bad R600/SI: Custom lower f64 -> i64 conversions
llvm-svn: 219038
2014-10-03 23:54:56 +00:00
Matt Arsenault
7b24655980 R600: Custom lower [s|u]int_to_fp for i64 -> f64
llvm-svn: 219037
2014-10-03 23:54:41 +00:00
Matt Arsenault
2456242394 R600/SI: Fix ftrunc f64 conformance failures.
Re-add the tests since they were deleted at some point

llvm-svn: 219036
2014-10-03 23:54:27 +00:00
Matt Arsenault
c05b3987ac R600/SI: Add a note about the order of the operands to div_scale
llvm-svn: 218534
2014-09-26 17:55:09 +00:00
Tom Stellard
8c1a958993 R600: Don't set BypassSlowDiv for 64-bit division
BypassSlowDiv is used by codegen prepare to insert a run-time
check to see if the operands to a 64-bit division are really 32-bit
values and if they are it will do 32-bit division instead.

This is not useful for R600, which has predicated control flow since
both the 32-bit and 64-bit paths will be executed in most cases.  It
also increases code size which can lead to more instruction cache
misses.

llvm-svn: 218252
2014-09-22 15:35:32 +00:00
Tom Stellard
14b4dc8502 R600/SI: Use ISD::MUL instead of ISD::UMULO when lowering division
ISD::MUL and ISD:UMULO are the same except that UMULO sets an overflow
bit.  Since we aren't using the overflow bit, we should use ISD::MUL.

llvm-svn: 218251
2014-09-22 15:35:30 +00:00
Matt Arsenault
114b472f2c R600: Better fix for bug 20982
Just do the left shift as unsigned to avoid the UB.

llvm-svn: 218092
2014-09-19 00:42:06 +00:00