1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

56 Commits

Author SHA1 Message Date
Evandro Menezes
7a30a0c01d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Konstantin Zhuravlyov
f9b60c4ef0 [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802

llvm-svn: 285716
2016-11-01 17:49:33 +00:00
Tom Stellard
f13abceda7 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

llvm-svn: 285704
2016-11-01 16:31:48 +00:00
Sanjay Patel
0e75d706d6 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Sanjay Patel
f5a046ca77 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel
399c026728 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Tom Stellard
d9bf037744 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault
9249507611 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
Jan Vesely
3b720aa486 AMDGPU/R600: Remove MergeVectorStores from legalization
This is handled by DAGCombiner in a more generic way

Differential Revision: https://reviews.llvm.org/D23970

llvm-svn: 280019
2016-08-29 22:05:06 +00:00
Matt Arsenault
d3892e17d5 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Nikolai Bozhenov
2540ce6c57 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Wei Ding
ce5d57c9f9 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Matt Arsenault
d60a4f902f AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Matt Arsenault
842a619c4a AMDGPU: Delete dead code
llvm-svn: 276675
2016-07-25 19:06:25 +00:00
Matt Arsenault
bc2c9c63b9 AMDGPU: Delete dead code
This has been dead since r269479

llvm-svn: 276518
2016-07-23 07:07:14 +00:00
Matt Arsenault
86ee9f5013 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault
f3c657912f AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault
a21ee6967c AMDGPU: Remove dead code
llvm-svn: 275369
2016-07-14 05:23:08 +00:00
Matt Arsenault
fcc3666c3b AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Matt Arsenault
14dbe93675 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Matt Arsenault
8603948f83 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Matt Arsenault
66eec46266 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Jan Vesely
c245f58037 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Tom Stellard
98822b77c0 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Re-commit this after fixing a bug where we were trying to use a
reference to a Triple object that had already been destroyed.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272705
2016-06-14 20:29:59 +00:00
Tom Stellard
de6d6f0171 Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"
This reverts commit r272675.

llvm-svn: 272677
2016-06-14 15:16:35 +00:00
Tom Stellard
b56b2a98e2 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272675
2016-06-14 15:11:01 +00:00
Benjamin Kramer
e80783f62f Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Matt Arsenault
6e839f3775 AMDGPU: Remove custom load/store scalarization
llvm-svn: 266385
2016-04-14 23:31:26 +00:00
Matt Arsenault
f100af354c AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

llvm-svn: 266074
2016-04-12 14:05:04 +00:00
Tom Stellard
07e93d8663 AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

llvm-svn: 265170
2016-04-01 18:27:37 +00:00
Matt Arsenault
0c86db8f5c AMDGPU: R600 code splitting cleanup
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204
2016-03-11 08:00:27 +00:00
Matt Arsenault
e24a001b35 AMDGPU: Move function only used by R600
llvm-svn: 262853
2016-03-07 21:10:13 +00:00
Matt Arsenault
4ff4c396c1 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Matt Arsenault
cdd789021d AMDGPU: Split R600 and SI store lowering
These were only sharing some somewhat incorrect
logic for when to scalarize or split vectors.

llvm-svn: 260490
2016-02-11 05:32:46 +00:00
Matt Arsenault
2c8154f002 AMDGPU: Split R600 and SI load lowering
These weren't actually sharing anything in the common
LowerLOAD.

llvm-svn: 260398
2016-02-10 18:21:39 +00:00
Matt Arsenault
28a667546a AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault
fe8ee22547 AMDGPU: Remove more unused intrinsics
Replace tests with lrp with basic IR expansion

llvm-svn: 258612
2016-01-23 05:42:38 +00:00
Matt Arsenault
c463ea6063 AMDGPU: Remove abs intrinsic
llvm-svn: 258343
2016-01-20 20:58:29 +00:00
Matt Arsenault
348623d27f AMDGPU: Reduce 64-bit SRAs
llvm-svn: 258096
2016-01-18 22:09:04 +00:00
Matt Arsenault
862bf93c73 AMDGPU: Split 64-bit and of constant up
This breaks the tests that were meant for testing
64-bit inline immediates, so move those to shl where
they won't be broken up.

This should be repeated for the other related bit ops.

llvm-svn: 258095
2016-01-18 22:01:13 +00:00
Matt Arsenault
e1a6e6ae7f AMDGPU: Reduce 64-bit lshr by constant to 32-bit
64-bit shifts are very slow on some subtargets.

llvm-svn: 258090
2016-01-18 21:43:36 +00:00
Marek Olsak
2b92f6022f AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).

This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.

v2: don't do this for compute shaders

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16033

llvm-svn: 257621
2016-01-13 17:23:04 +00:00
Matt Arsenault
e2033b4ea1 AMDGPU: Implement {{s|u}}int_to_fp i64 -> f32
The old lowering for uint_to_fp failed opencl conformance.
It might be OK for fast math mode, but I'm not sure.

llvm-svn: 257393
2016-01-11 22:01:48 +00:00
Matt Arsenault
b88ff2e112 AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.

llvm-svn: 257352
2016-01-11 17:02:00 +00:00
Matt Arsenault
9caa943926 AMDGPU: Custom lower i64 ctlz
llvm-svn: 257348
2016-01-11 16:50:29 +00:00
Matt Arsenault
3ca8d64e75 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Matt Arsenault
85dd075020 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Matt Arsenault
9dd8b4ce6c AMDGPU: Produce error on dynamic_stackalloc
llvm-svn: 246048
2015-08-26 18:37:13 +00:00
Simon Pilgrim
3d6f76a00a [AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation
D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

Differential Revision: http://reviews.llvm.org/D12007

llvm-svn: 244960
2015-08-13 21:40:02 +00:00
Matt Arsenault
27b16f9fd2 AMDGPU: Fix return type of getImplicitParameterOffset.
Patch by Zoltan Gilian <zoltan.gilian@gmail.com>

llvm-svn: 243459
2015-07-28 18:09:55 +00:00