1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00
Commit Graph

234 Commits

Author SHA1 Message Date
Justin Lebar
5e057b68bf [NVPTX] Support global variables of integer type larger than i64.
Reviewers: tra, majnemer

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28825

llvm-svn: 292316
2017-01-18 00:29:53 +00:00
Justin Lebar
9ce9617cca [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.
Summary:
This change also lets us use max.{s,u}16.  There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16.  It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).

In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc.  In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28732

llvm-svn: 292304
2017-01-18 00:09:01 +00:00
Justin Lebar
2af1dc95bf [NVPTX] Lower integer absolute value idiom to abs instruction.
Summary: Previously we lowered it literally, to shifts and xors.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28722

llvm-svn: 292303
2017-01-18 00:08:44 +00:00
Justin Lebar
8dc6f0597a [NVPTX] Improve lowering of llvm.ctpop.
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.

(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28721

llvm-svn: 292302
2017-01-18 00:08:27 +00:00
Justin Lebar
5831e57771 [NVPTX] Add lowering for llvm.bitreverse.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28720

llvm-svn: 292301
2017-01-18 00:08:10 +00:00
Justin Lebar
c602aa9145 [NVPTX] Fix function names in ctlz.ll test. Test-only change.
Looks like a copy/paste mistake, all the functions in ctlz.ll were named
"ctpop".

llvm-svn: 292300
2017-01-18 00:07:52 +00:00
Justin Lebar
19b8b1b408 [NVPTX] Improve lowering of llvm.ctlz.
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
  has defined behavior on x == 0 to check whether x is, in fact zero.

* Add DAG patterns that avoid re-truncating or re-expanding the result
  of the 16- and 64-bit ctz instructions.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28719

llvm-svn: 292299
2017-01-18 00:07:35 +00:00
Justin Lebar
41e4dfb4fc [NVPTX] Add fptosi tests to convert-fp.ll.
These seem to have been left off by accident.

llvm-svn: 292071
2017-01-15 16:55:54 +00:00
Justin Lebar
2166b8131c [NVPTX] Add codegen tests for llvm.fma.
llvm-svn: 292070
2017-01-15 16:55:37 +00:00
Justin Lebar
28196eb709 [NVPTX] Modernize intrinics.ll test.
llvm-svn: 292069
2017-01-15 16:54:57 +00:00
Justin Lebar
b814633873 [NVPTX] Let there be One True Way to set NVVMReflect params.
Summary:
Previously there were three ways to inform the NVVMReflect pass whether
you wanted to flush denormals to zero:

  * An LLVM command-line option
  * Parameters to the NVVMReflect constructor
  * Metadata on the module itself.

This change removes the first two, leaving only the third.

The motivation for this change, aside from simplifying things, is that
we want LLVM to be aware of whether it's operating in FTZ mode, so other
passes can use this information.  Ideally we'd have a target-generic
piece of metadata on the module.  This change moves us in that
direction.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28700

llvm-svn: 292068
2017-01-15 16:54:35 +00:00
Artem Belevich
f89a861f79 [NVPTX] Added support for half-precision floating point.
Only scalar half-precision operations are supported at the moment.

- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
  (can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
  fp16 operations are promoted to fp32 and results are converted back
  to fp16 for storage.

Differential Revision: https://reviews.llvm.org/D28540

llvm-svn: 291956
2017-01-13 20:56:17 +00:00
Artem Belevich
f0d4a0534b [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.

Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.

Differential Revision: https://reviews.llvm.org/D28619

llvm-svn: 291936
2017-01-13 18:48:13 +00:00
Justin Lebar
e3fb38059f [TM] Restore default TargetOptions in TargetMachine::resetTargetOptions.
Summary:
Previously if you had

 * a function with the fast-math-enabled attr, followed by
 * a function without the fast-math attr,

the second function would inherit the first function's fast-math-ness.

This means that mixing fast-math and non-fast-math functions in a module
was completely broken unless you explicitly annotated every
non-fast-math function with "unsafe-fp-math"="false".  This appears to
have been broken since r176986 (March 2013), when the resetTargetOptions
function was introduced.

This patch tests the correct behavior as best we can.  I don't think I
can test FPDenormalMode and NoTrappingFPMath, because they aren't used
in any backends during function lowering.  Surprisingly, I also can't
find any uses at all of LessPreciseFPMAD affecting generated code.

The NVPTX/fast-math.ll test changes are an expected result of fixing
this bug.  When FMA is disabled, we emit add as "add.rn.f32", which
prevents fma combining.  Before this patch, fast-math was enabled in all
functions following the one which explicitly enabled it on itself, so we
were emitting plain "add.f32" where we should have generated
"add.rn.f32".

Reviewers: mkuper

Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28507

llvm-svn: 291618
2017-01-10 23:43:04 +00:00
Justin Lebar
4dd3b74fa6 [NVPTX] Add CHECK-LABEL where appropriate to fast-math.ll test.
Also fix up whitespace.

Test-only change.

llvm-svn: 291617
2017-01-10 23:42:46 +00:00
David Majnemer
f0ed7ae0f8 [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

llvm-svn: 291196
2017-01-06 00:11:46 +00:00
David Majnemer
8347bff07a [NVVMIntrRange] Only set range metadata if none is already present
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.

llvm-svn: 290294
2016-12-22 00:51:59 +00:00
Adrian Prantl
cf7e846d11 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Adrian Prantl
0ab6669d6d Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Adrian Prantl
2345112c5b [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Adrian Prantl
daf4fef1f9 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl
0eee52640f [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Justin Lebar
5f4293f5f4 Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.
llvm-svn: 289730
2016-12-14 22:32:55 +00:00
Justin Lebar
8506de7ef9 [NVPTX] Support .maxnreg annotation.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D27638

llvm-svn: 289729
2016-12-14 22:32:50 +00:00
Justin Lebar
63fe7a59a1 [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass.  We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.

Reviewers: tra

Subscribers: llvm-commits, jholewinski, jingyue, mgorny

Differential Revision: https://reviews.llvm.org/D26165

llvm-svn: 285642
2016-10-31 21:51:42 +00:00
Justin Lebar
50ae991af5 [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

llvm-svn: 285461
2016-10-28 21:44:00 +00:00
Artem Belevich
ed0bd7024b [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
NAKAMURA Takumi
675f769e36 llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.
LLVM ERROR: Cannot select: 0x3607bf0: i32 = ExternalSymbol'__powidf2'

llvm-svn: 282053
2016-09-21 04:43:11 +00:00
Jacques Pienaar
92806722a6 [NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.

Reviewers: eliben, jpienaar

Subscribers: jlebar, vchuravy, cfe-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D9168

llvm-svn: 282045
2016-09-21 01:57:57 +00:00
Peter Collingbourne
a9fbb2813b DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
Justin Lebar
1c4f697044 [NVPTX] Use ldg for explicitly invariant loads.
Summary:
With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic.  This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.

Reviewers: tra

Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc

Differential Revision: https://reviews.llvm.org/D23477

llvm-svn: 281152
2016-09-11 01:39:04 +00:00
Justin Lebar
720b561453 [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.
Summary:
Previously these only worked via NVPTX-specific intrinsics.

This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.

It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.

Reviewers: majnemer

Subscribers: llvm-commits, jholewinski, tra

Differential Revision: https://reviews.llvm.org/D24300

llvm-svn: 281092
2016-09-09 21:07:26 +00:00
Justin Lebar
e64219b8ab [NVPTX] Switch nvptx-use-infer-addrspace to true.
Summary:
This switches us to use a different, more powerful algorithm for address
space inference.  I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D23694

llvm-svn: 279317
2016-08-19 20:46:45 +00:00
Artem Belevich
a2d7608fe7 [NVPTX] Use untyped (.b) integer registers in PTX.
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.

For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.

Differential Revision: https://reviews.llvm.org/D23460

llvm-svn: 278568
2016-08-12 22:02:19 +00:00
Artem Belevich
633861baf7 [NVPTX] remove unnecessary named metadata update that happens to break debug info.
Also added test case to verify IR changes done by NVPTXGenericToNVVM pass.

Differential Revision: https://reviews.llvm.org/D22837

llvm-svn: 277520
2016-08-02 20:58:24 +00:00
Justin Lebar
8d43f89f13 Fix NVPTX/call-with-alloca-buffer.ll after r276777.
r276777 makes InstSimplify stronger, letting it see through some
unnecessary addrspace casts.

llvm-svn: 276786
2016-07-26 18:28:33 +00:00
Justin Lebar
63ae2eb95c [NVPTX] Enable the load-store vectorizer on nvptx.
Reviewers: tra

Subscribers: jholewinski, arsenm, asbirlea

Differential Revision: https://reviews.llvm.org/D22592

llvm-svn: 276196
2016-07-20 22:11:36 +00:00
Artem Belevich
5fd5640c49 [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.
After r276153 the pass applies to both kernels and regular functions.

Differential Revision: https://reviews.llvm.org/D22583

llvm-svn: 276189
2016-07-20 21:44:07 +00:00
Artem Belevich
7eddb3f699 [NVPTX] deal with all aggregate return types.
Fixes a crash in llvm_unreachable when a function has array return type.

Differential Revision: https://reviews.llvm.org/D22524

llvm-svn: 276154
2016-07-20 18:39:52 +00:00
Artem Belevich
6da91c5086 [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.

Differential Review: https://reviews.llvm.org/D21421

llvm-svn: 276153
2016-07-20 18:39:47 +00:00
Artem Belevich
c6852a2c64 [NVPTX] Make sure we adjust alignment at all call sites
.. including calls from kernel functions that were
ignored by mistake before.

llvm-svn: 275920
2016-07-18 21:58:48 +00:00
Artem Belevich
b6f334a5b3 [NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.

The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.

Differential Revision: https://reviews.llvm.org/D22428

llvm-svn: 275893
2016-07-18 19:54:56 +00:00
Justin Bogner
b7a198f7fd NVPTX: Remove the legacy ptx intrinsics
- Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but
  not all of these registers were already accessible via the nvvm
  name.
- Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0.

There's a fair amount of code motion here, but it's all very
mechanical.

llvm-svn: 274769
2016-07-07 16:40:17 +00:00
Justin Lebar
110a699867 [NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D22068

llvm-svn: 274674
2016-07-06 21:06:10 +00:00
Justin Bogner
18054ad909 NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0
Everywhere where cuda.syncthreads or __syncthreads is used, use the
properly namespaced nvvm.barrier0 instead.

llvm-svn: 274664
2016-07-06 20:02:45 +00:00
Artem Belevich
92c43be169 Revert r273313 "[NVPTX] Improve lowering of byval args of device functions."
The change causes llvm crash in some unoptimized builds.

llvm-svn: 274163
2016-06-29 20:51:15 +00:00
Justin Holewinski
7a9f76e68a Only emit extension for zeroext/signext arguments if type is < 32 bits
Reviewers: jingyue, jlebar

Subscribers: jholewinski

Differential Revision: http://reviews.llvm.org/D21756

llvm-svn: 273922
2016-06-27 20:22:22 +00:00
Artem Belevich
daaa866808 [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.

Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.

This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).

This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.

Differential Review: http://reviews.llvm.org/D21421

llvm-svn: 273313
2016-06-21 20:30:26 +00:00
Justin Lebar
05e13f8eda [NVPTX] Add intrinsics for shfl instructions.
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers.  Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D21160

llvm-svn: 272298
2016-06-09 20:04:08 +00:00
Artem Belevich
f14e0f0a96 [NVPTX] Added NVVMIntrRange pass
NVVMIntrRange adds !range metadata to calls of NVVM intrinsics
that return values within known limited range.

This allows LLVM to generate optimal code for indexing arrays
based on tid/ctaid which is a frequently used pattern in CUDA code.

Differential Revision: http://reviews.llvm.org/D20644

llvm-svn: 270872
2016-05-26 17:02:56 +00:00