1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
Commit Graph

178 Commits

Author SHA1 Message Date
Jingyue Wu
2526430a5c [NVPTX] Adds a new address space inference pass.
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).

The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.

Reviewers: jholewinski, tra, jlebar

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17965

llvm-svn: 263916
2016-03-20 20:59:20 +00:00
Justin Lebar
881bba8e1c [NVPTX] Use different, convergent MIs for convergent calls.
Summary:
Calls sometimes need to be convergent.  This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.

Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs.  But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.

Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.

Reviewers: jholewinski

Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17423

llvm-svn: 262373
2016-03-01 19:24:03 +00:00
Justin Lebar
3251647c03 Don't tail-duplicate blocks that contain convergent instructions.
Summary:
Convergent instrs shouldn't be made control-dependent on other values,
but this is basically the whole point of tail duplication.  So just bail
if we see a convergent instruction.

Reviewers: iteratee

Subscribers: jholewinski, jhen, hfinkel, tra, jingyue, llvm-commits

Differential Revision: http://reviews.llvm.org/D17320

llvm-svn: 261540
2016-02-22 17:50:52 +00:00
Justin Lebar
ab72baa618 [NVPTX] Test that MachineSink won't sink across llvm.cuda.syncthreads.
Summary:
The syncthreads MI is modeled as mayread/maywrite -- convergence doesn't
even come into play here.  Nonetheless this property is highly implicit
in the tablegen files, so a test seems appropriate.

Reviewers: jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D17319

llvm-svn: 261114
2016-02-17 17:46:52 +00:00
Justin Lebar
f89d876f55 [NVPTX] Annotate call machine instructions as calls.
Summary:
Otherwise we'll try to do unsafe optimizations on these MIs, such as
sinking loads below calls.

(I suspect that this is not the only bug in the NVPTX instruction
tablegen files; I need to comb through them.)

Reviewers: jholewinski, tra

Subscribers: jingyue, jhen, llvm-commits

Differential Revision: http://reviews.llvm.org/D17315

llvm-svn: 261113
2016-02-17 17:46:50 +00:00
NAKAMURA Takumi
2c16d4bde8 llvm/test/CodeGen/NVPTX/debug-file-loc.ll: Tweak expressions for dos path.
llvm-svn: 260623
2016-02-11 23:59:43 +00:00
Artem Belevich
672363ee16 [NVPTX] emit .file directives for files referenced by subprograms.
.. so .loc directives referring to those files work correctly.

Differential Revision: http://reviews.llvm.org/D17086

llvm-svn: 260557
2016-02-11 18:21:47 +00:00
Jingyue Wu
bb54579422 [NVPTX] Disable performance optimizations when OptLevel==None
Reviewers: jholewinski, tra, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16874

llvm-svn: 259749
2016-02-04 04:15:36 +00:00
Justin Lebar
e107deb822 [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.
Summary: Previously we'd just silently skip these.

Reviewers: tra, jholewinski

Subscribers: llvm-commits, jhen, echristo,

Differential Revision: http://reviews.llvm.org/D16739

llvm-svn: 259279
2016-01-30 01:07:38 +00:00
Justin Lebar
67acdea900 [CUDA] Die gracefully when trying to output an LLVM alias.
Summary:
Previously, we would just output "foo = bar" in the assembly, and then
ptxas would choke.  Now we die before emitting any invalid code.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, jhen, tra

Differential Revision: http://reviews.llvm.org/D16490

llvm-svn: 258638
2016-01-23 21:12:20 +00:00
Jingyue Wu
90a1a65026 [NVPTX] expand mul_lohi to mul_lo and mul_hi
Summary: Fixes PR26186.

Reviewers: grosser, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16479

llvm-svn: 258536
2016-01-22 19:47:26 +00:00
Artem Belevich
b31d5d18b5 [NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.
llvm-svn: 257961
2016-01-15 23:57:53 +00:00
Rafael Espindola
9cb8841b77 Have a single way for creating unique value names.
We had two code paths. One would create names like "foo.1" and the other
names like "foo1".

For globals it is important to use "foo.1" to help C++ name demangling.
For locals there is no strong reason to go one way or the other so I
kept the most common mangling (foo1).

llvm-svn: 253804
2015-11-22 00:16:24 +00:00
Pete Cooper
b753649d63 Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Pete Cooper
aca4c5cdc6 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Bjarke Hammersholt Roune
28d5088cb1 [NVPTX] Let NVPTX backend detect integer min and max patterns.
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.


Reviewers: jholewinski, meheff, jingyue

Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D12377

llvm-svn: 246107
2015-08-26 23:22:02 +00:00
Jingyue Wu
2549889c79 [NVPTX] Allow undef value as global initializer
Summary:
__shared__ variable may now emit undef value as initializer, do not
throw error on that.

Test Plan: test/CodeGen/NVPTX/global-addrspace.ll

Patch by Xuetian Weng

Reviewers: jholewinski, tra, jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D12242

llvm-svn: 245785
2015-08-22 05:40:26 +00:00
Mark Heffernan
030525f5bf Use 32-bit divides instead of 64-bit divides where possible.
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.

llvm-svn: 244684
2015-08-11 22:16:34 +00:00
Jingyue Wu
41c8aefa76 SelectionDAG: Prefer to combine multiplication with less uses for fma
Summary:
For example:

  s6 = s0*s5;
  s2 = s6*s6 + s6;
  ...
  s4 = s6*s3;

We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.

Test Plan: test/CodeGen/NVPTX/fma-assoc.ll

Patch by Xuetian Weng

Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm

Subscribers: arsenm, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D11855

llvm-svn: 244649
2015-08-11 19:21:46 +00:00
Jonathan Roelofs
3ac128b6b7 Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.

llvm-svn: 244481
2015-08-10 19:01:27 +00:00
Bjarke Hammersholt Roune
f7c0d778e2 [NVPTX] Use LDG for pointer induction variables.
More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables.

Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads.

llvm-svn: 244166
2015-08-05 23:11:57 +00:00
Jingyue Wu
2c2fc26fb5 [NVPTX] allow register copy between float and int
Summary:
Fixes PR24303. With Bruno's WIP (D11197) on PeepholeOptimizer, across-class
register copying (e.g. i32 to f32) becomes possible. Enhance
NVPTXInstrInfo::copyPhysReg to handle these cases.

Reviewers: jholewinski

Subscribers: eliben, jholewinski, llvm-commits, bruno

Differential Revision: http://reviews.llvm.org/D11622

llvm-svn: 243839
2015-08-01 18:02:12 +00:00
Jingyue Wu
d1717a8a10 [NVPTX] convert pointers in byval kernel arguments to global
Summary:
For example, in

  struct S {
    int *x;
    int *y;
  };
  __global__ void foo(S s) {
    int *b = s.y;
    // use b
  }

"b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for
accessing "b".

Reviewers: jholewinski

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11505

llvm-svn: 243790
2015-07-31 21:44:14 +00:00
Jingyue Wu
e5794c4c5a Roll forward r242871
r242871 missed one place that should be guarded with isPhysicalReg. This patch
fixes that.

llvm-svn: 243555
2015-07-29 18:59:09 +00:00
Jingyue Wu
96bbc40a55 Temporarily revert r242871
PR24299

llvm-svn: 243522
2015-07-29 15:26:11 +00:00
Jingyue Wu
45a0757122 [BranchFolding] do not iterate the aliases of virtual registers
Summary:
MCRegAliasIterator only works for physical registers. So, do not run it
on virtual registers.

With this issue fixed, we can resurrect the BranchFolding pass in NVPTX
backend.

Reviewers: jholewinski, bkramer

Subscribers: henryhu, meheff, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11174

llvm-svn: 242871
2015-07-22 04:16:52 +00:00
Jingyue Wu
b0d6c8585a [NVPTX] make load on global readonly memory to use ldg
Summary:
[NVPTX] make load on global readonly memory to use ldg

Summary:
As describe in [1], ld.global.nc may be used to load memory by nvcc when
__restrict__ is used and compiler can detect whether read-only data cache
is safe to use.

This patch will try to check whether ldg is safe to use and use them to
replace ld.global when possible. This change can improve the performance
by 18~29% on affected kernels (ratt*_kernel and rwdot*_kernel) in 
S3D benchmark of shoc [2]. 

Patched by Xuetian Weng. 

[1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache
[2] https://github.com/vetter/shoc

Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll

Reviewers: jholewinski, jingyue

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D11314

llvm-svn: 242713
2015-07-20 21:28:54 +00:00
Eli Bendersky
d69a59088c Use inbounds GEPs for memcpy and memset lowering
Follow-up on discussion in http://reviews.llvm.org/D11220

llvm-svn: 242542
2015-07-17 16:42:33 +00:00
Eli Bendersky
901c8f80fc Correct lowering of memmove in NVPTX
This fixes https://llvm.org/bugs/show_bug.cgi?id=24056

Also a bit of refactoring along the way.

Differential Revision: http://reviews.llvm.org/D11220

llvm-svn: 242413
2015-07-16 16:27:19 +00:00
Eli Bendersky
711d658342 Actually support volatile memcpys in NVPTX lowering
Differential Revision: http://reviews.llvm.org/D11091

llvm-svn: 241914
2015-07-10 15:40:33 +00:00
Jingyue Wu
7084152bd9 [NVPTX] declare no vector registers
Summary:
Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll)
produces code with complex control flow which hurts later optimizations. Since
NVPTX doesn't have vector registers in LLVM's sense
(NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector
registers to effectively disable loop vectorization.

Reviewers: jholewinski

Subscribers: jingyue, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11089

llvm-svn: 241884
2015-07-10 04:31:56 +00:00
Eli Bendersky
374b7e43da Add tests for the NVPTXLowerAggrCopies pass.
Note: not testing memmove lowering for now, as it's broken
[see https://llvm.org/bugs/show_bug.cgi?id=24056]

llvm-svn: 241736
2015-07-08 21:29:28 +00:00
Jingyue Wu
5f36b4cd05 [NVPTX] expand extload/truncstore for vectors of floats
Summary:
According to PTX ISA:

For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.

So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.

As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]

This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.

Patched by Gang Hu. 

Test Plan: Test case attached.

Reviewers: jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D10876

llvm-svn: 241191
2015-07-01 21:32:42 +00:00
Jingyue Wu
bf15de754a [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.

Patched by Xuetian Weng. 

Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10853

llvm-svn: 241185
2015-07-01 20:08:06 +00:00
Jingyue Wu
20fd92cbbc [NVPTX] Fix issue introduced in D10321
Summary:
Really check if %SP is not used in other places, instead of checking only exact
one non-dbg use.

Patched by Xuetian Weng. 

Test Plan:
@foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that
SP will appear twice.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, sfantao, jholewinski

Differential Revision: http://reviews.llvm.org/D10844

llvm-svn: 241099
2015-06-30 18:59:19 +00:00
Samuel Antao
067b1d340e Force relocation mode to be default, regardless of what is passed to the backend.
llvm-svn: 241081
2015-06-30 17:18:00 +00:00
Jingyue Wu
e89c324de3 [NVPTX] noop when kernel pointers are already global
Summary:
Some front ends make kernel pointers global already. In that case,
handlePointerParams does nothing.

Test Plan: more tests in lower-kernel-ptr-arg.ll

Reviewers: grosser

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10779

llvm-svn: 240849
2015-06-26 22:35:43 +00:00
Jingyue Wu
7ff7eb51d2 Add NVPTXPeephole pass to reduce unnecessary address cast
Summary:
This patch first change the register that holds local address for stack
frame to %SPL. Then the new NVPTXPeephole pass will try to scan the
following pattern

   %vreg0<def> = LEA_ADDRi64 <fi#0>, 4
   %vreg1<def> = cvta_to_local %vreg0

and transform it into

   %vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4

Patched by Xuetian Weng

Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: eliben, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10549

llvm-svn: 240587
2015-06-24 20:20:16 +00:00
Artem Belevich
10eb05b756 [NVPTX] Added missing test case for llvm.nvvm.sqrt.f NVPTX intrinsic
Differential Revision: http://reviews.llvm.org/D10663

llvm-svn: 240437
2015-06-23 18:22:17 +00:00
Jingyue Wu
34412b317e Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address
Summary:
This is done by first adding two additional instructions to convert the
alloca returned address to local and convert it back to generic. Then
replace all uses of alloca instruction with the converted generic
address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine
the generic addresscast and the corresponding Load, Store, Bitcast, GEP
Instruction together.

Patched by Xuetian Weng (xweng@google.com). 

Test Plan: test/CodeGen/NVPTX/lower-alloca.ll

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: meheff, broune, eliben, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10483

llvm-svn: 239964
2015-06-17 22:31:02 +00:00
Philip Reames
fc6ddd62bf Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters
The original change broke clang side tests.  I will be submitting those momentarily.  This change includes post commit feedback on the original change from from Pete Cooper.

Original Submission comments:
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239849
2015-06-16 20:24:25 +00:00
Philip Reames
2aad4769d2 Revert 239795
I forgot to update some clang test cases.  I'll fix and resubmit tomorrow.

llvm-svn: 239800
2015-06-16 01:20:53 +00:00
Philip Reames
54716a6f5b [InstCombine] Propagate non-null facts to call parameters
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239795
2015-06-16 00:43:54 +00:00
Jingyue Wu
2406860399 [NVPTX] fix a crash bug in NVPTXFavorNonGenericAddrSpaces
Summary:
We used to assume V->RAUW only modifies the operand list of V's user.
However, if V and V's user are Constants, RAUW may replace and invalidate V's
user entirely.

This patch fixes the above issue by letting the caller replace the
operand instead of calling RAUW on Constants.

Test Plan: @nested_const_expr and @rauw in access-non-generic.ll

Reviewers: broune, jholewinski

Reviewed By: broune, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10345

llvm-svn: 239435
2015-06-09 21:50:32 +00:00
Samuel Antao
1e229d55ce The constant initialization for globals in NVPTX is generated as an
array of bytes. The generation of this byte arrays was expecting 
the host to be little endian, which prevents big endian hosts to be 
used in the generation of the PTX code. This patch fixes the 
problem by changing the way the bytes are extracted so that it 
works for either little and big endian.

llvm-svn: 239412
2015-06-09 16:29:34 +00:00
Jingyue Wu
c74965ca17 [NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces
Summary:
This cleans up most allocas NVPTXLowerKernelArgs emits for byval
parameters.

Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.

Reviewers: eliben, jholewinski

Reviewed By: eliben, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10322

llvm-svn: 239368
2015-06-09 00:05:56 +00:00
Jingyue Wu
efaa872a9c [NVPTX] roll forward r239082
NVPTXISelDAGToDAG translates "addrspacecast to param" to
NVPTX::nvvm_ptr_gen_to_param

Added an llc test in bug21465.

llvm-svn: 239100
2015-06-04 21:28:26 +00:00
Jingyue Wu
aaaab32848 Revert r239082
llc crashed for NVPTX backend

llvm-svn: 239094
2015-06-04 21:07:08 +00:00
Jingyue Wu
98b56fca22 [NVPTX] kernel pointer arguments point to the global address space
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.

Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll

Test Plan: lower-kernel-ptr-arg.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: jholewinski

Subscribers: wengxt, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10154

llvm-svn: 239082
2015-06-04 20:19:38 +00:00
Jingyue Wu
40e406369e [NVPTXFavorNonGenericAddrSpaces] recursively trace into GEP and BitCast
Summary:
This patch allows NVPTXFavorNonGenericAddrSpaces to remove addrspacecast
from longer chains consisting of GEPs and BitCasts. For example, it can
now optimize

  %0 = addrspacecast [10 x float] addrspace(3)* @a to [10 x float]*
  %1 = gep [10 x float]* %0, i64 0, i64 %i
  %2 = bitcast float* %1 to i32*
  %3 = load i32* %2 ; emits ld.u32

to

  %0 = gep [10 x float] addrspace(3)* @a, i64 0, i64 %i
  %1 = bitcast float addrspace(3)* %0 to i32 addrspace(3)*
  %3 = load i32 addrspace(3)* %1 ; emits ld.shared.f32

Test Plan: @ld_int_from_global_float in access-non-generic.ll

Reviewers: broune, eliben, jholewinski, meheff

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10074

llvm-svn: 238574
2015-05-29 17:00:27 +00:00