llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00

History

Arpith Chacko Jacob 82b0fc2a51 [NVPTX] Add intrinsics to support named barriers. Support for barrier synchronization between a subset of threads in a CTA through one of sixteen explicitly specified barriers. These intrinsics are not directly exposed in CUDA but are critical for forthcoming support of OpenMP on NVPTX GPUs. The intrinsics allow the synchronization of an arbitrary (multiple of 32) number of threads in a CTA at one of 16 distinct barriers. The two intrinsics added are as follows: call void @llvm.nvvm.barrier.n(i32 10) waits for all threads in a CTA to arrive at named barrier #10. call void @llvm.nvvm.barrier(i32 15, i32 992) waits for 992 threads in a CTA to arrive at barrier #15. Detailed description of these intrinsics are available in the PTX manual. http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions Reviewers: hfinkel, jlebar Differential Revision: https://reviews.llvm.org/D17657 llvm-svn: 293384		2017-01-28 16:38:15 +00:00
..
access-non-generic.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
add-128bit.ll
addrspacecast-gvar.ll
addrspacecast.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
aggr-param.ll
aggregate-return.ll	[NVPTX] deal with all aggregate return types.	2016-07-20 18:39:52 +00:00
alias.ll	[CUDA] Die gracefully when trying to output an LLVM alias.	2016-01-23 21:12:20 +00:00
annotations.ll	Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.	2016-12-14 22:32:55 +00:00
arg-lowering.ll
arithmetic-fp-sm20.ll
arithmetic-int.ll	[NVPTX] expand mul_lohi to mul_lo and mul_hi	2016-01-22 19:47:26 +00:00
atomics-with-scope.ll	[NVPTX] Added intrinsics for atom.gen.{sys\|cta}.* instructions.	2016-09-28 17:25:38 +00:00
atomics.ll
bfe.ll
branch-fold.ll	Roll forward r242871	2015-07-29 18:59:09 +00:00
bug17709.ll
bug21465.ll	[NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.	2016-07-20 21:44:07 +00:00
bug22246.ll
bug22322.ll	[NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.	2016-09-09 21:07:26 +00:00
bug26185-2.ll	[NVPTX] Fix sign/zero-extending ldg/ldu instruction selection	2016-05-02 18:12:02 +00:00
bug26185.ll	[NVPTX] Handle ldg created from sign-/zero-extended load	2016-04-05 12:38:01 +00:00
bypass-div.ll	Use 32-bit divides instead of 64-bit divides where possible.	2015-08-11 22:16:34 +00:00
call-with-alloca-buffer.ll	Fix NVPTX/call-with-alloca-buffer.ll after r276777.	2016-07-26 18:28:33 +00:00
callchain.ll
calling-conv.ll
combine-min-max.ll	[NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.	2017-01-18 00:09:01 +00:00
compare-int.ll
constant-vectors.ll
convergent-mir-call.ll	[NVPTX] Use different, convergent MIs for convergent calls.	2016-03-01 19:24:03 +00:00
convert-fp.ll	[NVPTX] Add fptosi tests to convert-fp.ll.	2017-01-15 16:55:54 +00:00
convert-int-sm20.ll
ctlz.ll	[NVPTX] Fix function names in ctlz.ll test. Test-only change.	2017-01-18 00:07:52 +00:00
ctpop.ll
cttz.ll
debug-file-loc.ll	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	2016-04-15 15:57:41 +00:00
disable-opt.ll	[NVPTX] Disable performance optimizations when OptLevel==None	2016-02-04 04:15:36 +00:00
div-ri.ll
divrem-combine.ll	[NVPTX] Compute 'rem' using the result of 'div', if possible.	2016-10-28 21:44:00 +00:00
envreg.ll
extloadv.ll
f16-instructions.ll	[NVPTX] Fix lowering of fp16 ISD::FNEG.	2017-01-19 00:14:45 +00:00
fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
fcos-no-fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
fma-assoc.ll	SelectionDAG: Prefer to combine multiplication with less uses for fma	2015-08-11 19:21:46 +00:00
fma-disable.ll
fma.ll
fp16.ll
fp-contract.ll
fp-literals.ll
fsin-no-fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
function-align.ll
generic-to-nvvm-ir.ll	[IR] Remove the DIExpression field from DIGlobalVariable.	2016-12-20 02:09:43 +00:00
generic-to-nvvm.ll
global-addrspace.ll	[NVPTX] Allow undef value as global initializer	2015-08-22 05:40:26 +00:00
global-ctor-empty.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ctor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-dtor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ordering.ll
global-variable-big.ll	[NVPTX] Support global variables of integer type larger than i64.	2017-01-18 00:29:53 +00:00
global-visibility.ll	[NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.	2016-01-15 23:57:53 +00:00
globals_init.ll
globals_lowering.ll
gvar-init.ll
half.ll	[NVPTX] Added support for half-precision floating point.	2017-01-13 20:56:17 +00:00
i1-global.ll
i1-int-to-fp.ll
i1-param.ll
i8-param.ll
idioms.ll	[NVPTX] Lower integer absolute value idiom to abs instruction.	2017-01-18 00:08:44 +00:00
imad.ll
implicit-def.ll
inline-asm.ll
intrin-nocapture.ll
intrinsic-old.ll	[NVVMIntrRange] Only set range metadata if none is already present	2016-12-22 00:51:59 +00:00
intrinsics.ll	Fix some broken CHECK lines.	2017-01-22 20:28:56 +00:00
isspacep.ll
ld-addrspace.ll
ld-generic.ll
ldg-invariant.ll	[NVPTX] Use ldg for explicitly invariant loads.	2016-09-11 01:39:04 +00:00
ldparam-v4.ll
ldu-i8.ll
ldu-ldg.ll
ldu-reg-plus-offset.ll
lit.local.cfg
load-sext-i1.ll
load-with-non-coherent-cache.ll	[NVPTX] Use LDG for pointer induction variables.	2015-08-05 23:11:57 +00:00
LoadStoreVectorizer.ll	[NVPTX] Enable the load-store vectorizer on nvptx.	2016-07-20 22:11:36 +00:00
local-stack-frame.ll
loop-vectorize.ll
lower-aggr-copies.ll	Revert "Change memcpy/memset/memmove to have dest and source alignments."	2015-11-19 05:56:52 +00:00
lower-alloca.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
lower-kernel-ptr-arg.ll	[NVPTX] Improve lowering of byval args of device functions.	2016-07-20 18:39:47 +00:00
machine-sink.ll
MachineSink-call.ll	[NVPTX] Annotate call machine instructions as calls.	2016-02-17 17:46:50 +00:00
MachineSink-convergent.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
managed.ll
math-intrins.ll	[NVPTX] Add codegen tests for llvm.fma.	2017-01-15 16:55:37 +00:00
misaligned-vector-ldst.ll
module-inline-asm.ll
mulwide.ll
named-barriers.ll	[NVPTX] Add intrinsics to support named barriers.	2017-01-28 16:38:15 +00:00
noduplicate-syncthreads.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
nounroll.ll
nvcl-param-align.ll
nvvm-reflect-module-flag.ll	[NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.	2016-04-01 01:09:07 +00:00
nvvm-reflect.ll	[NVPTX] Let there be One True Way to set NVVMReflect params.	2017-01-15 16:54:35 +00:00
param-align.ll	[NVPTX] Make sure we adjust alignment at all call sites	2016-07-18 21:58:48 +00:00
pr13291-i1-store.ll
pr16278.ll
pr17529.ll
refl1.ll
reg-copy.ll	[NVPTX] allow register copy between float and int	2015-08-01 18:02:12 +00:00
reg-types.ll	[NVPTX] Use untyped (.b) integer registers in PTX.	2016-08-12 22:02:19 +00:00
rotate.ll
rsqrt.ll
sched1.ll
sched2.ll
sext-in-reg.ll
sext-params.ll
shfl.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
shift-parts.ll
simple-call.ll
sm-version-20.ll
sm-version-21.ll
sm-version-30.ll
sm-version-32.ll
sm-version-35.ll
sm-version-37.ll
sm-version-50.ll
sm-version-52.ll
sm-version-53.ll
sm-version-60.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-61.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-62.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
speculative-execution-divergent-target.ll	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.	2016-04-15 01:20:52 +00:00
st-addrspace.ll
st-generic.ll
surf-read-cuda.ll
surf-read.ll
surf-write-cuda.ll
surf-write.ll
symbol-naming.ll	Have a single way for creating unique value names.	2015-11-22 00:16:24 +00:00
TailDuplication-convergent.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
tex-read-cuda.ll
tex-read.ll
texsurf-queries.ll
tid-range.ll	[SelectionDAG] Correctly transform range metadata to AssertZExt	2017-01-06 00:11:46 +00:00
tuple-literal.ll
vec8.ll
vec-param-load.ll
vector-args.ll
vector-call.ll	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI	2015-08-10 19:01:27 +00:00
vector-compare.ll
vector-global.ll
vector-loads.ll
vector-select.ll
vector-stores.ll
weak-global.ll
weak-linkage.ll
zero-cs.ll	llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.	2016-09-21 04:43:11 +00:00
zeroext-32bit.ll	Only emit extension for zeroext/signext arguments if type is < 32 bits	2016-06-27 20:22:22 +00:00