1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

15534 Commits

Author SHA1 Message Date
Adrian Prantl
fb3abba237 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

llvm-svn: 266446
2016-04-15 15:57:41 +00:00
NAKAMURA Takumi
b397b15fe8 llvm/test/CodeGen/AArch64/arm64-csldst-mmo.ll requires +Asserts.
llvm-svn: 266443
2016-04-15 15:37:27 +00:00
Geoff Berry
56afefb9c8 [AArch64] Add MMOs to callee-save load/store instructions.
Summary:
Without MMOs, the callee-save load/store instructions were treated as
volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer.

Reviewers: t.p.northover, mcrosier

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17661

llvm-svn: 266439
2016-04-15 15:16:19 +00:00
Nirav Dave
a36a5aeca3 Fix typing on generated LXV2DX/STXV2DX instructions
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.

This fixes PR27350.

Reviewers: nemanjai

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19133

llvm-svn: 266438
2016-04-15 15:01:38 +00:00
Jun Bum Lim
ad7ab4cf46 [MachineScheduler]Add support for store clustering
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.

This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().

llvm-svn: 266437
2016-04-15 14:58:38 +00:00
Nicolai Haehnle
730a184595 AMDGPU/SI: Fix regression with no-return atomics
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19043

llvm-svn: 266433
2016-04-15 14:42:36 +00:00
Justin Lebar
d4f9b5931f Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.
llvm-svn: 266403
2016-04-15 01:20:52 +00:00
Matt Arsenault
18048c7ee0 AMDGPU: Include LDS size in printed comment
llvm-svn: 266382
2016-04-14 22:11:51 +00:00
Matt Arsenault
2dfb6d03c5 AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
e73cb153a7 AMDGPU: Fold bitcasts of scalar constants to vectors
This cleans up some messes since the individual scalar components
can be CSEed.

llvm-svn: 266376
2016-04-14 21:58:07 +00:00
Tom Stellard
383448325b AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

llvm-svn: 266356
2016-04-14 19:09:28 +00:00
Jacques Pienaar
7870fedd7e [lanai] Add custom lowering for SRL_PARTS i32.
llvm-svn: 266349
2016-04-14 17:59:22 +00:00
Nicolai Haehnle
bbd04b8c49 [DivergenceAnalysis] Treat PHI with incoming undef as constant
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.

This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.

This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex

Reviewers: arsenm, tstellarAMD, jingyue

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19013

llvm-svn: 266347
2016-04-14 17:42:47 +00:00
Nicolai Haehnle
014104e476 AMDGPU: Remove SIFixSGPRLiveRanges pass
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

llvm-svn: 266345
2016-04-14 17:42:29 +00:00
Tim Northover
d6721e4e7e AArch64: expand cmpxchg after regalloc at -O0.
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.

I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.

It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.

Should fix PR25526.

llvm-svn: 266339
2016-04-14 17:03:29 +00:00
Jacques Pienaar
77c93881e7 [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth.
Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched.

Reviewers: eliben, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18903

llvm-svn: 266338
2016-04-14 16:47:42 +00:00
Tom Stellard
c0b7282ebc AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard
b9654f5566 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Simon Dardis
46f0e9cf63 Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.

llvm-svn: 266301
2016-04-14 13:43:17 +00:00
Vasileios Kalintiris
1ea0fc57ba [mips] Remove duplicate tests and add missing prefixes for *-LABEL checks. NFC.
Summary:
The only difference between the removed tests and the pre-existing
ones, is the materialization of the zero constant, which shouldn't
matter for these cases.

Reviewers: dsanders, sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18693

llvm-svn: 266285
2016-04-14 09:13:13 +00:00
Adam Nemet
aa40b369a6 Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This reverts commit r266086.

It breaks the LTO build of gcc in SPEC2000.

llvm-svn: 266282
2016-04-14 08:47:17 +00:00
David Majnemer
d2ed420815 [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.

It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM.  Notably, ARMv7 NEON's vmin has the
-NAN semantics.

N.B.  Of course, it is only safe to do this if the intrinsic call is
marked nnan.

llvm-svn: 266279
2016-04-14 07:13:24 +00:00
Matt Arsenault
489a8fbeea AMDGPU: Implement canonicalize
Also add generic DAG node for it.

llvm-svn: 266272
2016-04-14 01:42:16 +00:00
Sanjay Patel
4fd6cc9cca [ppc] add tests to show potential andc optimization
llvm-svn: 266261
2016-04-13 23:23:30 +00:00
Tim Northover
3d26dcef22 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Matthias Braun
1f20820c09 ARM: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18901

llvm-svn: 266253
2016-04-13 21:43:25 +00:00
Matthias Braun
fc08c62acd X86: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18902

llvm-svn: 266252
2016-04-13 21:43:21 +00:00
Matthias Braun
40d6ea2b7b AArch64: Use a callee save registers for swiftself parameters
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D19007

llvm-svn: 266251
2016-04-13 21:43:16 +00:00
Sanjay Patel
d32f647225 [x86] add tests to show potential BMI optimization
llvm-svn: 266243
2016-04-13 20:40:43 +00:00
Evandro Menezes
076242a2eb [AArch64] Disable LDP/STP for quads
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.

Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.

llvm-svn: 266223
2016-04-13 18:31:45 +00:00
Nirav Dave
abf596136a Cleanup Store Merging in UseAA case
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.

Reviewers: jyknight

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18909

llvm-svn: 266217
2016-04-13 17:27:26 +00:00
Tim Northover
f9f3caeceb AArch64: don't create instructions that write to xzr/wzr twice.
These are unpredictable even on AArch64.

Patch by Yichao Yu.

llvm-svn: 266206
2016-04-13 16:25:39 +00:00
Artem Tamazov
ead8b434de [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.

TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Differential Revision: http://reviews.llvm.org/D17825

llvm-svn: 266205
2016-04-13 16:18:41 +00:00
Zoran Jovanovic
9cd420ea3a [mips] Fix emitAtomicCmpSwapPartword to handle 64 bit pointers correctly
Differential Revision: http://reviews.llvm.org/D18995

llvm-svn: 266204
2016-04-13 16:02:25 +00:00
Vasileios Kalintiris
93f6871925 [mips] Sign-extend i32 values truncated from previously zero-extended i32 values.
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.

Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.

Reviewers: dsanders

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18893

llvm-svn: 266203
2016-04-13 15:07:45 +00:00
Simon Pilgrim
07bfcfd5f1 [X86][SSE] Regenerated vector integer absolute tests
llvm-svn: 266194
2016-04-13 12:40:22 +00:00
Simon Pilgrim
8a8bff24e5 Added missing autogeneration note
llvm-svn: 266185
2016-04-13 09:28:44 +00:00
Zlatko Buljan
b4097ad285 [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.

llvm-svn: 266179
2016-04-13 08:02:26 +00:00
Hrvoje Varga
acf02ee748 [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

This changes contains fix for failing test-suite. So, this patch should hopefully work now.

llvm-svn: 266171
2016-04-13 06:17:21 +00:00
Wei Mi
a92f4c62f1 Recommit r265547, and r265610,r265639,r265657 on top of it, plus
two fixes with one about error verify-regalloc reported, and
another about live range update of phi after rematerialization.

r265547:
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Patches on top of r265547:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"

Differential Revision: http://reviews.llvm.org/D15302
Differential Revision: http://reviews.llvm.org/D18934
Differential Revision: http://reviews.llvm.org/D18935
Differential Revision: http://reviews.llvm.org/D18936

llvm-svn: 266162
2016-04-13 03:08:27 +00:00
Matt Arsenault
677106a8c7 AMDGPU: Add test for m0 initialization in basic loop
Initialization of m0 is emitted for each LDS operation, so
every block with LDS usage ends up with one. MachineLICM
used to fail to hoist this out of the loop, so every loop
iteration with LDS usage in it would re-initialize it.

This seems to be fixed now, so add a test to make sure that
it stays this way.

llvm-svn: 266156
2016-04-13 00:39:52 +00:00
Justin Bogner
a980977c67 CodeGen: Clear the MFI's save and restore point after PrologEpilogInserter
This state is no longer useful and not guaranteed to be valid in later
codegen passes. For example, see the added test, which would print a
savepoint of %bb.-1 without this change, and crashes with a
use-after-free error under ASan if you apply the recycling allocator
patch from llvm.org/PR26808.

llvm-svn: 266150
2016-04-12 23:21:53 +00:00
Nicolai Haehnle
30a743add7 AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

llvm-svn: 266126
2016-04-12 21:18:10 +00:00
Derek Schuff
887911441a [WebAssembly] Fix debug info in reg-stackify.ll test
It lacked a CU and thus became invalid with r266102

llvm-svn: 266114
2016-04-12 20:12:05 +00:00
Tom Stellard
ec17b58f39 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

llvm-svn: 266105
2016-04-12 18:40:43 +00:00
Matt Arsenault
0740a1bead AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.

I think this could be a generic combine but I'm not sure.

llvm-svn: 266104
2016-04-12 18:24:38 +00:00
Nicolai Haehnle
be6e455946 AMDGPU/SI: Fix a mis-compilation of multi-level breaks
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.

This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18967

llvm-svn: 266088
2016-04-12 16:10:38 +00:00
Artur Pilipenko
d9a8d9a1af Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change.

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 266086
2016-04-12 15:58:04 +00:00
Geoff Berry
6cb5fc80e8 [ScheduleDAGInstrs] Handle instructions with multiple MMOs
Summary:
In getUnderlyingObjectsForInstr(): Don't give up on instructions with
multiple MMOs, instead look through all the MMOs and if they all meet
the conservative criteria previously used for single MMO instructions,
then return all of the underlying objects derived from the MMOs.

The change to ScheduleDAGInstrs::buildSchedGraph() is needed to avoid
the case where multiple underlying objects are present and are related
in such a way that successive iterations of the loop end up adding a
dependency from an instruction to itself.

Reviewers: atrick, hfinkel

Subscribers: MatzeB, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18093

llvm-svn: 266084
2016-04-12 15:50:19 +00:00
Than McIntosh
b927171cd9 Test commit, NFC.
Adds a blank line.

llvm-svn: 266082
2016-04-12 15:35:05 +00:00