1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 11:42:57 +01:00
Commit Graph

130057 Commits

Author SHA1 Message Date
Craig Topper
31a58dd264 [X86] AND, OR, and XOR of vectors are always legal no need to set them legal explicitly.
llvm-svn: 266412
2016-04-15 06:20:14 +00:00
Craig Topper
61886cf59d [X86] Combine an if and else block that had the same set of calls to setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC
llvm-svn: 266410
2016-04-15 04:57:09 +00:00
Davide Italiano
b9715288e8 Revert "[LTO] Add a new splitCodeGen() API which takes a TargetMachineFactory."
This reverts commits r266390 and r266396 as they broke some bots.

llvm-svn: 266408
2016-04-15 02:07:03 +00:00
Justin Lebar
4b4ad39f81 [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.
Summary:
Calls on NVPTX are unusually expensive (for one thing, lots of state
needs to be saved to memory, which is slow), so make the inlininer much
more aggressive.

Reviewers: chandlerc

Subscribers: jholewinski, llvm-commits, tra

Differential Revision: http://reviews.llvm.org/D18561

llvm-svn: 266406
2016-04-15 01:38:50 +00:00
Justin Lebar
5824577459 [TTI] Add getInliningThresholdMultiplier.
Summary:
InlineCost's threshold is multiplied by this value.  This lets us adjust
the inlining threshold up or down on a per-target basis.  For example,
we might want to increase the threshold on targets where calls are
unusually expensive.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18560

llvm-svn: 266405
2016-04-15 01:38:48 +00:00
Justin Lebar
9ca74f58f5 [ifcnv] Don't duplicate blocks that contain convergent instructions.
It's unsafe to duplicate blocks that contain convergent instructions
during ifcnv.  See the patch for details.

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D17518

llvm-svn: 266404
2016-04-15 01:38:41 +00:00
Justin Lebar
d4f9b5931f Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.
llvm-svn: 266403
2016-04-15 01:20:52 +00:00
Justin Lebar
cad5d11faa [PM] Add a SpeculativeExecution pass for targets with divergent branches.
Summary:
This IR pass is helpful for GPUs, and other targets with divergent
branches.  It's a nop on targets without divergent branches.

Reviewers: chandlerc

Subscribers: llvm-commits, jingyue, rnk, joker.eph, tra

Differential Revision: http://reviews.llvm.org/D18626

llvm-svn: 266399
2016-04-15 00:32:12 +00:00
Justin Lebar
c6bd85bac2 [Speculation] Add a SpeculativeExecution mode where the pass does nothing unless TTI::hasBranchDivergence() is true.
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.

Reviewers: tra

Subscribers: llvm-commits, jingyue, rnk, chandlerc

Differential Revision: http://reviews.llvm.org/D18625

llvm-svn: 266398
2016-04-15 00:32:09 +00:00
Davide Italiano
919373bcbd [ParallelCG] Attempt to placate MSVC.
llvm-svn: 266396
2016-04-15 00:25:19 +00:00
Hans Wennborg
839aebe443 Option parser: class for consuming a joined arg in addition to all remaining args
llvm-svn: 266394
2016-04-15 00:23:30 +00:00
Hans Wennborg
4a59486df8 OptionParsingTest.cpp: reorder EXPECT_EQs to put expectation on the left. NFC.
This provides for better error messages from the framework when the expected
and actual values don't match.

llvm-svn: 266393
2016-04-15 00:23:15 +00:00
Davide Italiano
eb343bd288 [LTO] Add a new splitCodeGen() API which takes a TargetMachineFactory.
This will be used in lld to avoid creating TargetMachine in two
different places. See D18999 for a more detailed discussion.

Differential Revision:  http://reviews.llvm.org/D19139

llvm-svn: 266390
2016-04-15 00:07:28 +00:00
Vedant Kumar
8b67388584 [test] Require 'asserts' for a test which uses -debug-only
Without this line, bots which run check-all on Release compilers will
break.

llvm-svn: 266386
2016-04-14 23:32:40 +00:00
Matt Arsenault
6e839f3775 AMDGPU: Remove custom load/store scalarization
llvm-svn: 266385
2016-04-14 23:31:26 +00:00
Matt Arsenault
18048c7ee0 AMDGPU: Include LDS size in printed comment
llvm-svn: 266382
2016-04-14 22:11:51 +00:00
Michael Kuperstein
80e3983b83 [AliasSetTracker] Correctly handle changing the size of an entry
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.

This fixes PR27262.

Differential Revision: http://reviews.llvm.org/D18939

llvm-svn: 266381
2016-04-14 22:00:11 +00:00
Mehdi Amini
609e1c0360 Nuke getGlobalContext() from LLVM (but the C API)
The only use for getGlobalContext() is in the C API.
Let's just move the static global here and nuke the C++ API.

Differential Revision: http://reviews.llvm.org/D19094

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266380
2016-04-14 21:59:18 +00:00
Mehdi Amini
ea195a382e Remove every uses of getGlobalContext() in LLVM (but the C API)
At the same time, fixes InstructionsTest::CastInst unittest: yes
you can leave the IR in an invalid state and exit when you don't
destroy the context (like the global one), no longer now.

This is the first part of http://reviews.llvm.org/D19094

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266379
2016-04-14 21:59:01 +00:00
Matt Arsenault
2dfb6d03c5 AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
61abb9daf9 AMDGPU: Directly emit m0 initialization with s_mov_b32
Currently what comes out of instruction selection is a
register initialized to -1, and then copied to m0.
MachineCSE doesn't consider copies, but we want these
to be CSEed. This isn't much of a problem currently,
because SIFoldOperands is run immediately after.

This avoids regressions when SIFoldOperands is run later
from leaving all copies to m0.

llvm-svn: 266377
2016-04-14 21:58:15 +00:00
Matt Arsenault
e73cb153a7 AMDGPU: Fold bitcasts of scalar constants to vectors
This cleans up some messes since the individual scalar components
can be CSEed.

llvm-svn: 266376
2016-04-14 21:58:07 +00:00
Geoff Berry
f26cedc3ed [ScheduleDAGInstrs] Re-factor for based on review feedback. NFC.
Summary:
Re-factor some code to improve clarity and style based on review
comments from http://reviews.llvm.org/D18093.

Reviewers: MatzeB, mcrosier

Subscribers: MatzeB, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19128

llvm-svn: 266372
2016-04-14 21:31:07 +00:00
Renato Golin
bf4b959200 [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

llvm-svn: 266363
2016-04-14 20:42:18 +00:00
Sanjay Patel
1d8eaf1ecc [InstCombine] remove constant by inverting compare + logic (PR27105)
https://llvm.org/bugs/show_bug.cgi?id=27105

We can check if all bits outside of a constant mask are set with a 
single constant.

As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc' 
comparison against zero because that could be a single machine instruction.

Differential Revision: http://reviews.llvm.org/D18842

llvm-svn: 266362
2016-04-14 20:17:40 +00:00
Dehao Chen
dab7c40296 Fix null pointer access for discriminator assignment.
Summary: This fixes the buildbot failure.

Reviewers: dnovillo, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19129

llvm-svn: 266360
2016-04-14 19:46:38 +00:00
Tom Stellard
383448325b AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

llvm-svn: 266356
2016-04-14 19:09:28 +00:00
Dehao Chen
5e2f05be2a Update discriminator assignment algorithm to handle nested call correctly.
Summary: Add discriminator for nested call correctly.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19127

llvm-svn: 266354
2016-04-14 18:37:18 +00:00
Reid Kleckner
2c00bf1830 Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.h
MachineInstr.h and MachineInstrBuilder.h are very popular headers,
widely included across all LLVM backends. It turns out that there only a
handful of TUs that actually care about DI operands on MachineInstrs.

After this change, touching DebugInfoMetadata.h and rebuilding llc only
needs 112 actions instead of 542.

llvm-svn: 266351
2016-04-14 18:29:59 +00:00
Davide Italiano
8301c52067 [ValueMapper] Range-loopify to improve readability. NFC.
llvm-svn: 266350
2016-04-14 18:07:32 +00:00
Jacques Pienaar
7870fedd7e [lanai] Add custom lowering for SRL_PARTS i32.
llvm-svn: 266349
2016-04-14 17:59:22 +00:00
Tom Stellard
63209a899d [GlobalISel] Move GISelAccessor class into public headers
Reviewers: qcolombet

Subscribers: joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19120

llvm-svn: 266348
2016-04-14 17:45:38 +00:00
Nicolai Haehnle
bbd04b8c49 [DivergenceAnalysis] Treat PHI with incoming undef as constant
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.

This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.

This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex

Reviewers: arsenm, tstellarAMD, jingyue

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19013

llvm-svn: 266347
2016-04-14 17:42:47 +00:00
Nicolai Haehnle
25eef7cc0f [StructurizeCFG] Annotate branches that were treated as uniform
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).

No tests included here, because tests in D19013 already cover this.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19018

llvm-svn: 266346
2016-04-14 17:42:35 +00:00
Nicolai Haehnle
014104e476 AMDGPU: Remove SIFixSGPRLiveRanges pass
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

llvm-svn: 266345
2016-04-14 17:42:29 +00:00
Nicolai Haehnle
141bf141e2 AMDGPU: change a redundant if () to an assert(). NFC
Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19042

llvm-svn: 266344
2016-04-14 17:42:18 +00:00
Tom Stellard
c2eabf0b58 [GlobalISel] Coding style and whitespace fixes
Reviewers: qcolombet

Subscribers: joker.eph, llvm-commits, vkalintiris

Differential Revision: http://reviews.llvm.org/D19119

llvm-svn: 266342
2016-04-14 17:23:33 +00:00
Tim Northover
d6721e4e7e AArch64: expand cmpxchg after regalloc at -O0.
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.

I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.

It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.

Should fix PR25526.

llvm-svn: 266339
2016-04-14 17:03:29 +00:00
Jacques Pienaar
77c93881e7 [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth.
Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched.

Reviewers: eliben, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18903

llvm-svn: 266338
2016-04-14 16:47:42 +00:00
Tom Stellard
c0b7282ebc AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard
b9654f5566 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Betul Buyukkurt
f4233962c1 [PGO] Do not attach VP metadata if value count at site is 0 [NFC]
llvm-svn: 266335
2016-04-14 16:25:45 +00:00
Silviu Baranga
83a7a9c1e0 [SCEV][LAA] Add tests for SCEV expression transformations performed during LAA
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.

Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.

The additional checking also acts as white-box testing for the getAsAddRec method.

Reviewers: anemet, sanjoy

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18792

llvm-svn: 266334
2016-04-14 16:08:45 +00:00
Simon Dardis
46f0e9cf63 Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.

llvm-svn: 266301
2016-04-14 13:43:17 +00:00
Igor Kudrin
dde32a28b6 [Coverage] Update testing methods to support more than two files
Differential Revision: http://reviews.llvm.org/D18757

llvm-svn: 266289
2016-04-14 10:43:37 +00:00
Vasileios Kalintiris
1ea0fc57ba [mips] Remove duplicate tests and add missing prefixes for *-LABEL checks. NFC.
Summary:
The only difference between the removed tests and the pre-existing
ones, is the materialization of the zero constant, which shouldn't
matter for these cases.

Reviewers: dsanders, sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18693

llvm-svn: 266285
2016-04-14 09:13:13 +00:00
Igor Kudrin
0f6ff5a6b3 [Coverage] Avoid unnecessary copying of std::vector
Approved by: Justin Bogner <mail@justinbogner.com>

Differential Revision: http://reviews.llvm.org/D18756

llvm-svn: 266284
2016-04-14 09:10:00 +00:00
Adam Nemet
aa40b369a6 Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This reverts commit r266086.

It breaks the LTO build of gcc in SPEC2000.

llvm-svn: 266282
2016-04-14 08:47:17 +00:00
Mehdi Amini
341c55e737 ThinLTO: linkonce compile-time optimization, do not bother when there is only one input file
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266281
2016-04-14 08:46:22 +00:00
David Majnemer
d2ed420815 [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.

It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM.  Notably, ARMv7 NEON's vmin has the
-NAN semantics.

N.B.  Of course, it is only safe to do this if the intrinsic call is
marked nnan.

llvm-svn: 266279
2016-04-14 07:13:24 +00:00