1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

79 Commits

Author SHA1 Message Date
Chandler Carruth
89da465927 [multiversion] Thread a function argument through all the callers of the
getTTI method used to get an actual TTI object.

No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.

The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.

llvm-svn: 227730
2015-02-01 12:01:35 +00:00
Chandler Carruth
e1550cbb3c [PM] Port SimplifyCFG to the new pass manager.
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.

llvm-svn: 227726
2015-02-01 11:34:21 +00:00
Chandler Carruth
b2d6052871 [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

llvm-svn: 227669
2015-01-31 03:43:40 +00:00
Chandler Carruth
c140bae640 [PM] Split the AssumptionTracker immutable pass into two separate APIs:
a cache of assumptions for a single function, and an immutable pass that
manages those caches.

The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.

Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.

For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.

llvm-svn: 225131
2015-01-04 12:03:27 +00:00
Jingyue Wu
908e4ab376 [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Hal Finkel
f8bb9b78cf Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.

As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.

The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.

Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.

This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).

llvm-svn: 217342
2014-09-07 18:57:58 +00:00
Rafael Espindola
98710599c1 Remove 'using std::errro_code' from lib.
llvm-svn: 210871
2014-06-13 02:24:39 +00:00
Rafael Espindola
e0e308ff6d Don't use 'using std::error_code' in include/llvm.
This should make sure that most new uses use the std prefix.

llvm-svn: 210835
2014-06-12 21:46:39 +00:00
Craig Topper
c0a2a29f4e [C++] Use 'nullptr'. Transforms edition.
llvm-svn: 207196
2014-04-25 05:29:35 +00:00
Chandler Carruth
6f9ba6a633 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Transforms/...
edition.

This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.

Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.

llvm-svn: 206844
2014-04-22 02:55:47 +00:00
Craig Topper
a3683ec835 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 202953
2014-03-05 09:10:37 +00:00
Chandler Carruth
075812f27c [Modules] Move CFG.h to the IR library as it defines graph traits over
IR types.

llvm-svn: 202827
2014-03-04 11:45:46 +00:00
Rafael Espindola
32da4bdd4b Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola
1f7e9d4bed Rename a few more DataLayout variables.
llvm-svn: 201833
2014-02-21 01:53:35 +00:00
Paul Robinson
189e175394 Disable most IR-level transform passes on functions marked 'optnone'.
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892
2014-02-06 00:07:05 +00:00
Peter Collingbourne
723e9b89ec Reapply r188119 now that the bug it exposed is fixed.
llvm-svn: 188217
2013-08-12 22:38:43 +00:00
Arnold Schwaighofer
3525aff8f6 Revert r188119 "Kill some duplicated code for removing unreachable BBs."
It is breaking builbots with libgmalloc enabled on Mac OS X.

$ cd llvm ; mkdir release ; cd release
$ ../configure --enable-optimized —prefix=$PWD/install
$ make
$ make check
$ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \
  gmalloc_path=/usr/lib/libgmalloc.dylib \
  ../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll

llvm-svn: 188142
2013-08-10 20:16:06 +00:00
Peter Collingbourne
0b56f9dd44 Kill some duplicated code for removing unreachable BBs.
This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp
to Utils/Local.cpp and uses it to replace the implementation of
llvm::removeUnreachableBlocks, which appears to do a strict subset
of what removeUnreachableBlocksFromFn does.

Differential Revision: http://llvm-reviews.chandlerc.com/D1334

llvm-svn: 188119
2013-08-09 22:47:24 +00:00
Tom Stellard
e4e3be6f50 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

llvm-svn: 187764
2013-08-06 02:43:45 +00:00
Tom Stellard
8e98bf332b SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches.  The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.

Patch by: Mei Ye

llvm-svn: 187278
2013-07-27 00:01:07 +00:00
Chandler Carruth
f8bff2bea0 Make SimplifyCFG simply depend upon TargetTransformInfo and pass it
through as a reference rather than a pointer. There is always *some*
implementation of this available, so this simplifies code by not having
to test for whether it is available or not.

Further, it turns out there were piles of places where SimplifyCFG was
recursing and not passing down either TD or TTI. These are fixed to be
more pedantically consistent even though I don't have any particular
cases where it would matter.

llvm-svn: 171691
2013-01-07 03:53:25 +00:00
Chandler Carruth
3c0f5d4efb Move TargetTransformInfo to live under the Analysis library. This no
longer would violate any dependency layering and it is in fact an
analysis. =]

llvm-svn: 171686
2013-01-07 03:08:10 +00:00
Chandler Carruth
4c1f3c24db Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Evgeniy Stepanov
5ecec98c2c Optimize tree walking in markAliveBlocks.
Check whether a BB is known as reachable before adding it to the worklist.
This way BB's with multiple predecessors are added to the list no more than
once.

llvm-svn: 170335
2012-12-17 14:28:00 +00:00
Chandler Carruth
a490793037 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Hans Wennborg
40eb1b4055 Use TargetTransformInfo to control switch-to-lookup table transformation
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.

This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.

llvm-svn: 167011
2012-10-30 11:23:25 +00:00
Micah Villmow
bb1a25cd67 Move TargetData to DataLayout.
llvm-svn: 165402
2012-10-08 16:38:25 +00:00
Jim Grosbach
0a077abacb Update function names to conform to guidelines.
No functional change.

llvm-svn: 163279
2012-09-06 00:59:08 +00:00
Nadav Rotem
bd2b55bc74 Clean whitespaces.
llvm-svn: 160668
2012-07-24 10:51:42 +00:00
Nuno Lopes
97c381ea93 fix PR13339 (remove the predecessor from the unwind BB when removing an invoke)
llvm-svn: 160325
2012-07-16 22:49:40 +00:00
Nuno Lopes
e967ebe7bb fix the regression I introduced in r159385 (it's necessary to update PHI nodes in unwind BB
llvm-svn: 159534
2012-07-02 16:14:47 +00:00
Nuno Lopes
66896bbd47 make simplifyCFG erase invokes to readonly/readnone functions
llvm-svn: 159385
2012-06-28 22:32:27 +00:00
Nuno Lopes
165c99b53d improve optimization of invoke instructions:
- simplifycfg:  invoke undef/null -> unreachable
 - instcombine:  invoke new  -> invoke expect(0, 0)  (an arbitrary NOOP intrinsic;  only done if the allocated memory is unused, of course)
 - verifier:  allow invoke of intrinsics  (to make the previous step work)

llvm-svn: 159146
2012-06-25 17:11:47 +00:00
Jay Foad
c826df8fb7 Convert CallInst and InvokeInst APIs to use ArrayRef.
llvm-svn: 135265
2011-07-15 08:37:34 +00:00
Devang Patel
73e16acee8 Preserve line number information while converting Invoke into a Call.
llvm-svn: 132505
2011-06-02 22:46:58 +00:00
Frits van Bommel
df58ece78a Add a parameter to ConstantFoldTerminator() that callers can use to ask it to also clean up the condition of any conditional terminator it folds to be unconditional, if that turns the condition into dead code. This just means it calls RecursivelyDeleteTriviallyDeadInstructions() in strategic spots. It defaults to the old behavior.
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.

llvm-svn: 131855
2011-05-22 16:24:18 +00:00
Devang Patel
cc15dfc21a Simplify cfg inserts a call to trap when unreachable code is detected. Assign DebugLoc to this new trap instruction.
llvm-svn: 130315
2011-04-27 17:59:27 +00:00
Jay Foad
53632b7c03 Remove PHINode::reserveOperandSpace(). Instead, add a parameter to
PHINode::Create() giving the (known or expected) number of operands.

llvm-svn: 128537
2011-03-30 11:28:46 +00:00
Jay Foad
dc5a008237 (Almost) always call reserveOperandSpace() on newly created PHINodes.
llvm-svn: 128535
2011-03-30 11:19:20 +00:00
Owen Anderson
46990c17f7 Get rid of static constructors for pass registration. Instead, every pass exposes an initializeMyPassFunction(), which
must be called in the pass's constructor.  This function uses static dependency declarations to recursively initialize
the pass's dependencies.

Clients that only create passes through the createFooPass() APIs will require no changes.  Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.

I have tested this with all standard configurations of clang and llvm-gcc on Darwin.  It is possible that there are problems
with the static dependencies that will only be visible with non-standard options.  If you encounter any crash in pass
registration/creation, please send the testcase to me directly.

llvm-svn: 116820
2010-10-19 17:21:58 +00:00
Owen Anderson
69cbf2e8b7 Now with fewer extraneous semicolons!
llvm-svn: 115996
2010-10-07 22:25:06 +00:00
Dan Gohman
d04a608a73 Teach SimplifyCFG how to simplify indirectbr instructions.
- Eliminate redundant successors.
 - Convert an indirectbr with one successor into a direct branch.

Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.

llvm-svn: 111060
2010-08-14 00:29:42 +00:00
Owen Anderson
f2fea95f2f Reapply r110396, with fixes to appease the Linux buildbot gods.
llvm-svn: 110460
2010-08-06 18:33:48 +00:00
Owen Anderson
aadd8a89ca Revert r110396 to fix buildbots.
llvm-svn: 110410
2010-08-06 00:23:35 +00:00
Owen Anderson
b9762c07cb Don't use PassInfo* as a type identifier for passes. Instead, use the address of the static
ID member as the sole unique type identifier.  Clean up APIs related to this change.

llvm-svn: 110396
2010-08-05 23:42:04 +00:00
Owen Anderson
f8addbb0a1 Fix batch of converting RegisterPass<> to INTIALIZE_PASS().
llvm-svn: 109045
2010-07-21 22:09:45 +00:00
Benjamin Kramer
109451b5e0 SimplifyCFG: don't turn volatile stores to null/undef into unreachable. Fixes PR7369.
llvm-svn: 105914
2010-06-13 14:35:54 +00:00
Chris Lattner
e74d980a02 make simplifycfg insert an llvm.trap before the 'unreachable' it introduces
when it detects undefined behavior.  llvm.trap generally codegens into some
thing really small (e.g. a 2 byte ud2 instruction on x86) and debugging this
sort of thing is "nontrivial".  For example, we now compile:

void foo() { *(int*)0 = 42; }

into:

_foo:
	pushl	%ebp
	movl	%esp, %ebp
	ud2

Some may even claim that this is a security hole, though that seems dubious
to me.  This addresses rdar://7958343 - Optimizing away null dereference 
potentially allows arbitrary code execution

llvm-svn: 103356
2010-05-08 22:15:59 +00:00
Gabor Greif
624a31f2eb Finally land the InvokeInst operand reordering.
I have audited all getOperandNo calls now, fixing
hidden assumptions. CallSite related uglyness will
be eliminated successively.

Note this patch has a long and griveous history,
for all the back-and-forths have a look at
CallSite.h's log.

llvm-svn: 99399
2010-03-24 13:21:49 +00:00
Gabor Greif
ef64581628 backing out r99170 because it still fails on clang-x86_64-darwin10-fnt
llvm-svn: 99171
2010-03-22 09:11:00 +00:00