1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

10530 Commits

Author SHA1 Message Date
Eric Christopher
8b4b95131b Add comparison operators for DIDescriptors to fix c++98 fallout
of operator bool change.

Also convert a variable in DebugIR.

llvm-svn: 186544
2013-07-17 23:25:22 +00:00
Nadav Rotem
22f3c7e5fb Fix a comment.
llvm-svn: 186541
2013-07-17 22:41:16 +00:00
Stephen Lin
a7fd9682cd Restore r181216, which was partially reverted in r182499.
llvm-svn: 186533
2013-07-17 20:06:03 +00:00
Nadav Rotem
990f24a7d2 Add a micro optimization to catch cases where the PtrA equals PtrB.
llvm-svn: 186531
2013-07-17 19:52:25 +00:00
Hal Finkel
cae8df776e Fix comparisons of alloca alignment in inliner merging
Duncan pointed out a mistake in my fix in r186425 when only one of the allocas
being compared had the target-default alignment. This is essentially his
suggested solution. Thanks!

llvm-svn: 186510
2013-07-17 14:32:41 +00:00
Craig Topper
f263d3bfb8 Mark a method 'const' and another 'static'.
llvm-svn: 186485
2013-07-17 03:54:53 +00:00
Craig Topper
2bf5d5a2f0 Make a few more static string pointers constant.
llvm-svn: 186484
2013-07-17 03:43:10 +00:00
Nadav Rotem
ae8b6de415 SLPVectorizer: Accelerate the isConsecutive check by replacing the subtraction of the two values with a simple SCEV expression that adds the offset to one of the pointers that we compare.
llvm-svn: 186479
2013-07-17 00:48:31 +00:00
Nadav Rotem
adfe58a7ad flip the scev minus direction to simplify the code.
llvm-svn: 186466
2013-07-16 22:57:06 +00:00
Nadav Rotem
633bd23118 SLPVectorizer: Improve the compile time of isConsecutive by adding a simple constant-gep check before using SCEV.
This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV.

llvm-svn: 186465
2013-07-16 22:51:07 +00:00
Rafael Espindola
2a9326a78f Add a wrapper for open.
This centralizes the handling of O_BINARY and opens the way for hiding more
differences (like how open behaves with directories).

llvm-svn: 186447
2013-07-16 19:44:17 +00:00
Peter Collingbourne
82d932d6c2 Make SpecialCaseList match full strings, as documented, using anchors.
Differential Revision: http://llvm-reviews.chandlerc.com/D1149

llvm-svn: 186431
2013-07-16 17:56:07 +00:00
Hal Finkel
35292d605d When the inliner merges allocas, it must keep the larger alignment
For safety, the inliner cannot decrease the allignment on an alloca when
merging it with another.

I've included two variants of the test case for this: one with DataLayout
available, and one without. When DataLayout is not available, if only one of
the allocas uses the default alignment (getAlignment() == 0), then they cannot
be safely merged.

llvm-svn: 186425
2013-07-16 17:10:55 +00:00
Nadav Rotem
ebe95f88ed SLPVectorizer: Reduce the compile time of the consecutive store lookup.
Process groups of stores in chunks of 16.

llvm-svn: 186420
2013-07-16 15:25:17 +00:00
Craig Topper
b8260534f6 Add 'const' qualifiers to static const char* variables.
llvm-svn: 186371
2013-07-16 01:17:10 +00:00
Nadav Rotem
f1e0aebca1 PR16628: Fix a bug in the code that merges compares.
Compares return i1 but they compare different types.

llvm-svn: 186359
2013-07-15 22:52:48 +00:00
Stephen Lin
b764d0591b Remove trailing whitespace
llvm-svn: 186333
2013-07-15 17:55:02 +00:00
Chandler Carruth
255d06c032 Revert r186316 while I track down an ASan failure and an assert from
a bot.

This reverts the commit which introduced a new implementation of the
fancy SROA pass designed to reduce its overhead. I'll skip the huge
commit log here, refer to r186316 if you're looking for how this all
works and why it works that way.

llvm-svn: 186332
2013-07-15 17:36:21 +00:00
Chandler Carruth
05be8f3230 Reimplement SROA yet again. Same fundamental principle, but a totally
different core implementation strategy.

Previously, SROA would build a relatively elaborate partitioning of an
alloca, associate uses with each partition, and then rewrite the uses of
each partition in an attempt to break apart the alloca into chunks that
could be promoted. This was very wasteful in terms of memory and compile
time because regardless of how complex the alloca or how much we're able
to do in breaking it up, all of the datastructure work to analyze the
partitioning was done up front.

The new implementation attempts to form partitions of the alloca lazily
and on the fly, rewriting the uses that make up that partition as it
goes. This has a few significant effects:
1) Much simpler data structures are used throughout.
2) No more double walk of the recursive use graph of the alloca, only
   walk it once.
3) No more complex algorithms for associating a particular use with
   a particular partition.
4) PHI and Select speculation is simplified and happens lazily.
5) More precise information is available about a specific use of the
   alloca, removing the need for some side datastructures.

Ultimately, I think this is a much better implementation. It removes
about 300 lines of code, but arguably removes more like 500 considering
that some code grew in the process of being factored apart and cleaned
up for this all to work.

I've re-used as much of the old implementation as possible, which
includes the lion's share of code in the form of the rewriting logic.
The interesting new logic centers around how the uses of a partition are
sorted, and split into actual partitions.

Each instruction using a pointer derived from the alloca gets
a 'Partition' entry. This name is totally wrong, but I'll do a rename in
a follow-up commit as there is already enough churn here. The entry
describes the offset range accessed and the nature of the access. Once
we have all of these entries we sort them in a very specific way:
increasing order of begin offset, followed by whether they are
splittable uses (memcpy, etc), followed by the end offset or whatever.
Sorting by splittability is important as it simplifies the collection of
uses into a partition.

Once we have these uses sorted, we walk from the beginning to the end
building up a range of uses that form a partition of the alloca.
Overlapping unsplittable uses are merged into a single partition while
splittable uses are broken apart and carried from one partition to the
next. A partition is also introduced to bridge splittable uses between
the unsplittable regions when necessary.

I've looked at the performance PRs fairly closely. PR15471 no longer
will even load (the module is invalid). Not sure what is up there.
PR15412 improves by between 5% and 10%, however it is nearly impossible
to know what is holding it up as SROA (the entire pass) takes less time
than reading the IR for that test case. The analysis takes the same time
as running mem2reg on the final allocas. I suspect (without much
evidence) that the new implementation will scale much better however,
and it is just the small nature of the test cases that makes the changes
small and noisy. Either way, it is still simpler and cleaner I think.

llvm-svn: 186316
2013-07-15 10:30:19 +00:00
Craig Topper
642bfedd2e Add 'const' qualifier to some arrays.
llvm-svn: 186312
2013-07-15 08:02:13 +00:00
Craig Topper
4e9457fd7d Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).
llvm-svn: 186301
2013-07-15 04:27:47 +00:00
Nadav Rotem
37c18f5cda SLPVectorizer: change the order in which we search for vectorization candidates. Do stores first and PHIs second.
llvm-svn: 186277
2013-07-14 06:15:46 +00:00
Craig Topper
58fa7a9b4a Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Arnold Schwaighofer
970f54281c LoopVectorizer: Disallow reductions whose header phi is used outside the loop
If an outside loop user of the reduction value uses the header phi node we
cannot just reduce the vectorized phi value in the vector code epilog because
we would loose VF-1 reductions.

lp:
  p = phi (0, lv)
  lv = lv + 1
  ...
  brcond , lp, outside

outside:
  usr = add 0, p

(Say the loop iterates two times, the value of p coming out of the loop is one).

We cannot just transform this to:

vlp:
  p = phi (<0,0>, lv)
  lv = lv + <1,1>
  ..
  brcond , lp, outside

outside:
  p_reduced = p[0] + [1];
  usr = add 0, p_reduced

(Because the original loop iterated two times the vectorized loop would iterate
one time, but p_reduced ends up being zero instead of one).

We would have to execute VF-1 iterations in the scalar remainder loop in such
cases. For now, just disable vectorization.

PR16522

llvm-svn: 186256
2013-07-13 19:09:29 +00:00
Andrew Trick
651c624842 LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander.
In general, one should always complete CFG modifications first, update
CFG-based analyses, like Dominatores and LoopInfo, then generate
instruction sequences.

LoopVectorizer was creating a new loop, calling SCEVExpander to
generate checks, then updating LoopInfo. I just changed the order.

llvm-svn: 186241
2013-07-13 06:20:06 +00:00
Nick Lewycky
1897fc3733 Add a microoptimization for urem.
llvm-svn: 186235
2013-07-13 01:16:47 +00:00
Joey Gouly
04b7300444 Fix a crash in EvaluateInDifferentElementOrder where it would generate an
undef vector of the wrong type.

LGTM'd by Nick Lewycky on IRC.

llvm-svn: 186224
2013-07-12 23:08:06 +00:00
Andrew Trick
b79ae09045 LFTR improvement to avoid truncation.
This is a reimplemntation of the patch originally in r186107.

llvm-svn: 186215
2013-07-12 22:08:48 +00:00
Andrew Trick
064fb30cab Cleanup LFTR logic.
llvm-svn: 186214
2013-07-12 22:08:44 +00:00
Andrew Trick
aacd582de2 Cleanup: rename a variable to make the logic easier to follow.
llvm-svn: 186213
2013-07-12 22:08:41 +00:00
Arnold Schwaighofer
b9c37551bc TargetTransformInfo: address calculation parameter for gather/scather
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.

radar://14351991

llvm-svn: 186187
2013-07-12 19:16:02 +00:00
Chandler Carruth
4163f3fe85 Revert "indvars: Improve LFTR by eliminating truncation when comparing
against a constant."

This reverts commit r186107. It didn't handle wrapping arithmetic in the
loop correctly and thus caused the following C program to count from
0 to UINT64_MAX instead of from 0 to 255 as intended:

  #include <stdio.h>
  int main() {
    unsigned char first = 0, last = 255;
    do { printf("%d\n", first); } while (first++ != last);
  }

Full test case and instructions to reproduce with just the -indvars pass
sent to the original review thread rather than to r186107's commit.

llvm-svn: 186152
2013-07-12 11:18:55 +00:00
Nadav Rotem
ee62470368 SLPVectorizer: Sink and enable CSE for ExtractElements.
llvm-svn: 186145
2013-07-12 06:09:24 +00:00
Nadav Rotem
1e6246b38c SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes.
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.

llvm-svn: 186139
2013-07-12 00:04:18 +00:00
Nadav Rotem
cd9c4e430f Remove an argument that we dont use anymore.
llvm-svn: 186116
2013-07-11 20:56:13 +00:00
Andrew Trick
fe577c9f45 indvars: Improve LFTR by eliminating truncation when comparing against a constant.
Patch by Michele Scandale!

Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.

llvm-svn: 186107
2013-07-11 17:08:59 +00:00
Benjamin Kramer
2880b5c20b Don't use a potentially expensive shift if all we want is one set bit.
No functionality change.

llvm-svn: 186095
2013-07-11 16:05:50 +00:00
Arnold Schwaighofer
a8667081e1 LoopVectorize: Vectorize all accesses in address space zero with unit stride
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).

Fixes PR16592.

llvm-svn: 186088
2013-07-11 15:21:55 +00:00
Duncan Sands
447d97b223 TryToSimplifyUncondBranchFromEmptyBlock was checking that any common
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block.  This change
allows merging in the case where there is one or more incoming values
that are undef.  The undef values are rewritten to match the non-undef
value that flows from the other edge.  Patch by Mark Lacey.

llvm-svn: 186069
2013-07-11 08:28:20 +00:00
Nadav Rotem
965af9cb85 Fix a warning.
llvm-svn: 186064
2013-07-11 05:39:02 +00:00
Nadav Rotem
8e1d89128f SLPVectorizer: refactor the code that places extracts. Place the code that decides where to put extracts in the build-tree phase. This allows us to take the cost of the extracts into account.
llvm-svn: 186058
2013-07-11 04:54:05 +00:00
Michael Gottesman
722cf9dc9b Teach TailRecursionElimination to handle certain cases of nocapture escaping allocas.
Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.

Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.

Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:

  1. By assumption A has no escaping alloca arguments itself so it can not
     access the caller's stack via its arguments.

  2. Since all of B's escaping alloca arguments are passed as parameters with
     the nocapture attribute, we know that B does not stash said escaping
     allocas in a manner that outlives B itself and thus could be accessed
     indirectly by A.

With the changes introduced by this patch:

  1. If we see any escaping allocas passed as a capturing argument, we do
     nothing and bail early.

  2. If we do not see any escaping allocas passed as captured arguments but we
     do see escaping allocas passed as nocapture arguments:

       i. We do not perform TRE to avoid PR962 since the code generator produces
          significantly worse code for the dynamic allocas that would be created
          by the TRE algorithm.

       ii. If we do not return twice, mark call sites without escaping allocas
           with the tail marker. *NOTE* This excludes functions with escaping
           nocapture allocas.

  3. If we do not see any escaping allocas at all (whether captured or not):

       i. If we do not have usage of setjmp, mark all callsites with the tail
          marker.

       ii. If there are no dynamic/variable sized allocas in the function,
           attempt to perform TRE on all callsites in the function.

Based off of a patch by Nick Lewycky.

rdar://14324281.

llvm-svn: 186057
2013-07-11 04:40:01 +00:00
Michael Gottesman
55fe2fe0ba [objc-arc] Changed 'mode: c++' => 'C++' at Nick Lewycky's suggestion. Also removed unnecessary mode: c++ lines from .cpp files.
llvm-svn: 186026
2013-07-10 18:49:00 +00:00
Peter Collingbourne
16d8086f72 Implement categories for special case lists.
A special case list can now specify categories for specific globals,
which can be used to instruct an instrumentation pass to treat certain
functions or global variables in a specific way, such as by omitting
certain aspects of instrumentation while keeping others, or informing
the instrumentation pass that a specific uninstrumentable function
has certain semantics, thus allowing the pass to instrument callers
according to those semantics.

For example, AddressSanitizer now uses the "init" category instead of
global-init prefixes for globals whose initializers should not be
instrumented, but which in all other respects should be instrumented.

The motivating use case is DataFlowSanitizer, which will have a
number of different categories for uninstrumentable functions, such
as "functional" which specifies that a function has pure functional
semantics, or "discard" which indicates that a function's return
value should not be labelled.

Differential Revision: http://llvm-reviews.chandlerc.com/D1092

llvm-svn: 185978
2013-07-09 22:03:17 +00:00
Peter Collingbourne
1d4db968ba Introduce a SpecialCaseList ctor which takes a MemoryBuffer to make
it more unit testable, and fix memory leak in the other ctor.

Differential Revision: http://llvm-reviews.chandlerc.com/D1090

llvm-svn: 185976
2013-07-09 22:03:09 +00:00
Peter Collingbourne
2321a4030c Rename BlackList class to SpecialCaseList and move it to Transforms/Utils.
Differential Revision: http://llvm-reviews.chandlerc.com/D1089

llvm-svn: 185975
2013-07-09 22:02:49 +00:00
Nadav Rotem
417c1a3150 Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform.
llvm-svn: 185970
2013-07-09 21:38:08 +00:00
Nadav Rotem
07b212b07b Set the default insert point to the first instruction, and not to end()
llvm-svn: 185953
2013-07-09 17:55:36 +00:00
David Majnemer
f8f57aad0a InstCombine: Fix typo in comment for visitICmpInstWithInstAndIntCst
llvm-svn: 185916
2013-07-09 09:24:35 +00:00
David Majnemer
3bb8099e6d InstCombine: variations on 0xffffffff - x >= 4
The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)

These are nice, they get rid of the xor.

llvm-svn: 185915
2013-07-09 09:20:58 +00:00