1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

983 Commits

Author SHA1 Message Date
George Burgess IV
5e569e7990 [CFLAA] Add interprocedural function summaries.
This patch adds function summaries, so that we don't need to recompute
various properties about function parameters/return values at each
callsite of a function. It also adds many interprocedural tests for
CFLAA.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21475#inline-182390

llvm-svn: 273219
2016-06-20 23:10:56 +00:00
Simon Pilgrim
894b8d816e [X86][SSE] Add cost model for BSWAP of vectors
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.

Differential Revision: http://reviews.llvm.org/D21521

llvm-svn: 273217
2016-06-20 23:08:21 +00:00
Sanjoy Das
8576aee95c [SCEV] Fix incorrect trip count computation
The way we elide max expressions when computing trip counts is incorrect
-- it breaks cases like this:

```
static int wrapping_add(int a, int b) {
  return (int)((unsigned)a + (unsigned)b);
}

void test() {
  volatile int end_buf = 2147483548; // INT_MIN - 100
  int end = end_buf;

  unsigned counter = 0;
  for (int start = wrapping_add(end,  200); start < end; start++)
    counter++;

  print(counter);
}
```

Note: the `NoWrap` variable that was being tested has little to do with
the values flowing into the max expression; it is a property of the
induction variable.

test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test
functionality I'm reverting in this change, so I've deleted the test
fully.

llvm-svn: 273079
2016-06-18 04:38:31 +00:00
George Burgess IV
8ffcde5567 [CFLAA] Tag arguments as escaped instead of unknown.
This patch also includes some refactoring.

Prior to this patch, we tagged all CFLAA attributes as unknown. This is
suboptimal, since it meant that any Value used as an argument would be
considered to alias any other Value that existed.

Now that we have the machinery to tag sets below the set for an
arbitrary value with attributes, it's okay to be less conservative with
arguments. (Specifically, we still tag the set under an argument with
unknown).

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21262

llvm-svn: 272690
2016-06-14 18:12:28 +00:00
Simon Pilgrim
2b98b76820 [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets
To account for the fast PSHUFB implementation now available

llvm-svn: 272484
2016-06-11 19:23:02 +00:00
Michael Kuperstein
57fc4a3484 [X86] Add costs for SSE zext/sext to v4i64 to TTI
The costs are somewhat hand-wavy, but should be much closer to the truth
than what we get from BasicTTI.

Differential Revision: http://reviews.llvm.org/D21156

llvm-svn: 272406
2016-06-10 17:01:05 +00:00
George Burgess IV
f70bbc0aad [CFLAA] Handle global/arg attrs more sanely.
Prior to this patch, we used argument/global stratified attributes in
order to note that a value could have come from either dereferencing a
global/arg, or from the assignment from a global/arg.

Now, AttrUnknown is placed on sets when we see a dereference, instead of
the global/arg attributes. This allows us to be more aggressive in the
future when we see global/arg attributes without AttrUnknown.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21110

llvm-svn: 272335
2016-06-09 23:15:04 +00:00
Sanjoy Das
7ac8ad9906 Be wary of abnormal exits from loop when exploiting UB
We can safely rely on a NoWrap add recurrence causing UB down the road
only if we know the loop does not have a exit expressed in a way that is
opaque to ScalarEvolution (e.g. by a function call that conditionally
calls exit(0)).

I believe with this change PR28012 is fixed.

Note: I had to change some llvm-lit tests in LoopReroll, since it looks
like they were depending on this incorrect behavior.

llvm-svn: 272237
2016-06-09 01:13:59 +00:00
Sanjoy Das
c35e5710c9 [SCEV] Track no-abnormal-exits instead of no-throw calls
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB.  We also need to guard against calls that terminate the thread
or infinite loop themselves.

This partially addresses PR28012.

llvm-svn: 272181
2016-06-08 17:48:42 +00:00
Sanjoy Das
45bd5cf143 Teach isGuarantdToTransferExecToSuccessor about debug info intrinsics
Calls to `@llvm.dbg.*` can be assumed to terminate.

llvm-svn: 272180
2016-06-08 17:48:36 +00:00
Sanjoy Das
748f06abba Fix a bug in SCEV's poison value propagation
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison.  This
change fixes the problem and makes the code structure more obvious.

Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.

llvm-svn: 272179
2016-06-08 17:48:31 +00:00
George Burgess IV
62da20254e [CFLAA] Add AttrEscaped, remove bit twiddling functions.
This patch does a few things:

- Unifies AttrAll and AttrUnknown (since they were used for more or less
  the same purpose anyway).

- Introduces AttrEscaped, an attribute that notes that a value escapes
  our analysis for a given set, but not that an unknown value flows into
  said set.

- Removes functions that take bit indices, since we also had functions
  that took bitsets, and the use of both (with similar names) was
  unclear and bug-prone.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21000

llvm-svn: 272040
2016-06-07 18:35:37 +00:00
Andrey Turetskiy
cf1b3836fd [LAA] Improve non-wrapping pointer detection by handling loop-invariant case.
This fixes PR26314. This patch adds new helper “isNoWrap” with detection of
loop-invariant pointer case.

Patch by Roman Shirokiy.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26314

Differential Revision: http://reviews.llvm.org/D17268

llvm-svn: 272014
2016-06-07 14:55:27 +00:00
Easwaran Raman
60d682daa9 Reapply r271728 after adding move cobstructor for ProfileSummaryInfo
llvm-svn: 271745
2016-06-03 22:54:26 +00:00
Easwaran Raman
0670f91a65 Revert r271728 as it breaks Windows build
llvm-svn: 271738
2016-06-03 21:14:26 +00:00
Easwaran Raman
553eb9ed8a Analysis pass to access profile summary info
Differential Revision: http://reviews.llvm.org/D20648

llvm-svn: 271728
2016-06-03 20:37:19 +00:00
Daniel Berlin
dc788a2a11 Revert "Claim NoAlias if two GEPs index different fields of the same struct"
This reverts commit 2d5d6493f43eb68493a3852b8c226ac9fafdc7eb.

llvm-svn: 271422
2016-06-01 18:55:32 +00:00
George Burgess IV
135c07fd0f [CFLAA] Recognize builtin allocation functions.
This patch extends CFLAA to recognize allocation functions such as
malloc, free, etc, so we can treat them more aggressively.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D20776

llvm-svn: 271421
2016-06-01 18:39:54 +00:00
Daniel Berlin
5aefd2250a Claim NoAlias if two GEPs index different fields of the same struct
Patch by Taewook Oh

Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure.

Reviewers: hfinkel, dberlin

Subscribers: dberlin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20665

llvm-svn: 271415
2016-06-01 18:12:01 +00:00
Sanjoy Das
1a00547e1a Reduce dependence on pointee types when deducing dereferenceability
Summary:
Change some of the internal interfaces in Loads.cpp to keep track of the
number of bytes we're trying to prove dereferenceable using an explicit
`Size` parameter.

Before this, the `Size` parameter was implicitly inferred from the
pointee type of the pointer whose dereferenceability we were trying to
prove, causing us to be conservative around bitcasts. This was
unfortunate since bitcast instructions are no-ops and should never
break optimizations.  With an explicit `Size` parameter, we're more
precise (as shown in the test cases), and the code is simpler.

We should eventually move towards a `DerefQuery` struct that groups
together a base pointer, an offset, a size and an alignment; but this
patch is a first step.

Reviewers: apilipenko, dblaikie, hfinkel, reames

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20764

llvm-svn: 271406
2016-06-01 16:47:45 +00:00
George Burgess IV
471fe02144 [CFLAA] Don't link GEP pointers to GEP indices.
Code like the following is considered broken, and doesn't need to be
supported by our AA magicks:

void getFoo(int *P) {
  int *PAlias = (int *)((char *)NULL + (uintptr_t)P);
}

This patch makes CFLAA drop support for code like this.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D20775

llvm-svn: 271322
2016-05-31 19:55:05 +00:00
Sanjoy Das
5e3a215bf7 [SCEV] See through op.with.overflow intrinsics (re-apply)
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).

This was first checked in at r265912 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.

Reviewers: atrick, regehr

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18684

llvm-svn: 271152
2016-05-29 00:34:42 +00:00
Sanjoy Das
9baaae9344 [SCEV] Don't always add no-wrap flags to post-inc add recs
Fixes PR27315.

The post-inc version of an add recurrence needs to "follow the same
rules" as a normal add or subtract expression.  Otherwise we miscompile
programs like

```
int main() {
  int a = 0;
  unsigned a_u = 0;
  volatile long last_value;
  do {
    a_u += 3;
    last_value = (long) ((int) a_u);
    if (will_add_overflow(a, 3)) {
      // Leave, and don't actually do the increment, so no UB.
      printf("last_value = %ld\n", last_value);
      exit(0);
    }
    a += 3;
  } while (a != 46);
  return 0;
}
```

This patch changes SCEV to put no-wrap flags on post-inc add recurrences
only when the poison from a potential overflow will go ahead to cause
undefined behavior.

To avoid regressing performance too much, I've assumed infinite loops
without side effects is undefined behavior to prove poison<->UB
equivalence in more cases.  This isn't ideal, but is not new to LLVM as
a whole, and far better than the situation I'm trying to fix.

llvm-svn: 271151
2016-05-29 00:32:17 +00:00
Sanjoy Das
ad0baa641e [ValueTracking] ICmp instructions propagate poison
This is a stripped down version of D19211, leaving out the questionable
"branching in poison is UB" bit.

llvm-svn: 271150
2016-05-29 00:31:18 +00:00
Michael Kuperstein
9672c87c77 [BasicAA] Extend inbound GEP negative offset logic to GlobalVariables
r270777 improved the precision of alloca vs. inbounbds GEP alias queries: if
we have (a) an inbounds GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points to would have a negative offset with
respect to the alloca, then the GEP can not alias pointer (b).

This makes the same logic fire when (b) is based on a GlobalVariable instead
of an alloca.

Differential Revision: http://reviews.llvm.org/D20652

llvm-svn: 270893
2016-05-26 19:30:49 +00:00
Michael Kuperstein
f4506f07fc [BasicAA] Improve precision of alloca vs. inbounds GEP alias queries
If a we have (a) a GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points would have a negative offset with
repsect to the alloca, then the GEP can not alias pointer (b).

For example, consider code like:

struct { int f0, int f1, ...} foo;
...
foo alloca;
foo *random = bar(alloca);
int *f0 = &alloca.f0
int *f1 = &random->f1;

Which is lowered, approximately, to:
%alloca = alloca %struct.foo
%random = call %struct.foo* @random(%struct.foo* %alloca)
%f0 = getelementptr inbounds %struct, %struct.foo* %alloca, i32 0, i32 0
%f1 = getelementptr inbounds %struct, %struct.foo* %random, i32 0, i32 1

Assume %f1 and %f0 alias. Then %f1 would point into the object allocated
by %alloca. Since the %f1 GEP is inbounds, that means %random must also
point into the same object. But since %f0 points to the beginning of %alloca,
the highest %f1 can be is (%alloca + 3). This means %random can not be higher
than (%alloca - 1), and so is not inbounds, a contradiction.

Differential Revision: http://reviews.llvm.org/D20495

llvm-svn: 270777
2016-05-25 22:23:08 +00:00
Oleg Ranevskyy
34bf60ca68 [SCEV] No-wrap flags are not propagated when folding "{S,+,X}+T ==> {S+T,+,X}"
Summary:
**Description**

This makes `WidenIV::widenIVUse` (IndVarSimplify.cpp) fail to widen narrow IV uses in some cases. The latter affects IndVarSimplify which may not eliminate narrow IV's when there actually exists such a possibility, thereby producing ineffective code.

When `WidenIV::widenIVUse` gets a NarrowUse such as `{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>`, it first tries to get a wide recurrence for it via the `getWideRecurrence` call.
`getWideRecurrence` returns recurrence like this: `{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>`.

Then a wide use operation is generated by `cloneIVUser`. The generated wide use is evaluated to `{(-2 + (sext i32 %inc.lcssa to i64))<nsw>,+,1}<nsw><%for.body3>`, which is different from the `getWideRecurrence` result. `cloneIVUser` sees the difference and returns nullptr.

This patch also fixes the broken LLVM tests by adding missing <nsw> entries introduced by the correction.

**Minimal reproducer:**
```
int foo(int a, int b, int c);
int baz();

void bar()
{
   int arr[20];
   int i = 0;

   for (i = 0; i < 4; ++i)
     arr[i] = baz();

   for (; i < 20; ++i)
     arr[i] = foo(arr[i - 4], arr[i - 3], arr[i - 2]);
}
```

**Clang command line:**
```
clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf test.cpp -o test.ir
```

**Expected result:**
The ` -mllvm -debug` log shows that all the IV's for the second `for` loop have been eliminated.

Reviewers: sanjoy

Subscribers: atrick, asl, aemerson, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D20058

llvm-svn: 270695
2016-05-25 13:01:33 +00:00
Simon Pilgrim
de7240c8f6 [CostModel][X86][XOP] Added XOP costmodel for BITREVERSE
Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well.

llvm-svn: 270537
2016-05-24 08:17:50 +00:00
Matthew Simpson
1384b0c2ff [LAA] Check independence of strided accesses before forward case
This patch changes the order in which we attempt to prove the independence of
strided accesses. We previously did this after we knew the dependence distance
was positive. With this change, we check for independence before handling the
negative distance case. The patch prevents LAA from reporting forward
dependences for independent strided accesses.

This change was requested in the review of D19984.

llvm-svn: 270072
2016-05-19 15:37:19 +00:00
Sanjoy Das
92221742de [SCEV] Be more aggressive in proving NUW
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount.  This is the NUW variant of r269211 and fixes PR27691.

(Note: PR27691 is not a correct or stability bug, it was created to
track a pending task).

llvm-svn: 269790
2016-05-17 17:51:14 +00:00
Simon Pilgrim
9e2a539638 [CostModel][X86] Tidied up checks
llvm-svn: 269770
2016-05-17 14:43:41 +00:00
Simon Pilgrim
49b8573f5f [CostModel][X86] Added scalar bitreverse tests
llvm-svn: 269594
2016-05-15 17:40:48 +00:00
Adam Nemet
a71d31d72a [LAA] Include MaxSafeDepDistBytes in the analysis print-out
llvm-svn: 269508
2016-05-13 22:49:13 +00:00
Sanjoy Das
3f12ce1fd4 [SCEVExpander] Fix a failed cast<> assertion
SCEVExpander::replaceCongruentIVs assumes the backedge value of an
SCEV-analysable PHI to always be an instruction, when this is not
necessarily true.  For now address this by bailing out of the
optimization if the backedge value of the PHI is a non-Instruction.

llvm-svn: 269213
2016-05-11 17:41:41 +00:00
Sanjoy Das
cb6b35a484 [SCEVExpander] Don't break SSA in replaceCongruentIVs
`SCEVExpander::replaceCongruentIVs` bypasses `hoistIVInc` if both the
original and the isomorphic increments are PHI nodes.  Doing this can
break SSA if the isomorphic increment is not dominated by the original
increment.  Get rid of the bypass, and let `hoistIVInc` do the right
thing.

Fixes PR27232 (compile time crash/hang).

llvm-svn: 269212
2016-05-11 17:41:34 +00:00
Sanjoy Das
8c40c5bf03 [SCEV] Be more aggressive around proving no-wrap
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount.  This is not a problem for "normal" loops[0] that don't have
guards or assumes, but helps in cases where we have guards or assumes in
the loop that can be used to constrain incoming values over the backedge.

This partially fixes PR27691 (we still don't handle the NUW case).

[0]: for "normal" loops, in the cases where we'd be able to prove
no-wrap via isKnownPredicate, we'd also be able to compute a max
tripcount.

llvm-svn: 269211
2016-05-11 17:41:26 +00:00
Vedant Kumar
641c55ebd0 [BasicAA] Compare GEP indices based on value (Fix PR27418)
Equivalent GEP indices with different types are treated as different
indices altogether, leading to an incorrect AA result. Fix the issue
by comparing indices based on their values.

Thanks to Mikael Holmén for reporting the issue!

Differential Revision: http://reviews.llvm.org/D19935

llvm-svn: 269197
2016-05-11 15:45:43 +00:00
Sanjoy Das
cf2951e884 [BasicAA] Guard intrinsics don't write to memory
Summary:
The idea is very close to what we do for assume intrinsics: we mark the
guard intrinsics as writing to arbitrary memory to maintain control
dependence, but under the covers we teach AA that they do not mod any
particular memory location.

Reviewers: chandlerc, hfinkel, gbiv, reames

Subscribers: george.burgess.iv, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19575

llvm-svn: 269007
2016-05-10 02:35:41 +00:00
Sanjoy Das
5e39b474b8 [SCEV] Use guards to prove predicates
We can use calls to @llvm.experimental.guard to prove predicates,
relying on the fact that in all locations domianted by a call to
@llvm.experimental.guard the predicate it is guarding is known to be
true.

llvm-svn: 268997
2016-05-10 00:31:49 +00:00
Simon Pilgrim
01729bd834 [X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets
As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton.

This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42.

Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each.

Differential Revision: http://reviews.llvm.org/D20057

llvm-svn: 268972
2016-05-09 21:14:38 +00:00
Matt Arsenault
006429b341 DivergenceAnalysis: Fix crash with no return blocks
The post dominator tree does not have a root node in this case.

llvm-svn: 268933
2016-05-09 16:57:08 +00:00
Simon Pilgrim
f8175826a4 [CostModel][X86] Extended comparison instruction cost model tests to include SSE2/SSE3/SSSE3/SSE41/SSE42 targets
llvm-svn: 268877
2016-05-08 15:24:53 +00:00
Simon Pilgrim
d0b4e9facb [CostModel][X86] Split BSWAP/BITREVERSE cost tests from CTPOP/CTLZ/CTTZ 'bit count' cost tests
llvm-svn: 268859
2016-05-07 16:34:16 +00:00
Simon Pilgrim
2ec2e6178f [CostModel][X86] Tweak 'SSE2-only' test CPU as it was only disabling SSE41 not SSE3/SSSE3 etc.
llvm-svn: 268763
2016-05-06 17:50:07 +00:00
Simon Pilgrim
2fab54a682 [CostModel][X86] Added ctlz/cttz undef-zero costmodel tests
llvm-svn: 268761
2016-05-06 17:48:35 +00:00
Simon Pilgrim
55d3667833 [CostModel][X86] Added costmodel tests for vector ctpop/ctlz/cttz/bitreverse/bswap
llvm-svn: 268738
2016-05-06 14:38:14 +00:00
Xinliang David Li
b6de0a1327 [PM] port Branch Frequency Analaysis pass to new PM
llvm-svn: 268687
2016-05-05 21:13:27 +00:00
Xinliang David Li
497e1a3160 [PM] Port Branch Probability Analysis pass to the new pass manager.
Differential Revision: http://reviews.llvm.org/D19839

llvm-svn: 268601
2016-05-05 02:59:57 +00:00
Sanjoy Das
822075f912 [SCEV] Tweak the output format and content of -analyze
In the "LoopDispositions:" section:

 - Instead of printing out a list, print out a "dictionary" to make it
   obvious by inspection which disposition is for which loop.  This is
   just a cosmetic change.

 - Print dispositions for parent _and_ sibling loops.  I will use this
   to write a test case.

llvm-svn: 268405
2016-05-03 17:49:57 +00:00
Nicolai Haehnle
634941ba37 AMDGPU: llvm.SI.fs.constant is a source of divergence
Summary:
This intrinsic is used to get flat-shaded fragment shader inputs. Those are
uniform across a primitive, but a fragment shader wave may process pixels from
multiple primitives (as indicated by the prim_mask), and so that's where
divergence can arise.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19747

llvm-svn: 268259
2016-05-02 17:37:01 +00:00