1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

8101 Commits

Author SHA1 Message Date
Piotr Padlewski
bda08e96a0 [MemDep] Handle gep with zeros for invariant.group
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.

Simple case like

    void g(A &a) {
        z(a);
        if (glob)
            a.foo();
    }
    void testG() {
        A a;
        g(a);
    }

was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis

Reviewers: dberlin, nlewycky, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28126

llvm-svn: 290763
2016-12-30 18:45:07 +00:00
Michael Kuperstein
173ab48710 [LICM] When promoting scalars, allow inserting stores to thread-local allocas.
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.

Differential Revision: https://reviews.llvm.org/D28170

llvm-svn: 290738
2016-12-30 01:03:17 +00:00
Dehao Chen
6e6c58680c Use continuous boosting factor for complete unroll.
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Reviewers: mzolotukhin, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26989

llvm-svn: 290737
2016-12-30 00:50:28 +00:00
Daniel Berlin
aede96b453 NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again.
llvm-svn: 290724
2016-12-29 22:15:12 +00:00
Craig Topper
412163fcd4 [InstCombine] Fix some of the AVX-512 scalar arithmetic test cases to do a better job of testing what they intended to test.
The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts.

llvm-svn: 290702
2016-12-29 02:29:04 +00:00
Chandler Carruth
176c94be9c [PM] Introduce a devirtualization iteration layer for the new PM.
This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.

The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.

The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.

This code also provides for a basic count heuristic: if the number of
indirect calls decreases and the number of direct calls increases for
a given function in the SCC, we assume devirtualization is responsible.
This matches the heuristics currently used in the legacy pass manager.

Differential Revision: https://reviews.llvm.org/D23114

llvm-svn: 290665
2016-12-28 11:07:33 +00:00
Chandler Carruth
90f7376682 [PM] Teach the CGSCC's CG update utility to more carefully invalidate
analyses when we're about to break apart an SCC.

We can't wait until after breaking apart the SCC to invalidate things:
1) Which SCC do we then invalidate? All of them?
2) Even if we invalidate all of them, a newly created SCC may not have
   a proxy that will convey the invalidation to functions!

Previously we only invalidated one of the SCCs and too late. This led to
stale analyses remaining in the cache. And because the caching strategy
actually works, they would get used and chaos would ensue.

Doing invalidation early is somewhat pessimizing though if we *know*
that the SCC structure won't change. So it turns out that the design to
make the mutation API force the caller to know the *kind* of mutation in
advance was indeed 100% correct and we didn't do enough of it. So this
change also splits two cases of switching a call edge to a ref edge into
two separate APIs so that callers can clearly test for this and take the
easy path without invalidating when appropriate. This is particularly
important in this case as we expect most inlines to be between functions
in separate SCCs and so the common case is that we don't have to so
aggressively invalidate analyses.

The LCG API change in turn needed some basic cleanups and better testing
in its unittest. No interesting functionality changed there other than
more coverage of the returned sequence of SCCs.

While this seems like an obvious improvement over the current state, I'd
like to revisit the core concept of invalidating within the CG-update
layer at all. I'm wondering if we would be better served forcing the
callers to handle the invalidation beforehand in the cases that they
can handle it. An interesting example is when we want to teach the
inliner to *update and preserve* analyses. But we can cross that bridge
when we get there.

With this patch, the new pass manager an build all of the LLVM test
suite at -O3 and everything passes. =D I haven't bootstrapped yet and
I'm sure there are still plenty of bugs, but this gives a nice baseline
so I'm going to increasingly focus on fleshing out the missing
functionality, especially the bits that are just turned off right now in
order to let us establish this baseline.

llvm-svn: 290664
2016-12-28 10:34:50 +00:00
Chandler Carruth
69a65a83e5 [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Michael Kuperstein
04dff9bc7a [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Bryant Wong
aa5f57bf44 [MemCpyOpt] Don't sink LoadInst below possible clobber.
Differential Revision: https://reviews.llvm.org/D26811

llvm-svn: 290611
2016-12-27 17:58:12 +00:00
Chandler Carruth
3a1d9fe91a [PM] Turn on the new PM's inliner in addition to the current one for
most of the inliner test cases.

The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.

There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.

llvm-svn: 290592
2016-12-27 07:18:43 +00:00
Chandler Carruth
fba79f0538 [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Chandler Carruth
44c5a10c53 [PM] Wire up another test to the new pass manager.
Nothing really interesting here, but I had to improve the test to use
variables rather than hard coding value names as we happen to end up
with different value names in the new PM.

llvm-svn: 290589
2016-12-27 06:46:16 +00:00
George Burgess IV
1ca7f8d821 [Analysis] Ignore nobuiltin on allocsize function calls.
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.

llvm-svn: 290588
2016-12-27 06:32:14 +00:00
Craig Topper
404ccdcfc2 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Chandler Carruth
56ddbe594d [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Chandler Carruth
a375a1b982 [Inliner] Modernize all of the inliner tests that were using grep.
This mostly involved converting from grep to FileCheck and tidying up
the IR used.

In one case (invoke_test-3.ll) the test had become completely pointless
as we use 'resume' rather than 'unwind' now, and even then it did not
occur at the end of the line.

llvm-svn: 290570
2016-12-27 02:47:37 +00:00
Craig Topper
9b6bc7d9d3 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Craig Topper
74c185fe10 [InstCombine][AVX-512] Add masked scalar add/sub/mul/div intrinsic test cases that don't have a CUR_DIRECTION rounding mode.
The CUR_DIRECTION case will be optimized in a future commit so this provides coverage for the other cases.

llvm-svn: 290565
2016-12-27 01:56:27 +00:00
Chandler Carruth
2f1d94e61e [PM] Move the collection of call sites to a more appropriate place
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.

Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.

I've also added a short circuit to the scan for call sites based on what
other code is doing.

With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.

I'll make my way through most of the other inliner test cases
just to get some easy coverage next.

llvm-svn: 290562
2016-12-27 01:24:50 +00:00
Craig Topper
e05e26a237 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Chandler Carruth
34293dff3c [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Simon Pilgrim
7e05f66baa [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Daniel Berlin
b7b850ccf1 Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. Fixes PR/31472
llvm-svn: 290549
2016-12-26 18:44:36 +00:00
Davide Italiano
cd290494fd [NewGVN] Change test to reflect difference between GVN and NewGVN.
The current GVN algorithm folds unconditional branches to, it claims,
expose more PRE oportunities. The folding, if really needed,
(which is not sure, as it's not really proved it improves analysis)
can be done by an earlier cleanup pass instead of GVN itself.
Ack'ed/SGTM'd by Daniel Berlin.

Differential Revision:  https://reviews.llvm.org/D28117

llvm-svn: 290546
2016-12-26 18:10:09 +00:00
Bryant Wong
6160c40a1c [InstCombiner] Simplify lib calls to round{,f}
Differential Revision: https://reviews.llvm.org/D28110

llvm-svn: 290542
2016-12-26 14:29:29 +00:00
Chandler Carruth
e6c6ee9be7 Test the different scenarios of GlobalDCE and comdats more
systematically and document in the test what all is going on.

This replaces the PR-named test that was the only coverage for GlobalDCE
and comdats previously. I wrote this because I wasn't certain how
comdat DCE was supposed to work and wanted to step through what
GlobalDCE did to fully understand it. After talking to folks and reading
the code and really staring at things it all makes sense but it seemed
good to help write down some of this in a more explicit and fully
covering test case.

For example, it seemed like a bug that GlobalDCE didn't consider comdat
participation of ifuncs. Specifically it seemed like an accident because
testing didn't really cover that case. But in fact, ifuncs specifically
cannot participate in a comdat despite having that API. The new test
case covers this and explicitly documents that DCE gets to fire here
even though there are comdats involved.

Also, we didn't have any positive tests for the challenging cases such
as usage cycles between comdat participants that might make them seem
alive except that there is no external edge into the cycle.

llvm-svn: 290537
2016-12-26 08:54:01 +00:00
Craig Topper
75e23d22b5 [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper
e3cd4f35c0 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
Bryant Wong
b0749ef47d [AliasAnalysis] Teach BasicAA about memcpy.
Differential Revision: https://reviews.llvm.org/D27034

llvm-svn: 290526
2016-12-25 22:42:27 +00:00
Daniel Berlin
cf6a330da2 Value number stores and memory states so we can detect when memory states are equivalent (IE store of same value to memory).
Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28084

llvm-svn: 290525
2016-12-25 22:23:49 +00:00
Simon Pilgrim
29ce1efcdf [InstCombine][X86] Add tests showing missed opportunities to simplify PMULUDQ/PMULDQ inputs.
PMULUDQ/PMULDQ - only the even elements (0, 2, 4, 6) of the vXi32 inputs are required.

llvm-svn: 290502
2016-12-24 17:30:19 +00:00
Chandler Carruth
286fbb87b8 [PM] Remove a bunch of junk that snuck in when I failed at manipulating
my editor to close and commit the patch. Sorry for the noise.

llvm-svn: 290460
2016-12-23 23:39:31 +00:00
Chandler Carruth
1221aba464 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Chandler Carruth
5f8ab3d7d6 [PM] Clean up test case and comments a bit. NFC.
llvm-svn: 290456
2016-12-23 23:33:32 +00:00
Davide Italiano
8b7b8bb5e6 [LICM] Work around LICM needs to maintain state across loops.
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.

Differential Revision:  https://reviews.llvm.org/D25848

llvm-svn: 290427
2016-12-23 13:12:50 +00:00
Chandler Carruth
7ce1098e2b Fix some DOS-style line endings that I suspect snuck in from one of the
frustrating Subversion clients that fails to do line ending translation
of text files.

llvm-svn: 290404
2016-12-23 02:02:26 +00:00
Evgeniy Stepanov
410a503700 [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Davide Italiano
3893354b16 [NewGVN] Add the pass to PassRegistry.def.
We need to hook up here to get it working with the new PM.
Add a test while here (and remove a typo).

llvm-svn: 290350
2016-12-22 16:35:02 +00:00
Davide Italiano
bd0298b1e1 [GVN] Initial check-in of a new global value numbering algorithm.
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...

The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.

Differential Revision:  https://reviews.llvm.org/D26224

llvm-svn: 290346
2016-12-22 16:03:48 +00:00
Adrian Prantl
1f3fc31b75 Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.

Differential Revision: https://reviews.llvm.org/D27765

llvm-svn: 290292
2016-12-22 00:45:21 +00:00
Adrian Prantl
6adadd607c Legalize metadata in legacy testcases
llvm-svn: 290288
2016-12-21 23:38:17 +00:00
Adrian Prantl
957c27b43e Legalize metadata in legacy testcases
llvm-svn: 290287
2016-12-21 23:36:06 +00:00
Adrian Prantl
f54db8e4ad Legalize metadata in legacy testcases
llvm-svn: 290286
2016-12-21 23:30:35 +00:00
David Majnemer
b66f37d6ae Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.

llvm-svn: 290266
2016-12-21 19:21:59 +00:00
Adam Nemet
ffa0068101 [LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).

As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.

clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.

This changes opt's default to off as well to match clang.  The tests are
modified to explicitly enable the transformation.

llvm-svn: 290235
2016-12-21 04:07:40 +00:00
George Burgess IV
8761f680f4 [Analysis] Centralize objectsize lowering logic.
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.

llvm-svn: 290214
2016-12-20 23:46:36 +00:00
Chandler Carruth
74eb1e2070 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Adrian Prantl
cf7e846d11 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Sanjay Patel
11ebb18e12 [InstCombine] use commutative matcher for pattern with commutative operators
This is a case that was missed in:
https://reviews.llvm.org/rL290067
...and it would regress if we fix operand complexity (PR28296).

llvm-svn: 290127
2016-12-19 18:35:37 +00:00
Sanjay Patel
8dd6f085af [InstCombine] add folds for icmp (umin|umax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (https://reviews.llvm.org/D27531)
https://reviews.llvm.org/rL290111

llvm-svn: 290118
2016-12-19 17:32:37 +00:00
Florian Hahn
f1e8664904 [LoopVersioning] Require loop-simplify form for loop versioning.
Summary:
Requiring loop-simplify form for loop versioning ensures that the
runtime check block always dominates the exit block.
    
This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958).

Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema

Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27469

llvm-svn: 290116
2016-12-19 17:13:37 +00:00
Sanjay Patel
534830670e [InstCombine] add folds for icmp (smax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (D27531)

llvm-svn: 290111
2016-12-19 16:28:53 +00:00
Daniel Jasper
162ffcacd6 Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Sanjay Patel
f685720721 [InstCombine] use commutative matchers for patterns with commutative operators
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296

I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.

But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.

We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but 
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted 
patterns because they require adjustments to the match order.

I'm aware of the concern about the potential compile-time performance impact of adding 
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.

Differential Revision: https://reviews.llvm.org/D24419

llvm-svn: 290067
2016-12-18 18:49:48 +00:00
Evgeniy Stepanov
eaffeadc52 Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
2016-12-17 01:53:15 +00:00
Michael Kuperstein
37502008ed Preserve loop metadata when folding branches to a common destination.
Differential Revision: https://reviews.llvm.org/D27830

llvm-svn: 289992
2016-12-16 21:23:59 +00:00
Jun Bum Lim
4f195c2484 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289988
2016-12-16 20:38:39 +00:00
Adrian Prantl
0ab6669d6d Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Matthew Simpson
7fb206b6f3 Reapply "[LV] Enable vectorization of loops with conditional stores by default"
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.

llvm-svn: 289975
2016-12-16 19:12:02 +00:00
Sanjoy Das
f115142e03 Fix CodeGenPrepare::stripInvariantGroupMetadata
`dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs.  The
only reason it worked at all is that `getMetadataID` returns something
unrelated -- it returns the subclass ID of the receiver (which is used
in `dyn_cast` etc.).  That does not numerically match
`LLVMContext::MD_invariant_group` and ends up dropping `invariant_group`
along with every other metadata that does not numerically match
`LLVMContext::MD_invariant_group`.

llvm-svn: 289973
2016-12-16 18:52:33 +00:00
Jun Bum Lim
6a3faa2a6d Revert "[CodeGenPrep] Skip merging empty case blocks"
This reverts commit r289951.

llvm-svn: 289960
2016-12-16 17:06:14 +00:00
Sanjay Patel
af961c3332 [InstCombine] auto-generate checks; NFC
llvm-svn: 289959
2016-12-16 16:58:54 +00:00
Matthew Simpson
add3b6cc60 [LV] Don't attempt to type-shrink scalarized instructions
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.

llvm-svn: 289958
2016-12-16 16:52:35 +00:00
Jun Bum Lim
ab29427ce1 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289951
2016-12-16 16:03:31 +00:00
Chandler Carruth
9362cf3865 Revert r289863: [LV] Enable vectorization of loops with conditional
stores by default

This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.

llvm-svn: 289934
2016-12-16 11:31:39 +00:00
Adrian Prantl
2345112c5b [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Adrian Prantl
daf4fef1f9 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl
0eee52640f [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Peter Collingbourne
dec168cd58 IPO: Introduce ThinLTOBitcodeWriter pass.
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899
2016-12-16 00:26:30 +00:00
Davide Italiano
73e0a445d1 [SimplifyLibCalls] Add a test to make sure we lower fls(0) correctly.
llvm-svn: 289895
2016-12-15 23:48:07 +00:00
Davide Italiano
1b19bf8526 [SimplifyLibCalls] Lower fls() to llvm.ctlz().
Differential Revision:  https://reviews.llvm.org/D14590

llvm-svn: 289894
2016-12-15 23:45:11 +00:00
Matthew Simpson
765604b11b [LV] Enable vectorization of loops with conditional stores by default
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".

Differential Revision: https://reviews.llvm.org/D27814

llvm-svn: 289863
2016-12-15 20:11:05 +00:00
Sanjay Patel
c02483f504 [InstCombine] add folds for icmp (smin X, Y), X
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. 
This patch won't solve the example that was attached to that thread, so something else still needs fixing.

The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.

Corresponding changes for smax/umin/umax to follow.

Differential Revision: https://reviews.llvm.org/D27531

llvm-svn: 289855
2016-12-15 19:13:37 +00:00
Teresa Johnson
4afc73f0ba [ThinLTO] Ensure callees get hot threshold when first seen on cold path
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843
2016-12-15 18:21:01 +00:00
Alexey Bataev
abd41e4fd6 [TEST] Initial commit of tests for minmax horizontal reductions.
llvm-svn: 289817
2016-12-15 13:21:29 +00:00
Ehsan Amiri
790f008233 [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp
A number of new patterns for simplifying and/xor of icmp:

(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.

(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
   %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.

llvm-svn: 289813
2016-12-15 12:25:13 +00:00
Craig Topper
589041bbd2 [AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts.
llvm-svn: 289759
2016-12-15 03:49:45 +00:00
Hal Finkel
f224db75d2 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Hal Finkel
502475d4f3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Dehao Chen
4c1d21ec8c Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Geoff Berry
e0fc91243e [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Craig Topper
a7a943094c [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics.
llvm-svn: 289639
2016-12-14 07:46:12 +00:00
Craig Topper
402d6bf13b [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar add/sub/mul/div/max/min intrinsics better.
Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking.

llvm-svn: 289636
2016-12-14 06:06:58 +00:00
Anna Thomas
d7d47f4ccb [IRCE] Avoid loop optimizations on pre and post loops
Summary:
This patch will add loop metadata on the pre and post loops generated by IRCE.
Currently, we have metadata for disabling optimizations such as vectorization,
unrolling, loop distribution and LICM versioning (and confirmed that these
optimizations check for the metadata before proceeding with the transformation).

The pre and post loops generated by IRCE need not go through loop opts (since
these are slow paths).

Added two test cases as well.

Reviewers: sanjoy, reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26806

llvm-svn: 289588
2016-12-13 21:05:21 +00:00
Michael Kuperstein
2c0f3dbd6f [LV] Don't vectorize when we have a small static bound on trip count
We currently check if the exact trip count is known and is smaller than the
"tiny loop" bound. We should be checking the maximum bound on the trip count
instead.

Differential Revision: https://reviews.llvm.org/D27690

llvm-svn: 289583
2016-12-13 20:38:18 +00:00
Rong Xu
e788cd0e3c Fix the test cases committed in r289521.
llvm-svn: 289556
2016-12-13 17:34:29 +00:00
David Callahan
86eb682529 [ADCE] Add code to remove dead branches
Summary:
This is last in of a series of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch adds the code to update the control and data flow graphs
to remove the dead control flow.

Also update unit tests to test the capability to remove dead,
may-be-infinite loop which is enabled by the switch
-adce-remove-loops.

Previous patches:

D23824 [ADCE] Add handling of PHI nodes when removing control flow
D23559 [ADCE] Add control dependence computation
D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Reviewers: dberlin, majnemer, nadav, mehdi_amini

Subscribers: llvm-commits, david2050, freik, twoh

Differential Revision: https://reviews.llvm.org/D24918

llvm-svn: 289548
2016-12-13 16:42:18 +00:00
Craig Topper
e3edc0aa09 [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly.
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.

Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.

llvm-svn: 289523
2016-12-13 07:45:45 +00:00
NAKAMURA Takumi
649a346af7 llvm/test/Transforms/PGOProfile/noreturncall.ll REQUIRES asserts due to -debug-only.
llvm-svn: 289522
2016-12-13 07:04:03 +00:00
Rong Xu
e6163512da [PGO] Fix insane counts due to nonreturn calls
Summary:
Since we don't break BBs for function calls. We might get some insane counts 
(wrap of unsigned) in the presence of noreturn calls.

This patch sets these counts to zero instead of the wrapped number.

Reviewers: davidxl

Subscribers: xur, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D27602

llvm-svn: 289521
2016-12-13 06:41:14 +00:00
Matthew Simpson
1d43cda909 [SLP] Fix sign-extends for type-shrinking
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.

Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466

llvm-svn: 289470
2016-12-12 21:11:04 +00:00
Reid Kleckner
a3cd1e4795 Revert "[SCEVExpand] do not hoist divisions by zero (PR30935)"
Reverts r289412. It caused an OOB PHI operand access in instcombine when
ASan is enabled. Reduction in progress.

Also reverts "[SCEVExpander] Add a test case related to r289412"

llvm-svn: 289453
2016-12-12 18:52:32 +00:00
Sanjay Patel
efaf5285a5 remove stale FIXME note from test; NFC
llvm-svn: 289445
2016-12-12 16:20:21 +00:00
Sanjay Patel
1d3228abc5 [InstCombine] fix bug when offsetting case values of a switch (PR31260)
We could truncate the condition and then try to fold the add into the
original condition value causing wrong case constants to be used.

Move the offset transform ahead of the truncate transform and return
after each transform, so there's no chance of getting confused values.

Fix for:
https://llvm.org/bugs/show_bug.cgi?id=31260

llvm-svn: 289442
2016-12-12 16:13:52 +00:00
Sanjay Patel
6b6a37e603 [InstCombine] add test to show PR31260 miscompile; NFC
llvm-svn: 289437
2016-12-12 15:28:44 +00:00
Sebastian Pop
9402602249 [SCEVExpand] do not hoist divisions by zero (PR30935)
SCEVExpand computes the insertion point for the components of a SCEV to be code
generated.  When it comes to generating code for a division, SCEVexpand would
not be able to check (at compilation time) all the conditions necessary to avoid
a division by zero.  The patch disables hoisting of expressions containing
divisions by anything other than non-zero constants in order to avoid hoisting
these expressions past conditions that should hold before doing the division.

The patch passes check-all on x86_64-linux.

Differential Revision: https://reviews.llvm.org/D27216

llvm-svn: 289412
2016-12-12 02:52:51 +00:00
Craig Topper
0827c06fc5 [InstCombine][XOP] The instructions for the scalar frcz intrinsics are defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead.
llvm-svn: 289411
2016-12-11 22:32:38 +00:00
Sanjoy Das
a0e8011216 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Sanjay Patel
c88c7809de [Constants] don't die processing non-ConstantInt GEP indices in isGEPWithNoNotionalOverIndexing() (PR31262)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31262

llvm-svn: 289401
2016-12-11 20:07:02 +00:00
Craig Topper
dedc8326dd [X86][InstCombine] Add support for scalar FMA intrinsics to SimplifyDemandedVectorElts.
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.

llvm-svn: 289377
2016-12-11 08:54:52 +00:00
Craig Topper
c6fc79f217 [X86][InstCombine] Add the test cases for r289370, r289371, and r289372.
I forgot to add the new files before commiting.

llvm-svn: 289374
2016-12-11 08:00:51 +00:00
Craig Topper
e978214668 [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit.
llvm-svn: 289354
2016-12-11 01:59:36 +00:00
Craig Topper
f2fbdfebf3 [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant.
llvm-svn: 289348
2016-12-11 00:23:50 +00:00
Davide Italiano
55a8a72b10 [SCCP] Make the test added in r289175 more meaningful.
Add a comment while here.

llvm-svn: 289182
2016-12-09 03:49:20 +00:00
Davide Italiano
c10f9279fa [SCCP] Teach the pass about mul %x 0 even if %x is overdefined.
The motivating example is:

extern int patatino;
int goo() {
    int x = 0;
    for (int i = 0; i < 1000000; ++i) {
        x *= patatino;
    }
    return x;
}

Currently SCCP will not realize that this function returns always zero,
therefore will try to unroll and vectorize the loop at -O3 producing an
awful lot of (useless) code. With this change, it will just produce:

0000000000000000 <g>:
   xor    %eax,%eax
   retq

llvm-svn: 289175
2016-12-09 03:08:42 +00:00
Peter Collingbourne
876ea64788 WholeProgramDevirt: Teach the pass to handle structs of arrays.
This will become necessary in some cases once D22296 lands.

llvm-svn: 289165
2016-12-09 01:10:11 +00:00
Peter Collingbourne
bb7c0a1390 Make WholeProgramDevirt understand ConstStruct vtables.
Based on a patch by LemonBoy!

Differential Revision: https://reviews.llvm.org/D26581

llvm-svn: 289162
2016-12-09 00:33:27 +00:00
Sanjay Patel
d1d40c15d9 [InstCombine] add tests for umin+icmp; NFC
llvm-svn: 289157
2016-12-08 23:44:58 +00:00
Sanjay Patel
ea303585a1 [InstCombine] add tests for umax+icmp; NFC
llvm-svn: 289156
2016-12-08 23:36:57 +00:00
Zia Ansari
df9164118d [InstSimplify] Add "X / 1.0" to SimplifyFDivInst.
Differential Revision: https://reviews.llvm.org/D27587

llvm-svn: 289153
2016-12-08 23:27:40 +00:00
Sanjay Patel
8fa56c0769 [InstCombine] add tests for smax+icmp; NFC
llvm-svn: 289151
2016-12-08 23:16:06 +00:00
Davide Italiano
b16c6ac588 [SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.
Currently SCCP folds the value to -1, while ConstantProp folds to
0. This changes SCCP to do what ConstantFolding does.

llvm-svn: 289147
2016-12-08 22:28:53 +00:00
Sanjay Patel
a17eff85b8 [InstSimplify] add fdiv x/1.0 test and update checks; NFC
llvm-svn: 289098
2016-12-08 20:23:56 +00:00
Peter Collingbourne
a2d4395226 IR, X86: Understand !absolute_symbol metadata on global variables.
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087
2016-12-08 19:01:00 +00:00
Alexey Bataev
3aadace193 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 289043
2016-12-08 11:57:51 +00:00
Evgeniy Stepanov
f4b13bac03 CFI-icall on Thumb
Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb.
Use b.w branch instruction.
Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set.

This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc.

Differential Revision: https://reviews.llvm.org/D27499

llvm-svn: 289008
2016-12-08 00:32:26 +00:00
Davide Italiano
e638f91de2 [BDCE] Skip metadata while replacing uses.
The fix committed in r288851 doesn't cover all the cases.
In particular, if we have an instruction with side effects
which has a no non-dbg use not depending on the bits, we still
perform RAUW destroying the dbg.value's first argument.
Prevent metadata from being replaced here to avoid the issue.

Differential Revision:  https://reviews.llvm.org/D27534

llvm-svn: 288987
2016-12-07 21:47:32 +00:00
Matt Arsenault
61527cb46b InstCombine: Fold bitcast of vector to FP scalar
llvm-svn: 288978
2016-12-07 20:56:11 +00:00
Eli Friedman
781ad0c5f8 [GVNHoist] Invalidate MemDep when an instruction is moved.
See also r279907.

Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 .

Differential Revision: https://reviews.llvm.org/D27493

llvm-svn: 288968
2016-12-07 19:55:59 +00:00
Sanjay Patel
3a0c404a96 [InstCombine] add tests for smin+icmp; NFC
The tests that already work are folded in InstSimplify, so those
tests should be redundant and we can remove them if they don't
seem worthwhile for completeness.

llvm-svn: 288957
2016-12-07 18:56:55 +00:00
Matthew Simpson
aa3b094d0c [LV] Scalarize operands of predicated instructions
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.

The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.

Differential Revision: https://reviews.llvm.org/D26083

llvm-svn: 288909
2016-12-07 15:03:32 +00:00
Andrea Di Biagio
a466d9dd9e When GVN removes a redundant load, it should not modify the debug location of the dominating load.
In the case of a fully redundant load LI dominated by an equivalent load V, GVN
should always preserve the original debug location of V. Otherwise, we risk to
introduce an incorrect stepping.
If V has debug info, then clearly it should not be modified. If V has a null
debugloc, then it is still potentially incorrect to propagate LI's debugloc
because LI may not post-dominate V.

Differential Revision: https://reviews.llvm.org/D27468

llvm-svn: 288903
2016-12-07 12:31:36 +00:00
Peter Collingbourne
04abb3931e LowerTypeTests: Add a test that covers "unsatisfiable" type metadata.
llvm-svn: 288881
2016-12-07 03:04:34 +00:00
Sanjay Patel
a12bf57077 [InstSimplify] fixed (?) to not mutate icmps
As Eli noted in the post-commit thread for r288833, the use of
swapOperands() may not be allowed in InstSimplify, so I'm 
removing those calls here pending further review. 

The swap mutates the icmp, and there doesn't appear to be precedent
for instruction mutation in InstSimplify.

I didn't actually have any tests for those cases, so I'm adding
a few here. 

llvm-svn: 288855
2016-12-06 22:09:52 +00:00
Davide Italiano
7c4753662d [BDCE/DebugInfo] Preserve llvm.dbg.value's argument.
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.

Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
  %call = tail call i32 (...) @g() #4, !dbg !15
  tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
  %call = tail call i32 (...) @g() #4, !dbg !15
  tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17

but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.

Differential Revision:  https://reviews.llvm.org/D27471

llvm-svn: 288851
2016-12-06 21:52:47 +00:00
Sanjay Patel
cadc4eed3a [InstSimplify] add folds for and-of-icmps with same operands
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.

This is the 'and' sibling of the earlier 'or' patch:
https://reviews.llvm.org/rL288833

llvm-svn: 288841
2016-12-06 19:05:46 +00:00
Sanjay Patel
5b580504bd [InstSimplify] add tests for and-of-icmps; NFC
llvm-svn: 288837
2016-12-06 18:46:54 +00:00
Sanjay Patel
b173c42cbd [InstSimplify] add folds for or-of-icmps with same operands
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.

llvm-svn: 288833
2016-12-06 18:09:37 +00:00
Sanjay Patel
513158aa27 [InstSimplify] add tests for or-of-icmps; NFC
llvm-svn: 288830
2016-12-06 17:49:10 +00:00
Simon Pilgrim
d14b61c0c1 [SLPVectorizer][X86] Tests to show missed buildvector sitofp/fptosi vectorizations
e.g.
buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32))

llvm-svn: 288807
2016-12-06 13:29:55 +00:00
Keno Fischer
62fcb22ba0 [LAA] Prevent invalid IR for loop-invariant bound in loop body
Summary:
If LAA expands a bound that is loop invariant, but not hoisted out
of the loop body, it used to use that value anyway, causing a
non-domination error, because the memcheck block is of course not
dominated by the scalar loop body. Detect this situation and expand
the SCEV expression instead.

Fixes PR31251

Reviewers: anemet
Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D27397

llvm-svn: 288705
2016-12-05 21:25:03 +00:00
Adrian Prantl
57907269da [DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation
so we can stop using DW_OP_bit_piece with the wrong semantics.

The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html

The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.

Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.

Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809

llvm-svn: 288683
2016-12-05 18:04:47 +00:00
Sanjay Patel
1aee656fa9 [InstCombine] change select type to eliminate bitcasts
This solves a secondary problem seen in PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137#c6

This is similar to the bitwise logic op fold added with:
https://reviews.llvm.org/rL287707

And like that patch, I'm artificially restricting the
transform from vector <-> scalar types until we're sure
that the backend can handle that. 

llvm-svn: 288584
2016-12-03 15:25:16 +00:00
Guozhi Wei
27571a5d37 [ppc] Correctly compute the cost of loading 32/64 bit memory into VSR
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.

Differential Revision: https://reviews.llvm.org/D26713

llvm-svn: 288560
2016-12-03 00:41:43 +00:00
Rong Xu
33fafefbc5 [PGO] Fix PGO use ICE when there are unreachable BBs
For -O0 there might be unreachable BBs, which breaks the assumption that all the
BBs have an auxiliary data structure.  In this patch, we add another interface
called findBBInfo() so that a nullptr can be returned for the unreachable BBs
(and the callers can ignore those BBs).

This fixes the bug reported
https://llvm.org/bugs/show_bug.cgi?id=31209

Differential Revision: https://reviews.llvm.org/D27280

llvm-svn: 288528
2016-12-02 19:10:29 +00:00
Matt Arsenault
ffb70acf5d AMDGPU: Implement isCheapAddrSpaceCast
llvm-svn: 288523
2016-12-02 18:12:53 +00:00
Simon Pilgrim
05718aa6c9 [InstCombine] Add vector urem tests
Demonstrate missed opportunity for urem -> and combine for powerof2 or zero non-uniform constant dividers

llvm-svn: 288510
2016-12-02 17:16:21 +00:00
Simon Pilgrim
4306301707 [InstCombine] Regenerate vector srem tests
llvm-svn: 288509
2016-12-02 17:12:56 +00:00
Renato Golin
78b2b93486 Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.

llvm-svn: 288508
2016-12-02 16:56:26 +00:00
Alexey Bataev
b767847ade [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 288497
2016-12-02 12:20:22 +00:00
Simon Pilgrim
2752efeba9 [SLPVectorizer][X86] Add tests for vectorization of buildvector of scalar fp-ops (PR6246)
llvm-svn: 288492
2016-12-02 10:54:46 +00:00
Artem Belevich
18e8ce875f Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."
This reverts r288412 which causes severe compile-time regression.

llvm-svn: 288431
2016-12-01 22:52:15 +00:00
Philip Reames
460755f4a6 [PR29121] Don't fold if it would produce atomic vector loads or stores
The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.

Differential Revision: https://reviews.llvm.org/D24365

llvm-svn: 288415
2016-12-01 20:17:06 +00:00
Alexey Bataev
ec86a0f4e9 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 288412
2016-12-01 20:06:53 +00:00
Alexey Bataev
7135b5ad24 [SLP] Fixed cost model for horizontal reduction.
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.

Differential Revision: https://reviews.llvm.org/D26277

llvm-svn: 288398
2016-12-01 18:42:42 +00:00
Adam Nemet
c8b56276e7 [GVN, OptDiag] Print the interesting instructions involved in missed load-elimination
[recommitting after the fix in r288307]

This includes the intervening store and the load/store that we're trying
to forward from in the optimization remark for the missed load
elimination.

This is hooked up under a new mode in ORE that allows for compile-time
budget for a bit more analysis to print more insightful messages.  This
mode is currently enabled for -fsave-optimization-record (-Rpass is
trickier since it is controlled in the front-end).

With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446

Differential Revision: https://reviews.llvm.org/D26490

llvm-svn: 288381
2016-12-01 17:34:50 +00:00
Adam Nemet
05eccda0d4 [GVN, OptDiag] Include the value that is forwarded in load elimination
[recommitting after the fix in r288307]

This requires some changes to the opt-diag API.  Hal and I have
discussed this at the Dev Meeting and came up with a streaming delimiter
(setExtraArgs) to solve this.

Arguments after this delimiter are only included in the optimization
records and not in the remarks printed in the compiler output.  (Note,
how in the test the content of the YAML file changes but the remarks on
the compiler output don't.)

This implements the green GVN message with a bug fix at line
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446

The fix is that now we properly include the constant value in the
message: "load of type i32 eliminated in favor of 7"

Differential Revision: https://reviews.llvm.org/D26489

llvm-svn: 288380
2016-12-01 17:34:44 +00:00
Alexey Bataev
68437dbc6c [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288377
2016-12-01 17:26:54 +00:00
Alexey Bataev
ef66b3c144 Revert "[SLP] Additional tests with the cost of vector operations."
This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e.

llvm-svn: 288371
2016-12-01 16:45:04 +00:00