1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

1286 Commits

Author SHA1 Message Date
Chandler Carruth
6ac2421fa0 [PM] Clean up a bunch of the doxygen / API docs on the InstCombiner pass
prior to refactoring it.

llvm-svn: 226594
2015-01-20 19:27:58 +00:00
Chandler Carruth
8951826a61 [PM] Move the LoopInfo analysis pointer into the InstCombiner class
along with the other analyses.

The most obvious reason why is because eventually I need to separate out
the pass layer from the rest of the instcombiner. However, it is also
probably a compile time win as every query through the pass manager
layer is pretty slow these days.

llvm-svn: 226550
2015-01-20 08:35:24 +00:00
Chandler Carruth
c47432114d [PM] Split the LoopInfo object apart from the legacy pass, creating
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.

This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.

llvm-svn: 226373
2015-01-17 14:16:18 +00:00
Chandler Carruth
88fd126216 [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
NAKAMURA Takumi
0068e08baa Update libdeps since TLI was moved from Target to Analysis in r226078.
llvm-svn: 226126
2015-01-15 05:21:00 +00:00
Chandler Carruth
49a7633378 [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
David Majnemer
b89ab3fd88 InstCombine: Don't take A-B<0 into A<B if A-B has other uses
This fixes PR22226.

llvm-svn: 226023
2015-01-14 19:26:56 +00:00
Matt Arsenault
4b66850c40 Fix fcmp + fabs instcombines when using the intrinsic
This was only handling the libcall. This is another example
of why only the intrinsic should ever be used when it exists.

llvm-svn: 225465
2015-01-08 20:09:34 +00:00
David Majnemer
eb61de555d Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability
WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as
computeOverflowForUnsignedAdd.  It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.

llvm-svn: 225329
2015-01-07 00:39:50 +00:00
David Majnemer
d02481ebf3 InstCombine: Just a small tidy-up
llvm-svn: 225328
2015-01-07 00:39:42 +00:00
Matt Arsenault
416be52fbf Convert fcmp with 0.0 from casted integers to icmp
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.

This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.

Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.

Also fold cases that aren't integers to true / false.

llvm-svn: 225265
2015-01-06 15:50:59 +00:00
David Majnemer
82fa22459b InstCombine: Bitcast call arguments from/to pointer/integer type
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.

llvm-svn: 225254
2015-01-06 08:41:31 +00:00
Chandler Carruth
c140bae640 [PM] Split the AssumptionTracker immutable pass into two separate APIs:
a cache of assumptions for a single function, and an immutable pass that
manages those caches.

The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.

Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.

For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.

llvm-svn: 225131
2015-01-04 12:03:27 +00:00
David Majnemer
04c9e5e52d InstCombine: match can find ConstantExprs, don't assume we have a Value
We assumed the output of a match was a Value, this would cause us to
assert because we would fail a cast<>.  Instead, use a helper in the
Operator family to hide the distinction between Value and Constant.

This fixes PR22087.

llvm-svn: 225127
2015-01-04 07:36:02 +00:00
David Majnemer
78198d7245 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

llvm-svn: 225077
2015-01-02 07:29:47 +00:00
David Majnemer
a7058e95b3 Analysis: Reformulate WillNotOverflowUnsignedMul for reusability
WillNotOverflowUnsignedMul's smarts will live in ValueTracking as
computeOverflowForUnsignedMul.  It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.

llvm-svn: 225076
2015-01-02 07:29:43 +00:00
Sanjay Patel
657e61ccbf InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731

llvm-svn: 225050
2014-12-31 22:14:05 +00:00
David Majnemer
83939d3744 InstCombine: try to transform A-B < 0 into A < B
We are allowed to move the 'B' to the right hand side if we an prove
there is no signed overflow and if the comparison itself is signed.

llvm-svn: 225034
2014-12-31 04:21:41 +00:00
Philip Reames
4527f27ca2 Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Philip Reames
a4f427e6e7 Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.

llvm-svn: 224961
2014-12-29 22:46:21 +00:00
David Majnemer
c57dbde8fa InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

llvm-svn: 224849
2014-12-26 09:50:35 +00:00
David Majnemer
ce5bd510cb InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

llvm-svn: 224847
2014-12-26 09:10:14 +00:00
David Majnemer
3da9d34415 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
Bruno Cardoso Lopes
b25873afd1 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Sanjay Patel
b1bb7100db use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes
ba090bc5c2 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes
afb4b15ab3 [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
Sanjay Patel
f3ae90d3db fix formatting; NFC
llvm-svn: 224542
2014-12-18 21:11:09 +00:00
Erik Eckstein
042c032147 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
Steven Wu
17876438f0 More code format fix from r224133, NFC
llvm-svn: 224140
2014-12-12 18:48:37 +00:00
Steven Wu
498e8ef334 Restructure code from r224097. NFC
llvm-svn: 224133
2014-12-12 17:21:54 +00:00
Steven Wu
896d3dd47b Fix another infinite loop in InstCombine
Summary:
InstCombine infinite-loops for the testcase added
It is because InstCombine is generating instructions that can be
optimized by itself. Fix by not optimizing frem if the optimized
type is the same as original type.
rdar://problem/19150820

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6634

llvm-svn: 224097
2014-12-12 04:34:07 +00:00
Andrea Di Biagio
6186490ec7 [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

llvm-svn: 224054
2014-12-11 20:44:59 +00:00
Erik Eckstein
4078937e34 Refactor creation of overflow result tuples in InstCombineCalls.
Extract the creation of overflow result tuples in a separate function. NFC.

llvm-svn: 224006
2014-12-11 08:02:30 +00:00
Chandler Carruth
2100e2da37 Revert r223764 which taught instcombine about integer-based elment extraction
patterns.

This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.

This reverts commit r223764.

Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

llvm-svn: 223813
2014-12-09 19:21:16 +00:00
Duncan P. N. Exon Smith
3d57886267 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Chandler Carruth
318b867199 Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

Differential Revision: http://reviews.llvm.org/D6548

llvm-svn: 223764
2014-12-09 08:55:32 +00:00
Simon Pilgrim
0b4002e32c [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Erik Eckstein
b61d06dbaf InstCombine: simplify signed range checks
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n

llvm-svn: 223224
2014-12-03 10:39:15 +00:00
Philip Reames
02104421ff [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
David Majnemer
87a17a0975 InstCombine: FoldOrOfICmps harder
We may be in a situation where the icmps might not be near each other in
a tree of or instructions.  Try to dig out related compare instructions
and see if they combine.

N.B.  This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR.  We may have to resort
to something more sophisticated if this is a real problem.

llvm-svn: 222928
2014-11-28 19:58:29 +00:00
Ankur Garg
0bf088f50d Removed extra line from a comment to test first commit. NFC.
llvm-svn: 222916
2014-11-28 10:38:18 +00:00
David Majnemer
9698d3487d InstCombine: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) == 0 ? X ^ C : X  into  X | C
(X & C) != 0 ? X ^ C : X  into  X & ~C

llvm-svn: 222871
2014-11-27 07:25:21 +00:00
David Majnemer
7e9b94486e Revert "Added inst combine transforms for single bit tests from Chris's note"
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.

A test has been added to ensure that we don't regress again.

I'll work on a rewrite of what the optimization was trying to do later.

llvm-svn: 222856
2014-11-26 23:00:38 +00:00
Chandler Carruth
6264bc6537 [InstCombine] Change LLVM To canonicalize toward the value type being
stored rather than the pointer type.

This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.

With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.

I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.

llvm-svn: 222748
2014-11-25 10:09:51 +00:00
Chandler Carruth
7feb19d89c Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.

Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 222739
2014-11-25 08:20:27 +00:00
Matt Arsenault
454e837bd2 Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons
llvm-svn: 222705
2014-11-24 23:15:18 +00:00
David Majnemer
291966cd3b InstCombine: Don't create an unused instruction
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.

This fixes PR21653.

llvm-svn: 222659
2014-11-24 16:41:13 +00:00
David Majnemer
2445c6caf5 InstCombine: Don't assume DataLayout is always available
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout.  This resulted in opt crashing.

This fixes PR21651.

llvm-svn: 222645
2014-11-24 07:26:20 +00:00
David Majnemer
0b413925f3 InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2)
llvm-svn: 222625
2014-11-22 20:00:41 +00:00