1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

1957 Commits

Author SHA1 Message Date
Sanjay Patel
6e95a2af4a [InstCombine] use canEvaluateShiftedShift() to handle the lshr case (NFCI)
We need just a couple of logic tweaks to consolidate the shl and lshr cases.

This is step 5 of refactoring to solve PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760

llvm-svn: 265965
2016-04-11 17:11:55 +00:00
Sanjay Patel
c3b4340d01 [InstCombine] don't try to shift an illegal amount (PR26760)
This is the straightforward fix for PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760

But we still need to make some changes to generalize this helper function
and then send the lshr case into here.

llvm-svn: 265960
2016-04-11 16:50:32 +00:00
Sanjay Patel
9d615dcbe9 [InstCombine] rename variables in shifted-shift helper function (NFCI)
This is step 3 of refactoring to solve PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760

llvm-svn: 265954
2016-04-11 16:11:07 +00:00
Sanjay Patel
2bc03c53fa [InstCombine] add helper function for shift-shift optimization (NFCI)
This is step 2 of refactoring to solve PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760

llvm-svn: 265951
2016-04-11 15:43:41 +00:00
David Majnemer
08e9ca8730 [InstCombine] Fix miscompile in FoldSPFofSPF
We had a select of a cast of a select but attempted to replace the outer
select with the inner select dispite their incompatible types.

Patch by Anton Korobeynikov!

This fixes PR27236.

llvm-svn: 265805
2016-04-08 16:51:49 +00:00
David Majnemer
8f75cadbc7 [InstCombine] Add a peephole for redundant assumes
Two or more identical assumes are occasionally next to each other in a
basic block.
While our generic machinery will turn a redundant assume into a no-op,
it is not super cheap.
We can perform a simpler check to achieve the same result for this case.

llvm-svn: 265801
2016-04-08 16:37:12 +00:00
Sanjoy Das
b20d278ebd Don't IPO over functions that can be de-refined
Summary:
Fixes PR26774.

If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".

Motivation:

I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard.  So transforming:

```
void f(unsigned x) {
  unsigned t = 5 / x;
  (void)t;
}
```

to

```
void f(unsigned x) { }
```

is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).

Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM.  For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).

Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have.  This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.

For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store.  As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal.  The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal.  Such a
refined variant will look like it is `readonly`.  However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

Note: this is not just a problem with atomics or with linking
differently optimized object files.  See PR26774 for more realistic
examples that involved neither.

This patch:

This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time.  It then changes a set of IPO passes to bail out if they see
such a function.

Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18634

llvm-svn: 265762
2016-04-08 00:48:30 +00:00
David Majnemer
0cf2c938ab [InstCombine] Don't sink an instr after a catchswitch
A catchswitch is a terminator, instructions cannot be inserted after it.

llvm-svn: 265158
2016-04-01 17:28:17 +00:00
Matt Arsenault
2a2b3fa835 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Matt Arsenault
c182eae9e4 AMDGPU: Constant folding for frexp_mant
llvm-svn: 264943
2016-03-30 22:28:26 +00:00
Junmo Park
f9c43d451f Minor code cleanup. NFC.
llvm-svn: 264124
2016-03-23 01:38:35 +00:00
David Majnemer
2669789037 [InstCombine] Don't insert instructions before a catch switch
CatchSwitches are not splittable, we cannot insert casts, etc. before
them.

This fixes PR26992.

llvm-svn: 263874
2016-03-19 04:39:52 +00:00
Guozhi Wei
9cfdcc479b [InstCombine] Combine A->B->A BitCast
This patch enhances InstCombine to handle following case:

        A  ->  B    bitcast
        PHI
        B  ->  A    bitcast

llvm-svn: 263734
2016-03-17 18:47:20 +00:00
Bjorn Steinbrink
a31b1fe444 Also handle the new Rust pers fn to isCatchAll()
llvm-svn: 263585
2016-03-15 20:57:07 +00:00
Justin Lebar
7e369d0dca [attrs] Handle convergent CallSites.
Summary:
Previously we had a notion of convergent functions but not of convergent
calls.  This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.

Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent.  As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.

Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage.  Re-landed here with no changes.

Reviewers: chandlerc, jingyue

Subscribers: llvm-commits, tra, jhen, hfinkel

Differential Revision: http://reviews.llvm.org/D17739

llvm-svn: 263481
2016-03-14 20:18:54 +00:00
Mehdi Amini
12d624a26a Remove PreserveNames template parameter from IRBuilder
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263393
2016-03-13 21:05:13 +00:00
Sanjay Patel
60bcb48851 [x86, InstCombine] delete x86 SSE2 masked store with zero mask
This follows up on the related AVX instruction transforms, but this
one is too strange to do anything more with. Intel's behavioral
description of this instruction in its Software Developer's Manual
is tragi-comic.

llvm-svn: 263340
2016-03-12 15:16:59 +00:00
Eric Christopher
559b0be562 Temporarily revert:
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date:   Fri Mar 11 17:15:50 2016 +0000

    Remove PreserveNames template parameter from IRBuilder

    Summary:
    Following r263086, we are now relying on a flag on the Context to
    discard Value names in release builds.

    Reviewers: chandlerc

    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: http://reviews.llvm.org/D18023

    From: Mehdi Amini <mehdi.amini@apple.com>

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
    91177308-0d34-0410-b5e6-96231b3b80d8

until we can figure out what to do about clang and Release build testing.

This reverts commit 263258.

llvm-svn: 263321
2016-03-12 01:47:22 +00:00
Mehdi Amini
1f82b794e4 Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.

Reviewers: chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18023

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
2016-03-11 17:15:50 +00:00
Chandler Carruth
6150530377 [PM] Make the AnalysisManager parameter to run methods a reference.
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.

In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.

llvm-svn: 263219
2016-03-11 11:05:24 +00:00
Benjamin Kramer
af02270708 [InstCombine] Use Twines to generate names.
Since the names are used in a loop this does more work in debug builds. In
release builds value names are generally discarded so we don't have to do
the concatenation at all. It's also simpler code, no functional change
intended.

llvm-svn: 263215
2016-03-11 10:20:56 +00:00
Philip Reames
64fa288117 [ValueTracking] Extract isKnownPositive [NFCI]
Extract out a generic interface from a recently landed patch and document a TODO in case compile time becomes a problem.

llvm-svn: 263062
2016-03-09 21:31:47 +00:00
Philip Reames
a1cf015229 [InstCombine] (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0)
When checking whether an smin is positive, we can move the comparison to one of the inputs if the other is known positive. If the known positive one is the min, then the other can't be negative. If the other is the min, then we compute the min.

Differential Revision: http://reviews.llvm.org/D17873

llvm-svn: 263059
2016-03-09 21:05:07 +00:00
Matthias Braun
3757169d38 InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.

This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.

Differential Revision: http://reviews.llvm.org/D16835

llvm-svn: 263047
2016-03-09 18:47:11 +00:00
Petar Jovanovic
d649b976d0 Reland r262337 "calculate builtin_object_size if arg is a removable pointer"
Original commit message:
 calculate builtin_object_size if argument is a removable pointer

 This patch fixes calculating correct value for builtin_object_size function
 when pointer is used only in builtin_object_size function call and never
 after that.

 Patch by Strahinja Petrovic.

 Differential Revision: http://reviews.llvm.org/D17337

Reland the original change with a small modification (first do a null check
and then do the cast) to satisfy ubsan.

llvm-svn: 263011
2016-03-09 14:12:47 +00:00
Junmo Park
4c71496a1c Revert "[InstCombine] Combine A->B->A BitCast"
This reverts commit r262670 due to compile failure.

llvm-svn: 262916
2016-03-08 07:09:46 +00:00
Guozhi Wei
a7fc8e012e [InstCombine] Combine A->B->A BitCast
This patch enhances InstCombine to handle following case:

        A  ->  B    bitcast
        PHI
        B  ->  A    bitcast

llvm-svn: 262670
2016-03-03 23:21:38 +00:00
Sanjay Patel
fe4b2c4bc6 [InstCombine] transform bitcasted bitwise logic ops with constants (PR26702)
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.

The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
  %bc = bitcast <4 x i32> %not to <2 x i64>
  %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
  %bc2 = bitcast <2 x i64> %notnot to <4 x i32>
  ret <4 x i32> %bc2
}

Simplifies to the expected:

define <4 x i32> @is_negative(<4 x i32> %x) {
  %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  ret <4 x i32> %lobit
}

Differential Revision: http://reviews.llvm.org/D17583

llvm-svn: 262645
2016-03-03 19:19:04 +00:00
Amaury Sechet
09dedb75e8 Explode store of arrays in instcombine
Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.

Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17828

llvm-svn: 262530
2016-03-02 22:36:45 +00:00
Amaury Sechet
7169395ec4 Unpack array of all sizes in InstCombine
Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.

Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15890

llvm-svn: 262521
2016-03-02 21:28:30 +00:00
Sanjay Patel
afbc4f1551 revert r262424 because there's a *clang test* for AArch64 that checks -O3 asm output
that is broken by this change

llvm-svn: 262440
2016-03-02 01:04:09 +00:00
Sanjay Patel
7266aa16e2 [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701)
As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case. 

Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.

But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
 

llvm-svn: 262424
2016-03-01 23:55:18 +00:00
Dehao Chen
9581e2d337 Perform InstructioinCombiningPass before SampleProfile pass.
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17742

llvm-svn: 262419
2016-03-01 22:53:02 +00:00
Owen Anderson
999a9f171c Fix an issue where fast math flags were dropped during scalarization.
Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.

llvm-svn: 262376
2016-03-01 19:35:52 +00:00
Petar Jovanovic
0302dd0999 Revert "calculate builtin_object_size if argument is a removable pointer"
Revert r262337 as "check-llvm ubsan" step failed on
sanitizer-x86_64-linux-fast buildbot.

llvm-svn: 262349
2016-03-01 16:50:08 +00:00
Petar Jovanovic
a1fb751763 calculate builtin_object_size if argument is a removable pointer
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D17337

llvm-svn: 262337
2016-03-01 14:39:55 +00:00
Sanjay Patel
aee22b5eed [x86, InstCombine] transform more x86 masked loads to LLVM intrinsics
Continuation of:
http://reviews.llvm.org/rL262269

llvm-svn: 262273
2016-02-29 23:59:00 +00:00
Sanjay Patel
4d045a2e91 [x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:

__m128 mload_zeros(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0));
}

__m128 mload_fakeones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(1));
}

__m128 mload_ones(float *f) {
  return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}

__m128 mload_oneset(float *f) {
  return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}

...so none of the above will actually generate a masked load for optimized code.

This is the masked load counterpart to:
http://reviews.llvm.org/rL262064

llvm-svn: 262269
2016-02-29 23:16:48 +00:00
Reid Kleckner
43aeb743c9 [InstCombine] Be more conservative about removing stackrestore
We ended up removing a save/restore pair around an inalloca call,
leading to a miscompile in Chromium.

llvm-svn: 262095
2016-02-27 00:53:54 +00:00
Sanjay Patel
014ad4d329 [x86, InstCombine] transform x86 AVX2 masked stores to LLVM intrinsics
Replicate everything for integers...because x86.

Continuation of:
http://reviews.llvm.org/rL262064

llvm-svn: 262077
2016-02-26 21:51:44 +00:00
Sanjay Patel
d5185a0950 [x86, InstCombine] transform x86 AVX masked stores to LLVM intrinsics
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392
http://reviews.llvm.org/rL260145

is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:

void mstore_zero_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}

void mstore_fake_ones_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}

void mstore_ones_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}

void mstore_one_set_elt_mask(float *f, __m128 v) {
  _mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}

...so none of the above will actually generate a masked store for optimized code.

Differential Revision: http://reviews.llvm.org/D17485

llvm-svn: 262064
2016-02-26 21:04:14 +00:00
Sanjay Patel
97801c676a [InstCombine] enable optimization of casted vector xor instructions
This is part of the payoff for the refactoring in:
http://reviews.llvm.org/rL261649
http://reviews.llvm.org/rL261707

In addition to removing a pile of duplicated code, the xor case was
missing the optimization for vector types because it checked
"SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()"
like 'and' and 'or' were already doing.

This solves part of:
https://llvm.org/bugs/show_bug.cgi?id=26702

llvm-svn: 261750
2016-02-24 17:00:34 +00:00
Artur Pilipenko
5746dd289e NFC. Move isDereferenceable to Loads.h/cpp
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.   

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D16180

llvm-svn: 261736
2016-02-24 12:49:04 +00:00
Sanjay Patel
4f966611aa [InstCombine] refactor visitOr() to use foldCastedBitwiseLogic()
Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))

But I'm not sure how to expose that difference in a regression test. 
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.

The refactoring should extend to the 'xor' case next to solve part of
PR26702.

llvm-svn: 261707
2016-02-23 23:56:23 +00:00
Sanjay Patel
e4ec3d519e [InstCombine] improve readability ; NFCI
Less indenting, named local variables, more descriptive names.

llvm-svn: 261659
2016-02-23 17:41:34 +00:00
Sanjay Patel
4e682fcfe8 [InstCombine] less indenting; NFC
llvm-svn: 261652
2016-02-23 16:59:21 +00:00
Sanjay Patel
70dd21aa9d [InstCombine] add helper function to foldCastedBitwiseLogic() ; NFCI
This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

We should be able to reuse most of this and delete the nearly identical 
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.

llvm-svn: 261649
2016-02-23 16:36:07 +00:00
Justin Lebar
f84464712c Revert "[attrs] Handle convergent CallSites."
This reverts r261544, which was causing a test failure in
Transforms/FunctionAttrs/readattrs.ll.

llvm-svn: 261549
2016-02-22 18:24:43 +00:00
Justin Lebar
ca379cda9f [attrs] Handle convergent CallSites.
Summary:
Previously we had a notion of convergent functions but not of convergent
calls.  This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.

Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent.  As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.

Reviewers: chandlerc, jingyue

Subscribers: hfinkel, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17317

llvm-svn: 261544
2016-02-22 17:51:35 +00:00
Sanjay Patel
ff9ee24191 fix inaccurate comment; NFC
llvm-svn: 261484
2016-02-21 17:33:31 +00:00
Sanjay Patel
67805931c5 [InstCombine] add getNegativeIsTrueBoolVec() helper function; NFC
Originally part of:
http://reviews.llvm.org/D17485

We need this when simplifying masked memory ops too.

llvm-svn: 261483
2016-02-21 17:29:33 +00:00
Simon Pilgrim
f781386f81 [InstCombine] SSE/SSE2 (u)comiss/(u)comisd comparison intrinsics only use the lowest vector element
llvm-svn: 261460
2016-02-20 23:17:35 +00:00
Chandler Carruth
3d4a43dca8 [AA] Preserve the AA results wrapper pass as well as BasicAA in a few
more places to prevent gratuitous re-"runs" of these passes.

The passes themselves don't do any work when run, but we keep spending
time scheduling and running these needlessly when we really don't need
to do so.

This is the first patch towards fixing the really horrible loop pass
pipeline fragmentation pointed out by Sanjoy in PR24804.

llvm-svn: 261302
2016-02-19 03:12:14 +00:00
Richard Trieu
5a759985de Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Amaury Sechet
ea7abf7b10 NFC: Fix formating
llvm-svn: 261156
2016-02-17 21:21:29 +00:00
Amaury Sechet
14d2c58ecf Fix load alignement when unpacking aggregates structs
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.

Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17326

llvm-svn: 261139
2016-02-17 19:21:28 +00:00
David Majnemer
5886331bc4 [InstCombine] Don't aggressively replace xor with icmp
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.

However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).

This patch make the replacement happen only when xor/sub has only one
use.

Differential Revision: http://reviews.llvm.org/D16915

Patch by Taewook Oh!

llvm-svn: 260695
2016-02-12 18:12:38 +00:00
Quentin Colombet
b60e3f65de Re-apply r238452, the bug was in clang and was fixed in r260567.
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.

Currently we only fold a BitCast into a Load when the BitCast is its
only user.

Do the same for any no-op cast.

Patch by Philip Pfaffe!

Differential Revision: http://reviews.llvm.org/D9152

llvm-svn: 260612
2016-02-11 22:30:41 +00:00
Pete Cooper
e68afcf1b2 Set load alignment on aggregate loads.
When optimizing a extractvalue(load), we generate a load from the
aggregate type.  This load didn't have alignment set and so would
get the alignment of the type.  This breaks when the type is packed
and so the alignment should be lower.

For example, loading { int, int } would give us alignment of 4, but
the original load from this type may have an alignment of 1 if packed.

Reviewed by David Majnemer

Differential revision: http://reviews.llvm.org/D17158

llvm-svn: 260587
2016-02-11 21:10:40 +00:00
Jun Bum Lim
2bc62cf930 Fixed typo in r260530
llvm-svn: 260541
2016-02-11 16:46:13 +00:00
Jun Bum Lim
555cbf018b [InstCombine] Simplify a known nonzero incoming value of PHI
Summary:
When a PHI is used only to be compared with zero, it is possible to replace an
incoming value with any non-zero constant if the incoming value can be proved as
a known nonzero value. For example, in below code, we can replace the incoming value %v with
any non-zero constant based on the fact that the PHI is only used to be compared with zero
and %v is a known non-zero value:
  %v = select %cond, 1, 2
  %p = phi [%v, BB] ...
  %c = icmp eq, %p, 0

Reviewers: mcrosier, jmolloy, sanjoy

Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16240

llvm-svn: 260530
2016-02-11 15:50:07 +00:00
Artur Pilipenko
f8c51ed6a0 Don't propagate dereferenceable attribute through gc.relocate in InstCombine
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D16143

llvm-svn: 260509
2016-02-11 11:22:46 +00:00
Philip Reames
0e7cbf4f20 [InstCombine][GC] Handle gc.relocations of vector type
We introduced gc.relocates of vector-of-pointer types a couple of weeks back.  Somehow, I missed updating the InstCombine rule to account for this.  If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>.

I also took the chance to do a bit of code style cleanup.

llvm-svn: 260279
2016-02-09 21:09:22 +00:00
Quentin Colombet
091258f7a2 [InstCombine] Revert r238452: Fold IntToPtr and PtrToInt into preceding loads.
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.

Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
   using -O2 -flto
4. Run llvm-extract  -ralias '.*bar' -S test/Other/extract-alias.ll

Result:
program doesn't contain global named '.*bar'!

Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar

declare void @bar()

Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
llvm-svn: 259674
2016-02-03 18:04:13 +00:00
Eugene Zelenko
0ebce618ad Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.
Differential revision: http://reviews.llvm.org/D16793

llvm-svn: 259539
2016-02-02 18:20:45 +00:00
Sanjay Patel
374a6c4ff8 function names start with a lowercase letter; NFC
llvm-svn: 259425
2016-02-01 22:23:39 +00:00
Sanjay Patel
14ae72b119 [InstCombine] simplify masked scatter/gather intrinsics with zero masks
A masked scatter with a zero mask means there's no store.
A masked gather with a zero mask means the passthru arg is returned.

This is a continuation of:
http://reviews.llvm.org/rL259369
http://reviews.llvm.org/rL259392

llvm-svn: 259421
2016-02-01 22:10:26 +00:00
Sanjay Patel
0a594deff4 [InstCombine] simplify masked store intrinsics with all ones or zeros masks
A masked store with a zero mask means there's no store.
A masked store with an allOnes mask means it's a normal vector store.

This is a continuation of:
http://reviews.llvm.org/rL259369

llvm-svn: 259392
2016-02-01 19:39:52 +00:00
David Majnemer
f3d73f0449 [InstCombine] Don't transform (X+INT_MAX)>=(Y+INT_MAX) -> (X<=Y)
This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.

This fixes PR26407.

llvm-svn: 259375
2016-02-01 17:37:56 +00:00
Sanjay Patel
834c52c879 [InstCombine] simplify masked load intrinsics with all ones or zeros masks
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.

Differential Revision: http://reviews.llvm.org/D16691

llvm-svn: 259369
2016-02-01 17:00:10 +00:00
Sanjay Patel
4c5a2daa39 add helper function for minnum/maxnum ; NFC
llvm-svn: 259326
2016-01-31 16:35:23 +00:00
Sanjay Patel
433350d1d4 use range-based for loop; NFC
llvm-svn: 259325
2016-01-31 16:34:48 +00:00
Sanjay Patel
5b8745669b fix formatting; NFC
llvm-svn: 259324
2016-01-31 16:34:11 +00:00
Sanjay Patel
9cba4166c0 simplify; NFC
llvm-svn: 259323
2016-01-31 16:33:33 +00:00
Matt Arsenault
d68c0fe0a6 InstCombine: fabs(x) * fabs(x) -> x * x
llvm-svn: 259295
2016-01-30 05:02:00 +00:00
Matthias Braun
882ae69776 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00
Sanjay Patel
08d75b7ee9 function names start with a lower case letter ; NFC
llvm-svn: 259264
2016-01-29 23:27:03 +00:00
Sanjay Patel
bde9b8602b fix formatting; NFC
llvm-svn: 259262
2016-01-29 23:14:58 +00:00
Sanjay Patel
ce82873e36 [InstCombine] avoid an insertelement transformation that induces the opposite extractelement fold (PR26354)
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.

llvm-svn: 259236
2016-01-29 20:21:02 +00:00
Sanjay Patel
9fa4a25ba4 less indenting; NFCI
llvm-svn: 259002
2016-01-28 00:03:16 +00:00
Chris Bieneman
1b8d4f74aa Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
David Majnemer
0fce247968 [InstCombine, SCCP] Consolidate code used to remove instructions
InstCombine and SCCP both want to remove dead code in a very particular
way but using identical means to do so.  Share the code between the two.

No functionality change is intended.

llvm-svn: 258653
2016-01-24 05:26:18 +00:00
Matt Arsenault
7a5e15697d AMDGPU: Rename intrinsics to use amdgcn prefix
The intrinsic target prefix should match the target name
as it appears in the triple.

This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.

llvm-svn: 258557
2016-01-22 21:30:34 +00:00
Eduard Burtescu
cfc72ec986 [opaque pointer types] [NFC] FindAvailableLoadedValue: take LoadInst instead of just the pointer.
Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16422

llvm-svn: 258477
2016-01-22 01:51:51 +00:00
Sanjay Patel
7980a4a5f4 don't repeat function names in comments; NFC
llvm-svn: 258360
2016-01-20 22:24:38 +00:00
Sanjay Patel
a1fa737ea0 80-cols; NFC
llvm-svn: 258323
2016-01-20 16:41:43 +00:00
Sanjay Patel
76380d0013 remove outdated comment; NFC
llvm-svn: 258147
2016-01-19 17:29:22 +00:00
Eduard Burtescu
c55147fcdc [opaque pointer types] [NFC] GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.

GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.

Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16275

llvm-svn: 258145
2016-01-19 17:28:00 +00:00
Sanjay Patel
5b6411a86b combine clauses with same output ; NFCI
llvm-svn: 258062
2016-01-18 19:17:58 +00:00
Sanjay Patel
4f9ef4b7f0 use m_OneUse ; NFCI
llvm-svn: 258059
2016-01-18 18:36:38 +00:00
Sanjay Patel
08bcf5f0bd fix variable names, typos ; NFC
llvm-svn: 258058
2016-01-18 18:28:09 +00:00
Sanjay Patel
5dcccbe4e7 fix typo; NFC
llvm-svn: 258057
2016-01-18 17:50:23 +00:00
Manuel Jacob
51a8af4316 [opaque pointer types] [breaking-change] [NFC] SimplifyGEPInst: take the source element type of the GEP as an argument.
Patch by Eduard Burtescu.

Reviewers: dblaikie, mjacob

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16281

llvm-svn: 258024
2016-01-17 22:46:43 +00:00
Manuel Jacob
e6438acb66 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Silviu Baranga
777f975cab Re-commit r257064, after it was reverted in r257340.
This contains a fix for the issue that caused the revert:
we no longer assume that we can insert instructions after the
instruction that produces the base pointer. We previously
assumed that this would be ok, because the instruction produces
a value and therefore is not a terminator. This is false for invoke
instructions. We will now insert these new instruction directly
at the location of the users.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257897
2016-01-15 15:52:05 +00:00
Artur Pilipenko
4681033ea8 Change isSafeToLoadUnconditionally arguments order. Separated from http://reviews.llvm.org/D10920.
llvm-svn: 257894
2016-01-15 15:27:46 +00:00
James Molloy
7697faf6db [InstCombine] Rewrite bswap/bitreverse handling completely.
There are several requirements that ended up with this design;
  1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
  2. Bitreversals and byteswaps are very related in their matching logic.
  3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
  4. Bswaps are best matched early in InstCombine.

The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.

We can then extend the matching logic in one place only.

llvm-svn: 257875
2016-01-15 09:20:19 +00:00
Sanjay Patel
eb6cf93f57 function names start with a lower case letter ; NFC
llvm-svn: 257496
2016-01-12 18:03:37 +00:00
Silviu Baranga
90360019af Revert r257164 - it has caused spec2k6 failures in LTO mode
llvm-svn: 257340
2016-01-11 16:19:38 +00:00
NAKAMURA Takumi
a28b505b5e InstCombineCompares.cpp: Fix a warning. [-Wbraced-scalar-init]
llvm-svn: 257167
2016-01-08 12:50:03 +00:00
Silviu Baranga
93f7373429 Re-commit r257064, this time with a fixed assert
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257164
2016-01-08 11:11:04 +00:00
Sanjay Patel
898b29bc66 [InstCombine] insert a new shuffle in a safe place (PR25999)
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

llvm-svn: 257133
2016-01-08 01:39:16 +00:00
Silviu Baranga
aa39f9d643 Revert r257064. It caused failures in some sanitizer tests.
llvm-svn: 257069
2016-01-07 15:46:43 +00:00
Silviu Baranga
2819dfa8d7 Fix build after r257064: we should be returning false, not nullptr
llvm-svn: 257067
2016-01-07 15:09:22 +00:00
Silviu Baranga
b0c35664c0 [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257064
2016-01-07 14:56:08 +00:00
Sanjay Patel
b7b8b14e32 fix typo; NFC
llvm-svn: 256883
2016-01-06 00:23:12 +00:00
Sanjay Patel
2e13cb5de7 [InstCombine] insert a new shuffle before its uses (PR26015)
Although this solves the test case in PR26015:
https://llvm.org/bugs/show_bug.cgi?id=26015

And may solve PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

...I suspect this is not the best solution. I think we want to insert the new shuffle
just ahead of the earliest ExtractElementInst that we're replacing, but I don't know 
how that should be implemented.

Differential Revision: http://reviews.llvm.org/D15878

llvm-svn: 256857
2016-01-05 19:09:47 +00:00
Manuel Jacob
102d481261 [Statepoints] Refactor GCRelocateOperands into an intrinsic wrapper. NFC.
Summary:
This commit renames GCRelocateOperands to GCRelocateInst and makes it an
intrinsic wrapper, similar to e.g. MemCpyInst.  Also, all users of
GCRelocateOperands were changed to use the new intrinsic wrapper instead.

Reviewers: sanjoy, reames

Subscribers: reames, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15762

llvm-svn: 256811
2016-01-05 04:03:00 +00:00
Chen Li
10e521338c [InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.

Reviewers: reames, majnemer

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15859

llvm-svn: 256792
2016-01-04 23:28:57 +00:00
Sanjay Patel
5343e4bc32 fix formatting; NFC
llvm-svn: 256645
2015-12-30 18:31:30 +00:00
Sanjay Patel
b4b4a9aeb1 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
David Majnemer
d697901638 [OperandBundles] Have InstCombine play nice with operand bundles
Don't assume a call's use corresponds to an argument operand, it might
correspond to a bundle operand.

llvm-svn: 256327
2015-12-23 09:58:41 +00:00
Craig Topper
03267ab1e3 [InstCombine] Fix indentation. NFC.
llvm-svn: 256131
2015-12-21 01:02:28 +00:00
Philip Reames
46cd55f309 [InstCombine] Extend peephole DSE to handle unordered atomics
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.

Key points:
 * We only remove unordered or simple stores.
 * The loads producing values consumed by dead stores don't influence whether the store is dead.

Differential Revision: http://reviews.llvm.org/D15354

llvm-svn: 255932
2015-12-17 22:19:27 +00:00
Weiming Zhao
763894f5cd [InstCombine] Adding "\n" to debug output. NFC.
Summary:
[InstCombine] Adding '\n' to debug output. NFC.

Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>

Reviewers: apazos, majnemer, weimingz

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15403

llvm-svn: 255920
2015-12-17 19:53:41 +00:00
NAKAMURA Takumi
b4d83263c5 InstCombineLoadStoreAlloca.cpp: Avoid instantiating Twine.
llvm-svn: 255637
2015-12-15 09:37:31 +00:00
Mehdi Amini
b29b50a9dd Instcombine: destructor loads of structs that do not contains padding
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).

Also update extractvalue.ll and cast.ll so that they use structs with padding.

Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.

Patch by: Amaury SECHET <deadalnix@gmail.com>

Differential Revision: http://reviews.llvm.org/D14483

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
2015-12-15 01:44:07 +00:00
Sanjay Patel
8312ed978c getParent() ^ 3 == getModule() ; NFCI
llvm-svn: 255511
2015-12-14 17:24:23 +00:00
Sanjay Patel
0876eb09e0 [InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)
This is a fix for PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The idea is to take the existing fold of:
bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X)
( http://reviews.llvm.org/rL112232 )

And break it into less specific transforms so we'll catch more cases such as
the example in the bug report:
bitcast ( trunc ( lshr ( bitcast X))) -->
bitcast ( extractelement (bitcast X)) -->
extractelement (bitcast X)

Enabling patches for this change:
http://reviews.llvm.org/rL255399 (combine bitcasts)
http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X))

Differential Revision: http://reviews.llvm.org/D15392

llvm-svn: 255504
2015-12-14 16:16:54 +00:00
Sanjay Patel
3f624d4650 [InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X))
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261

...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232

while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.

Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before 
proceeding with the transform.

llvm-svn: 255433
2015-12-12 16:44:48 +00:00
James Molloy
d8003c7bf6 [InstCombine] Make MatchBSwap also match bit reversals
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.

llvm-svn: 255334
2015-12-11 10:04:51 +00:00
Sanjay Patel
25a4b4195f [InstCombine] fold bitcasts around an extractelement (3rd try)
This is a redo of r255137 (reverted at r255227) which was a redo of 
r255124 (reverted at r255126) with a fixed check for a scalar source 
type and an added test for the failure that caused the revert.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255261
2015-12-10 17:09:28 +00:00
Akira Hatanaka
a1488717da Revert r255137.
This commit broke apple's internal bot.

llvm-svn: 255227
2015-12-10 08:00:52 +00:00
Sanjay Patel
de6f59d487 [InstCombine] fold bitcasts around an extractelement (2nd try)
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255137
2015-12-09 18:57:16 +00:00
Mehdi Amini
de04fa6b68 Revert "[InstCombine] fold bitcasts around an extractelement"
This reverts commit r255124.

Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255126
2015-12-09 16:31:39 +00:00
Sanjay Patel
8a5018320c [InstCombine] fold bitcasts around an extractelement
Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255124
2015-12-09 16:17:20 +00:00
Sanjoy Das
16ad4f2471 [InstCombine] Call getCmpPredicateForMinMax only with a valid SPF
Summary:
There are `SelectPatternFlavor`s that don't represent min or max idioms,
and we should not be passing those to `getCmpPredicateForMinMax`.

Fixes PR25745.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15249

llvm-svn: 254869
2015-12-05 23:44:22 +00:00
David Majnemer
56dee65385 Move EH-specific helper functions to a more appropriate place
No functionality change is intended.

llvm-svn: 254562
2015-12-02 23:06:39 +00:00
David Majnemer
df4ee5c023 Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2.
Differential Revision: http://reviews.llvm.org/D14223

Patch by Amaury SECHET!

llvm-svn: 254518
2015-12-02 16:15:07 +00:00
Akira Hatanaka
a676a66323 [AttributeSet] Overload AttributeSet::addAttribute to reduce compile
time.

The new overloaded function is used when an attribute is added to a
large number of slots of an AttributeSet (for example, to function
parameters). This is much faster than calling AttributeSet::addAttribute
once per slot, because AttributeSet::getImpl (which calls
FoldingSet::FIndNodeOrInsertPos) is called only once per function
instead of once per slot.

With this commit, clang compiles a file which used to take over 22
minutes in just 13 seconds.

rdar://problem/23581000

Differential Revision: http://reviews.llvm.org/D15085

llvm-svn: 254491
2015-12-02 06:58:49 +00:00
Sanjay Patel
1fcfdccb6c fix typos in comments; NFC
llvm-svn: 254266
2015-11-29 22:09:34 +00:00
Sanjoy Das
dcc5bddb02 [OperandBundles] Extract duplicated code into a helper function, NFC
llvm-svn: 254047
2015-11-25 00:42:24 +00:00
Sanjoy Das
d16b4e5c5e [InstCombine] Don't drop operand bundles
Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14857

llvm-svn: 254046
2015-11-25 00:42:19 +00:00
Sanjay Patel
cca965412e [InstCombine] fix propagation of fast-math-flags
Noticed while working on D4583:
http://reviews.llvm.org/D4583

llvm-svn: 253997
2015-11-24 17:51:20 +00:00
Sanjay Patel
bdc3aab1eb use ternary ops; NFC
llvm-svn: 253787
2015-11-21 16:51:19 +00:00
Sanjay Patel
04acedc718 remove unnecessary temp variables; NFC
llvm-svn: 253786
2015-11-21 16:37:09 +00:00
Sanjay Patel
85d5af7b49 fix typo; NFC
llvm-svn: 253785
2015-11-21 16:16:29 +00:00
Pete Cooper
b753649d63 Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Pete Cooper
aca4c5cdc6 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Sanjay Patel
a5866e67e6 [InstCombine] refactor optimizeIntToFloatBitCast() ; NFCI
The logic for handling the pattern without a shift is identical
to the logic for handling the pattern with a shift if you set 
the shift amount to zero for the former.

This should make it easier to see that we probably don't even need
optimizeIntToFloatBitCast(). 

If we call something like foldVecTruncToExtElt() from visitTrunc(),
we'll solve PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

llvm-svn: 253403
2015-11-18 00:00:04 +00:00
Andrew Kaylor
459ce58049 [EH] Keep filter clauses for types that have been caught.
The instruction combiner previously removed types from filter clauses in Landing Pad instructions if the type had previously been seen in a catch clause.  This is incorrect and prevents unexpected exception handlers from rethrowing the caught type.

Differential Revision: http://reviews.llvm.org/D14669

llvm-svn: 253370
2015-11-17 20:13:04 +00:00
Sanjay Patel
4fcb285de4 fix typos; NFC
llvm-svn: 253359
2015-11-17 18:46:56 +00:00
Sanjay Patel
bee424376a use local variables; NFCI
llvm-svn: 253356
2015-11-17 18:37:23 +00:00
Sanjay Patel
b04539d10d function names start with a lower case letter; NFC
llvm-svn: 253348
2015-11-17 17:24:08 +00:00
Sanjay Patel
36f22e5dda use range-based for loop; NFCI
llvm-svn: 253256
2015-11-16 22:16:52 +00:00
Elena Demikhovsky
d43b8f3050 Fixed GEP visitor in the InstCombine pass.
The current implementation of GEP visitor in InstCombine fails with assertion on Vector GEP with mix of scalar and vector types, like this:

getelementptr double, double* %a, <8 x i32> %i
(It fails to create a "sext" from <8 x i32> to <8 x i64>)

I fixed it and added some tests.

Differential Revision: http://reviews.llvm.org/D14485

llvm-svn: 253162
2015-11-15 08:19:35 +00:00
James Molloy
c1250c50be [InstCombine] Add trivial folding (bitreverse (bitreverse x)) -> x
There are plenty more instcombines we could probably do with bitreverse, but this seems like a very obvious and trivial starting point and was brought up by Hal in his review.

llvm-svn: 252879
2015-11-12 12:39:41 +00:00
David Majnemer
d5f26284b9 [InstCombine] Teach FoldPHIArgZextsIntoPHI about EHPads
FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if
there is an EHPad in the BB.  Doing so would result in an instruction
inserted after a terminator.

llvm-svn: 252377
2015-11-07 00:52:53 +00:00
David Majnemer
48f3ee66bd [InstCombine] Don't insert an instruction after a terminator
We tried to insert a cast of a phi in a block whose terminator is an
EHPad.  This is invalid.  Do not attempt the transform in these
circumstances.

llvm-svn: 252370
2015-11-06 23:59:23 +00:00
David Majnemer
9ffc9c11c7 [InstCombine] Don't RAUW tokens with undef
Let SimplifyCFG remove unreachable BBs which define token instructions.

llvm-svn: 252343
2015-11-06 21:26:32 +00:00
Eugene Zelenko
7c062ccc18 Fix some Clang-tidy modernize warnings, other minor fixes.
Fixed warnings are: modernize-use-override, modernize-use-nullptr and modernize-redundant-void-arg.

Differential revision: http://reviews.llvm.org/D14312

llvm-svn: 252087
2015-11-04 22:32:32 +00:00
Fiona Glaser
abcc6a8ee2 InstCombine: fix sinking of convergent calls
llvm-svn: 251991
2015-11-03 22:23:39 +00:00
Sanjay Patel
7f2d34165d don't repeat function names in comments; NFC
llvm-svn: 251846
2015-11-02 22:34:55 +00:00
Artur Pilipenko
0dd1f670a9 Preserve load alignment and dereferenceable metadata during some transformations
Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D13953

llvm-svn: 251809
2015-11-02 17:53:51 +00:00
Silviu Baranga
8fa79f5d2b [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs
Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.

This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.

Reviewers: jmolloy, majnemer

Subscribers: mcrosier, mssimpso, llvm-commits

Differential Revision: http://reviews.llvm.org/D13887

llvm-svn: 251281
2015-10-26 10:25:05 +00:00
Hal Finkel
1a64f66683 Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
First, the motivation: LLVM currently does not realize that:

  ((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.

This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:

  1. InstCombine won't automatically pick up the associated logic in
     InstSimplify (InstCombine uses InstSimplify, but not via the API that
     passes in the original instruction).

  2. Putting the logic in InstCombine allows the resulting simplifications to become
     part of the iterative worklist

  3. Putting the logic in InstSimplify allows the resulting simplifications to be
     used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
     and many others).

And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.

llvm-svn: 251146
2015-10-23 20:37:08 +00:00
Craig Topper
d5d965afcc Use ArrayRef instead of pointer and size. NFC
llvm-svn: 251029
2015-10-22 16:35:56 +00:00
Michael Liao
5efcb99302 [InstCombine] Optimize icmp of inc/dec at RHS
Allow LLVM to optimize the sequence like the following:

  %inc = add nsw i32 %i, 1
  %cmp = icmp slt %n, %inc

into:

  %cmp = icmp sle i32 %n, %i

The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.

llvm-svn: 250746
2015-10-19 22:08:14 +00:00
Simon Pilgrim
2df3161736 [InstCombine] SSE4A constant folding and conversion to shuffles.
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Differential Revision: http://reviews.llvm.org/D13348

llvm-svn: 250609
2015-10-17 11:40:05 +00:00
Duncan P. N. Exon Smith
279616c29f InstCombine: Remove ilist iterator implicit conversions, NFC
Stop relying on implicit conversions of ilist iterators in
LLVMInstCombine.  No functionality change intended.

llvm-svn: 250183
2015-10-13 16:59:33 +00:00
Simon Pilgrim
5a441530c3 [InstCombine][SSE4A] Remove broken INSERTQI range combining optimization
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.

The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.

llvm-svn: 250160
2015-10-13 14:48:54 +00:00
Simon Pilgrim
91bf14e39c [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR
We now have lowering support for XOP PCOM/PCOMU instructions.

llvm-svn: 249977
2015-10-11 14:38:34 +00:00
Sanjay Patel
0bf5df4240 [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886

Without this IR transform, the backend (x86 at least) was producing inefficient code.

This patch is making 2 assumptions:

    1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
    2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.

Differential Revision: http://reviews.llvm.org/D13076

llvm-svn: 249702
2015-10-08 17:09:31 +00:00
Hans Wennborg
b27c6dcee4 InstCombine: Fold comparisons between unguessable allocas and other pointers
This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490
2015-10-07 00:20:07 +00:00
Hans Wennborg
7d1f4ff326 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Joseph Tremoulet
d1c89447ca [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
Andrea Di Biagio
e88e150aaa [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390
2015-10-06 10:34:53 +00:00
Piotr Padlewski
2354bd7d63 inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

llvm-svn: 249196
2015-10-02 22:12:22 +00:00
Arnaud A. de Grandmaison
3d6b7ea719 [InstCombine] Remove trivially empty lifetime start/end ranges.
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018
2015-10-01 14:54:31 +00:00
Andrea Di Biagio
71ee17fea4 [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

llvm-svn: 248913
2015-09-30 16:44:39 +00:00
Simon Pilgrim
95228d060e [InstCombine] Improve Vector Demanded Bits Through Bitcasts
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

llvm-svn: 248784
2015-09-29 08:19:11 +00:00
Sanjay Patel
c39648a72a [InstCombine] fold zexts and constants into a phi (PR24766)
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
   bool conditionMet = false;
   switch (condition) {
   case 0: conditionMet = (x1 == x2); break;
   case 1: conditionMet = (x1 <= x2); break;
   }
   return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
   switch (condition) {
   case 0: return (x1 == x2);
   case 1: return (x1 <= x2);
   }
  return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi 
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, 
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

llvm-svn: 248689
2015-09-27 20:34:31 +00:00
Sanjay Patel
61ecb7fad2 [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I 
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb	%sil, %dil
xorb	$1, %dil
movb	%dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

llvm-svn: 248634
2015-09-25 23:21:38 +00:00
Charlie Turner
4305b8ca2a [InstCombine] Recognize another bswap idiom.
Summary:
The byte-swap recognizer can now notice that this

```
uint32_t bswap(uint32_t x)
{
  x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
  x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
  return x;
}
```
    
is a bswap. Fixes PR23863.

Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin

Subscribers: majnemer, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12637

llvm-svn: 248482
2015-09-24 10:24:58 +00:00
Akira Hatanaka
e7bdc0a349 [InstCombine] Preserve metadata when merging loads that are phi
arguments.

Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following
metadata:

MD_tbaa
MD_alias_scope
MD_noalias
MD_invariant_load
MD_nonnull
MD_range

rdar://problem/17617709

Differential Revision: http://reviews.llvm.org/D12710

llvm-svn: 248419
2015-09-23 18:40:57 +00:00
Simon Pilgrim
838c618b88 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Sanjay Patel
22afdec520 add ShouldChangeType() variant that takes bitwidths
This is more efficient for cases like D12965 where we already have widths.

llvm-svn: 248170
2015-09-21 16:09:37 +00:00
Sanjay Patel
ba0b11252c don't repeat function names in comments; NFC
llvm-svn: 248166
2015-09-21 15:33:26 +00:00
Simon Pilgrim
9664e19104 [InstCombine] Use SimplifyDemandedVectorEltsLow helper function. NFCI.
Use the SimplifyDemandedVectorEltsLow helper function introduced in D12680.

llvm-svn: 248089
2015-09-19 11:41:53 +00:00
David Majnemer
906a17fb34 [InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1
(icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a
power of two and has the sign bit set.

This fixes PR24873.

llvm-svn: 248074
2015-09-19 00:48:31 +00:00
David Majnemer
d15737b6a9 [InstCombine] FoldICmpCstShrCst didn't handle icmps of -1 in the ashr case correctly
llvm-svn: 248073
2015-09-19 00:48:26 +00:00
Larisse Voufo
d25582bd2b Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan. (Complete version of r247497. See D12886)
llvm-svn: 248022
2015-09-18 19:14:35 +00:00
Simon Pilgrim
76acca71c9 [InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ instructions
The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results.

Differential Revision: http://reviews.llvm.org/D12680

llvm-svn: 247934
2015-09-17 20:32:45 +00:00
Sanjoy Das
cbeb36a4cc [InstCombine] Optimize icmp slt signum(x), 1 --> icmp slt x, 1
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`).  This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.

Later, we can also consider optimizing:

  icmp slt signum(x), 0 --> icmp slt x, 0
  icmp sle signum(x), 1 --> true

etc.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12703

llvm-svn: 247846
2015-09-16 20:41:29 +00:00
Larisse Voufo
33950c83d4 Revert "Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan." for preliminary community discussion (See. D12886)
llvm-svn: 247716
2015-09-15 19:14:05 +00:00
Arch D. Robison
f5010c4ba7 Broaden optimization of fcmp ([us]itofp x, constant) by instcombine.
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant.  The "so small" case includes negative zero.

Differential review: http://reviews.llvm.org/D11210

llvm-svn: 247708
2015-09-15 17:51:59 +00:00
Chen Li
7a880213bc [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not  covered under llvm/test, but fixes should be pretty obvious. 

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247587
2015-09-14 18:10:43 +00:00
Simon Pilgrim
2793fdef03 Fixed unused variable warning.
llvm-svn: 247505
2015-09-12 14:00:17 +00:00
Simon Pilgrim
4f410cc25c [InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding
Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):

<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.

Added constant folding support.

Differential Revision: http://reviews.llvm.org/D12731

llvm-svn: 247504
2015-09-12 13:39:53 +00:00
Larisse Voufo
758e77398a Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan.
llvm-svn: 247497
2015-09-12 01:41:55 +00:00
Bruce Mitchener
9ad7a63fa9 Fix typos.
Summary: This fixes a variety of typos in docs, code and headers.

Subscribers: jholewinski, sanjoy, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12626

llvm-svn: 247495
2015-09-12 01:17:08 +00:00
Sanjay Patel
ca619ceee9 typo; NFC
llvm-svn: 247454
2015-09-11 19:29:18 +00:00
Mehdi Amini
2a8d4fb24b Revert "[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite"
This reverts commit r247356.

Breaks test/Transforms/InstCombine/pr8547.ll with:

Wrong types for attribute: byval inalloca nest noalias nocapture nonnull readnone readonly sret dereferenceable(1) dereferenceable_or_null(1)
  %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str, i64 0, i64 0), i32 nonnull %conv2) #0
LLVM ERROR: Broken function found, compilation aborted!

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247371
2015-09-11 01:33:48 +00:00
Chen Li
b3a586f07f [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247356
2015-09-10 23:04:49 +00:00
Chen Li
040b0ed807 [InstCombineCalls] Use isKnownNonNullAt() to check nullness of gc.relocate return value
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12772

llvm-svn: 247353
2015-09-10 22:35:41 +00:00
Jakub Kuderski
1c398c712b There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 247271
2015-09-10 11:31:20 +00:00
David Majnemer
3926c488cf Revert trunc(lshr (sext A), Cst) to ashr A, Cst
This reverts commit r246997, it introduced a regression (PR24763).

llvm-svn: 247180
2015-09-09 20:20:08 +00:00
Chandler Carruth
d7003090ac [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Sanjay Patel
b8699722a6 don't repeat function names in comments; NFC
llvm-svn: 247154
2015-09-09 15:24:36 +00:00
Sanjay Patel
e55928d8ce function names start with a lower case letter; NFC
llvm-svn: 247150
2015-09-09 14:54:29 +00:00
Sanjay Patel
36b333fb18 don't repeat function names in comments; NFC
llvm-svn: 247148
2015-09-09 14:34:26 +00:00
Sanjay Patel
d43d80fe58 refactor matches for De Morgan's Laws; NFCI
llvm-svn: 247061
2015-09-08 20:14:13 +00:00
Sanjay Patel
f932a72849 remove function names from comments; NFC
llvm-svn: 247043
2015-09-08 18:24:36 +00:00
Jakub Kuderski
92dba72884 There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 246997
2015-09-08 10:03:17 +00:00
David Majnemer
5e6c715cd2 [InstCombine] Don't divide by zero when evaluating a potential transform
Trivial multiplication by zero may survive the worklist.  We tried to
reassociate the multiplication with a division instruction, causing us
to divide by zero; bail out instead.

This fixes PR24726.

llvm-svn: 246939
2015-09-06 06:49:59 +00:00
David Majnemer
5cca152072 [InstCombine] Don't assume m_Mul gives back an Instruction
This fixes PR24713.

llvm-svn: 246933
2015-09-05 20:44:56 +00:00
Sanjoy Das
b13b002480 [InstCombine] Fix PR24605.
PR24605 is caused due to an incorrect insert point in instcombine's IR
builder.  When simplifying

  %t = add X Y
  ...
  %m = icmp ... %t

the replacement for %t should be placed before %t, not before %m, as
there could be a use of %t between %t and %m.

llvm-svn: 246315
2015-08-28 19:09:31 +00:00
Sanjoy Das
ead6e3fe61 Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"
The original checkin was buggy, this change has a fix.

Original commit message:

[InstCombine] Transform A & (L - 1) u< L --> L != 0

Summary:

This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:

 1. It may make the `icmp` fold away or become loop invariant.
 2. It may make the `A & (L - 1)` computation dead.

This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.

Reviewers: reames, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12210

llvm-svn: 245753
2015-08-21 22:22:37 +00:00
NAKAMURA Takumi
9d82bd7d88 Revert r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"
It caused miscompilation in clang.

llvm-svn: 245678
2015-08-21 07:46:07 +00:00
Sanjoy Das
d7d7f13bd6 [InstCombine] Transform A & (L - 1) u< L --> L != 0
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:

 1. It may make the `icmp` fold away or become loop invariant.
 2. It may make the `A & (L - 1)` computation dead.

This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.

Reviewers: reames, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12210

llvm-svn: 245635
2015-08-20 22:31:55 +00:00
Adrian Prantl
578302805a Rename Instruction::dropUnknownMetadata() to dropUnknownNonDebugMetadata()
and make it always preserve debug locations, since all callers wanted this
behavior anyway.

This is addressing a post-commit review feedback for r245589.

NFC (inside the LLVM tree).

llvm-svn: 245622
2015-08-20 22:00:30 +00:00
Adrian Prantl
bbd441b7ad Fix a bug that caused SimplifyCFG to drop DebugLocs.
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.

Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.

llvm-svn: 245589
2015-08-20 18:24:02 +00:00
Balaram Makam
9086980655 Optimize bitwise even/odd test (-x&1 -> x&1) to not use negation.
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.

Reviewers: majnemer

Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D12156

llvm-svn: 245569
2015-08-20 15:35:00 +00:00
Sanjay Patel
1163b7b428 use minSize wrapper; NFCI
These were missed when other uses were switched over:
http://llvm.org/viewvc/llvm-project?view=revision&revision=243994

llvm-svn: 245311
2015-08-18 16:44:23 +00:00
David Majnemer
fc4172a950 Revert "[InstCombinePHI] Partial simplification of identity operations."
This reverts commit r244887, it caused PR24470.

llvm-svn: 245194
2015-08-17 03:11:26 +00:00
David Majnemer
d32daec92c [InstCombine] Replace an and+icmp with a trunc+icmp
Bitwise arithmetic can obscure a simple sign-test.  If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.

llvm-svn: 245171
2015-08-16 07:09:17 +00:00
Nick Lewycky
465237af32 Fix a crash where a utility function wasn't aware of fcmp vectors and created a value with the wrong type. Fixes PR24458!
llvm-svn: 245119
2015-08-14 22:46:49 +00:00
Charlie Turner
ed8f5c45cb [InstCombinePHI] Partial simplification of identity operations.
Consider this code:

BB:
  %i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
  %add = add nsw i32 %i, %b
  ...

In this common case the add can be moved to the %if.else basic block, because
adding zero is an identity operation. If we go though %if.then branch it's
always a win, because add is not executed; if not, the number of instructions
stays the same.

This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
mul, sdiv, div | 1.

Patch by Jakub Kuderski!

llvm-svn: 244887
2015-08-13 12:38:58 +00:00
Simon Pilgrim
6b29a29fa2 [InstCombine] SSE/AVX vector shifts demanded shift amount bits
Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.

I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.

Differential Revision: http://reviews.llvm.org/D11938

llvm-svn: 244872
2015-08-13 07:39:03 +00:00
Simon Pilgrim
9c9b4332ca unused variable warning fix.
llvm-svn: 244725
2015-08-12 08:23:36 +00:00
Simon Pilgrim
45d6ddee89 [InstCombine] Move SSE/AVX vector blend folding to instcombiner
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).

InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).

I also moved all the relevant combine tests into InstCombine/blend_x86.ll

Differential Revision: http://reviews.llvm.org/D11934

llvm-svn: 244723
2015-08-12 08:08:56 +00:00
Sanjoy Das
2078fb4d89 Fix PR24354.
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant.  Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.

llvm-svn: 244676
2015-08-11 21:33:55 +00:00
James Molloy
ecd6525b24 Add support for floating-point minnum and maxnum
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.

matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.

C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.

ARM and AArch64 at least have different instructions for these different cases.

llvm-svn: 244580
2015-08-11 09:12:57 +00:00
Simon Pilgrim
65266a8e22 [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.

Differential Revision: http://reviews.llvm.org/D11886

llvm-svn: 244495
2015-08-10 20:21:15 +00:00
Benjamin Kramer
69a3fdb314 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
David Majnemer
eabf812a59 [InstCombine] Don't try to sink EH pad instructions
Found by inspection, this change should not effect the existing
landingpad behavior.

llvm-svn: 244391
2015-08-08 03:51:49 +00:00
Simon Pilgrim
c31bcce147 [InstCombine] Fix SSE2/AVX2 vector logical shift by constant
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.

e.g.

%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)

In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.

In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).

Differential Revision: http://reviews.llvm.org/D11760

llvm-svn: 244341
2015-08-07 18:22:50 +00:00
Pete Cooper
598f1f2fd1 Convert a bunch of loops to foreach. NFC.
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst.  This commit changes a bunch
of eligible loops to use it.

llvm-svn: 244260
2015-08-06 20:22:46 +00:00
Simon Pilgrim
ca06990ca7 Fixed line endings.
llvm-svn: 244021
2015-08-05 08:18:00 +00:00
Simon Pilgrim
828e3c79fa [InstCombine] Moved SSE vector shift constant folding into its own helper function. NFCI.
This will make some upcoming bugfixes + improvements easier to manage.

llvm-svn: 243962
2015-08-04 07:49:58 +00:00
Sanjay Patel
c11822e0e8 fix formatting; NFC
llvm-svn: 243424
2015-07-28 15:38:43 +00:00
Simon Pilgrim
eddfa36b82 Fixed signed/unsigned comparison warning.
llvm-svn: 243306
2015-07-27 19:07:15 +00:00
Simon Pilgrim
e713c640a4 [InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.

Differential Revision: http://reviews.llvm.org/D11503

llvm-svn: 243303
2015-07-27 18:52:15 +00:00
Simon Pilgrim
80ca3df4ed [InstCombine][SSE4A] Standardized references to Length/Width and Index/Start to match AMD docs. NFCI.
llvm-svn: 243226
2015-07-25 20:41:00 +00:00
David Majnemer
c31a55656e [InstCombine] Generalize sub of selects optimization to all BinaryOperators
This exposes further optimization opportunities if the selects are
correlated.

llvm-svn: 242235
2015-07-14 22:39:23 +00:00
David Majnemer
31685d95b1 [InstSimplify] Teach InstSimplify how to simplify extractelement
llvm-svn: 242008
2015-07-13 01:15:53 +00:00
David Majnemer
bbc2fd617f [InstSimplify] Teach InstSimplify how to simplify extractvalue
llvm-svn: 242007
2015-07-13 01:15:46 +00:00
Bjorn Steinbrink
4de6eb776a [InstCombine] Actually combine AA metadata when replacing one load with another
Fixes PR24083

llvm-svn: 241955
2015-07-10 22:30:17 +00:00
Benjamin Kramer
90b7dad7cf [InstSimplify] Fold away ord/uno fcmps when nnan is present.
This is important to fold away the slow case of complex multiplies
emitted by clang.

llvm-svn: 241911
2015-07-10 14:02:02 +00:00
Bjorn Steinbrink
ecb205fc74 [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue
llvm-svn: 241887
2015-07-10 06:55:49 +00:00
Bjorn Steinbrink
1a4f8db997 [InstCombine] Properly combine metadata when replacing a load with another
Not doing this can lead to misoptimizations down the line, e.g. because
of range metadata on the replacing load excluding values that are valid
for the load that is being replaced.

llvm-svn: 241886
2015-07-10 06:55:44 +00:00
Jingyue Wu
35a6e27706 [InstCombine] call SimplifyICmpInst with correct context
Summary:
Fixes PR23809. Without passing the context to SimplifyICmpInst, we would
use the assume to prove that the condition feeding the assume is
trivially true (see isValidAssumeForContext in ValueTracking.cpp),
causing the removal of the assume which may be useful for later
optimizations.

Test Plan: pr23800.ll

Reviewers: hfinkel, majnemer

Reviewed By: hfinkel

Subscribers: henryhu, llvm-commits, wengxt, broune, meheff, eliben

Differential Revision: http://reviews.llvm.org/D10695

llvm-svn: 240683
2015-06-25 20:14:47 +00:00
Sanjay Patel
f6f1f61067 fix typo; NFC
llvm-svn: 240480
2015-06-23 23:26:22 +00:00
Sanjay Patel
a134bf81d5 don't repeat function names in comments; NFC
llvm-svn: 240478
2015-06-23 23:05:08 +00:00
Alexander Kornienko
f993659b8f Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
David Majnemer
9f136ce139 [InstCombine] Optimize subtract of selects into a select of a sub
This came up when examining some code generated by clang's IRGen for
certain member pointers.

llvm-svn: 240369
2015-06-23 02:49:24 +00:00
Alexander Kornienko
40cb19d802 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
David Majnemer
c8b1f095a3 Move the personality function from LandingPadInst to Function
The personality routine currently lives in the LandingPadInst.

This isn't desirable because:
- All LandingPadInsts in the same function must have the same
  personality routine.  This means that each LandingPadInst beyond the
  first has an operand which produces no additional information.

- There is ongoing work to introduce EH IR constructs other than
  LandingPadInst.  Moving the personality routine off of any one
  particular Instruction and onto the parent function seems a lot better
  than have N different places a personality function can sneak onto an
  exceptional function.

Differential Revision: http://reviews.llvm.org/D10429

llvm-svn: 239940
2015-06-17 20:52:32 +00:00
Philip Reames
fc6ddd62bf Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters
The original change broke clang side tests.  I will be submitting those momentarily.  This change includes post commit feedback on the original change from from Pete Cooper.

Original Submission comments:
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239849
2015-06-16 20:24:25 +00:00
Philip Reames
2aad4769d2 Revert 239795
I forgot to update some clang test cases.  I'll fix and resubmit tomorrow.

llvm-svn: 239800
2015-06-16 01:20:53 +00:00
Philip Reames
54716a6f5b [InstCombine] Propagate non-null facts to call parameters
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239795
2015-06-16 00:43:54 +00:00
David Majnemer
55a3d56f02 [InstCombine, InstSimplify] Move xforms from Combine to Simplify
There were several SelectInst combines that always returned an existing
instruction instead of modifying an old one or creating a new one.
These are prime candidates for moving to InstSimplify.

llvm-svn: 239229
2015-06-06 22:40:21 +00:00
David Majnemer
aa3c1f7077 [InstCombine] Don't miscompile select to poison
If we have (select a, b, c), it is sometimes valid to simplify this to a
single select operand.  However, doing so is only valid if the
computation doesn't inject poison into the computation.

It might be helpful to consider the following example:
  (select (icmp ne %i, INT_MAX), (add nsw %i, 1), INT_MIN)

The select is equivalent to (add %i, 1) but not (add nsw %i, 1).

Self hosting on x86_64 revealed that this occurs very, very rarely so
bailing out is hopefully pretty reasonable.

llvm-svn: 239215
2015-06-06 02:30:43 +00:00
Renato Golin
cdb4b5a579 Revert "[InstCombine] Rephrase fix to SimplifyWithOpReplaced"
This reverts commit r239141. This commit was an attempt to reintroduce
a previous patch that broke many self-hosting bots with clang timeouts,
but it still has slowdown issues, at least  on ARM, increasing the
compilation time (stage 2, clang's) by 5x.

llvm-svn: 239175
2015-06-05 18:24:12 +00:00
Sanjoy Das
3e4e55c096 [InstCombine][NFC] Add a `break;` statement.
This change is NFC because both the ``break;`` and the fall through end
up returning immediately. However, this helps clarify intent and also
ensures correctness in case more ``case`` blocks are added later.

llvm-svn: 239172
2015-06-05 18:04:46 +00:00
Sanjoy Das
71de44f239 [InstCombine] Fix PR23751.
PR23751 was caused by a missing ``break;`` in r234388.

llvm-svn: 239171
2015-06-05 18:04:42 +00:00
David Majnemer
523f1fa033 [InstCombine] Rephrase fix to SimplifyWithOpReplaced
I don't have the IR which is causing the build bot breakage but I can
postulate as to why they are timing out:
1. SimplifyWithOpReplaced was stripping flags from the simplified value.
2. visitSelectInstWithICmp was overriding SimplifyWithOpReplaced because
   it's simplification wasn't correct.
3. InstCombine would revisit the add instruction and note that it can
   rederive the flags.
4. By modifying the value, we chose to revisit instructions which reuse
   the value.  One of the instructions is the original select, causing
   LLVM to never reach fixpoint.

Instead, strip the flags only when we are sure we are going to perform
the simplification.

llvm-svn: 239141
2015-06-05 09:57:57 +00:00
Daniel Jasper
85b6ed5297 Revert "[InstCombine] Don't miscompile safe increment idiom"
This is breaking a lot of build bots and is causing very long-running
compiles (infinite loops)?

Likely, we shouldn't return nullptr?

llvm-svn: 239139
2015-06-05 09:31:20 +00:00
David Majnemer
7e0cc86e96 [InstCombine] Don't miscompile safe increment idiom
We cleverly handle cases where computation done in one argument of a select
instruction is suitable for the other operand, thus obviating the need
of the select and the comparison.  However, the other operand cannot
have flags.

This fixes PR23757.

llvm-svn: 239115
2015-06-04 23:11:30 +00:00
Benjamin Kramer
0e31955b32 Replace push_back(Constructor(foo)) with emplace_back(foo) for non-trivial types
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.


Call sites were found with the ASTMatcher + some semi-automated cleanup.

memberCallExpr(
    argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
    on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
    hasArgument(0, bindTemporaryExpr(
                       hasType(recordDecl(hasNonTrivialDestructor())),
                       has(constructExpr()))),
    unless(isInTemplateInstantiation()))

No functional change intended.

llvm-svn: 238602
2015-05-29 19:43:39 +00:00
David Majnemer
514ab73614 [InstCombine] Fold IntToPtr and PtrToInt into preceding loads.
Currently we only fold a BitCast into a Load when the BitCast is its
only user.

Do the same for any no-op cast.

Differential Revision: http://reviews.llvm.org/D9152

llvm-svn: 238452
2015-05-28 18:39:17 +00:00
David Majnemer
8a231f0374 [InstCombine] Don't eagerly propagate nsw for A*B+A*C => A*(B+C)
InstCombine transforms A *nsw B +nsw A *nsw C to A *nsw (B + C).
This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then
nothing in the LHS overflows, but the multiplication in RHS overflows.

We need to first make sure that we won't multiple by INT_SMAX + 1.

Test case `add_of_mul` contributed by Sanjoy Das.

This fixes PR23635.

Differential Revision: http://reviews.llvm.org/D9629

llvm-svn: 238066
2015-05-22 23:02:11 +00:00
David Majnemer
e7a303ac2b [InstSimplify] Handle some overflow intrinsics in InstSimplify
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.

llvm-svn: 237995
2015-05-22 03:56:46 +00:00
David Majnemer
b6938a4929 [InstCombine] X - 0 is equal to X, not undef
A refactoring made @llvm.ssub.with.overflow.i32(i32 %X, i32 0) transform
into undef instead of %X.

This fixes PR23624.

llvm-svn: 237968
2015-05-21 23:04:21 +00:00
James Molloy
e993a7db93 Reapply r237539 with a fix for the Chromium build.
Make sure if we're truncating a constant that would then be sign extended
that the sign extension of the truncated constant is the same as the
original constant.

> Canonicalize min/max expressions correctly.
>
> This patch introduces a canonical form for min/max idioms where one operand
> is extended or truncated. This often happens when the other operand is a
> constant. For example:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = sext i32 %a to i64
> %3 = select i1 %1, i64 %2, i64 0
>
> Would now be canonicalized into:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = select i1 %1, i32 %a, i32 0
> %3 = sext i32 %2 to i64
>
> This builds upon a patch posted by David Majenemer
> (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
> passively stopped instcombine from ruining canonical patterns. This
> patch additionally actively makes instcombine canonicalize too.
>
> Canonicalization of expressions involving a change in type from int->fp
> or fp->int are not yet implemented.

llvm-svn: 237821
2015-05-20 18:41:25 +00:00
Hans Wennborg
224f420df5 Revert r237539: "Reapply r237520 with another fix for infinite looping"
This caused PR23583.

llvm-svn: 237739
2015-05-19 23:06:30 +00:00
David Blaikie
0be3b52a8f Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced init only
llvm-svn: 237624
2015-05-18 22:13:54 +00:00
James Molloy
928c38a114 Reapply r237520 with another fix for infinite looping
SimplifyDemandedBits was "simplifying" a constant by removing just sign bits.
This caused a canonicalization race between different parts of instcombine.

Fix and regression test added - third time lucky?

llvm-svn: 237539
2015-05-17 08:27:27 +00:00
James Molloy
56795ba031 Revert commits r237521 and r237520.
The AArch64 LNT bot is unhappy - I've found that the problem is in
SimpliftDemandedBits, but that's going to require another code review
so reverting in the meantime.

llvm-svn: 237528
2015-05-16 21:27:14 +00:00
James Molloy
c45c32fb55 Reapply r237453 with a fix for the test timeouts.
The test timeouts were due to instcombine fighting itself. Regression test added.
Original log message:

Canonicalize min/max expressions correctly.

This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:

  %1 = icmp slt i32 %a, i32 0
    %2 = sext i32 %a to i64
      %3 = select i1 %1, i64 %2, i64 0

Would now be canonicalized into:

  %1 = icmp slt i32 %a, i32 0
    %2 = select i1 %1, i32 %a, i32 0
      %3 = sext i32 %2 to i64

This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.

Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.

llvm-svn: 237520
2015-05-16 13:10:45 +00:00
James Molloy
d400196c9a Revert "Canonicalize min/max expressions correctly."
This reverts r237453 - it was causing timeouts on some bots. Reverting
while I investigate (it's probably InstCombine fighting itself...)

llvm-svn: 237458
2015-05-15 17:45:09 +00:00
James Molloy
d7cc5c99ef Canonicalize min/max expressions correctly.
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:

  %1 = icmp slt i32 %a, i32 0
  %2 = sext i32 %a to i64
  %3 = select i1 %1, i64 %2, i64 0

Would now be canonicalized into:

  %1 = icmp slt i32 %a, i32 0
  %2 = select i1 %1, i32 %a, i32 0
  %3 = sext i32 %2 to i64

This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.

Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.

llvm-svn: 237453
2015-05-15 16:10:59 +00:00
Jingyue Wu
55f6400e38 [ValueTracking] refactor: extract method haveNoCommonBitsSet
Summary:
Extract method haveNoCommonBitsSet so that we don't have to duplicate this logic in
InstCombine and SeparateConstOffsetFromGEP.

This patch also makes SeparateConstOffsetFromGEP more precise by passing
DominatorTree to computeKnownBits.

Test Plan: value-tracking-domtree.ll that tests ValueTracking indeed leverages dominating conditions

Reviewers: broune, meheff, majnemer

Reviewed By: majnemer

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9734

llvm-svn: 237407
2015-05-14 23:53:19 +00:00
Pete Cooper
cd94898d6b Convert PHI getIncomingValue() to foreach over incoming_values(). NFC.
We already had a method to iterate over all the incoming values of a PHI.  This just changes all eligible code to use it.

Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.

llvm-svn: 237169
2015-05-12 20:05:31 +00:00
Sanjoy Das
1f4a69a79d [RewriteStatepointsForGC] Fix a bug on creating gc_relocate for pointer to vector of pointers
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.

This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.

Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.

Patch by Chen Li!

Reviewers: reames, AndyAyers, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9592

llvm-svn: 237009
2015-05-11 18:49:34 +00:00
James Molloy
1208541226 Rip min/max pattern matching out of InstCombine and into
ValueTracking.

This matching functionality is useful in more than just InstCombine, so
make it available in ValueTracking.

NFC.

llvm-svn: 236998
2015-05-11 14:42:20 +00:00
Hal Finkel
bb5d93c15a [InstCombine/PowerPC] Fix single-precision QPX load/store replacement
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).

llvm-svn: 236973
2015-05-11 06:37:03 +00:00
David Majnemer
5580620741 [InstCombine] Canonicalize single element array store
Use the element type instead of the aggregate type.

Differential Revision: http://reviews.llvm.org/D9591

llvm-svn: 236969
2015-05-11 05:04:27 +00:00
David Majnemer
9f376fec1c [InstCombine] Canonicalize single element array load
Use the element type instead of the aggregate type.

Differential Revision: http://reviews.llvm.org/D9596

llvm-svn: 236968
2015-05-11 05:04:22 +00:00
Mehdi Amini
7da0eff84e Update InstCombine to transform aggregate loads into scalar loads.
Summary:
One step further getting aggregate loads and store being optimized
properly. This will only handle struct with one element at this point.

Test Plan: Added unit tests for the new supported cases.

Reviewers: chandlerc, joker-eph, joker.eph, majnemer

Reviewed By: majnemer

Subscribers: pete, llvm-commits

Differential Revision: http://reviews.llvm.org/D8339

Patch by Amaury Sechet.

From: Amaury Sechet <amaury@fb.com>
llvm-svn: 236695
2015-05-07 05:52:40 +00:00
Pete Cooper
762f70e897 Change typeIncompatible to return an AttrBuilder instead of new-ing an AttributeSet.
This makes use of the new API which can remove attributes from a set given a builder.

This is much faster than creating a temporary set and reduces llc time by about 0.3% which was all spent creating temporary attributes sets on the context.

llvm-svn: 236668
2015-05-06 23:19:56 +00:00
Sanjoy Das
197092fa7d [Statepoint] Clean up Statepoint.h: accessor names.
Use getFoo() as accessors consistently and some other naming changes.

llvm-svn: 236564
2015-05-06 02:36:26 +00:00
David Blaikie
b888632efe [opaque pointer type] Track explicit GEP pointee type through in-memory IR
llvm-svn: 236510
2015-05-05 18:03:48 +00:00
Matthias Braun
bfbcc9d6b2 InstCombineSimplifyDemanded: Remove nsw/nuw flags when optimizing demanded bits
When optimizing demanded bits of the operands of an Add we have to
remove the nsw/nuw flags as we have no guarantee anymore that we don't
wrap.  This is legal here because the top bit is not demanded.  In fact
this operaion was already performed but missed in the case of an Add
with a constant on the right side.  To fix this this patch refactors the
code to unify the code paths in SimplifyDemandedUseBits() handling of
Add/Sub:

- The transformation of Add->Or is removed from the simplify demand
  code because the equivalent transformation exists in
  InstCombiner::visitAdd()
- KnownOnes/KnownZero are not adjusted for Add x, C anymore as
  computeKnownBits() already performs these computations.
- The simplification of the operands is unified. In this new version
  constant on the right side of a Sub are shrunk now as I could not find
  a reason why not to do so.
- The special case for clearing nsw/nuw in ShrinkDemandedConstant() is
  not necessary anymore as the caller does that already.

Differential Revision: http://reviews.llvm.org/D9415

llvm-svn: 236269
2015-04-30 22:05:30 +00:00
Matthias Braun
dd934add12 InstCombine: Move Sub->Xor rule from SimplifyDemanded to InstCombine
The rule that turns a sub to xor if the LHS is 2^n-1 and the remaining bits
are known zero, does not use the demanded bits at all: Move it to the
normal InstCombine code path.

Differential Revision: http://reviews.llvm.org/D9417

llvm-svn: 236268
2015-04-30 22:04:26 +00:00
Sanjoy Das
dd8fba973b [InstCombine] Add new rule for MIN(MAX(~A, ~B), ~C) et. al.
Summary:
Optimizing these well are especially interesting for IRCE since it
"clamps" values by generating this sort of pattern through SCEV
expressions.

Depends on D9352.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9353

llvm-svn: 236203
2015-04-30 04:56:04 +00:00
Sanjoy Das
668a7c9deb [InstCombine] Add a new formula for SMIN.
Summary:
After this change `MatchSelectPattern` recognizes the following form
of SMIN:

  Y >s C ? ~Y : ~C == ~Y <s ~C ? ~Y : ~C = SMIN(~Y, ~C)

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9352

llvm-svn: 236202
2015-04-30 04:56:00 +00:00
Sanjay Patel
6e05431540 [x86] instcombine more cases of insertps into a shufflevector
This is a follow-on to D8833 (insertps optimization when the zero mask is not used).

In this patch, we check for the case where the zmask is used, but both input vectors
to the insertps intrinsic are the same operand or the zmask overrides the destination
lane. This lets us replace the 2nd shuffle input operand with the zero vector.

Differential Revision: http://reviews.llvm.org/D9257

llvm-svn: 235810
2015-04-25 20:55:25 +00:00
Philip Reames
3f4453de96 Move Value.isDereferenceablePointer to ValueTracking [NFC]
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.

This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.

Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075

llvm-svn: 235611
2015-04-23 17:36:48 +00:00
David Majnemer
c54f703dad [InstCombine] Use a more targeted fix instead of r235544
Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while
taking advantage that the sign bit is not set.  We do this optimization
to further shrink the mask but shrinking the mask isn't NSW/NUW
preserving in this case.

llvm-svn: 235558
2015-04-22 22:42:05 +00:00
David Majnemer
61d5d8dd8e [InstCombine] Clear out nsw/nuw if we modify computation in the chain
An nsw/nuw operation relies on the values feeding into it to not
overflow if 'poison' is not to be produced.  This means that
optimizations which make modifications to the bottom of a chain (like
SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that
they will be preserved.

This fixes PR23309.

llvm-svn: 235544
2015-04-22 20:59:28 +00:00
Wei Mi
24e5246dd6 Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.

Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.

The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.

Differential Revision: http://reviews.llvm.org/D8911

llvm-svn: 235455
2015-04-21 23:02:15 +00:00
Wei Mi
6a1df2cb86 Revert r235451 since it is attached to a wrong Differential Revision. Sorry.
llvm-svn: 235453
2015-04-21 22:56:09 +00:00
Wei Mi
9c45f921b6 Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.

Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.

The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.

Differential Revision: http://reviews.llvm.org/D9007

llvm-svn: 235451
2015-04-21 22:37:09 +00:00
Benjamin Kramer
18e9411446 [InstCombine] Create zero constants on demand.
No functional change intended.

llvm-svn: 235257
2015-04-18 16:52:08 +00:00
David Majnemer
fa43478f01 [InstCombine] (mul nsw 1, INT_MIN) != (shl nsw 1, 31)
Multiplying INT_MIN by 1 doesn't trigger nsw.  However, shifting 1 into
the sign bit *does* trigger nsw.

llvm-svn: 235250
2015-04-18 04:41:30 +00:00
Sanjay Patel
f808782564 [X86, SSE] instcombine common cases of insertps intrinsics into shuffles
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.

I've left all but the full zero case of the zero mask variants out of this patch. 
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.

Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.

Differential Revision: http://reviews.llvm.org/D8833

llvm-svn: 235124
2015-04-16 17:52:13 +00:00
Nick Lewycky
faeb8d6bf0 GCC complains thusly: "attributes at the beginning of statement are ignored [-Werror=attributes]". Very well then! NFC
llvm-svn: 234788
2015-04-13 20:03:08 +00:00
Nick Lewycky
e8c4d44b5f Subtraction is not commutative. Fixes PR23212!
llvm-svn: 234780
2015-04-13 19:17:37 +00:00
Sanjoy Das
68c711fb56 [InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep.  Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.

Depends on D8888.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8889

llvm-svn: 234638
2015-04-10 21:07:09 +00:00
Benjamin Kramer
024e0f290e [CallSite] Make construction from Value* (or Instruction*) explicit.
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.

Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this: 
  if (auto CS = CallSite(V)) // think dyn_cast
instead of:
  if (CallSite CS = V)

This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.

llvm-svn: 234601
2015-04-10 14:50:08 +00:00
Sanjoy Das
4afe7378c6 [InstCombine] Refactor out OptimizeOverflowCheck. NFCI.
Summary:
This patch adds an enum `OverflowCheckFlavor` and a function
`OptimizeOverflowCheck`.  This will allow InstCombine to optimize
overflow checks without directly introducing an intermediate call to the
`llvm.$op.with.overflow` instrinsics.

This specific change is a refactoring and does not intend to change
behavior.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8888

llvm-svn: 234388
2015-04-08 04:27:22 +00:00
David Blaikie
968a03f8cd [opaque pointer type] More GEP IRBuilder API migrations...
llvm-svn: 234058
2015-04-03 21:33:42 +00:00
David Majnemer
2243ffe08d [InstCombine] Use DataLayout to determine vector element width
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are.  This lead to assertion failures.

This fixes PR23113.

llvm-svn: 234046
2015-04-03 20:18:40 +00:00
David Blaikie
39f56d0f3c [opaque pointer type] Change GetElementPtrInst::getIndexedType to take the pointee type
This pushes the use of PointerType::getElementType up into several
callers - I'll essentially just have to keep pushing that up the stack
until I can eliminate every call to it...

llvm-svn: 233604
2015-03-30 21:41:43 +00:00
Duncan P. N. Exon Smith
7f2c33ec62 Transforms: Use the new DebugLoc API, NFC
Update lib/Analysis and lib/Transforms to use the new `DebugLoc` API.

llvm-svn: 233587
2015-03-30 19:49:49 +00:00
David Blaikie
614465c9a9 Constrain the type of a parameter now that callers without this constraint have been removed.
llvm-svn: 233419
2015-03-27 20:56:11 +00:00
David Blaikie
b57edabf0c Recommit r233116 better: Remove a redundant instcombine involving bitcasts of geps of bitcasts
This just didn't need to be here at all, but the assertion I tried to
add wasn't appropriate either - the circumstance isn't impossible, it's
just not important to deal with it here - the gep-rooted version of this
instcombine will handle this case, we don't need to duplicate it for the
case where the gep happens to be used in a bitcast.

llvm-svn: 233404
2015-03-27 20:13:55 +00:00
Benjamin Kramer
ab138a6078 InstCombine: fold (A << C) == (B << C) --> ((A^B) & (~0U >> C)) == 0
Anding and comparing with zero can be done in a single instruction on
most archs so this is a bit cheaper.

llvm-svn: 233291
2015-03-26 17:12:06 +00:00
David Blaikie
27c176e356 Opaque Pointer Types: GEP API migrations to specify the gep type explicitly
The changes to InstCombine (& SCEV) do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)

SCEV looks like it'll need some restructuring - we'll have to do a bit
more work for GEP canonicalization, since it'll depend on how it's used
if we can even manage to canonicalize it to a non-ugly GEP. I guess we
can do some fun stuff like voting (do 2 out of 3 load from the GEP with
a certain type that gives a pretty GEP? Does every typed use of the GEP
use either a specific type or a generic type (i8*, etc)?)

llvm-svn: 233131
2015-03-24 23:34:31 +00:00
Sanjay Patel
1a17022bda optimize the AVX2 (integer) version of vperm2 into a shuffle
...because this is what happens when an instruction
set puts its underwear on after its pants.

This is an extension of r232852, r233100, and 233110:
http://llvm.org/viewvc/llvm-project?view=revision&revision=232852
http://llvm.org/viewvc/llvm-project?view=revision&revision=233100
http://llvm.org/viewvc/llvm-project?view=revision&revision=233110

llvm-svn: 233127
2015-03-24 22:39:29 +00:00
David Blaikie
0dc1532dd9 Opaque Pointer Types: GEP API migrations to specify the gep type explicitly
The changes to InstCombine do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)

llvm-svn: 233126
2015-03-24 22:38:16 +00:00
David Blaikie
8b46a1bcb2 Revert "Remove an InstCombine that seems to have become redundant."
Assertion fires in compiler-rt. Guess it does fire..

This reverts commit r233116.

llvm-svn: 233121
2015-03-24 21:50:35 +00:00
David Blaikie
0914ef9c6b Remove an InstCombine that seems to have become redundant.
Assert that this doesn't fire - I'll remove all of this later, but just
leaving it in for a while in case this is firing & we just don't have
test coverage.

llvm-svn: 233116
2015-03-24 21:31:31 +00:00
Sanjay Patel
c077161b46 [X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.

This is also a continuation of the transform that started in D8486. 
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle. 

This is an implementation of that suggestion.

Differential Revision: http://reviews.llvm.org/D8567

llvm-svn: 233110
2015-03-24 20:36:42 +00:00
Benjamin Kramer
6a9aa608f1 Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
2015-03-23 19:32:43 +00:00
Sanjay Patel
cf8a53f502 [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
vperm2* intrinsics are just shuffles. 
In a few special cases, they're not even shuffles.

Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:

1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
   really angry (and so I have regrets about some patches from last week).

2. Doing mask conversion logic in header files is hard to write and 
   subsequently read.

There are a couple of TODOs in this patch to complete this optimization.

Differential Revision: http://reviews.llvm.org/D8486

llvm-svn: 232852
2015-03-20 21:47:56 +00:00
Daniel Jasper
7e5f7305cf [InstCombine] Don't fold a GEP into itself through a PHI node
This can only occur (I think) through the back-edge of the loop.

However, folding a GEP into itself means that the value of the previous
iteration needs to be stored in the meantime, thus requiring an
additional register variable to be live, but not actually achieving
anything (the gep still needs to be executed once per loop iteration).

The attached test case is derived from:
  typedef unsigned uint32;
  typedef unsigned char uint8;
  inline uint8 *f(uint32 value, uint8 *target) {
    while (value >= 0x80) {
      value >>= 7;
      ++target;
    }
    ++target;
    return target;
  }
  uint8 *g(uint32 b, uint8 *target) {
    target = f(b, f(42, target));
    return target;
  }

What happens is that the GEP stored in incptr2 is folded into itself
through the loop's back-edge and the phi-node stored in loopptr,
effectively incrementing the ptr by "2" in each iteration instead of "1".

In this case, it is actually increasing the number of GEPs required as
the GEP before the loop can't be folded away anymore. For comparison:

With this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %buffer.pn = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %loopptr = getelementptr inbounds i8, i8* %buffer.pn, i64 1
    %shr = lshr i32 %newval, 7
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %0 = phi i8* [ %loopptr, %loop.exit ], [ %buffer, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %0, i64 2
    ret i8* %incptr3
  }

Without this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %incptr = getelementptr inbounds i8, i8* %buffer, i64 1
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %0 = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %loopptr = phi i8* [ %incptr, %loop.header ], [ %incptr2, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %shr = lshr i32 %newval, 7
    %incptr2 = getelementptr inbounds i8, i8* %0, i64 2
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %ptr2 = phi i8* [ %incptr2, %loop.exit ], [ %incptr, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %ptr2, i64 1
    ret i8* %incptr3
  }

Review: http://reviews.llvm.org/D8245
llvm-svn: 232718
2015-03-19 11:05:08 +00:00
Sanjoy Das
1f60e8293a [ConstantRange] Split makeICmpRegion in two.
Summary:
This change splits `makeICmpRegion` into `makeAllowedICmpRegion` and
`makeSatisfyingICmpRegion` with slightly different contracts.  The first
one is useful for determining what values some expression //may// take,
given that a certain `icmp` evaluates to true.  The second one is useful
for determining what values are guaranteed to //satisfy// a given
`icmp`.

Reviewers: nlewycky

Reviewed By: nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8345

llvm-svn: 232575
2015-03-18 00:41:24 +00:00
David Blaikie
441e94fa44 [opaque pointer type] IRBuilder gep migration progress
llvm-svn: 232294
2015-03-15 01:03:19 +00:00
Mehdi Amini
459bdb4f65 Update InstCombine to transform aggregate stores into scalar stores.
Summary: This is a first step toward getting proper support for aggregate loads and stores.

Test Plan: Added unittests

Reviewers: reames, chandlerc

Reviewed By: chandlerc

Subscribers: majnemer, joker.eph, chandlerc, llvm-commits

Differential Revision: http://reviews.llvm.org/D7780

Patch by Amaury Sechet

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
2015-03-14 22:19:33 +00:00
David Blaikie
6b8c7476b9 [opaque pointer type] more gep API migration
llvm-svn: 232274
2015-03-14 19:53:33 +00:00
David Blaikie
7279c7f872 [opaque pointer type] more gep API migrations
Adding nullptr to all the IRBuilder stuff because it's the first thing
that fails to build when testing without the back-compat functions, so
I'll keep having to re-add these locally for each chunk of migration I
do. Might as well check them in to save me the churn. Eventually I'll
have to migrate these too, but I'm going breadth-first.

llvm-svn: 232270
2015-03-14 19:24:04 +00:00
David Blaikie
2db52814fd [opaque pointer type] Start migrating GEP creation to explicitly specify the pointee type
I'm just going to migrate these in a pretty ad-hoc & incremental way -
providing the backwards compatible API for now, then locally removing
it, fixing a few callers, adding it back in and commiting those callers.
Rinse, repeat.

The assertions should ensure that if I get this wrong we'll find out
about it and not just have one giant patch to revert, recommit, revert,
recommit, etc.

llvm-svn: 232240
2015-03-14 01:53:18 +00:00
Duncan P. N. Exon Smith
e5ae5021db instcombine: alloca: Canonicalize scalar allocation array size
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`.  Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often.  Nevertheless, it's
a cheap check and a nice cleanup.

llvm-svn: 232202
2015-03-13 19:42:09 +00:00
Duncan P. N. Exon Smith
a5cb5caa92 instcombine: alloca: Limit array size type promotion
Move type promotion of the size of the array allocation to the end of
`simplifyAllocaArraySize()`.  This avoids promoting the type of the
array size if it's a `ConstantInt`, since the next -instcombine
iteration will drop it to a scalar allocation anyway.  Similarly, this
avoids promoting the type if it's an `UndefValue`, in which case the
alloca gets RAUW'ed.

This is NFC when considered over the lifetime of -instcombine, since
it's just reducing the number of iterations needed to reach fixed point.

llvm-svn: 232201
2015-03-13 19:34:55 +00:00
Duncan P. N. Exon Smith
4326380356 AsmWriter: Write alloca array size explicitly (and -instcombine fixup)
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.

The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)

The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`.  Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.

I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations.  (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)

An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.

A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.

rdar://problem/20075773

llvm-svn: 232200
2015-03-13 19:30:44 +00:00
Duncan P. N. Exon Smith
eb33647b74 instcombine: alloca: Remove nesting in simplifyAllocaArraySize(), NFC
llvm-svn: 232199
2015-03-13 19:26:33 +00:00
Duncan P. N. Exon Smith
56d9d5dfc6 instcombine: alloca: Split out simplifyAllocaArraySize(), NFC
Follow-up commits will change some of the logic here.  Splitting into a
separate function simplifies the logic by allowing early returns instead
of deeper nesting.

llvm-svn: 232197
2015-03-13 19:22:03 +00:00
David Majnemer
613caa1eaa InstCombine: Don't fold call bitcast into args if callee is byval
This fixes a bug reported here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html

llvm-svn: 231948
2015-03-11 18:03:05 +00:00
Philip Reames
d1be160c96 If a conditional branch jumps to the same target, remove the condition
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.

I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.

p.s. We should probably do the same thing for switch instructions.

Differential Revision: http://reviews.llvm.org/D8220

llvm-svn: 231881
2015-03-10 22:52:37 +00:00
Owen Anderson
fa465e39c1 Fix a crash in InstCombine where we could try to truncate a switch comparison to zero width.
llvm-svn: 231761
2015-03-10 06:51:39 +00:00
Owen Anderson
c22a907f78 Fix an infinite loop in InstCombine when an instruction with no users and side effects can be constant folded.
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program.  Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.

llvm-svn: 231755
2015-03-10 05:13:47 +00:00
Mehdi Amini
f88efe5f8a DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
David Blaikie
350b9cbf65 Simplify expressions involving boolean constants with clang-tidy
Patch by Richard (legalize at xmission dot com).

Differential Revision: http://reviews.llvm.org/D8154

llvm-svn: 231617
2015-03-09 01:57:13 +00:00
Michael Kuperstein
92d61decb9 [InstCombine] Fix an assertion when fmul has a ConstantExpr operand
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.

Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053

llvm-svn: 231359
2015-03-05 08:38:57 +00:00
Mehdi Amini
29ebc2d39f Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
David Majnemer
0a7829af7e InstCombine: Ensure select condition types are identical before merging
Selection conditions may be vectors or scalars.  Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.

This fixes PR22773.

llvm-svn: 231156
2015-03-03 22:40:36 +00:00
Hal Finkel
5cc8afebfd [InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.

llvm-svn: 230660
2015-02-26 18:56:03 +00:00
JF Bastien
0f477f42f6 InstCombine: extract instead of shuffle when performing vector/array type punning
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.

Test Plan: make check

Reviewers: jvoung, chandlerc

Subscribers: llvm-commits
llvm-svn: 230560
2015-02-25 22:30:51 +00:00
Charles Davis
f135d6973b [IC] Turn non-null MD on pointer loads to range MD on integer loads.
Summary:
This change fixes the FIXME that you recently added when you committed
(a modified version of) my patch.  When `InstCombine` combines a load and
store of an pointer to those of an equivalently-sized integer, it currently
drops any `!nonnull` metadata that might be present.  This change replaces
`!nonnull` metadata with `!range !{ 1, -1 }` metadata instead.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7621

llvm-svn: 230462
2015-02-25 05:10:25 +00:00
Sanjoy Das
de271b53ef New instcombine rule: max(~a,~b) -> ~min(a, b)
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b).  I think the profitability heuristics can be made
more clever/aggressive, but this is a start.

Differential Revision: http://reviews.llvm.org/D7821

llvm-svn: 230285
2015-02-24 00:08:41 +00:00
Mehdi Amini
1b3cca0d56 InstSimplify: simplify 0 / X if nnan and nsz
From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 230238
2015-02-23 18:30:25 +00:00
Hal Finkel
09a36680d7 [InstCombine] Remove unnecessary variable indexing into single-element arrays
This change addresses a deficiency pointed out in PR22629. To copy from the bug
report:

[from the bug report]

Consider this code:

int f(int x) {
  int a[] = {12};
  return a[x];
}

GCC knows to optimize this to

movl     $12, %eax
ret

The code generated by recent Clang at -O3 is:

movslq   %edi, %rax
movl     .L_ZZ1fiE1a(,%rax,4), %eax
retq

.L_ZZ1fiE1a:
  .long    12                      # 0xc

[end from the bug report]

This definitely seems worth fixing. I've also seen this kind of code before (as
the base case of generic vector wrapper templates with one element).

The general idea is to look at the GEP feeding a load or a store, which has
some variable as its first non-zero index, and determine if that index must be
zero (or else an out-of-bounds access would occur). We can do this for allocas
and globals with constant initializers where we know the maximum size of the
underlying object. When we find such a GEP, we create a new one for the memory
access with that first variable index replaced with a constant zero.

Even if we can't eliminate the memory access (and sometimes we can't), it is
still useful because it removes unnecessary indexing calculations.

llvm-svn: 229959
2015-02-20 03:05:53 +00:00
Akira Hatanaka
913155d842 [InstCombine] Do not insert a GEP instruction before a landingpad instruction.
InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the
insertion point, which caused the verifier to complain when a GEP was inserted
before a landingpad instruction. This commit fixes it to use getFirstInsertionPt
instead.

rdar://problem/19394964

llvm-svn: 229619
2015-02-18 03:30:11 +00:00
Mehdi Amini
ce11b626e7 InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))
Fixes radar 15486701.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229437
2015-02-16 21:47:54 +00:00
Ramkumar Ramachandra
af4f23c6ae InstCombine: propagate deref via new addDereferenceableAttr
The "dereferenceable" attribute cannot be added via .addAttribute(),
since it also expects a size in bytes. AttrBuilder#addAttribute or
AttributeSet#addAttribute is wrapped by classes Function, InvokeInst,
and CallInst. Add corresponding wrappers to
AttrBuilder#addDereferenceableAttr.

Having done this, propagate the dereferenceable attribute via
gc.relocate, adding a test to exercise it. Note that -datalayout is
required during execution over and above -instcombine, because
InstCombine only optionally requires DataLayoutPass.

Differential Revision: http://reviews.llvm.org/D7510

llvm-svn: 229265
2015-02-14 19:37:54 +00:00
Duncan P. N. Exon Smith
a0ef805386 Transforms: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

llvm-svn: 229202
2015-02-14 01:11:29 +00:00
Philip Reames
45adc98469 [InstCombine] When canonicalizing gep indices, prefer zext when possible
If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead.  This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?).  We already apply a similar transform in DAGCombine, this just extends that to the IR level.

This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. 

Differential Revision: http://reviews.llvm.org/D7255

llvm-svn: 229189
2015-02-14 00:05:36 +00:00
Andrea Di Biagio
0c23f4cf6c [InstCombine] Fix regression introduced at r227197.
This patch fixes a problem I accidentally introduced in an instruction combine
on select instructions added at r227197. That revision taught the instruction
combiner how to fold a cttz/ctlz followed by a icmp plus select into a single
cttz/ctlz with flag 'is_zero_undef' cleared.

However, the new rule added at r227197 would have produced wrong results in the
case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a
zero-extend or truncate. In that case, the folded instruction would have
been inserted in a wrong location thus leaving the CFG in an inconsistent
state.

This patch fixes the problem and add two reproducible test cases to
existing test 'InstCombine/select-cmp-cttz-ctlz.ll'.

llvm-svn: 229124
2015-02-13 16:33:34 +00:00