1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

2025 Commits

Author SHA1 Message Date
Simon Pilgrim
750ce6b529 [InstCombine] Added SSE41 roundss/roundsd demanded vector elements tests
llvm-svn: 261468
2016-02-21 12:40:39 +00:00
Simon Pilgrim
f781386f81 [InstCombine] SSE/SSE2 (u)comiss/(u)comisd comparison intrinsics only use the lowest vector element
llvm-svn: 261460
2016-02-20 23:17:35 +00:00
Simon Pilgrim
9535e505f8 [InstCombine] Added SSE/SSE2 comparison intrinsics demanded vector elements tests
llvm-svn: 261454
2016-02-20 22:41:31 +00:00
Simon Pilgrim
fd170b86ec [InstCombine] Added some SSE/SSE2 demanded vector elements tests
llvm-svn: 261451
2016-02-20 21:44:48 +00:00
Reid Kleckner
66e4d3276a [IR] Straighten out bundle overload of IRBuilder::CreateCall
IRBuilder has two ways of putting bundle operands on calls: the default
operand bundle, and an overload of CreateCall that takes an operand
bundle list.

Previously, this overload used a default argument of None. This made it
impossible to distinguish between the case were the caller doesn't care
about bundles, and the case where the caller explicitly wants no
bundles. We behaved as if they wanted the latter behavior rather than
the former, which led to problems with simplifylibcalls and WinEH.

This change fixes it by making the parameter non-optional, so we can
distinguish these two cases.

llvm-svn: 261258
2016-02-18 20:57:41 +00:00
Amaury Sechet
14d2c58ecf Fix load alignement when unpacking aggregates structs
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.

Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17326

llvm-svn: 261139
2016-02-17 19:21:28 +00:00
David Majnemer
5886331bc4 [InstCombine] Don't aggressively replace xor with icmp
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.

However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).

This patch make the replacement happen only when xor/sub has only one
use.

Differential Revision: http://reviews.llvm.org/D16915

Patch by Taewook Oh!

llvm-svn: 260695
2016-02-12 18:12:38 +00:00
Quentin Colombet
b60e3f65de Re-apply r238452, the bug was in clang and was fixed in r260567.
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.

Currently we only fold a BitCast into a Load when the BitCast is its
only user.

Do the same for any no-op cast.

Patch by Philip Pfaffe!

Differential Revision: http://reviews.llvm.org/D9152

llvm-svn: 260612
2016-02-11 22:30:41 +00:00
Pete Cooper
e68afcf1b2 Set load alignment on aggregate loads.
When optimizing a extractvalue(load), we generate a load from the
aggregate type.  This load didn't have alignment set and so would
get the alignment of the type.  This breaks when the type is packed
and so the alignment should be lower.

For example, loading { int, int } would give us alignment of 4, but
the original load from this type may have an alignment of 1 if packed.

Reviewed by David Majnemer

Differential revision: http://reviews.llvm.org/D17158

llvm-svn: 260587
2016-02-11 21:10:40 +00:00
Jun Bum Lim
555cbf018b [InstCombine] Simplify a known nonzero incoming value of PHI
Summary:
When a PHI is used only to be compared with zero, it is possible to replace an
incoming value with any non-zero constant if the incoming value can be proved as
a known nonzero value. For example, in below code, we can replace the incoming value %v with
any non-zero constant based on the fact that the PHI is only used to be compared with zero
and %v is a known non-zero value:
  %v = select %cond, 1, 2
  %p = phi [%v, BB] ...
  %c = icmp eq, %p, 0

Reviewers: mcrosier, jmolloy, sanjoy

Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16240

llvm-svn: 260530
2016-02-11 15:50:07 +00:00
Artur Pilipenko
f8c51ed6a0 Don't propagate dereferenceable attribute through gc.relocate in InstCombine
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D16143

llvm-svn: 260509
2016-02-11 11:22:46 +00:00
Philip Reames
0e7cbf4f20 [InstCombine][GC] Handle gc.relocations of vector type
We introduced gc.relocates of vector-of-pointer types a couple of weeks back.  Somehow, I missed updating the InstCombine rule to account for this.  If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>.

I also took the chance to do a bit of code style cleanup.

llvm-svn: 260279
2016-02-09 21:09:22 +00:00
Quentin Colombet
091258f7a2 [InstCombine] Revert r238452: Fold IntToPtr and PtrToInt into preceding loads.
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.

Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
   using -O2 -flto
4. Run llvm-extract  -ralias '.*bar' -S test/Other/extract-alias.ll

Result:
program doesn't contain global named '.*bar'!

Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar

declare void @bar()

Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
llvm-svn: 259674
2016-02-03 18:04:13 +00:00
Sanjay Patel
14ae72b119 [InstCombine] simplify masked scatter/gather intrinsics with zero masks
A masked scatter with a zero mask means there's no store.
A masked gather with a zero mask means the passthru arg is returned.

This is a continuation of:
http://reviews.llvm.org/rL259369
http://reviews.llvm.org/rL259392

llvm-svn: 259421
2016-02-01 22:10:26 +00:00
Sanjay Patel
0a594deff4 [InstCombine] simplify masked store intrinsics with all ones or zeros masks
A masked store with a zero mask means there's no store.
A masked store with an allOnes mask means it's a normal vector store.

This is a continuation of:
http://reviews.llvm.org/rL259369

llvm-svn: 259392
2016-02-01 19:39:52 +00:00
Sanjay Patel
0bb1987650 fix broken check lines
Without the colon, it doesn't mean anything!

llvm-svn: 259377
2016-02-01 17:46:18 +00:00
David Majnemer
f3d73f0449 [InstCombine] Don't transform (X+INT_MAX)>=(Y+INT_MAX) -> (X<=Y)
This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.

This fixes PR26407.

llvm-svn: 259375
2016-02-01 17:37:56 +00:00
Jun Bum Lim
b95afc3d46 [ValueTracking] Improve isKnownNonZero for PHI of non-zero constants
It is clear that a PHI is a non-zero if all incoming values are non-zero constants.

llvm-svn: 259370
2016-02-01 17:03:07 +00:00
Sanjay Patel
834c52c879 [InstCombine] simplify masked load intrinsics with all ones or zeros masks
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.

Differential Revision: http://reviews.llvm.org/D16691

llvm-svn: 259369
2016-02-01 17:00:10 +00:00
Matt Arsenault
d68c0fe0a6 InstCombine: fabs(x) * fabs(x) -> x * x
llvm-svn: 259295
2016-01-30 05:02:00 +00:00
Sanjay Patel
ce82873e36 [InstCombine] avoid an insertelement transformation that induces the opposite extractelement fold (PR26354)
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.

llvm-svn: 259236
2016-01-29 20:21:02 +00:00
Sanjay Patel
a2f277ac24 add masked intrinsic tests to show missed opportunities
llvm-svn: 259083
2016-01-28 19:54:20 +00:00
Sanjay Patel
906306d436 [LibCallSimplifier] fold memset(malloc(x), 0, x) --> calloc(1, x)
This is a step towards solving PR25892:
https://llvm.org/bugs/show_bug.cgi?id=25892

It won't handle the reported case. As noted by the 'TODO' comments in the patch, 
we need to relax the hasOneUse() constraint and also match patterns that include
memset_chk() and the llvm.memset() intrinsic in addition to memset().

Differential Revision: http://reviews.llvm.org/D16337

llvm-svn: 258816
2016-01-26 16:17:24 +00:00
Matt Arsenault
7a5e15697d AMDGPU: Rename intrinsics to use amdgcn prefix
The intrinsic target prefix should match the target name
as it appears in the triple.

This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.

llvm-svn: 258557
2016-01-22 21:30:34 +00:00
Sanjay Patel
1087b8fb2a [LibCallSimplifier] don't get fooled by a fake fmin()
This is similar to the bug/fix:
https://llvm.org/bugs/show_bug.cgi?id=26211
http://reviews.llvm.org/rL258325

The fmin() test case reveals another bug caused by sloppy
code duplication. It will crash without this patch because
fp128 is a valid floating-point type, but we would think
that we had matched a function that used doubles.

The new helper function can be used to replace similar
checks that are used in several other places in this file.

llvm-svn: 258428
2016-01-21 20:19:54 +00:00
Sanjay Patel
3635b71b45 [LibCallSimplifier] don't get fooled by a fake sqrt()
The test case will crash without this patch because the subsequent call to
hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator
(ie, returns an FP type).

This part of the function signature check was omitted for the sqrt() case, 
but seems to be in place for all other transforms.

Before:
http://reviews.llvm.org/rL257400
...we would have needlessly continued execution in optimizeSqrt(), but the
bug was harmless because we'd eventually fail some other check and return
without damage.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=26211

Differential Revision: http://reviews.llvm.org/D16198

llvm-svn: 258325
2016-01-20 17:41:14 +00:00
Sanjay Patel
691c821001 add tests to show missing memset/malloc optimizations (PR25892)
llvm-svn: 258218
2016-01-19 23:07:10 +00:00
Sanjay Patel
a2ab3d6165 [LibCallSimplifier] use instruction-level fast-math-flags to shrink calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

llvm-svn: 258158
2016-01-19 18:38:52 +00:00
Sanjay Patel
a46637dede [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, [small integer]) calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

As with D15937, the intent of the patch is to preserve the current behavior of the transform
except that we use the pow call's 'fast' attribute as a trigger rather than a function-level
attribute.

The TODO comment notes a potential follow-on patch that would propagate FMF to the new
instructions.

Differential Revision: http://reviews.llvm.org/D16122

llvm-svn: 258153
2016-01-19 18:15:12 +00:00
Artur Pilipenko
bb5abf9eb3 Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D16226

llvm-svn: 258010
2016-01-17 12:35:29 +00:00
Silviu Baranga
777f975cab Re-commit r257064, after it was reverted in r257340.
This contains a fix for the issue that caused the revert:
we no longer assume that we can insert instructions after the
instruction that produces the base pointer. We previously
assumed that this would be ok, because the instruction produces
a value and therefore is not a terminator. This is false for invoke
instructions. We will now insert these new instruction directly
at the location of the users.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257897
2016-01-15 15:52:05 +00:00
James Molloy
7697faf6db [InstCombine] Rewrite bswap/bitreverse handling completely.
There are several requirements that ended up with this design;
  1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
  2. Bitreversals and byteswaps are very related in their matching logic.
  3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
  4. Bswaps are best matched early in InstCombine.

The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.

We can then extend the matching logic in one place only.

llvm-svn: 257875
2016-01-15 09:20:19 +00:00
Sanjay Patel
489a46e98d [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, 0.5) calls
Also, propagate the FMF to the newly created sqrt() call.

llvm-svn: 257503
2016-01-12 19:06:35 +00:00
Sanjay Patel
91e6a8ee15 [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(exp(x)) calls
See also:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404
http://reviews.llvm.org/rL257414

llvm-svn: 257491
2016-01-12 17:30:37 +00:00
Sanjay Patel
930be29a19 consolidate exp/exp2 tests
The transform is identical, so keep the tests together and save some overhead.

llvm-svn: 257484
2016-01-12 17:00:38 +00:00
Sanjay Patel
c16495bc10 Add/edit tests to include instruction-level FMF on calls
Prepatory patch before changing LibCallSimplifier to use the FMF.
Also, tighten the CHECK lines and give the tests more meaningful names.
Similar changes to:
http://reviews.llvm.org/rL257414

llvm-svn: 257481
2016-01-12 16:50:17 +00:00
Sanjay Patel
42e7daf81c [LibCallSimplifier] use instruction-level fast-math-flags to transform log calls
Also, add tests to verify that we're checking 'fast' on both calls of each transform pair,
tighten the CHECK lines, and give the tests more meaningful names.

This is a continuation of:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404

llvm-svn: 257414
2016-01-11 23:31:48 +00:00
Sanjay Patel
dfd0791d6d [LibCallSimplifier] don't allow sqrt transform unless all ops are unsafe
Fix the FIXME added with:
http://reviews.llvm.org/rL257400

llvm-svn: 257404
2016-01-11 22:50:36 +00:00
Sanjay Patel
9ac7e74796 [LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

The intent of the patch is to preserve the current behavior of the transform except
that we use the sqrt instruction's 'fast' attribute as a trigger rather than the
function-level attribute.

But this raises a bug noted by the new FIXME comment.

In order to do this transform:
sqrt((x * x) * y) ---> fabs(x) * sqrt(y)

...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'. 
If any of those ops is strict, we should bail out.

Differential Revision: http://reviews.llvm.org/D15937

llvm-svn: 257400
2016-01-11 22:34:19 +00:00
Silviu Baranga
90360019af Revert r257164 - it has caused spec2k6 failures in LTO mode
llvm-svn: 257340
2016-01-11 16:19:38 +00:00
Silviu Baranga
93f7373429 Re-commit r257064, this time with a fixed assert
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257164
2016-01-08 11:11:04 +00:00
Sanjay Patel
898b29bc66 [InstCombine] insert a new shuffle in a safe place (PR25999)
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

llvm-svn: 257133
2016-01-08 01:39:16 +00:00
David Majnemer
bda025cd91 Add test for r256912
I forgot to add this with the rest of r256912.

llvm-svn: 257088
2016-01-07 19:27:16 +00:00
Silviu Baranga
aa39f9d643 Revert r257064. It caused failures in some sanitizer tests.
llvm-svn: 257069
2016-01-07 15:46:43 +00:00
Silviu Baranga
b0c35664c0 [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257064
2016-01-07 14:56:08 +00:00
Sanjay Patel
20d1d5e75f [LibCallSimplifier] use instruction-level fast-math-flags for tan/atan transform
llvm-svn: 256964
2016-01-06 19:23:35 +00:00
Sanjay Patel
2273c0c2a2 [LibCallSimplfier] use instruction-level fast-math-flags for fmin/fmax transforms
llvm-svn: 256871
2016-01-05 20:46:19 +00:00
Sanjay Patel
2e13cb5de7 [InstCombine] insert a new shuffle before its uses (PR26015)
Although this solves the test case in PR26015:
https://llvm.org/bugs/show_bug.cgi?id=26015

And may solve PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

...I suspect this is not the best solution. I think we want to insert the new shuffle
just ahead of the earliest ExtractElementInst that we're replacing, but I don't know 
how that should be implemented.

Differential Revision: http://reviews.llvm.org/D15878

llvm-svn: 256857
2016-01-05 19:09:47 +00:00
Chen Li
10e521338c [InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.

Reviewers: reames, majnemer

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15859

llvm-svn: 256792
2016-01-04 23:28:57 +00:00
Sanjay Patel
2d3c7242d3 [LibCallSimplifier] propagate FMF when shrinking binary calls
llvm-svn: 256682
2015-12-31 23:40:59 +00:00
Sanjay Patel
9333af147c [LibCallSimplifier] propagate FMF when shrinking unary calls
llvm-svn: 256679
2015-12-31 21:52:31 +00:00
Sanjay Patel
bc5190f0cb change function names to avoid accidentally matching the substring
llvm-svn: 256678
2015-12-31 21:25:25 +00:00
Sanjay Patel
3ea18b95b7 add 'fast' attribute to calls to show that the flag isn't being propagated
llvm-svn: 256677
2015-12-31 21:12:19 +00:00
Chandler Carruth
8beb86a806 [attrs] Extract the pure inference of function attributes into
a standalone pass.

There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.

In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.

The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.

Differential Revision: http://reviews.llvm.org/D15676

llvm-svn: 256466
2015-12-27 08:41:34 +00:00
Chen Li
c60ad3e1fe [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
Sanjay Patel
b4b4a9aeb1 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
David Majnemer
d697901638 [OperandBundles] Have InstCombine play nice with operand bundles
Don't assume a call's use corresponds to an argument operand, it might
correspond to a bundle operand.

llvm-svn: 256327
2015-12-23 09:58:41 +00:00
Philip Reames
46cd55f309 [InstCombine] Extend peephole DSE to handle unordered atomics
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.

Key points:
 * We only remove unordered or simple stores.
 * The loads producing values consumed by dead stores don't influence whether the store is dead.

Differential Revision: http://reviews.llvm.org/D15354

llvm-svn: 255932
2015-12-17 22:19:27 +00:00
Nicolai Hahnle
2aeb81a126 AMDGPU: mark ldexp LibCalls as unavailable
Summary:
The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls
which do not make sense with the AMDGPU backend.

In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of
this optimization, but this works around the problem for now.

See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewers: arsenm, tstellarAMD

Differential Revision: http://reviews.llvm.org/D14990

llvm-svn: 255658
2015-12-15 17:24:15 +00:00
Mehdi Amini
b29b50a9dd Instcombine: destructor loads of structs that do not contains padding
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).

Also update extractvalue.ll and cast.ll so that they use structs with padding.

Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.

Patch by: Amaury SECHET <deadalnix@gmail.com>

Differential Revision: http://reviews.llvm.org/D14483

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
2015-12-15 01:44:07 +00:00
Sanjay Patel
14a74b66f7 add fast-math-flags to 'call' instructions (PR21290)
This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp)
to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add 
support to clang, and extend FMF to the DAG for calls.

Motivating example:

%y = fmul fast float %x, %x
%z = tail call float @sqrtf(float %y)

We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide
attribute for unsafe-math, but we really want to trigger on the instructions themselves:

%z = tail call fast float @sqrtf(float %y)

because in an LTO build it's possible that calls with fast semantics have been inlined into a
function with non-fast semantics.

The code changes and tests are based on the recent commits that added "notail":
http://reviews.llvm.org/rL252368

and added FMF to fcmp:
http://reviews.llvm.org/rL241901

Differential Revision: http://reviews.llvm.org/D14707

llvm-svn: 255555
2015-12-14 21:59:03 +00:00
Sanjay Patel
0876eb09e0 [InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)
This is a fix for PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The idea is to take the existing fold of:
bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X)
( http://reviews.llvm.org/rL112232 )

And break it into less specific transforms so we'll catch more cases such as
the example in the bug report:
bitcast ( trunc ( lshr ( bitcast X))) -->
bitcast ( extractelement (bitcast X)) -->
extractelement (bitcast X)

Enabling patches for this change:
http://reviews.llvm.org/rL255399 (combine bitcasts)
http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X))

Differential Revision: http://reviews.llvm.org/D15392

llvm-svn: 255504
2015-12-14 16:16:54 +00:00
Sanjay Patel
3f624d4650 [InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X))
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261

...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232

while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.

Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before 
proceeding with the transform.

llvm-svn: 255433
2015-12-12 16:44:48 +00:00
David Majnemer
bf189bdcd7 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Sanjay Patel
ceecde00d5 [InstCombine] allow any pair of bitcasts to be combined
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.

It should be safe to convert any pair of bitcasts into a single bitcast, 
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm 
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases. 

Some day we'll get to remove that MMX wart from LLVM IR completely?

Differential Revision: http://reviews.llvm.org/D15468

llvm-svn: 255399
2015-12-12 00:33:36 +00:00
Sanjay Patel
6151d885e6 use FileCheck for better checking
llvm-svn: 255394
2015-12-12 00:01:10 +00:00
Sanjay Patel
5d71fe5292 Add tests for bitcast-bitcast sequences for all scalar/vector permutations
As noted in http://reviews.llvm.org/D15392 , we should be able to improve this.

llvm-svn: 255370
2015-12-11 20:26:30 +00:00
James Molloy
d8003c7bf6 [InstCombine] Make MatchBSwap also match bit reversals
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.

llvm-svn: 255334
2015-12-11 10:04:51 +00:00
Sanjay Patel
25a4b4195f [InstCombine] fold bitcasts around an extractelement (3rd try)
This is a redo of r255137 (reverted at r255227) which was a redo of 
r255124 (reverted at r255126) with a fixed check for a scalar source 
type and an added test for the failure that caused the revert.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255261
2015-12-10 17:09:28 +00:00
Akira Hatanaka
a1488717da Revert r255137.
This commit broke apple's internal bot.

llvm-svn: 255227
2015-12-10 08:00:52 +00:00
Sanjay Patel
de6f59d487 [InstCombine] fold bitcasts around an extractelement (2nd try)
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255137
2015-12-09 18:57:16 +00:00
Mehdi Amini
de04fa6b68 Revert "[InstCombine] fold bitcasts around an extractelement"
This reverts commit r255124.

Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255126
2015-12-09 16:31:39 +00:00
Sanjay Patel
8a5018320c [InstCombine] fold bitcasts around an extractelement
Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255124
2015-12-09 16:17:20 +00:00
Sanjoy Das
16ad4f2471 [InstCombine] Call getCmpPredicateForMinMax only with a valid SPF
Summary:
There are `SelectPatternFlavor`s that don't represent min or max idioms,
and we should not be passing those to `getCmpPredicateForMinMax`.

Fixes PR25745.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15249

llvm-svn: 254869
2015-12-05 23:44:22 +00:00
Weiming Zhao
84bd343622 [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
Summary:
    In order to avoid calling pow function we generate repeated fmul when n is a
    positive or negative whole number.
    
    For each exponent we pre-compute Addition Chains in order to minimize the no.
    of fmuls.
    Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html
    
    We pre-compute addition chains for exponents upto 32 (which results in a max of
    7 fmuls).

    For eg:
    4 = 2+2
    5 = 2+3
    6 = 3+3 and so on
    
    Hence,
    pow(x, 4.0) ==> y = fmul x, x
                    x = fmul y, y
                    ret x

    For negative exponents, we simply compute the reciprocal of the final result.
    
    Note: This transformation is only enabled under fast-math.
    
    Patch by Mandeep Singh Grang <mgrang@codeaurora.org>

Reviewers: weimingz, majnemer, escha, davide, scanon, joerg

Subscribers: probinson, escha, llvm-commits

Differential Revision: http://reviews.llvm.org/D13994

llvm-svn: 254776
2015-12-04 22:00:47 +00:00
David Majnemer
dc587eeed6 [Analysis] Become aware of MSVC's new/delete functions
The compiler can take advantage of the allocation/deallocation
function's properties.  We knew how to do this for Itanium but had no
support for MSVC-style functions.

llvm-svn: 254656
2015-12-03 22:45:19 +00:00
David Majnemer
df4ee5c023 Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2.
Differential Revision: http://reviews.llvm.org/D14223

Patch by Amaury SECHET!

llvm-svn: 254518
2015-12-02 16:15:07 +00:00
Sanjay Patel
f9793efc69 [InstCombine] add tests to show potential vector IR shuffle transforms
llvm-svn: 254342
2015-11-30 22:39:36 +00:00
Davide Italiano
a4b22c0406 [SimplifyLibCalls] Remove useless bits of this tests.
llvm-svn: 254318
2015-11-30 19:38:35 +00:00
Davide Italiano
0f427b7147 [SimplifyLibCalls] Transform log(exp2(y)) to y*log(2) under fast-math.
llvm-svn: 254317
2015-11-30 19:36:35 +00:00
Davide Italiano
ae7cdf685f [SimplifyLibCalls] Don't crash if the function doesn't have a name.
llvm-svn: 254265
2015-11-29 21:58:56 +00:00
Davide Italiano
85963c8ad6 [SimplifyLibCalls] Tranform log(pow(x, y)) -> y*log(x).
This one is enabled only under -ffast-math. There are cases where the
difference between the value computed and the correct value is huge
even for ffast-math, e.g. as Steven pointed out:

x = -1, y = -4
log(pow(-1), 4) = 0
4*log(-1) = NaN

I checked what GCC does and apparently they do the same optimization
(which result in the dramatic difference). Future work might try to
make this (slightly) less worse.

Differential Revision:	http://reviews.llvm.org/D14400

llvm-svn: 254263
2015-11-29 20:58:04 +00:00
Benjamin Kramer
a5c875d940 [SimplifyLibCalls] Don't depend on a called function having a name, it might be an indirect call.
Fixes the crasher in PR25651 and related crashers using the same pattern.

llvm-svn: 254145
2015-11-26 09:51:17 +00:00
Sanjoy Das
d16b4e5c5e [InstCombine] Don't drop operand bundles
Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14857

llvm-svn: 254046
2015-11-25 00:42:19 +00:00
Sanjay Patel
cca965412e [InstCombine] fix propagation of fast-math-flags
Noticed while working on D4583:
http://reviews.llvm.org/D4583

llvm-svn: 253997
2015-11-24 17:51:20 +00:00
Rafael Espindola
9cb8841b77 Have a single way for creating unique value names.
We had two code paths. One would create names like "foo.1" and the other
names like "foo1".

For globals it is important to use "foo.1" to help C++ name demangling.
For locals there is no strong reason to go one way or the other so I
kept the most common mangling (foo1).

llvm-svn: 253804
2015-11-22 00:16:24 +00:00
Sanjay Patel
58d25e69b7 move a single test case to where most other instcombine shuffle bug test cases exist
llvm-svn: 253784
2015-11-21 16:12:58 +00:00
Sanjay Patel
c0f869e525 [InstCombine] add tests to show missing trunc optimizations
llvm-svn: 253609
2015-11-19 22:11:52 +00:00
Sanjay Patel
c933d364c7 [InstCombine] add tests to show missing bitcast optimizations
llvm-svn: 253602
2015-11-19 21:32:25 +00:00
Pete Cooper
b753649d63 Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Davide Italiano
4a84641b2a [SimplifyLibCalls] New trick: pow(x, 0.5) -> sqrt(x) under -ffast-math.
Differential Revision:	http://reviews.llvm.org/D14466

llvm-svn: 253521
2015-11-18 23:21:32 +00:00
Pete Cooper
aca4c5cdc6 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Andrew Kaylor
459ce58049 [EH] Keep filter clauses for types that have been caught.
The instruction combiner previously removed types from filter clauses in Landing Pad instructions if the type had previously been seen in a catch clause.  This is incorrect and prevents unexpected exception handlers from rethrowing the caught type.

Differential Revision: http://reviews.llvm.org/D14669

llvm-svn: 253370
2015-11-17 20:13:04 +00:00
Elena Demikhovsky
d43b8f3050 Fixed GEP visitor in the InstCombine pass.
The current implementation of GEP visitor in InstCombine fails with assertion on Vector GEP with mix of scalar and vector types, like this:

getelementptr double, double* %a, <8 x i32> %i
(It fails to create a "sext" from <8 x i32> to <8 x i64>)

I fixed it and added some tests.

Differential Revision: http://reviews.llvm.org/D14485

llvm-svn: 253162
2015-11-15 08:19:35 +00:00
James Molloy
c1250c50be [InstCombine] Add trivial folding (bitreverse (bitreverse x)) -> x
There are plenty more instcombines we could probably do with bitreverse, but this seems like a very obvious and trivial starting point and was brought up by Hal in his review.

llvm-svn: 252879
2015-11-12 12:39:41 +00:00
David Majnemer
d5f26284b9 [InstCombine] Teach FoldPHIArgZextsIntoPHI about EHPads
FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if
there is an EHPad in the BB.  Doing so would result in an instruction
inserted after a terminator.

llvm-svn: 252377
2015-11-07 00:52:53 +00:00
David Majnemer
48f3ee66bd [InstCombine] Don't insert an instruction after a terminator
We tried to insert a cast of a phi in a block whose terminator is an
EHPad.  This is invalid.  Do not attempt the transform in these
circumstances.

llvm-svn: 252370
2015-11-06 23:59:23 +00:00
David Majnemer
9ffc9c11c7 [InstCombine] Don't RAUW tokens with undef
Let SimplifyCFG remove unreachable BBs which define token instructions.

llvm-svn: 252343
2015-11-06 21:26:32 +00:00
Peter Collingbourne
5b721561aa DI: Reverse direction of subprogram -> function edge.
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.

For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.

This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.

Since this is an IR change, a bitcode upgrade has been provided.

Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.

Differential Revision: http://reviews.llvm.org/D14265

llvm-svn: 252219
2015-11-05 22:03:56 +00:00
Davide Italiano
c3b20ee04f [SimplifyLibCalls] New transformation: tan(atan(x)) -> x
This is enabled only under -ffast-math.
So, instead of emitting:
  4007b0:       50                      push   %rax
  4007b1:       e8 8a fd ff ff          callq  400540 <atanf@plt>
  4007b6:       58                      pop    %rax
  4007b7:       e9 94 fd ff ff          jmpq   400550 <tanf@plt>
  4007bc:       0f 1f 40 00             nopl   0x0(%rax)

for:
float mytan(float x) {
  return tanf(atanf(x));
}
we emit a single retq.

Differential Revision:	 http://reviews.llvm.org/D14302

llvm-svn: 252098
2015-11-04 23:36:56 +00:00
Davide Italiano
063a880856 [SimplifyLibCalls] Add a new transformation: pow(exp(x), y) -> exp(x*y)
This one is enabled only under -ffast-math (due to rounding/overflows)
but allows us to emit shorter code.

Before (on FreeBSD x86-64):
4007f0:       50                      push   %rax
4007f1:       f2 0f 11 0c 24          movsd  %xmm1,(%rsp)
4007f6:       e8 75 fd ff ff          callq  400570 <exp2@plt>
4007fb:       f2 0f 10 0c 24          movsd  (%rsp),%xmm1
400800:       58                      pop    %rax
400801:       e9 7a fd ff ff          jmpq   400580 <pow@plt>
400806:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
40080d:       00 00 00

After:
4007b0:       f2 0f 59 c1             mulsd  %xmm1,%xmm0
4007b4:       e9 87 fd ff ff          jmpq   400540 <exp2@plt>
4007b9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Differential Revision:	http://reviews.llvm.org/D14045

llvm-svn: 251976
2015-11-03 20:32:23 +00:00
Tim Northover
58717c5330 TvOS: add missing support for some libcalls.
llvm-svn: 251811
2015-11-02 18:00:00 +00:00
Artur Pilipenko
0dd1f670a9 Preserve load alignment and dereferenceable metadata during some transformations
Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D13953

llvm-svn: 251809
2015-11-02 17:53:51 +00:00
Davide Italiano
572717843f [SimplifyLibCalls] Add test to ensure transform is not executed if fast-math
attribute is not present.

During my refactor in r251595 I changed the behavior of optimizeSqrt(),
skipping the transformation if the function wasn't marked with unsafe-fp-math
attribute. This fixed a bug, as confirmed by Sanjay (before the optimization
was silently executed anyway), although it wasn't my primary aim.
This commit adds a test to ensure the code doesn't break again.

Reported by: Marcello Maggioni
Discussed with: Sanjay Patel

llvm-svn: 251747
2015-10-31 20:59:32 +00:00
Silviu Baranga
8fa79f5d2b [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs
Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.

This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.

Reviewers: jmolloy, majnemer

Subscribers: mcrosier, mssimpso, llvm-commits

Differential Revision: http://reviews.llvm.org/D13887

llvm-svn: 251281
2015-10-26 10:25:05 +00:00
Hal Finkel
1a64f66683 Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
First, the motivation: LLVM currently does not realize that:

  ((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.

This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:

  1. InstCombine won't automatically pick up the associated logic in
     InstSimplify (InstCombine uses InstSimplify, but not via the API that
     passes in the original instruction).

  2. Putting the logic in InstCombine allows the resulting simplifications to become
     part of the iterative worklist

  3. Putting the logic in InstSimplify allows the resulting simplifications to be
     used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
     and many others).

And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.

llvm-svn: 251146
2015-10-23 20:37:08 +00:00
Michael Liao
9a38e95740 [InstCombine] Revise the test case to match full sequene
llvm-svn: 250950
2015-10-21 21:50:58 +00:00
Michael Liao
5efcb99302 [InstCombine] Optimize icmp of inc/dec at RHS
Allow LLVM to optimize the sequence like the following:

  %inc = add nsw i32 %i, 1
  %cmp = icmp slt %n, %inc

into:

  %cmp = icmp sle i32 %n, %i

The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.

llvm-svn: 250746
2015-10-19 22:08:14 +00:00
Simon Pilgrim
2df3161736 [InstCombine] SSE4A constant folding and conversion to shuffles.
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Differential Revision: http://reviews.llvm.org/D13348

llvm-svn: 250609
2015-10-17 11:40:05 +00:00
Philip Reames
1bb1df8eb7 Tighten known bits for ctpop based on zero input bits
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.

Differential Revision: http://reviews.llvm.org/D13253

llvm-svn: 250338
2015-10-14 22:42:12 +00:00
Simon Pilgrim
5a441530c3 [InstCombine][SSE4A] Remove broken INSERTQI range combining optimization
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.

The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.

llvm-svn: 250160
2015-10-13 14:48:54 +00:00
Simon Pilgrim
977c9904f0 [InstCombine] Tidied up SSE4A tests.
First stage of bugfix discussed in D13348

llvm-svn: 250121
2015-10-12 23:07:06 +00:00
Simon Pilgrim
91bf14e39c [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR
We now have lowering support for XOP PCOM/PCOMU instructions.

llvm-svn: 249977
2015-10-11 14:38:34 +00:00
Sanjay Patel
0bf5df4240 [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886

Without this IR transform, the backend (x86 at least) was producing inefficient code.

This patch is making 2 assumptions:

    1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
    2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.

Differential Revision: http://reviews.llvm.org/D13076

llvm-svn: 249702
2015-10-08 17:09:31 +00:00
Sanjay Patel
172d0b6bb7 [ValueTracking] teach computeKnownBits that a fabs() clears sign bits
This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking
should know that fabs() clears sign bits.

In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits'
fabs() case itself should be vector-ready via the splat in this patch. 
Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'.

Differential Revision: http://reviews.llvm.org/D13222

llvm-svn: 249701
2015-10-08 16:56:55 +00:00
Artur Pilipenko
8b3cc14fdc Teach computeKnownBits to use new align attribute/metadata
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D13470

llvm-svn: 249557
2015-10-07 16:01:18 +00:00
Hans Wennborg
b27c6dcee4 InstCombine: Fold comparisons between unguessable allocas and other pointers
This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490
2015-10-07 00:20:07 +00:00
Philip Reames
ca99cf7d94 Extend known bits to understand @llvm.bswap
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine.

Differential Revision: http://reviews.llvm.org/D13250

llvm-svn: 249453
2015-10-06 20:20:45 +00:00
Andrea Di Biagio
e88e150aaa [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390
2015-10-06 10:34:53 +00:00
Bruno Cardoso Lopes
f80e20287d [SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization
When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

llvm-svn: 249092
2015-10-01 22:43:53 +00:00
Arnaud A. de Grandmaison
3d6b7ea719 [InstCombine] Remove trivially empty lifetime start/end ranges.
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018
2015-10-01 14:54:31 +00:00
Andrea Di Biagio
71ee17fea4 [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

llvm-svn: 248913
2015-09-30 16:44:39 +00:00
Jeroen Ketema
b9ecf8a3ee [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim
95228d060e [InstCombine] Improve Vector Demanded Bits Through Bitcasts
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

llvm-svn: 248784
2015-09-29 08:19:11 +00:00
Sanjay Patel
c39648a72a [InstCombine] fold zexts and constants into a phi (PR24766)
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
   bool conditionMet = false;
   switch (condition) {
   case 0: conditionMet = (x1 == x2); break;
   case 1: conditionMet = (x1 <= x2); break;
   }
   return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
   switch (condition) {
   case 0: return (x1 == x2);
   case 1: return (x1 <= x2);
   }
  return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi 
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, 
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

llvm-svn: 248689
2015-09-27 20:34:31 +00:00
Simon Pilgrim
a06843cf1a [InstCombine] Removed unnecessary meta attributes.
llvm-svn: 248672
2015-09-26 17:49:04 +00:00
Chen Li
11db69b00f [Bug 24848] Use range metadata to constant fold comparisons between two values
Summary:
This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

If both operands of a comparison have range metadata, they should be used to constant fold the comparison.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13177

llvm-svn: 248650
2015-09-26 03:26:47 +00:00
Sanjay Patel
61ecb7fad2 [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I 
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb	%sil, %dil
xorb	$1, %dil
movb	%dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

llvm-svn: 248634
2015-09-25 23:21:38 +00:00
Charlie Turner
4305b8ca2a [InstCombine] Recognize another bswap idiom.
Summary:
The byte-swap recognizer can now notice that this

```
uint32_t bswap(uint32_t x)
{
  x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
  x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
  return x;
}
```
    
is a bswap. Fixes PR23863.

Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin

Subscribers: majnemer, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12637

llvm-svn: 248482
2015-09-24 10:24:58 +00:00
Akira Hatanaka
e7bdc0a349 [InstCombine] Preserve metadata when merging loads that are phi
arguments.

Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following
metadata:

MD_tbaa
MD_alias_scope
MD_noalias
MD_invariant_load
MD_nonnull
MD_range

rdar://problem/17617709

Differential Revision: http://reviews.llvm.org/D12710

llvm-svn: 248419
2015-09-23 18:40:57 +00:00
Chen Li
7331c90075 [Bug 24848] Use range metadata to constant fold comparisons with constant values
Summary:
This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

When range metadata is provided, it should be used to constant fold comparisons with constant values.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12988

llvm-svn: 248402
2015-09-23 17:58:44 +00:00
David Majnemer
906a17fb34 [InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1
(icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a
power of two and has the sign bit set.

This fixes PR24873.

llvm-svn: 248074
2015-09-19 00:48:31 +00:00
Simon Pilgrim
76acca71c9 [InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ instructions
The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results.

Differential Revision: http://reviews.llvm.org/D12680

llvm-svn: 247934
2015-09-17 20:32:45 +00:00
Sanjoy Das
cbeb36a4cc [InstCombine] Optimize icmp slt signum(x), 1 --> icmp slt x, 1
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`).  This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.

Later, we can also consider optimizing:

  icmp slt signum(x), 0 --> icmp slt x, 0
  icmp sle signum(x), 1 --> true

etc.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12703

llvm-svn: 247846
2015-09-16 20:41:29 +00:00
Arch D. Robison
f5010c4ba7 Broaden optimization of fcmp ([us]itofp x, constant) by instcombine.
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant.  The "so small" case includes negative zero.

Differential review: http://reviews.llvm.org/D11210

llvm-svn: 247708
2015-09-15 17:51:59 +00:00
Chen Li
7a880213bc [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not  covered under llvm/test, but fixes should be pretty obvious. 

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247587
2015-09-14 18:10:43 +00:00
Simon Pilgrim
4f410cc25c [InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding
Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):

<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.

Added constant folding support.

Differential Revision: http://reviews.llvm.org/D12731

llvm-svn: 247504
2015-09-12 13:39:53 +00:00
David Blaikie
65b92c4f37 [opaque pointer type] Add textual IR support for explicit type parameter for global aliases
update.py:
import fileinput
import sys
import re

alias_match_prefix = r"(.*(?:=|:|^)\s*(?:external |)(?:(?:private|internal|linkonce|linkonce_odr|weak|weak_odr|common|appending|extern_weak|available_externally) )?(?:default |hidden |protected )?(?:dllimport |dllexport )?(?:unnamed_addr |)(?:thread_local(?:\([a-z]*\))? )?alias"
plain = re.compile(alias_match_prefix + r" (.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|addrspacecast|\[\[[a-zA-Z]|\{\{).*$)")
cast  = re.compile(alias_match_prefix + r") ((?:bitcast|inttoptr|addrspacecast)\s*\(.* to (.*?)(| addrspace\(\d+\) *)\*\)\s*(?:;.*)?$)")
gep   = re.compile(alias_match_prefix + r") ((?:getelementptr)\s*(?:inbounds)?\s*\((?P<type>.*), (?P=type)(?:\s*addrspace\(\d+\)\s*)?\* .*\)\s*(?:;.*)?$)")

def conv(line):
  m = re.match(cast, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(gep, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(plain, line)
  if m:
    return m.group(1) + ", " + m.group(2) + m.group(3) + "*" + m.group(4) + "\n"
  return line

for line in sys.stdin:
  sys.stdout.write(conv(line))

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

llvm-svn: 247378
2015-09-11 03:22:04 +00:00
Mehdi Amini
2a8d4fb24b Revert "[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite"
This reverts commit r247356.

Breaks test/Transforms/InstCombine/pr8547.ll with:

Wrong types for attribute: byval inalloca nest noalias nocapture nonnull readnone readonly sret dereferenceable(1) dereferenceable_or_null(1)
  %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str, i64 0, i64 0), i32 nonnull %conv2) #0
LLVM ERROR: Broken function found, compilation aborted!

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247371
2015-09-11 01:33:48 +00:00
Chen Li
b3a586f07f [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247356
2015-09-10 23:04:49 +00:00
Chen Li
040b0ed807 [InstCombineCalls] Use isKnownNonNullAt() to check nullness of gc.relocate return value
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12772

llvm-svn: 247353
2015-09-10 22:35:41 +00:00
Jakub Kuderski
1c398c712b There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 247271
2015-09-10 11:31:20 +00:00
David Majnemer
3926c488cf Revert trunc(lshr (sext A), Cst) to ashr A, Cst
This reverts commit r246997, it introduced a regression (PR24763).

llvm-svn: 247180
2015-09-09 20:20:08 +00:00
Benjamin Kramer
2c58ab02a8 Merge or combine tests and convert to FileCheck.
- Move tests only exercising instsimplify to instsimplify's apint-or.ll
- Actually test the CHECK lines in instsimplify's apint-or.ll
- Merge the remaining tests in apint-or1.ll and apint-or2.ll, use FileCheck

llvm-svn: 247045
2015-09-08 18:36:56 +00:00
Sanjay Patel
044ebbbd9b add tests for De Morgan instcombines based on PR22723
llvm-svn: 247040
2015-09-08 18:13:03 +00:00
Sanjay Patel
4714c9c846 fix typos, remove noise; NFCI
llvm-svn: 247035
2015-09-08 17:58:22 +00:00
Jakub Kuderski
92dba72884 There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 246997
2015-09-08 10:03:17 +00:00
Sanjay Patel
3529fac16b add missing regression tests for De Morgan's Law transform in InstCombine
llvm-svn: 246973
2015-09-07 19:00:38 +00:00
David Majnemer
5e6c715cd2 [InstCombine] Don't divide by zero when evaluating a potential transform
Trivial multiplication by zero may survive the worklist.  We tried to
reassociate the multiplication with a division instruction, causing us
to divide by zero; bail out instead.

This fixes PR24726.

llvm-svn: 246939
2015-09-06 06:49:59 +00:00
David Majnemer
5cca152072 [InstCombine] Don't assume m_Mul gives back an Instruction
This fixes PR24713.

llvm-svn: 246933
2015-09-05 20:44:56 +00:00