1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 14:02:52 +02:00
Commit Graph

2131 Commits

Author SHA1 Message Date
Sanjay Patel
4b3c5fd7a2 minimize regression tests and update checks
llvm-svn: 274046
2016-06-28 18:33:10 +00:00
Artur Pilipenko
e2ddc2857d Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 274043
2016-06-28 18:27:25 +00:00
Sanjay Patel
88dfc37dab [InstCombine] shrink type of sdiv if dividend is sexted and constant divisor is small enough (PR28153)
This should fix PR28153:
https://llvm.org/bugs/show_bug.cgi?id=28153

Differential Revision: http://reviews.llvm.org/D21769

llvm-svn: 273951
2016-06-27 22:27:11 +00:00
Sanjay Patel
26d02081c6 add tests for PR28153
llvm-svn: 273936
2016-06-27 20:28:59 +00:00
Sanjay Patel
7b6c294043 [InstCombine] use m_APInt for div --> ashr fold
The APInt matcher works with splat vectors, so we get this fold for vectors too.

llvm-svn: 273897
2016-06-27 17:25:57 +00:00
Artur Pilipenko
f9a0655273 Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures.
llvm-svn: 273895
2016-06-27 16:54:33 +00:00
Artur Pilipenko
5d29d9eab5 Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 273892
2016-06-27 16:29:26 +00:00
Igor Breger
859963f633 [ConstantFolding] Fix bitcast vector of i1.
Differential Revision: http://reviews.llvm.org/D21735

llvm-svn: 273845
2016-06-27 06:42:54 +00:00
Sanjay Patel
17b343e65b add tests for potential select transforms
llvm-svn: 273833
2016-06-26 23:44:21 +00:00
Sanjay Patel
7be14224bc update tests to use FileCheck
llvm-svn: 273784
2016-06-25 17:39:10 +00:00
Reid Kleckner
c875c8fffd Revert "InstCombine rule to fold trunc when value available"
This reverts commit r273608.

Broke building code with sanitizers, where apparently these kinds of
loads, casts, and truncations are common:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/24502
http://crbug.com/623099

llvm-svn: 273703
2016-06-24 18:42:58 +00:00
Sanjay Patel
4ac2926595 [InstCombine] consolidate commutation variants of matchSelectFromAndOr() in one place; NFCI
By putting all the possible commutations together, we simplify the code.
Note that this is NFCI, but I'm adding tests that actually exercise each
commutation pattern because we don't have this anywhere else.

llvm-svn: 273702
2016-06-24 18:26:02 +00:00
Anna Thomas
7e0ac12473 InstCombine rule to fold trunc when value available
Summary:
This instcombine rule folds away trunc operations that have value available from a prior load or store.
This kind of code can be generated as a result of GVN widening the load or from source code as well.

Reviewers: reames, majnemer, sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21246

llvm-svn: 273608
2016-06-23 20:22:22 +00:00
Sanjay Patel
1977275dcd [InstSimplify] analyze (optionally casted) icmps to eliminate obviously false logic (PR27869)
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.

Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.

Differential Revision: http://reviews.llvm.org/D21512

llvm-svn: 273200
2016-06-20 20:59:59 +00:00
Matt Arsenault
2da7aad0af InstCombine: Don't strip convergent from intrinsic callsites
Specific instances of intrinsic calls may want to be convergent, such
as certain register reads but the intrinsic declaration is not.

llvm-svn: 273188
2016-06-20 19:04:44 +00:00
Sanjay Patel
1a3ac185b2 [InstCombine] consolidate some icmp+logic tests and improve checks
llvm-svn: 273186
2016-06-20 18:40:37 +00:00
Sanjay Patel
93d8c284d1 [InstCombine] update to use FileCheck with autogenerated exact checking
llvm-svn: 273180
2016-06-20 18:23:40 +00:00
Sanjay Patel
c7bc4ff700 [InstCombine] update to use FileCheck with autogenerated exact checking
llvm-svn: 273173
2016-06-20 17:56:13 +00:00
Sanjay Patel
df106547a0 [InstCombine] regenerate checks
llvm-svn: 273170
2016-06-20 17:48:48 +00:00
Matt Arsenault
c2d726b822 Add looping testcase that broke in r272987
llvm-svn: 273081
2016-06-18 05:15:58 +00:00
Matt Arsenault
b3e867c3ad Revert "Revert "Revert "InstCombine: Reduce trunc (shl x, K) width."""
This seems to be causing an infinite loop / crash in instcombine
on some bots.

llvm-svn: 273069
2016-06-17 23:36:38 +00:00
Matt Arsenault
f5834050fa Revert "Revert "InstCombine: Reduce trunc (shl x, K) width.""
Reapply r272987. Condition should be in terms of the destination type,
and the flags should not be copied.

llvm-svn: 273045
2016-06-17 20:33:53 +00:00
Sanjay Patel
981e90bafa [InstCombine] allow more than one use for vector bitcast folding with selects
The motivating example for this transform is similar to D20774 where bitcasts interfere
with a single cmp/select sequence, but in this case we have 2 uses of each bitcast to 
produce min and max ops:

define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) {
  %cmp = fcmp olt <4 x float> %a, %b
  %bc1 = bitcast <4 x float> %a to <4 x i32>
  %bc2 = bitcast <4 x float> %b to <4 x i32>
  %sel1 = select <4 x i1> %cmp, <4 x i32> %bc1, <4 x i32> %bc2
  %sel2 = select <4 x i1> %cmp, <4 x i32> %bc2, <4 x i32> %bc1
  %bc3 = bitcast <4 x float>* %ptr1 to <4 x i32>*
  store <4 x i32> %sel1, <4 x i32>* %bc3
  %bc4 = bitcast <4 x float>* %ptr2 to <4 x i32>*
  store <4 x i32> %sel2, <4 x i32>* %bc4
  ret void
}

With this patch, we move the selects up to use the input args which allows getting rid of
all of the bitcasts:

define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) {
  %cmp = fcmp olt <4 x float> %a, %b
  %sel1.v = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
  %sel2.v = select <4 x i1> %cmp, <4 x float> %b, <4 x float> %a
  store <4 x float> %sel1.v, <4 x float>* %ptr1, align 16
  store <4 x float> %sel2.v, <4 x float>* %ptr2, align 16
  ret void
}

The asm for x86 SSE then improves from:

movaps  %xmm0, %xmm2
cmpltps %xmm1, %xmm2
movaps  %xmm2, %xmm3
andnps  %xmm1, %xmm3
movaps  %xmm2, %xmm4
andnps  %xmm0, %xmm4
andps %xmm2, %xmm0
orps  %xmm3, %xmm0
andps %xmm1, %xmm2
orps  %xmm4, %xmm2
movaps  %xmm0, (%rdi)
movaps  %xmm2, (%rsi)

To:

movaps  %xmm0, %xmm2
minps %xmm1, %xmm2
maxps %xmm0, %xmm1
movaps  %xmm2, (%rdi)
movaps  %xmm1, (%rsi)

The TODO comments show that we're limiting this transform only to vectors and only to bitcasts
because we need to improve other transforms or risk creating worse codegen.

Differential Revision: http://reviews.llvm.org/D21190

llvm-svn: 273011
2016-06-17 16:46:50 +00:00
Matt Arsenault
6251ace8d8 Revert "InstCombine: Reduce trunc (shl x, K) width."
This reverts commit r272987.

This might be causing crashes on some bots.

llvm-svn: 272990
2016-06-17 06:28:53 +00:00
Matt Arsenault
123f827fb9 InstCombine: Reduce trunc (shl x, K) width.
llvm-svn: 272987
2016-06-17 04:43:22 +00:00
Eli Friedman
39d15bed50 [InstCombine] Don't widen metadata on store-to-load forwarding
The original check for load CSE or store-to-load forwarding is wrong
when the forwarded stored value happened to be a load.

Ref https://github.com/JuliaLang/julia/issues/16894

Differential Revision: http://reviews.llvm.org/D21271

Patch by Yichao Yu!

llvm-svn: 272868
2016-06-16 02:33:42 +00:00
David Majnemer
89fe9b6eda [TargetLibraryInfo] Teach isValidProtoForLibFunc about tan
We would fail to validate the type of the tan function which would cause
downstream users of isValidProtoForLibFunc to assert.

This fixes PR28143.

llvm-svn: 272802
2016-06-15 16:47:23 +00:00
David Majnemer
c6df3d773b Remove the ScalarReplAggregates pass
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM.  LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.

Differential Revision: http://reviews.llvm.org/D21316

llvm-svn: 272737
2016-06-15 00:19:09 +00:00
Simon Pilgrim
3b8db9c327 [InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to native shifts
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).

If the shift amount is constant we can sometimes convert these instructions to native shifts:

1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.

In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.

Differential Revision: http://reviews.llvm.org/D19675

llvm-svn: 271996
2016-06-07 10:27:15 +00:00
Simon Pilgrim
6a9d532b78 [InstCombine][SSE] Add MOVMSK constant folding (PR27982)
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.

The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.

Differential Revision: http://reviews.llvm.org/D20998

llvm-svn: 271990
2016-06-07 08:18:35 +00:00
Michael Kuperstein
247caf8b05 [InstCombine] scalarizePHI should not assume the code it sees has been CSE'd
scalarizePHI only looked for phis that have exactly two uses - the "latch"
use, and an extract. Unfortunately, we can not assume all equivalent extracts
are CSE'd, since InstCombine itself may create an extract which is a duplicate
of an existing one. This extends it to handle several distinct extracts from
the same index.

This should fix at least some of the  performance regressions from PR27988.

Differential Revision: http://reviews.llvm.org/D20983

llvm-svn: 271961
2016-06-06 23:38:33 +00:00
Sanjay Patel
6ec19288fa [InstCombine] limit icmp transform to ConstantInt (PR28011)
In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check
above this to work for any Constant rather than ConstantInt. AFAICT, 
that part makes sense if we can determine that the shrunken/extended 
constant remained equal. But it doesn't make sense for this later 
transform where we assume that the constant DID change. 

This could assert for a ConstantExpr:
https://llvm.org/bugs/show_bug.cgi?id=28011

And it could be wrong for a vector as shown in the added regression test.

llvm-svn: 271908
2016-06-06 16:56:57 +00:00
Sanjay Patel
810d7e88a4 regenerate checks
llvm-svn: 271904
2016-06-06 16:03:06 +00:00
Sanjay Patel
95a34fb288 regenerate checks
llvm-svn: 271903
2016-06-06 15:55:00 +00:00
Sanjoy Das
7a0f514448 Add safety check to InstCombiner::commonIRemTransforms
Since FoldOpIntoPhi speculates the binary operation to potentially each
of the predecessors of the PHI node (pulling it out of arbitrary control
dependence in the process), we can FoldOpIntoPhi only if we know the
operation doesn't have UB.

This also brings up an interesting profitability question -- the way it
is written today, commonIRemTransforms will hoist out work from
dynamically dead code into code that will execute at runtime.  Perhaps
that isn't the best canonicalization?

Fixes PR27968.

llvm-svn: 271857
2016-06-05 21:17:04 +00:00
Sanjoy Das
feb879a718 Add test case for InstCombiner::commonIRemTransforms; NFC
The PHI case in commonIRemTransforms was untested; add a trivial test
case.

llvm-svn: 271856
2016-06-05 21:17:00 +00:00
Sanjay Patel
1709e61793 fix checks
update_test_checks.py got confused matching the variable names. 

llvm-svn: 271844
2016-06-05 17:54:56 +00:00
Sanjay Patel
57c134b91a [InstCombine] allow vector icmp bool transforms
llvm-svn: 271843
2016-06-05 17:49:45 +00:00
Sanjay Patel
b11050c641 add tests to show missing vector transforms
llvm-svn: 271842
2016-06-05 17:32:58 +00:00
Sanjay Patel
b1c8f8ce55 regenerate checks
llvm-svn: 271841
2016-06-05 17:29:45 +00:00
Sanjay Patel
605061c514 update test to use FileCheck
llvm-svn: 271840
2016-06-05 17:13:09 +00:00
Sanjay Patel
24556cdcec update test to use FileCheck
llvm-svn: 271838
2016-06-05 16:41:20 +00:00
Sanjay Patel
7c1912623f update test to FileCheck
llvm-svn: 271837
2016-06-05 16:29:15 +00:00
Sanjay Patel
dad6c47e6b [InstCombine] allow vector constants for cast+icmp fold
This is step 1 of unknown towards fixing PR28001:
https://llvm.org/bugs/show_bug.cgi?id=28001

llvm-svn: 271810
2016-06-04 22:04:05 +00:00
Sanjay Patel
a7b7945972 [InstCombine] add test for missing vector optimization
llvm-svn: 271808
2016-06-04 21:41:25 +00:00
Sanjay Patel
b44cf32fc9 [InstCombine] add test for missing vector optimization
llvm-svn: 271806
2016-06-04 21:20:03 +00:00
Sanjay Patel
9fa8579461 [InstCombine] minimize test case and use FileCheck
llvm-svn: 271805
2016-06-04 21:04:59 +00:00
Simon Pilgrim
c995ee7b75 [InstCombine][MMX] Extend SimplifyDemandedUseBits MOVMSK support to MMX
Add the MMX implementation to the SimplifyDemandedUseBits SSE/AVX MOVMSK support added in D19614

Requires a minor tweak as llvm.x86.mmx.pmovmskb takes a x86_mmx argument - so we have to be explicit about the implied v8i8 vector type.

llvm-svn: 271789
2016-06-04 13:42:46 +00:00
Sanjay Patel
500fe712ac [InstCombine] look through bitcasts to find selects
There was concern that creating bitcasts for the simpler potential select pattern:

define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) {
  %a2 = add <2 x i64> %a, %a
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %bc = bitcast <4 x i32> %sext to <2 x i64>
  %and = and <2 x i64> %a2, %bc
  ret <2 x i64> %and
}

might lead to worse code for some targets, so this patch is matching the larger
patterns seen in the test cases.

The motivating example for this patch is this IR produced via SSE intrinsics in C:

define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x i32>
  %t1 = bitcast <2 x i64> %b to <4 x i32>
  %cmp = icmp sgt <4 x i32> %t0, %t1
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %t2 = bitcast <4 x i32> %sext to <2 x i64>
  %and = and <2 x i64> %t2, %a
  %neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
  %neg2 = bitcast <4 x i32> %neg to <2 x i64>
  %and2 = and <2 x i64> %neg2, %b
  %or = or <2 x i64> %and, %and2
  ret <2 x i64> %or
}

For an AVX target, this is currently:

vpcmpgtd  %xmm1, %xmm0, %xmm2
vpand     %xmm0, %xmm2, %xmm0
vpandn    %xmm1, %xmm2, %xmm1
vpor      %xmm1, %xmm0, %xmm0
retq

With this patch, it becomes:

vpmaxsd   %xmm1, %xmm0, %xmm0

Differential Revision: http://reviews.llvm.org/D20774

llvm-svn: 271676
2016-06-03 14:42:07 +00:00
Sanjay Patel
009edf13a8 [InstCombine] change tests to show a more obvious transform possibility
The original tests were intended to show a missing transform that would
be solved by D20774:
http://reviews.llvm.org/D20774

But it's not clear that the transform for the simpler tests is a win for
all targets. Make the tests show a larger pattern that should be a win
regardless of the cost of bitcast instructions.

llvm-svn: 271603
2016-06-02 22:45:49 +00:00