1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

3789 Commits

Author SHA1 Message Date
Nadav Rotem
1e6246b38c SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes.
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.

llvm-svn: 186139
2013-07-12 00:04:18 +00:00
Andrew Trick
fe577c9f45 indvars: Improve LFTR by eliminating truncation when comparing against a constant.
Patch by Michele Scandale!

Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.

llvm-svn: 186107
2013-07-11 17:08:59 +00:00
Arnold Schwaighofer
a8667081e1 LoopVectorize: Vectorize all accesses in address space zero with unit stride
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).

Fixes PR16592.

llvm-svn: 186088
2013-07-11 15:21:55 +00:00
Duncan Sands
447d97b223 TryToSimplifyUncondBranchFromEmptyBlock was checking that any common
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block.  This change
allows merging in the case where there is one or more incoming values
that are undef.  The undef values are rewritten to match the non-undef
value that flows from the other edge.  Patch by Mark Lacey.

llvm-svn: 186069
2013-07-11 08:28:20 +00:00
Nadav Rotem
ee3b8a1b49 Consolidate more lit tests.
llvm-svn: 186063
2013-07-11 05:15:11 +00:00
Nadav Rotem
7bfc1b28ea Consolidate some of the lit tests.
llvm-svn: 186062
2013-07-11 05:11:33 +00:00
Nadav Rotem
b4313e229a Consolidate some of the lit tests.
llvm-svn: 186060
2013-07-11 05:01:50 +00:00
Michael Gottesman
722cf9dc9b Teach TailRecursionElimination to handle certain cases of nocapture escaping allocas.
Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.

Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.

Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:

  1. By assumption A has no escaping alloca arguments itself so it can not
     access the caller's stack via its arguments.

  2. Since all of B's escaping alloca arguments are passed as parameters with
     the nocapture attribute, we know that B does not stash said escaping
     allocas in a manner that outlives B itself and thus could be accessed
     indirectly by A.

With the changes introduced by this patch:

  1. If we see any escaping allocas passed as a capturing argument, we do
     nothing and bail early.

  2. If we do not see any escaping allocas passed as captured arguments but we
     do see escaping allocas passed as nocapture arguments:

       i. We do not perform TRE to avoid PR962 since the code generator produces
          significantly worse code for the dynamic allocas that would be created
          by the TRE algorithm.

       ii. If we do not return twice, mark call sites without escaping allocas
           with the tail marker. *NOTE* This excludes functions with escaping
           nocapture allocas.

  3. If we do not see any escaping allocas at all (whether captured or not):

       i. If we do not have usage of setjmp, mark all callsites with the tail
          marker.

       ii. If there are no dynamic/variable sized allocas in the function,
           attempt to perform TRE on all callsites in the function.

Based off of a patch by Nick Lewycky.

rdar://14324281.

llvm-svn: 186057
2013-07-11 04:40:01 +00:00
David Majnemer
0a8d4ca7e9 InstSimplify: X >> X -> 0
llvm-svn: 185973
2013-07-09 22:01:22 +00:00
Nadav Rotem
417c1a3150 Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform.
llvm-svn: 185970
2013-07-09 21:38:08 +00:00
David Majnemer
a36d20b589 ValueTracking: Fix bugs in isKnownToBeAPowerOfTwo
(add nsw x, (and x, y)) isn't a power of two if x is zero, it's zero
(add nsw x, (xor x, y)) isn't a power of two if y has bits set that aren't set in x

llvm-svn: 185954
2013-07-09 18:11:10 +00:00
David Majnemer
3bb8099e6d InstCombine: variations on 0xffffffff - x >= 4
The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)

These are nice, they get rid of the xor.

llvm-svn: 185915
2013-07-09 09:20:58 +00:00
David Majnemer
ebf98e0163 InstCombine: X & -C != -C -> X <= u ~C
Tests were added in r185910 somehow.

llvm-svn: 185912
2013-07-09 08:09:32 +00:00
David Majnemer
90d0b32c9e Commit r185909 was a misapplied patch, fix it
llvm-svn: 185910
2013-07-09 07:58:32 +00:00
David Majnemer
969e1f9c9f InstCombine: add more transforms
C1-X <u C2 -> (X|(C2-1)) == C1
C1-X >u C2 -> (X|C2) == C1
X-C1 <u C2 -> (X & -C2) == C1
X-C1 >u C2 -> (X & ~C2) == C1

llvm-svn: 185909
2013-07-09 07:50:59 +00:00
David Majnemer
c84fdc2727 InstCombine: Fold X-C1 <u 2 -> (X & -2) == C1
Back in r179493 we determined that two transforms collided with each
other.  The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform.  However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.

llvm-svn: 185808
2013-07-08 11:53:08 +00:00
Michael Gottesman
fb9cc0dd34 [objc-arc] Committed test for r185770 as per dblaikie's suggestion.
llvm-svn: 185782
2013-07-08 02:13:47 +00:00
Nick Lewycky
7a38a8d88f Eliminate trivial redundant loads across nocapture+readonly calls to uncaptured
pointer arguments.

llvm-svn: 185776
2013-07-07 10:15:16 +00:00
Nadav Rotem
883fb8ad80 SLPVectorizer: Implement DCE as part of vectorization.
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.

llvm-svn: 185774
2013-07-07 06:57:07 +00:00
Michael Gottesman
80ca18cc8d [objc-arc] Remove the alias analysis part of r185764.
Upon further reflection, the alias analysis part of r185764 is not a safe
change.

llvm-svn: 185770
2013-07-07 04:18:03 +00:00
Michael Gottesman
92317b03c4 [objc-arc] Teach the ARC optimizer that objc_sync_enter/objc_sync_exit do not modify the ref count of an objc object and additionally are inert for modref purposes.
llvm-svn: 185769
2013-07-07 01:52:55 +00:00
David Majnemer
f803d12937 InstCombine: typo in or_icmp_eq_B_0_icmp_ult_A_B test
llvm-svn: 185737
2013-07-06 00:54:07 +00:00
Nick Lewycky
7b093a1c2f Extend 'readonly' and 'readnone' to work on function arguments as well as
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.

llvm-svn: 185735
2013-07-06 00:29:58 +00:00
Michael Gottesman
afc8c8ccf1 [TRE] Combined another test into basic.ll
llvm-svn: 185729
2013-07-05 22:24:06 +00:00
Michael Gottesman
fc7a204960 [TRE] Merged several tests into the the test basic.ll.
llvm-svn: 185723
2013-07-05 20:45:13 +00:00
David Majnemer
2c72ce123d InstCombine: (icmp eq B, 0) | (icmp ult A, B) -> (icmp ule A, B-1)
This transform allows us to turn IR that looks like:
  %1 = icmp eq i64 %b, 0
  %2 = icmp ult i64 %a, %b
  %3 = or i1 %1, %2
  ret i1 %3

into:
  %0 = add i64 %b, -1
  %1 = icmp uge i64 %0, %a
  ret i1 %1

which means we go from lowering:
        cmpq    %rsi, %rdi
        setb    %cl
        testq   %rsi, %rsi
        sete    %al
        orb     %cl, %al
        ret

to lowering:
        decq    %rsi
        cmpq    %rdi, %rsi
        setae   %al
        ret

llvm-svn: 185677
2013-07-05 00:31:17 +00:00
David Majnemer
bd4154c43f InstCombine: Reimplementation of visitUDivOperand
This transform was originally added in r185257 but later removed in
r185415.  The original transform would create instructions speculatively
and then discard them if the speculation was proved incorrect.  This has
been replaced with a scheme that splits the transform into two parts:
preflight and fold.  While we preflight, we build up fold actions that
inform the folding stage on how to act.

llvm-svn: 185667
2013-07-04 21:17:49 +00:00
Benjamin Kramer
bcf3a616e4 SimplifyCFG: Teach switch generation some patterns that instcombine forms.
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.

llvm-svn: 185632
2013-07-04 14:22:02 +00:00
Michael Gottesman
5165cfc3e1 Change the gettimeofday test to only test on a posix platform.
llvm-svn: 185503
2013-07-03 04:15:22 +00:00
Michael Gottesman
5a59133d77 Added support in FunctionAttrs for adding relevant function/argument attributes for the posix call gettimeofday.
This implies annotating it as nounwind and its arguments as nocapture. To be
conservative, we do not annotate the arguments with noalias since some platforms
do not have restrict on the declaration for gettimeofday.

llvm-svn: 185502
2013-07-03 04:00:54 +00:00
Hal Finkel
142a9fc3fb Revert r185257 (InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms)
I'm reverting this commit because:

 1. As discussed during review, it needs to be rewritten (to avoid creating and
then deleting instructions).

 2. This is causing optimizer crashes. Specifically, I'm seeing things like
this:

    While deleting: i1 %
    Use still stuck around after Def is destroyed:  <badref> = select i1 <badref>, i32 0, i32 1
    opt: /src/llvm-trunk/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.

   I'd guess that these will go away once we're no longer creating/deleting
instructions here, but just in case, I'm adding a regression test.

Because the code is bring rewritten, I've just XFAIL'd the original regression test. Original commit message:

	InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms

	Real world code sometimes has the denominator of a 'udiv' be a
	'select'.  LLVM can handle such cases but only when the 'select'
	operands are symmetric in structure (both select operands are a constant
	power of two or a left shift, etc.).  This falls apart if we are dealt a
	'udiv' where the code is not symetric or if the select operands lead us
	to more select instructions.

	Instead, we should treat the LHS and each select operand as a distinct
	divide operation and try to optimize them independently.  If we can
	to simplify each operation, then we can replace the 'udiv' with, say, a
	'lshr' that has a new select with a bunch of new operands for the
	select.

llvm-svn: 185415
2013-07-02 05:21:11 +00:00
Arnold Schwaighofer
26461b04c7 LoopVectorize: Math functions only read rounding mode
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.

llvm-svn: 185299
2013-07-01 00:54:44 +00:00
Stephen Lin
d9ddb58848 DeadArgumentElimination: keep return value on functions that have a live argument with the 'returned' attribute (rather than generate invalid IR); however, if both can be eliminated, both will be
llvm-svn: 185290
2013-06-30 20:26:21 +00:00
Benjamin Kramer
dc3c6d36e0 ConstantFold: Check that truncating the other side is safe under a sext when trying to remove a sext from a compare.
Fixes PR16462.

llvm-svn: 185284
2013-06-30 13:47:43 +00:00
David Majnemer
52e3b875dd ValueTracking: Teach isKnownToBeAPowerOfTwo about (ADD X, (XOR X, Y)) where X is a power of two
This allows us to simplify urem instructions involving the add+xor to
turn into simpler math.

llvm-svn: 185272
2013-06-29 23:44:53 +00:00
Benjamin Kramer
57adb0eb90 InstCombine: Also turn selects fed by an and into arithmetic when the types don't match.
Inserting a zext or trunc is sufficient. This pattern is somewhat common in
LLVM's pointer mangling code.

llvm-svn: 185270
2013-06-29 21:17:04 +00:00
David Majnemer
10db94fa84 InstCombine: FoldGEPICmp shouldn't change sign of base pointer comparison
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
  %gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
  %gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
  %cmp.i = icmp ult i8* %gep.i, %gep2.i
  %cmp.i1 = icmp ult [1 x i8]* %a, %b
  %cmp = icmp ne i1 %cmp.i, %cmp.i1
  ret i1 %cmp

into:
  %cmp.i = icmp slt [1 x i8]* %a, %b
  %cmp.i1 = icmp ult [1 x i8]* %a, %b
  %cmp = xor i1 %cmp.i, %cmp.i1
  ret i1 %cmp

By preserving the original sign, we now get:
  ret i1 false

This fixes PR16483.

llvm-svn: 185259
2013-06-29 10:28:04 +00:00
David Majnemer
5a0bcf7c2c InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms
Real world code sometimes has the denominator of a 'udiv' be a
'select'.  LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.).  This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.

Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently.  If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.

llvm-svn: 185257
2013-06-29 08:40:07 +00:00
David Majnemer
da449ec54b InstCombine: Optimize (1 << X) Pred CstP2 to X Pred Log2(CstP2)
We may, after other optimizations, find ourselves with IR that looks
like:

  %shl = shl i32 1, %y
  %cmp = icmp ult i32 %shl, 32

Instead, we should just compare the shift count:

  %cmp = icmp ult i32 %y, 5

llvm-svn: 185242
2013-06-28 23:42:03 +00:00
Nadav Rotem
12c3b510fa SLP Vectorizer: Add support for trees with external users.
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.

llvm-svn: 185230
2013-06-28 22:07:09 +00:00
Daniel Malea
f5547eb2d8 Adding tests for DebugIR pass
- lit tests verify that each line of input LLVM IR gets a !dbg node and a
  corresponding entry of metadata that contains the line number 
- unit tests verify that DebugIR works as advertised in the interface
- refactored some useful IR generation functionality from the MCJIT unit tests
  so it can be reused

llvm-svn: 185212
2013-06-28 20:37:20 +00:00
Manman Ren
5bedd08922 Debug Info: clean up usage of Verify.
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.

Also update testing cases to make them conform to the format of DI classes.

llvm-svn: 185135
2013-06-28 05:43:10 +00:00
Matt Arsenault
39fab64303 Convert tests to FileCheck
llvm-svn: 185124
2013-06-28 01:29:35 +00:00
Arnold Schwaighofer
d6aee045b3 LoopVectorize: Preserve debug location info
radar://14169017

llvm-svn: 185122
2013-06-28 00:38:54 +00:00
Arnold Schwaighofer
c0e3a07c99 LoopVectorize: Cache edge masks created during if-conversion
Otherwise, we end up with an exponential IR blowup.
Fixes PR16472.

llvm-svn: 185097
2013-06-27 20:31:06 +00:00
Arnold Schwaighofer
ccd78deec7 LoopVectorize: Use vectorized loop invariant gep index anchored in loop
Use vectorized instruction instead of original instruction anchored in the
original loop.

Fixes PR16452 and t2075.c of PR16455.

llvm-svn: 185081
2013-06-27 15:11:55 +00:00
Manman Ren
989245d300 Update testing case to make DI nodes have the correct format.
llvm-svn: 185061
2013-06-27 06:40:18 +00:00
Arnold Schwaighofer
5f879365c2 Fix spelling.
llvm-svn: 185052
2013-06-27 01:01:11 +00:00
Arnold Schwaighofer
18efca433e LoopVectorize: Don't store a reversed value in the vectorized value map
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.

This fixes 3 test cases of PR16455.

llvm-svn: 185051
2013-06-27 00:45:41 +00:00
Michael Gottesman
fe055b3806 Added support for the Builtin attribute.
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,

rdar://problem/13727199

llvm-svn: 185049
2013-06-27 00:25:01 +00:00