1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

5392 Commits

Author SHA1 Message Date
Benjamin Kramer
facc0a564b [SimplifyLibCalls] Turn memchr(const, C, const) into a bitfield check.
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.

int foo(char C) { return strchr("123!", C) != nullptr; } now becomes

	cmpl	$64, %edi ## range check
	sbbb	%al, %al
	movabsq	$0xE000200000001, %rcx
	btq	%rdi, %rcx ## bit test
	sbbb	%cl, %cl
	andb	%al, %cl ## and the two conditions
	andb	$1, %cl
	movzbl	%cl, %eax ## returning an int
	ret

(imho the backend should expand this into a series of branches, but
that's a different story)

The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.

llvm-svn: 232902
2015-03-21 21:09:33 +00:00
Benjamin Kramer
6c7bd16fc0 SimplifyLibCalls: Add basic optimization of memchr calls.
This is just memchr(x, y, 0) -> nullptr and constant folding.

llvm-svn: 232896
2015-03-21 15:36:21 +00:00
David Majnemer
a558926cc0 MemoryDependenceAnalysis: Don't miscompile atomics
r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations.  However, this change does
not ensure that the acquire/release operations pair.  Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting.  This largely reverts r216771.

This fixes PR22708.

llvm-svn: 232889
2015-03-21 06:19:17 +00:00
Sanjay Patel
cf8a53f502 [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
vperm2* intrinsics are just shuffles. 
In a few special cases, they're not even shuffles.

Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:

1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
   really angry (and so I have regrets about some patches from last week).

2. Doing mask conversion logic in header files is hard to write and 
   subsequently read.

There are a couple of TODOs in this patch to complete this optimization.

Differential Revision: http://reviews.llvm.org/D8486

llvm-svn: 232852
2015-03-20 21:47:56 +00:00
Wei Mi
2edc60752a Correctly estimate SROA savings for store operands in inline cost analysis.
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0.  Use
getPointerOperand for loads and stores to fix this.

Patch by Easwaran Raman.
http://reviews.llvm.org/D8425

llvm-svn: 232827
2015-03-20 18:33:12 +00:00
Peter Collingbourne
1254323d85 LowerBitSets: Avoid reusing byte set addresses.
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.

Differential Revision: http://reviews.llvm.org/D8455

llvm-svn: 232770
2015-03-19 22:02:10 +00:00
Daniel Jasper
7e5f7305cf [InstCombine] Don't fold a GEP into itself through a PHI node
This can only occur (I think) through the back-edge of the loop.

However, folding a GEP into itself means that the value of the previous
iteration needs to be stored in the meantime, thus requiring an
additional register variable to be live, but not actually achieving
anything (the gep still needs to be executed once per loop iteration).

The attached test case is derived from:
  typedef unsigned uint32;
  typedef unsigned char uint8;
  inline uint8 *f(uint32 value, uint8 *target) {
    while (value >= 0x80) {
      value >>= 7;
      ++target;
    }
    ++target;
    return target;
  }
  uint8 *g(uint32 b, uint8 *target) {
    target = f(b, f(42, target));
    return target;
  }

What happens is that the GEP stored in incptr2 is folded into itself
through the loop's back-edge and the phi-node stored in loopptr,
effectively incrementing the ptr by "2" in each iteration instead of "1".

In this case, it is actually increasing the number of GEPs required as
the GEP before the loop can't be folded away anymore. For comparison:

With this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %buffer.pn = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %loopptr = getelementptr inbounds i8, i8* %buffer.pn, i64 1
    %shr = lshr i32 %newval, 7
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %0 = phi i8* [ %loopptr, %loop.exit ], [ %buffer, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %0, i64 2
    ret i8* %incptr3
  }

Without this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %incptr = getelementptr inbounds i8, i8* %buffer, i64 1
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %0 = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %loopptr = phi i8* [ %incptr, %loop.header ], [ %incptr2, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %shr = lshr i32 %newval, 7
    %incptr2 = getelementptr inbounds i8, i8* %0, i64 2
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %ptr2 = phi i8* [ %incptr2, %loop.exit ], [ %incptr, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %ptr2, i64 1
    ret i8* %incptr3
  }

Review: http://reviews.llvm.org/D8245
llvm-svn: 232718
2015-03-19 11:05:08 +00:00
Michael Zolotukhin
32106188c4 TLI: Add addVectorizableFunctionsFromVecLib.
Also, add several entries to vectorizable functions table, and
corresponding tests. The table isn't complete, it'll be populated later.

Review: http://reviews.llvm.org/D8131
llvm-svn: 232531
2015-03-17 19:50:55 +00:00
Michael Zolotukhin
16e5c0ceec TTI: Honour cost model for estimating cost of vector-intrinsic and calls.
Review: http://reviews.llvm.org/D8096
llvm-svn: 232528
2015-03-17 19:37:28 +00:00
Michael Liao
a26d21b8dd [SwitchLowering] Remove incoming values in the reverse order
- To prevent invalidating *successive* indices.
 

llvm-svn: 232510
2015-03-17 18:03:10 +00:00
Sanjoy Das
08c30ff27c [IRCE] Re-commit tests cases.
Re-commit the test cases added in r232444.  These now use
-irce-print-changed-loops and -irce-print-range-checks so they run
correctly on a without asserts build of llvm.

llvm-svn: 232452
2015-03-17 01:40:24 +00:00
Sanjoy Das
3ba48d59a9 [IRCE] Delete two tests.
I accidentally checked in two tests that used -debug-only -- these fail
on a release LLVM build.  Temporarily delete these from the repo to keep
the bots green while I fix this locally.

llvm-svn: 232446
2015-03-17 00:54:50 +00:00
Sanjoy Das
59e77f5c4e [IRCE] Support half-range checks.
This change to IRCE gets it to recognize "half" range checks.  Half
range checks are range checks that only either check if the index is
`slt` some positive integer ("length") or if the index is `sge` `0`.

The range solver does not try to be clever / aggressive about solving
half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I"
to "0 <= I < INT_SMAX".  This is safe, but not always optimal.

llvm-svn: 232444
2015-03-17 00:42:13 +00:00
Justin Bogner
99971234a2 GCOV: Make the exit block placement from r223193 optional
By default we want our gcov emission to stay 4.2 compatible, which
means we need to continue emit the exit block last by default. We add
an option to emit it before the body for users that need it.

llvm-svn: 232438
2015-03-16 23:52:03 +00:00
Peter Collingbourne
cf293cd018 LowerBitSets: do not use private aliases at all on Darwin.
LLVM currently turns these into linker-private symbols, which can be dead
stripped by the Darwin linker.

llvm-svn: 232435
2015-03-16 23:36:24 +00:00
Duncan P. N. Exon Smith
9b4abdcdd5 DebugInfo: Fix testcases that fail -verify-debug-info=true
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:

  - Empty `filename:` fields in `MDFile`s.  Compile units and some types
    require non-empty filenames.  A number of testcases have empty
    filenames, probably due to hand-reduction of testcases.
  - Not-quite empty arrays: `!{i32 0}`.  This used to be equivalent in
    the debug info schema to `!{}`.  They cause problems for
    `!MDSubroutineType`'s `types:` array, since it requires all operands
    to be valid types.  (Note that `!{null}` is the correct type array
    for functions that take no arguments and return `void`.)
  - Significantly bitrotted testcases.  Nodes got left behind a few
    upgrades ago because of missing or invalid tags.

llvm-svn: 232415
2015-03-16 21:10:12 +00:00
Michael Gottesman
1ce7630734 [objc-arc] Make the ARC optimizer more conservative by forcing it to be non-safe in both direction, but mitigate the problem by noting that we just care if there was a further use.
The problem here is the infamous one direction known safe. I was
hesitant to turn it off before b/c of the potential for regressions
without an actual bug from users hitting the problem. This is that bug ;
).

The main performance impact of having known safe in both directions is
that often times it is very difficult to find two releases without a use
in-between them since we are so conservative with determining potential
uses. The one direction known safe gets around that problem by taking
advantage of many situations where we have two retains in a row,
allowing us to avoid that problem. That being said, the one direction
known safe is unsafe. Consider the following situation:

retain(x)
retain(x)
call(x)
call(x)
release(x)

Then we know the following about the reference count of x:

// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 1 (since we can not release a deallocated pointer).
release(x)
// rc(x) >= 0

That is all the information that we can know statically. That means that
we know that A(x), B(x) together can release (x) at most N+1 times. Lets
say that we remove the inner retain, release pair.

// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
release(x)
// rc(x) >= 0

We knew before that A(x), B(x) could release x up to N+1 times meaning
that rc(x) may be zero at the release(x). That is not safe. On the other
hand, consider the following situation where we have a must use of
release(x) that x must be kept alive for after the release(x)**. Then we
know that:

// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 2 (since we know that we are going to release x and that that release can not be the last use of x).
release(x)
// rc(x) >= 1 (since we can not deallocate the pointer since we have a must use after x).
…
// rc(x) >= 1
use(x)

Thus we know that statically the calls to A(x), B(x) can together only
release rc(x) N times. Thus if we remove the inner retain, release pair:

// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
…
// rc(x) >= 1
use(x)

We are still safe unless in the final … there are unbalanced retains,
releases which would have caused the program to blow up anyways even
before optimization occurred. The simplest form of must use is an
additional release that has not been paired up with any retain (if we
had paired the release with a retain and removed it we would not have
the additional use). This fits nicely into the ARC framework since
basically what you do is say that given any nested releases regardless
of what is in between, the inner release is known safe. This enables us to get
back the lost performance.

<rdar://problem/19023795>

llvm-svn: 232351
2015-03-16 07:02:36 +00:00
Duncan P. N. Exon Smith
9f52f307c2 Verifier: Check debug info intrinsic arguments
Verify that debug info intrinsic arguments are valid.  (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)

With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).

Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.

If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful.  The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).

llvm-svn: 232296
2015-03-15 01:21:30 +00:00
Mehdi Amini
459bdb4f65 Update InstCombine to transform aggregate stores into scalar stores.
Summary: This is a first step toward getting proper support for aggregate loads and stores.

Test Plan: Added unittests

Reviewers: reames, chandlerc

Reviewed By: chandlerc

Subscribers: majnemer, joker.eph, chandlerc, llvm-commits

Differential Revision: http://reviews.llvm.org/D7780

Patch by Amaury Sechet

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
2015-03-14 22:19:33 +00:00
Ahmed Bougacha
b8ea46fe37 Add a bunch of CHECK missing colons in tests. NFC.
Some wouldn't pass;  fixed most, the rest will be fixed separately.

llvm-svn: 232239
2015-03-14 01:43:57 +00:00
Peter Collingbourne
03b0d472bb LowerBitSets: Do not export symbols for bit set referenced globals on Darwin.
The linker on that platform may re-order symbols or strip dead symbols, which
will break bit set checks. Avoid this by hiding the symbols from the linker.

llvm-svn: 232235
2015-03-14 00:00:49 +00:00
Robert Lougher
103950c1eb Reapply "[Reassociate] Add initial support for vector instructions."
This reapplies the patch previously committed at revision 232190.  This was
reverted at revision 232196 as it caused test failures in tests that did not
expect operands to be commuted.  I have made the tests more resilient to
reassociation in revision 232206.

llvm-svn: 232209
2015-03-13 20:53:01 +00:00
Duncan P. N. Exon Smith
e5ae5021db instcombine: alloca: Canonicalize scalar allocation array size
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`.  Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often.  Nevertheless, it's
a cheap check and a nice cleanup.

llvm-svn: 232202
2015-03-13 19:42:09 +00:00
Duncan P. N. Exon Smith
4326380356 AsmWriter: Write alloca array size explicitly (and -instcombine fixup)
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.

The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)

The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`.  Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.

I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations.  (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)

An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.

A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.

rdar://problem/20075773

llvm-svn: 232200
2015-03-13 19:30:44 +00:00
Robert Lougher
c9db4beacb Revert: "[Reassociate] Add initial support for vector instructions."
This reverts revision 232190 due to buildbot failure reported on clang-hexagon-elf
for test arm64_vtst.c.  To be investigated.

llvm-svn: 232196
2015-03-13 19:20:46 +00:00
Robert Lougher
f99d4427a8 [Reassociate] Add initial support for vector instructions.
This patch adds initial support for vector instructions to the reassociation
pass. It enables most parts of the pass to work with vectors but to keep the
size of the patch small, optimization of Xor trees, canonicalization of
negative constants and converting shifts to muls, etc., have been left out.
This will be handled in later patches.

The patch is based on an initial patch by Chad Rosier.

Differential Revision: http://reviews.llvm.org/D7566

llvm-svn: 232190
2015-03-13 18:33:27 +00:00
David Blaikie
3ea2df7c7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
David Majnemer
f2479f3c48 ConstantFold: Fix big shift constant folding
Constant folding for shift IR instructions ignores all bits above 32 of
second argument (shift amount).
Because of that, some undef results are not recognized and APInt can
raise an assert failure if second argument has more than 64 bits.

Patch by Paweł Bylica!

Differential Revision: http://reviews.llvm.org/D7701

llvm-svn: 232176
2015-03-13 16:39:46 +00:00
Kevin Qin
5fb79c9e03 Reapply 'Run LICM pass after loop unrolling pass.'
It's firstly committed at r231630, and reverted at r231635.

Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.

llvm-svn: 232011
2015-03-12 05:36:01 +00:00
David Majnemer
613caa1eaa InstCombine: Don't fold call bitcast into args if callee is byval
This fixes a bug reported here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html

llvm-svn: 231948
2015-03-11 18:03:05 +00:00
Sanjay Patel
f8b6a29222 Inliner should not add callgraph edges for intrinsic calls (PR22857)
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.

This patch makes sure that the Inliner does not add such an edge to the callgraph.

Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857

Differential Revision: http://reviews.llvm.org/D8231

llvm-svn: 231927
2015-03-11 15:12:32 +00:00
Benjamin Kramer
3452551bc8 Prefer pipes over temporary files in a feeble attempt to stabilize this test on windows.
llvm-svn: 231923
2015-03-11 13:55:41 +00:00
Philip Reames
d1be160c96 If a conditional branch jumps to the same target, remove the condition
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.

I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.

p.s. We should probably do the same thing for switch instructions.

Differential Revision: http://reviews.llvm.org/D8220

llvm-svn: 231881
2015-03-10 22:52:37 +00:00
Philip Reames
6b5f658b6c Infer known bits from dominating conditions
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.

Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems.  In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow.  Even with that restriction, it handles many interesting cases:
 * Early exits from functions
 * Early exits from loops (for context instructions in the loop and after the check)
 * Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)

Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.

This patch implements two approaches to the problem that need further benchmarking.  Approach 1 is to directly walk the dominator tree looking for interesting conditions.  Approach 2 is to inspect other uses of the value being queried for interesting comparisons.  From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.  

Differential Revision: http://reviews.llvm.org/D7708

llvm-svn: 231879
2015-03-10 22:43:20 +00:00
Owen Anderson
fa465e39c1 Fix a crash in InstCombine where we could try to truncate a switch comparison to zero width.
llvm-svn: 231761
2015-03-10 06:51:39 +00:00
Owen Anderson
c22a907f78 Fix an infinite loop in InstCombine when an instruction with no users and side effects can be constant folded.
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program.  Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.

llvm-svn: 231755
2015-03-10 05:13:47 +00:00
Kevin Qin
4ba876c24d Revert r231630 - Run LICM pass after loop unrolling pass.
As it broke llvm bootstrap.

llvm-svn: 231635
2015-03-09 07:26:37 +00:00
Kevin Qin
fdbff51f47 [AArch64] Enable partial & runtime unrolling on cortex-a57
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.

llvm-svn: 231632
2015-03-09 06:14:28 +00:00
Kevin Qin
0f6643694d Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization.
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.

llvm-svn: 231631
2015-03-09 06:14:18 +00:00
Kevin Qin
92a0be0434 Run LICM pass after loop unrolling pass.
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.

llvm-svn: 231630
2015-03-09 06:14:07 +00:00
Mehdi Amini
201b59d62e InstCombine: fix fold "fcmp x, undef" to account for NaN
Summary:
See the two test cases.

; Can fold fcmp with undef on one side by choosing NaN for the undef

; Can fold fcmp with undef on both side
;   fcmp u_pred undef, undef -> true
;   fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef

Reviewers: hfinkel, chandlerc, majnemer

Reviewed By: majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D7617

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
2015-03-09 03:20:25 +00:00
Owen Anderson
e54101bbb5 Teach DataLayout to infer a plausible alignment for things even when nothing is specified by the user.
llvm-svn: 231613
2015-03-08 21:53:59 +00:00
Olivier Sallenave
09fc2f3b93 Do not restrict interleaved unrolling to small loops, depending on the target.
llvm-svn: 231528
2015-03-06 23:12:04 +00:00
Karthik Bhat
78ffabf782 Add a new pass "Loop Interchange"
This pass interchanges loops to provide a more cache-friendly memory access.

For e.g. given a loop like -
  for(int i=0;i<N;i++)
    for(int j=0;j<N;j++)
      A[j][i] = A[j][i]+B[j][i];

is interchanged to -
  for(int j=0;j<N;j++)
    for(int i=0;i<N;i++)
      A[j][i] = A[j][i]+B[j][i];

This pass is currently disabled by default.

To give a brief introduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.

TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.

A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499

llvm-svn: 231458
2015-03-06 10:11:25 +00:00
Michael Gottesman
18520dd445 [objc-arc] Remove annotations code.
It will always be in the history if it is needed again. Now it is just dead
code.

llvm-svn: 231435
2015-03-06 00:34:29 +00:00
Nadav Rotem
f095e52b93 Teach ComputeNumSignBits about signed reminder.
This optimization a continuation of r231140 that reasoned about signed div.

llvm-svn: 231433
2015-03-06 00:23:58 +00:00
Philip Reames
442f27a805 [RewriteStatepointsForGC] Yet more test cases for relocation
At this point, we should have decent coverage of the involved code.  I've got a few more test cases to cleanup and submit, but what's here is already reasonable.

I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days.  Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage.  

llvm-svn: 231414
2015-03-05 22:28:06 +00:00
Philip Reames
bfe478d542 [RewriteStatepointsForGC] Add additional tests around relocation
These are focused around the actual relocation rewriting itself, not the rest of the infrastructure.

llvm-svn: 231399
2015-03-05 19:52:13 +00:00
Michael Kuperstein
92d61decb9 [InstCombine] Fix an assertion when fmul has a ConstantExpr operand
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.

Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053

llvm-svn: 231359
2015-03-05 08:38:57 +00:00
Mehdi Amini
29ebc2d39f Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
Philip Reames
59aed68d01 [RewriteStatepointsForGC] Fix a relocation bug w.r.t values defined by invoke instructions
RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form.

However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR.

This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there.

Patch by: Chen Li
Differential Revision: http://reviews.llvm.org/D7923

llvm-svn: 231183
2015-03-04 00:13:52 +00:00
David Majnemer
0a7829af7e InstCombine: Ensure select condition types are identical before merging
Selection conditions may be vectors or scalars.  Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.

This fixes PR22773.

llvm-svn: 231156
2015-03-03 22:40:36 +00:00
Nadav Rotem
0f7a38b97d Teach ComputeNumSignBits about signed divisions.
http://reviews.llvm.org/D8028
rdar://20023136

llvm-svn: 231140
2015-03-03 21:39:02 +00:00
Duncan P. N. Exon Smith
8d1b74869c DebugInfo: Move new hierarchy into place
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464.  I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned.  Let me know if I'm wrong :).

The code changes are fairly mechanical:

  - Bumped the "Debug Info Version".
  - `DIBuilder` now creates the appropriate subclass of `MDNode`.
  - Subclasses of DIDescriptor now expect to hold their "MD"
    counterparts (e.g., `DIBasicType` expects `MDBasicType`).
  - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
    for printing comments.
  - Big update to LangRef to describe the nodes in the new hierarchy.
    Feel free to make it better.

Testcase changes are enormous.  There's an accompanying clang commit on
its way.

If you have out-of-tree debug info testcases, I just broke your build.

  - `upgrade-specialized-nodes.sh` is attached to PR22564.  I used it to
    update all the IR testcases.
  - Unfortunately I failed to find way to script the updates to CHECK
    lines, so I updated all of these by hand.  This was fairly painful,
    since the old CHECKs are difficult to reason about.  That's one of
    the benefits of the new hierarchy.

This work isn't quite finished, BTW.  The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro).  Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers.  I
also expect to make a few schema changes now that it's easier to reason
about everything.

llvm-svn: 231082
2015-03-03 17:24:31 +00:00
Peter Collingbourne
fca689c3d4 LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets.
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:

A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)

These bits can be laid out in a 16-byte array like this:

      Byte Offset
    0123456789ABCDEF
Bit
  7 HHHHHHHHHIIIIIII
  6 GGGGGGGGGGJJJJJJ
  5 FFFFFFFFFFFKKKKK
  4 EEEEEEEEEEEELLLL
  3 DDDDDDDDDDDDDMMM
  2 CCCCCCCCCCCCCCNN
  1 BBBBBBBBBBBBBBBO
  0 AAAAAAAAAAAAAAAA

For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.

This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.

Saves ~450KB of instructions in a recent build of Chromium.

Differential Revision: http://reviews.llvm.org/D7954

llvm-svn: 231043
2015-03-03 00:49:28 +00:00
Benjamin Kramer
11218d5663 LoopIdiom: Give globals for memset_pattern16 private linkage.
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.

llvm-svn: 231041
2015-03-03 00:17:09 +00:00
Sanjoy Das
3584fa135a Revert some changes that were made to fix PR20680.
This re-lands change r230921.  r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.

Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533.  SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow.  There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.

Revert "IndVarSimplify: Allow LFTR to fire more often"

This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

Reviewers: majnemer, atrick, spatel

Differential Revision: http://reviews.llvm.org/D7979

llvm-svn: 231018
2015-03-02 21:41:07 +00:00
NAKAMURA Takumi
233b254c9b Revert r230921, "Revert some changes that were made to fix PR20680.", for now.
It caused a failure on clang/test/Misc/backend-optimization-failure.cpp .

llvm-svn: 230929
2015-03-02 01:14:03 +00:00
Sanjoy Das
b11e512365 Revert some changes that were made to fix PR20680.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533.  SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow.  There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.

Revert "IndVarSimplify: Allow LFTR to fire more often"

This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).

Revert "IndVarSimplify: Don't let LFTR compare against a poison value"

This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).

Reviewers: majnemer, atrick, spatel

Differential Revision: http://reviews.llvm.org/D7979

llvm-svn: 230921
2015-03-01 23:36:26 +00:00
Duncan P. N. Exon Smith
76af7b2bdf DebugInfo: Convert DW_OP_piece => DW_OP_bit_piece
r228631 stopped using `DW_OP_piece` inside `DIExpression`s in the IR,
but it apparently missed updating these testcases.  Caught by verifier
checks for `MDExpression` while working on moving the new hierarchy into
place.

llvm-svn: 230882
2015-02-28 23:57:16 +00:00
Duncan P. N. Exon Smith
4156ef43f1 Fix line endings on Transforms/Inline/inline_dbg_declare.ll
llvm-svn: 230870
2015-02-28 21:38:32 +00:00
Benjamin Kramer
1a8767d1c0 TRE: Just erase dead BBs and tweak the iteration loop not to increment the deleted BB iterator.
Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting
them early also slightly simplifies code.

Thanks to Sanjay for the IR test case.

llvm-svn: 230856
2015-02-28 16:47:27 +00:00
Philip Reames
7a2e8f33b1 [RewriteStatepointsForGC] Fix another order of iteration bug
It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated.  We need to nail this down to avoid non-deterministic output and possible test failures.  

The modified test is the one I first noticed something odd in.  The change is making it more strict to report the error.  With the test change, but without the code change, the test fails roughly 1 in 5.  With the code change, I've run ~30 runs without error.

Long term, the right fix here is to adjust the naming scheme.  I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend.  HJust because I only noticed one case doesn't mean it's actually the only case.  I hope to get to the right change Monday.

std->llvm data structure changes bugfix change #3

llvm-svn: 230835
2015-02-28 01:52:09 +00:00
Philip Reames
937676755d [RewriteStatepointsForGC] Add tests for the base pointer identification algorithm
These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC.  These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions.

In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug.  We were relying on the order of iteration when testing the base pointers found for a derived pointer.  When we switched from std::set to DenseSet, this stopped being a safe assumption.  I'm suspecting I'm going to find more of those.  In particular, I'm now really wondering about the main iteration loop for this algorithm.  I need to go take a closer look at the assumptions there.

I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags).  Suggestions for how to structure this better are very welcome.  

llvm-svn: 230818
2015-02-28 00:20:48 +00:00
David Blaikie
ab043ff680 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
0d99339102 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Sanjoy Das
b7c27748cc IRCE: add a test case for r230619.
llvm-svn: 230680
2015-02-26 20:14:32 +00:00
Hal Finkel
5cc8afebfd [InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.

llvm-svn: 230660
2015-02-26 18:56:03 +00:00
Hal Finkel
13407e3fb6 [InstCombine] Add a test for altivec load/store intrinsic simplification
InstCombine has logic to convert aligned Altivec load/store intrinsics into
regular loads and stores. Unfortunately, there seems to be no regression test
covering this behavior. Adding one...

llvm-svn: 230632
2015-02-26 14:22:41 +00:00
Sanjoy Das
622e5ac37c IRCE: generalize to handle loops with decreasing induction variables.
IRCE can now split the iteration space for loops like:

   for (i = n; i >= 0; i--)
     a[i + k] = 42; // bounds check on access

llvm-svn: 230618
2015-02-26 08:19:31 +00:00
Ramkumar Ramachandra
d6ef954184 PlaceSafepoints: use IRBuilder helpers
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.

Differential Revision: http://reviews.llvm.org/D7518

llvm-svn: 230591
2015-02-26 00:35:56 +00:00
Sanjay Patel
7b4252051a only propagate equality comparisons of FP values that we are certain are non-zero
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.

Bug noted and patch suggested by Eli Friedman.

llvm-svn: 230564
2015-02-25 22:46:08 +00:00
JF Bastien
0f477f42f6 InstCombine: extract instead of shuffle when performing vector/array type punning
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.

Test Plan: make check

Reviewers: jvoung, chandlerc

Subscribers: llvm-commits
llvm-svn: 230560
2015-02-25 22:30:51 +00:00
Peter Collingbourne
e60f3e06fc LowerBitSets: Align referenced globals.
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.

The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.

Differential Revision: http://reviews.llvm.org/D7873

llvm-svn: 230540
2015-02-25 20:42:41 +00:00
Sanjoy Das
60e1014097 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
    
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
    
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
    
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230533
2015-02-25 20:02:59 +00:00
Sanjay Patel
5a3bbd8851 Fix really obscure bug in CannotBeNegativeZero() (PR22688)
With a diabolically crafted test case, we could recurse
through this code and return true instead of false.

The larger engineering crime is the use of magic numbers. 
Added FIXME comments for those.

llvm-svn: 230515
2015-02-25 18:00:15 +00:00
Charles Davis
f135d6973b [IC] Turn non-null MD on pointer loads to range MD on integer loads.
Summary:
This change fixes the FIXME that you recently added when you committed
(a modified version of) my patch.  When `InstCombine` combines a load and
store of an pointer to those of an equivalently-sized integer, it currently
drops any `!nonnull` metadata that might be present.  This change replaces
`!nonnull` metadata with `!range !{ 1, -1 }` metadata instead.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7621

llvm-svn: 230462
2015-02-25 05:10:25 +00:00
Peter Collingbourne
b005cb0cfc LowerBitSets: Introduce global layout builder.
The builder is based on a layout algorithm that tries to keep members of
small bit sets together. The new layout compresses Chromium's bit sets to
around 15% of their original size.

Differential Revision: http://reviews.llvm.org/D7796

llvm-svn: 230394
2015-02-24 23:17:02 +00:00
Hans Wennborg
fb5af71543 Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap"
This caused PR22674, failing this assert:

Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.

llvm-svn: 230341
2015-02-24 16:19:29 +00:00
Sanjoy Das
de271b53ef New instcombine rule: max(~a,~b) -> ~min(a, b)
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b).  I think the profitability heuristics can be made
more clever/aggressive, but this is a start.

Differential Revision: http://reviews.llvm.org/D7821

llvm-svn: 230285
2015-02-24 00:08:41 +00:00
Sanjoy Das
9dddcf7f33 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.

This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.

Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.

NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279.  This commit
message is the correct one.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230280
2015-02-23 23:22:58 +00:00
Sanjoy Das
e5ca05754e Revert 230275.
230275 got committed with an incorrect commit message due to a mixup
on my side.  Will re-land in a few moments with the correct commit
message.

llvm-svn: 230279
2015-02-23 23:13:22 +00:00
Sanjoy Das
421baaa45f Fix bug 22641
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal.  {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.

Differential Revision: http://reviews.llvm.org/D7808

llvm-svn: 230275
2015-02-23 22:55:13 +00:00
Chad Rosier
a95dc74f57 Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.
This patch adds the isProfitableToHoist API.  For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.

Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>

llvm-svn: 230241
2015-02-23 19:15:16 +00:00
Mehdi Amini
1b3cca0d56 InstSimplify: simplify 0 / X if nnan and nsz
From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 230238
2015-02-23 18:30:25 +00:00
Sanjoy Das
20241e74ac IRCE: use SCEVs instead of llvm::Value's for intermediate
calculations.  Semantically non-functional change.

This gets rid of some of the SCEV -> Value -> SCEV round tripping and
the Construct(SMin|SMax)Of and MaybeSimplify helper routines.

llvm-svn: 230150
2015-02-21 22:07:32 +00:00
Philip Reames
fe3f4cccba [PlaceSafepoints] Adjust enablement logic to default to off and be GC configurable per GC
Previously, this pass ran over every function in the Module if added to the pass order.  With this change, it runs only over those with a GC attribute where the GC explicitly opts in.  A GC can also choose which of entry safepoint polls, backedge safepoint polls, and call safepoints it wants.  I hope to get these exposed as checks on the GCStrategy at some point, but for now, the checks are manual string comparisons.

llvm-svn: 230097
2015-02-21 00:09:09 +00:00
Benjamin Kramer
787373e72c LoopRotate: When reconstructing loop simplify form don't split edges from indirectbrs.
Yet another chapter in the endless story. While this looks like we leave
the loop in a non-canonical state this replicates the logic in
LoopSimplify so it doesn't diverge from the canonical form in any way.

PR21968

llvm-svn: 230058
2015-02-20 20:49:25 +00:00
Peter Collingbourne
68aaa34960 Introduce bitset metadata format and bitset lowering pass.
This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.

The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.

Differential Revision: http://reviews.llvm.org/D7288

llvm-svn: 230054
2015-02-20 20:30:47 +00:00
Philip Reames
3c4461b7c6 Bugfix for 229954
Before calling Function::getGC to test for enablement, we need to make sure there's actually a GC at all via Function::hasGC.  Otherwise, we'd crash on functions without a GC.  Thankfully, this only mattered if you manually scheduled the pass, but still, oops. :(

llvm-svn: 230040
2015-02-20 18:56:14 +00:00
Hal Finkel
09a36680d7 [InstCombine] Remove unnecessary variable indexing into single-element arrays
This change addresses a deficiency pointed out in PR22629. To copy from the bug
report:

[from the bug report]

Consider this code:

int f(int x) {
  int a[] = {12};
  return a[x];
}

GCC knows to optimize this to

movl     $12, %eax
ret

The code generated by recent Clang at -O3 is:

movslq   %edi, %rax
movl     .L_ZZ1fiE1a(,%rax,4), %eax
retq

.L_ZZ1fiE1a:
  .long    12                      # 0xc

[end from the bug report]

This definitely seems worth fixing. I've also seen this kind of code before (as
the base case of generic vector wrapper templates with one element).

The general idea is to look at the GEP feeding a load or a store, which has
some variable as its first non-zero index, and determine if that index must be
zero (or else an out-of-bounds access would occur). We can do this for allocas
and globals with constant initializers where we know the maximum size of the
underlying object. When we find such a GEP, we create a new one for the memory
access with that first variable index replaced with a constant zero.

Even if we can't eliminate the memory access (and sometimes we can't), it is
still useful because it removes unnecessary indexing calculations.

llvm-svn: 229959
2015-02-20 03:05:53 +00:00
Philip Reames
d1cf30e1b4 Adjust enablement of RewriteStatepointsForGC
When back merging the changes in 229945 I noticed that I forgot to mark the test cases with the appropriate GC.  We want the rewriting to be off by default (even when manually added to the pass order), not on-by default.  To keep the current test working, mark them as using the statepoint-example GC and whitelist that GC.  

Longer term, we need a better selection mechanism here for both actual usage and testing.  As I migrate more tests to the in tree version of this pass, I will probably need to update the enable/disable logic as well. 

llvm-svn: 229954
2015-02-20 02:34:49 +00:00
Philip Reames
623c8019b3 Add a pass for constructing gc.statepoint sequences w/explicit relocations
This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'.

This patch is setting the stage for work to continue in tree.  In particular, there are known naming and style violations in the current patch.  I'll try to get those resolved over the next week or so.  As I touch each area to make style changes, I need to make sure we have adequate testing in place.  As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage.

The pass has several main subproblems it needs to address:
- First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future.  Note that the current change doesn't actually contain a useful liveness analysis.  It was seperated into a followup change as the code wasn't ready to be shared.  Instead, the current implementation just considers any dominating def of appropriate pointer type to be live.
- Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. 
- Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us.

llvm-svn: 229945
2015-02-20 01:06:44 +00:00
Michael Gottesman
802d429449 [objc-arc-contract] We can not move retains over instructions which can not conservatively be proven to not decrement the retain's RCIdentity.
I also cleaned up the code to make it more understandable for mere mortals.

<rdar://problem/19853758>

llvm-svn: 229937
2015-02-20 00:02:49 +00:00
Ahmed Bougacha
5f490e6f09 [ARM] Re-re-apply VLD1/VST1 base-update combine.
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.

The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes.  When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.

However, for ARMISD nodes, we just used the MMO alignment, no matter
what.  In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.

And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).

To fix this, we can just mirror the hack done for ISD nodes:  only
take into account the MMO alignment when the access is overaligned.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

rdar://19717869, rdar://14062261.

llvm-svn: 229932
2015-02-19 23:52:41 +00:00
Rafael Espindola
75192238cd Avoid conversion to float when creating ConstantDataArray/ConstantDataVector.
Patch by Raoux, Thomas F!

llvm-svn: 229864
2015-02-19 16:08:20 +00:00
Igor Laevsky
571e6552fb Add few simple tests to check statepoint placement for invoke instructions.
Differential Revision: http://reviews.llvm.org/D7535

llvm-svn: 229842
2015-02-19 11:39:04 +00:00
Chandler Carruth
105d2fa5e8 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Sanjoy Das
48dc5bb962 Partial fix for bug 22589
Don't spend the entire iteration space in the scalar loop prologue if
computing the trip count overflows.  This change also gets rid of the
backedge check in the prologue loop and the extra check for
overflowing trip-count.

Differential Revision: http://reviews.llvm.org/D7715

llvm-svn: 229731
2015-02-18 19:32:25 +00:00
Elena Demikhovsky
2900aabad0 Minor fix after 229495.
Removed metadata and function attributes from the test.

llvm-svn: 229647
2015-02-18 08:09:28 +00:00
Adam Nemet
691acc83fa [LoopAccesses] Modify test to also check symbolic strides with memchecks
See the comment in the code.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229627
2015-02-18 03:43:32 +00:00
Akira Hatanaka
913155d842 [InstCombine] Do not insert a GEP instruction before a landingpad instruction.
InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the
insertion point, which caused the verifier to complain when a GEP was inserted
before a landingpad instruction. This commit fixes it to use getFirstInsertionPt
instead.

rdar://problem/19394964

llvm-svn: 229619
2015-02-18 03:30:11 +00:00
Hal Finkel
c078e835f2 [BDCE] Don't forget uses of root instructions seen before the instruction itself
When visiting the initial list of "root" instructions (those which must always
be alive), for those that are integer-valued (such as invokes returning an
integer), we mark their bits as (initially) all dead (we might, obviously, find
uses of those bits later, but all bits are assumed dead until proven
otherwise). Don't do so, however, if we're already seen a use of those bits by
another root instruction (such as a store).

Fixes a miscompile of the sanitizer unit tests on x86_64.

Also, add a debug line for visiting the root instructions, and remove a debug
line which tried to print instructions being removed (printing dead
instructions is dangerous, and can sometimes crash).

llvm-svn: 229618
2015-02-18 03:12:28 +00:00
Elena Demikhovsky
b1743d071c Fixed a bug in store sinking.
The problem was in store-sink barrier check.

Store sink barrier should be checked for ModRef (read-write) mode.

http://llvm.org/bugs/show_bug.cgi?id=22613

llvm-svn: 229495
2015-02-17 13:10:05 +00:00
Hal Finkel
c9890f4fe1 [BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.

Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).

The motivation for this is a case like:

int __attribute__((const)) foo(int i);
int bar(int x) {
  x |= (4 & foo(5));
  x |= (8 & foo(3));
  x |= (16 & foo(2));
  x |= (32 & foo(1));
  x |= (64 & foo(0));
  x |= (128& foo(4));
  return x >> 4;
}

As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).

I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).

I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark

llvm-svn: 229462
2015-02-17 01:36:59 +00:00
Mehdi Amini
ce11b626e7 InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))
Fixes radar 15486701.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229437
2015-02-16 21:47:54 +00:00
Mehdi Amini
1468b597ea Tests: reformat sitofp.ll and use FileCheck
From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229436
2015-02-16 21:47:50 +00:00
James Molloy
317bb7473f [LoopReroll] Relax some assumptions a little.
We won't find a root with index zero in any loop that we are able to reroll.
However, we may find one in a non-rerollable loop, so bail gracefully instead
of failing hard.

llvm-svn: 229406
2015-02-16 17:02:00 +00:00
James Molloy
382c1caece [LoopReroll] Don't crash on dead code
If a PHI has no users, don't crash; bail gracefully. This shouldn't
happen often, but we can make no guarantees that previous passes didn't leave
dead code around.

llvm-svn: 229405
2015-02-16 17:01:52 +00:00
David Majnemer
2b452a1df4 IR: Properly return nullptr when getAggregateElement is out-of-bounds
We didn't properly handle the out-of-bounds case for
ConstantAggregateZero and UndefValue.  This would manifest as a crash
when the constant folder was asked to fold a load of a constant global
whose struct type has no operands.

This fixes PR22595.

llvm-svn: 229352
2015-02-16 04:02:09 +00:00
David Blaikie
9f95d71989 FileCheck-ize a test to make it easier to migrate to typeless pointers
llvm-svn: 229278
2015-02-15 04:14:00 +00:00
David Blaikie
d4624d6acc Update a test to make it easier to migrate to untyped pointers
llvm-svn: 229277
2015-02-15 04:13:58 +00:00
David Blaikie
ddfea60e4d Update a test to use FileCheck so it's easier to migrate to future typeless pointer changes
llvm-svn: 229276
2015-02-15 04:13:57 +00:00
David Blaikie
d5022c864c Reformat test case to be easier to migrate to typeless pointers.
llvm-svn: 229275
2015-02-15 04:13:53 +00:00
Ramkumar Ramachandra
af4f23c6ae InstCombine: propagate deref via new addDereferenceableAttr
The "dereferenceable" attribute cannot be added via .addAttribute(),
since it also expects a size in bytes. AttrBuilder#addAttribute or
AttributeSet#addAttribute is wrapped by classes Function, InvokeInst,
and CallInst. Add corresponding wrappers to
AttrBuilder#addDereferenceableAttr.

Having done this, propagate the dereferenceable attribute via
gc.relocate, adding a test to exercise it. Note that -datalayout is
required during execution over and above -instcombine, because
InstCombine only optionally requires DataLayoutPass.

Differential Revision: http://reviews.llvm.org/D7510

llvm-svn: 229265
2015-02-14 19:37:54 +00:00
Philip Reames
45adc98469 [InstCombine] When canonicalizing gep indices, prefer zext when possible
If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead.  This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?).  We already apply a similar transform in DAGCombine, this just extends that to the IR level.

This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. 

Differential Revision: http://reviews.llvm.org/D7255

llvm-svn: 229189
2015-02-14 00:05:36 +00:00
Andrea Di Biagio
0c23f4cf6c [InstCombine] Fix regression introduced at r227197.
This patch fixes a problem I accidentally introduced in an instruction combine
on select instructions added at r227197. That revision taught the instruction
combiner how to fold a cttz/ctlz followed by a icmp plus select into a single
cttz/ctlz with flag 'is_zero_undef' cleared.

However, the new rule added at r227197 would have produced wrong results in the
case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a
zero-extend or truncate. In that case, the folded instruction would have
been inserted in a wrong location thus leaving the CFG in an inconsistent
state.

This patch fixes the problem and add two reproducible test cases to
existing test 'InstCombine/select-cmp-cttz-ctlz.ll'.

llvm-svn: 229124
2015-02-13 16:33:34 +00:00
Andrea Di Biagio
7791371c4e [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.

This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.

Differential Revision: http://reviews.llvm.org/D7608

llvm-svn: 229105
2015-02-13 14:15:48 +00:00
James Molloy
88288c494b [SimplifyCFG] Add test for r229099
Add extra test that was accidentally not staged.

llvm-svn: 229101
2015-02-13 11:08:40 +00:00
Chandler Carruth
5b9fde4431 [unroll] Concede defeat and disable the unroll analyzer for now.
The issues with the new unroll analyzer are more fundamental than code
cleanup, algorithm, or data structure changes. I've sent an email to the
original commit thread with details and a proposal for how to redesign
things. I'm disabling this for now so that we don't spend time
debugging issues with it in its current state.

llvm-svn: 229064
2015-02-13 05:31:46 +00:00
Michael Liao
1e3950b179 [InstCombine] Fix a bug when combining icmp from ptrtoint
- First, there's a crash when we try to combine that pointers into `icmp`
  directly by creating a `bitcast`, which is invalid if that two pointers are
  from different address spaces.

- It's not always appropriate to cast one pointer to another if they are from
  different address spaces as that is not no-op cast. Instead, we only combine
  `icmp` from `ptrtoint` if that two pointers are of the same address space.

llvm-svn: 229063
2015-02-13 04:51:26 +00:00
Chandler Carruth
cbd5e9d8c2 [IC] Fix a bug with the instcombine canonicalizing of loads and
propagating of metadata.

We were propagating !nonnull metadata even when the newly formed load is
no longer of a pointer type. This is clearly broken and results in LLVM
failing the verifier and aborting. This patch just restricts the
propagation of !nonnull metadata to when we actually have a pointer
type.

This bug report and the initial version of this patch was provided by
Charles Davis! Many thanks for finding this!

We still need to add logic to round-trip the metadata correctly if we
combine from pointer types to integer types and then back by using range
metadata for the integer type loads. But this is the minimal and safe
version of the patch, which is important so we can backport it into 3.6.

llvm-svn: 229029
2015-02-13 02:30:01 +00:00
Olivier Sallenave
cfd4c9724c Check interleaving without relying on debug output.
llvm-svn: 229027
2015-02-13 02:13:57 +00:00
Michael Zolotukhin
4701b74ea8 Testcase for r228988.
llvm-svn: 228995
2015-02-13 00:35:45 +00:00
NAKAMURA Takumi
b02669f0d4 llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll REQUIRES +Asserts due to -debug.
llvm-svn: 228989
2015-02-13 00:21:34 +00:00
Olivier Sallenave
7043291b9c Change max interleave factor to 12 for POWER7 and POWER8.
llvm-svn: 228973
2015-02-12 22:57:58 +00:00
Bjorn Steinbrink
5647f7ac6b Fix a crash in the assumption cache when inlining indirect function calls
Summary:
Instances of the AssumptionCache are per function, so we can't re-use
the same AssumptionCache instance when recursing in the CallAnalyzer to
analyze a different function. Instead we have to pass the
AssumptionCacheTracker to the CallAnalyzer so it can get the right
AssumptionCache on demand.

Reviewers: hfinkel

Subscribers: llvm-commits, hans

Differential Revision: http://reviews.llvm.org/D7533

llvm-svn: 228957
2015-02-12 21:04:22 +00:00
Benjamin Kramer
f3253754cf Update test case.
llvm-svn: 228956
2015-02-12 20:40:19 +00:00
Benjamin Kramer
3b1b5b8725 InstCombine: Allow folding of xor into icmp by changing the predicate for vectors
The loop vectorizer can create this pattern.

llvm-svn: 228954
2015-02-12 20:26:46 +00:00
Michael Zolotukhin
edeaac9321 Add a testcase for r228432.
llvm-svn: 228951
2015-02-12 19:57:24 +00:00
James Molloy
a6fbcd3b92 [LoopRerolling] Be more forgiving with instruction order.
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.

llvm-svn: 228931
2015-02-12 15:54:14 +00:00
Andrea Di Biagio
7ca0db442c [TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target.
Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl
how to query TLI in order to get a more accurate cost for truncates and
zero-extends.

Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase
would have conservatively returned a 'default' TCC_Basic for all zero-extends,
and TCC_Free for truncates on native types.

This patch improves the heuristic so that we query TLI (if available) to get
more accurate answers. If TLI is available, then methods 'isZExtFree' and
'isTruncateFree' can be used to check if a zext/trunc is free for the target.

Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll.
With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz
immediately followed by a free zext/trunc.

Differential Revision: http://reviews.llvm.org/D7585

llvm-svn: 228923
2015-02-12 14:17:24 +00:00
Chandler Carruth
2af75e99bb [slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out.
Apparently some code finally started to tickle this after my
canonicalization changes to instcombine.

The bug stems from trying to form a vector type out of scalars that
aren't compatible at all. In this example, from x86_mmx values. The code
in the vectorizer that checks for reasonable types whas checking for
aggregates or vectors, but there are lots of other types that should
just never reach the vectorizer.

Debugging this was made more confusing by the lie in an assert in
VectorType::get() -- it isn't that the types are *primitive*. The types
must be integer, pointer, or floating point types. No other types are
allowed.

I've improved the assert and added a helper to the vectorizer to handle
the element type validity checks. It now re-uses the VectorType static
function and then further excludes weird target-specific types that we
probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of
these are really reachable anyways (neither 80-bit nor 128-bit things
will get vectorized) but it seems better to just eagerly exclude such
nonesense.

I've added a test case, but while it definitely covers two of the paths
through this code there may be more paths that would benefit from test
coverage. I'm not familiar enough with the SLP vectorizer to synthesize
test cases for all of these, but was able to update the code itself by
inspection.

llvm-svn: 228899
2015-02-12 02:30:56 +00:00
Tim Northover
f976f969cd DeadArgElim: aggregate Return assessment properly.
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.

llvm-svn: 228885
2015-02-11 23:13:11 +00:00
Mehdi Amini
3c8f7ac243 Reassociate: cannot negate a INT_MIN value
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7286

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 228872
2015-02-11 19:54:44 +00:00
Andrea Di Biagio
70c7608263 [TTI] Improved cost heuristic for cttz/ctlz calls.
This patch is a follow-up of r228826 (see code-review: D7506).

Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we 
have to fix the cost heuristic for intrinsic calls to cttz/ctlz.

This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl
queries TLI to check if a call to cttz/ctlz is cheap for the target.

Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86,
SimplifyCFG only speculates a call to cttz/ctlz if it is cheap.

Differential Revision: http://reviews.llvm.org/D7554

llvm-svn: 228829
2015-02-11 14:22:18 +00:00
James Molloy
ba8cd33738 [SimplifyCFG] Swap to using TargetTransformInfo for cost
analysis.

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:
  * combine-comparisons-by-cse.ll: Removed unneeded branch check.
  * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
  * coalesce-subregs.ll: Superfluous block check.
  * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
  * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
  * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

llvm-svn: 228826
2015-02-11 12:15:41 +00:00
James Molloy
c9ce650708 [LoopReroll] Introduce the concept of DAGRootSets.
A DAGRootSet models an induction variable being used in a rerollable
loop. For example:

   x[i*3+0] = y1
   x[i*3+1] = y2
   x[i*3+2] = y3

   Base instruction -> i*3
                    +---+----+
                   /    |     \
               ST[y1]  +1     +2  <-- Roots
                        |      |
                      ST[y2] ST[y3]

There may be multiple DAGRootSets, for example:

   x[i*2+0] = ...   (1)
   x[i*2+1] = ...   (1)
   x[i*2+4] = ...   (2)
   x[i*2+5] = ...   (2)
   x[(i+1234)*2+5678] = ... (3)
   x[(i+1234)*2+5679] = ... (3)

This concept is similar to the "Scale" member used previously, but allows
multiple independent sets of roots based off the same induction variable.

llvm-svn: 228821
2015-02-11 09:19:47 +00:00
Reid Kleckner
f2f6e76b2c Fix invalid LLVM IR in PruneEH tests
llvm-svn: 228786
2015-02-11 02:06:47 +00:00
Reid Kleckner
86643b627c Don't promote asynch EH invokes of nounwind functions to calls
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.

Also add some landingpads to invalid LLVM IR test cases that lack them.

Over-the-shoulder reviewed by David Majnemer.

llvm-svn: 228782
2015-02-11 01:23:16 +00:00
David Majnemer
15b422ab6d EarlyCSE: Add check lines for test added in r228760
llvm-svn: 228761
2015-02-10 23:11:02 +00:00
David Majnemer
530afeca43 EarlyCSE: It isn't safe to CSE across synchronization boundaries
This fixes PR22514.

llvm-svn: 228760
2015-02-10 23:09:43 +00:00
Tim Northover
0eba2eb04d DeadArgElim: arguments affect all returned sub-values by default.
Unless we meet an insertvalue on a path from some value to a return, that value
will be live if *any* of the return's components are live, so all of those
components must be added to the MaybeLiveUses.

Previously we were deleting arguments if sub-value 0 turned out to be dead.

llvm-svn: 228731
2015-02-10 19:49:18 +00:00
Michael Zolotukhin
5d9638624f Add a test case for new unrolling heuristics.
THe heuristics were added in r228265 and r228434.

llvm-svn: 228713
2015-02-10 17:54:54 +00:00
Chandler Carruth
3f645fb28b Revert r228556: InstCombine: propagate nonNull through assume
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.

llvm-svn: 228677
2015-02-10 08:07:32 +00:00
Ramkumar Ramachandra
054737a0c9 PlaceSafepoints: modernize gc.result.* -> gc.result
Differential Revision: http://reviews.llvm.org/D7516

llvm-svn: 228625
2015-02-09 23:00:40 +00:00
Philip Reames
35415b0c25 Introduce more tests for PlaceSafepoints
These tests the two optimizations for backedge insertion currently implemented and the split backedge flag which is currently off by default.

llvm-svn: 228617
2015-02-09 22:10:15 +00:00
Philip Reames
e9662c9d04 Minor test cleanup
a) add gc attribute
b) remove unused param

llvm-svn: 228612
2015-02-09 21:50:31 +00:00
Philip Reames
4ae49786df Add basic tests for PlaceSafepoints
This is just adding really simple tests which should have been part of the original submission.  When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission.  I fixed two such bugs in this submission.

llvm-svn: 228610
2015-02-09 21:48:05 +00:00
Tim Northover
e34e125ece DeadArgElim: fix mismatch in accounting of array return types.
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.

This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.

llvm-svn: 228559
2015-02-09 01:21:00 +00:00
Tim Northover
3d150e22ef DeadArgElim: assess uses of entire return value aggregate.
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.

E.g.

  %val = call { i32, i32 } @whatever()
  [...]
  ret { i32, i32 } %val

The return is using the entire aggregate (sub-values 0 and 1). We can
still simplify @whatever if we can prove that this return is itself
unused.

Also unifies the logic slightly between aggregate and non-aggregate
cases..

llvm-svn: 228558
2015-02-09 01:20:53 +00:00
Ramkumar Ramachandra
570fa73130 InstCombine: propagate nonNull through assume
Make assume (load (call|invoke) != null) set nonNull return attribute
for the call and invoke. Also include tests.

Differential Revision: http://reviews.llvm.org/D7107

llvm-svn: 228556
2015-02-09 01:13:13 +00:00
Bjorn Steinbrink
a6a56743c3 Correctly combine alias.scope metadata by a union instead of intersecting
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7490

llvm-svn: 228525
2015-02-08 17:07:14 +00:00
Benjamin Kramer
3aeb5530c5 ValueTracking: Make isBytewiseValue simpler and more powerful at the same time.
Turns out there is a simpler way of checking that all bytes in a word are equal
than binary decomposition.

llvm-svn: 228503
2015-02-07 19:29:02 +00:00
Bjorn Steinbrink
7de3907fdc Properly update AA metadata when performing call slot optimization
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7482

llvm-svn: 228500
2015-02-07 17:54:36 +00:00
Matthias Braun
7549566067 InstCombine: Combine select sequences into a single select
Normalize
select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b)
select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b)

This normal form may enable further combines on the And/Or and shortens
paths for the values. Many targets prefer the other but can go back
easily in CodeGen.

Differential Revision: http://reviews.llvm.org/D7399

llvm-svn: 228409
2015-02-06 17:49:36 +00:00
Michael Kuperstein
8ed19d08b2 Teach isDereferenceablePointer() to look through bitcast constant expressions.
This fixes a LICM regression due to the new load+store pair canonicalization.

Differential Revision: http://reviews.llvm.org/D7411

llvm-svn: 228284
2015-02-05 09:15:37 +00:00
Cameron Esfahani
a75b0eb54b Value soft float calls as more expensive in the inliner.
Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation.  By default, it's not an expensive operation.  This keeps the default behavior the same as before.  The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.

Reviewers: chandlerc, echristo

Reviewed By: echristo

Subscribers: t.p.northover, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D6936

llvm-svn: 228263
2015-02-05 02:09:33 +00:00
Tom Stellard
e903257d57 StructurizeCFG: Use a reverse post-order traversal
We were previously doing a post-order traversal and operating on the
list in reverse, however this would occasionaly cause backedges for
loops to be visited before some of the other blocks in the loop.

We know use a reverse post-order traversal, which avoids this issue.

The reverse post-order traversal is not completely ideal, so we need
to manually fixup the list to ensure that inner loop backedges are
visited before outer loop backedges.

llvm-svn: 228186
2015-02-04 20:49:44 +00:00
Renato Golin
f8fd9bab9d Reverting VLD1/VST1 base-updating/post-incrementing combining
This reverts patches 223862, 224198, 224203, and 224754, which were all
related to the vector load/store combining and were reverted/reaplied
a few times due to the same alignment problems we're seeing now.

Further tests, mainly self-hosting Clang, will be needed to reapply this
patch in the future.

llvm-svn: 228129
2015-02-04 10:11:59 +00:00
Daniel Berlin
2d2eb452e9 Allow PRE to insert no-cost phi nodes
llvm-svn: 228024
2015-02-03 20:37:08 +00:00
Jingyue Wu
4e99b65428 Add straight-line strength reduction to LLVM
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,

LLVM unrolls

  #pragma unroll
  foo (int i = 0; i < 3; ++i) {
    sum += foo((b + i) * s);
  }

into

  sum += foo(b * s);
  sum += foo((b + 1) * s);
  sum += foo((b + 2) * s);

However, no optimizations yet reduce the internal redundancy of the three
expressions:

  b * s
  (b + 1) * s
  (b + 2) * s

With SLSR, LLVM can optimize these three expressions into:

  t1 = b * s
  t2 = t1 + s
  t3 = t2 + s

This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.

Test Plan: test/StraightLineStrengthReduce/slsr.ll

Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick

Reviewed By: jholewinski, atrick

Subscribers: karthikthecool, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7310

llvm-svn: 228016
2015-02-03 19:37:06 +00:00
Erik Eckstein
b53691cdaf Fix: SLPVectorizer crashes with assertion when vectorizing a cmp instruction.
The commit r225977 uncovered this bug. The problem was that the vectorizer tried to
read the second operand of an already deleted instruction.
The bug didn't show up before r225977 because the freed memory still contained a non-null pointer.
With r225977 deletion of instructions is delayed and the read operand pointer is always null.

llvm-svn: 227800
2015-02-02 12:45:34 +00:00
Chandler Carruth
e1550cbb3c [PM] Port SimplifyCFG to the new pass manager.
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.

llvm-svn: 227726
2015-02-01 11:34:21 +00:00
Chandler Carruth
b4f6fbea29 [PM] Port EarlyCSE to the new pass manager.
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.

llvm-svn: 227725
2015-02-01 10:51:23 +00:00
Adrian Prantl
94fa62f69f Inliner: Use replaceDbgDeclareForAlloca() instead of splicing the
instruction and generalize it to optionally dereference the variable.
Follow-up to r227544.

llvm-svn: 227604
2015-01-30 19:37:48 +00:00
Hao Liu
7f59f4df87 Move the target specific test case arbitrary-induction-step.ll to test/Transforms/LoopVectorize/AArch64 folder.
llvm-svn: 227561
2015-01-30 07:33:31 +00:00
Hao Liu
dd2f874770 [LoopVectorize] Induction variables: support arbitrary constant step.
Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support
arbitrary constant steps.
Initial patch by Alexey Volkov. Some bug fixes are added in the following version.

Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193

llvm-svn: 227557
2015-01-30 05:02:21 +00:00
Adrian Prantl
4ac268c18b Fix PR22386. The inliner moves static allocas to the entry basic block
so we need to move the dbg.declare intrinsics that describe them, too.

llvm-svn: 227544
2015-01-30 01:55:25 +00:00
Sanjay Patel
1040ef834a [GVN] don't propagate equality comparisons of FP zero (PR22376)
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.

This patch disallows propagating zero constants in equality comparisons. 
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376

Differential Revision: http://reviews.llvm.org/D7257

llvm-svn: 227491
2015-01-29 20:51:49 +00:00
Philip Reames
fe460d2612 Teach SplitBlockPredecessors how to handle landingpad blocks.
Patch by: Igor Laevsky <igor@azulsystems.com>

"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."

Differential Revision: http://reviews.llvm.org/D7157

llvm-svn: 227390
2015-01-28 23:06:47 +00:00
Michael Kuperstein
139e0bbb66 [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

llvm-svn: 227308
2015-01-28 14:08:22 +00:00
Elena Demikhovsky
e46025656d Fold fcmp in cases where value is provably non-negative. By Arch Robison.
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
 - the predicate is olt or uge
 - the first operand is provably not less than zero
 - the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..

http://reviews.llvm.org/D6972

llvm-svn: 227298
2015-01-28 08:03:58 +00:00
Reid Kleckner
031930cb0b Move EH personality type classification to Analysis/LibCallSemantics.h
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7214

llvm-svn: 227284
2015-01-28 01:17:38 +00:00
Ahmed Bougacha
63ac678e11 [SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk.
This was introduced in a faulty refactoring (r225640, mea culpa):
the tests weren't testing the return values, so, for both
__strcpy_chk and __stpcpy_chk, we would return the end of the
buffer (matching stpcpy) instead of the beginning (for strcpy).

The root cause was the prefix "__" being ignored when comparing,
which made us always pick LibFunc::stpcpy_chk.
Pass the LibFunc::Func directly to avoid this kind of error.
Also, make the testcases as explicit as possible to prevent this.

The now-useful testcases expose another, entangled, stpcpy problem,
with the further simplification.  This was introduced in a
refactoring (r225640) to match the original behavior.

However, this leads to problems when successive simplifications
generate several similar instructions, none of which are removed
by the custom replaceAllUsesWith.

For instance, InstCombine (the main user) doesn't erase the
instruction in its custom RAUW.  When trying to simplify say
__stpcpy_chk:
- first, an stpcpy is created (fortified simplifier),
- second, a memcpy is created (normal simplifier), but the
  stpcpy call isn't removed.
- third, InstCombine later revisits the instructions,
  and simplifies the first stpcpy to a memcpy.  We now have
  two memcpys.

llvm-svn: 227250
2015-01-27 21:52:16 +00:00
Sanjoy Das
a8dd5128cf Teach IRCE to look at branch weights when recognizing range checks
Splitting a loop to make range checks redundant is profitable only if
the range check "never" fails. Make this fact a part of recognizing a
range check -- a branch is a range check only if it is expected to
pass (via branch_weights metadata).

Differential Revision: http://reviews.llvm.org/D7192

llvm-svn: 227249
2015-01-27 21:38:12 +00:00
Andrea Di Biagio
a698cd7c36 [InstCombine] Teach how to fold a select into a cttz/ctlz with the 'is_zero_undef' flag.
This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by
a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared.

Added test InstCombine/select-cmp-cttz-ctlz.ll.

llvm-svn: 227197
2015-01-27 15:58:14 +00:00
David Majnemer
532fcd01e2 LoopRotate: Don't walk the uses of a Constant
LoopRotate wanted to avoid live range interference by looking at the
uses of a Value in the loop latch and seeing if any lied outside of the
loop.  We would wrongly perform this operation on Constants.

This fixes PR22337.

llvm-svn: 227171
2015-01-27 06:21:43 +00:00
Chad Rosier
f89b8ccace Commoning of target specific load/store intrinsics in Early CSE.
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!

llvm-svn: 227149
2015-01-26 22:51:15 +00:00
Philip Reames
2b71db1121 Add test cases for PRE w/volatile loads
These tests check that the combination of 227110 (cross block query inst) and 227112 (volatile load semantics) work together properly to allow PRE in cases where a loop contains a volatile access.

llvm-svn: 227146
2015-01-26 22:40:44 +00:00
Hans Wennborg
c53389f42a SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
The range check would get optimized away later, but we might as well not emit
them in the first place.

http://reviews.llvm.org/D6471

llvm-svn: 227126
2015-01-26 19:52:34 +00:00
Hans Wennborg
c27e49ae48 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.

Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.

Differential Revision: http://reviews.llvm.org/D6471

llvm-svn: 227125
2015-01-26 19:52:32 +00:00
Philip Reames
fe255a8e29 Refine memory dependence's notion of volatile semantics
According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile.

The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes.  I will be submitting a PRE w/volatiles test case seperately in the near future.

Differential Revision: http://reviews.llvm.org/D6901

llvm-svn: 227112
2015-01-26 18:54:27 +00:00
Philip Reames
5b68eeb27b Pass QueryInst down through non-local dependency calculation
This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block.

Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that.

Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself. 

Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well.

Differential Revision: http://reviews.llvm.org/D6895

llvm-svn: 227110
2015-01-26 18:39:52 +00:00
Erik Eckstein
351784e75f SLPVectorizer: fix wrong scheduling of atomic load/stores.
This fixes PR22306.

llvm-svn: 227077
2015-01-26 09:07:04 +00:00
Chandler Carruth
0c0e20141c [PM] Port LowerExpectIntrinsic to the new pass manager.
This just lifts the logic into a static helper function, sinks the
legacy pass to be a trivial wrapper of that helper fuction, and adds
a trivial wrapper for the new PM as well. Not much to see here.

I switched a test case to run in both modes, but we have to strip the
dead prototypes separately as that pass isn't in the new pass manager
(yet).

llvm-svn: 226999
2015-01-24 11:13:02 +00:00
Chandler Carruth
3f06e001ed [PM] Port instcombine to the new pass manager!
This is exciting as this is a much more involved port. This is
a complex, existing transformation pass. All of the core logic is shared
between both old and new pass managers. Only the access to the analyses
is separate because the actual techniques are separate. This also uses
a bunch of different and interesting analyses and is the first time
where we need to use an analysis across an IR layer.

This also paves the way to expose instcombine utility functions. I've
got a static function that implements the core pass logic over
a function which might be mildly interesting, but more interesting is
likely exposing a routine which just uses instructions *already in* the
worklist and combines until empty.

I've switched one of my favorite instcombine tests to run with both as
well to make sure this keeps working.

llvm-svn: 226987
2015-01-24 04:19:17 +00:00
Hans Wennborg
c2c8ff3fab LowerSwitch: replace unreachable default with popular case destination
SimplifyCFG currently does this transformation, but I'm planning to remove that
to allow other passes, such as this one, to exploit the unreachable default.

This patch takes care to keep track of what case values are unreachable even
after the transformation, allowing for more efficient lowering.

Differential Revision: http://reviews.llvm.org/D6697

llvm-svn: 226934
2015-01-23 20:43:51 +00:00
Reid Kleckner
ab481aa774 Revert "Don't remove a landing pad if the invoke requires a table entry."
This reverts commit r176827.

Björn Steinbrink pointed out that this didn't actually fix the bug
(PR15555) it was attempting to fix.

With this reverted, we can now remove landingpad cleanups that
immediately resume unwinding, converting the invoke to a call.

llvm-svn: 226850
2015-01-22 19:29:46 +00:00
Sanjoy Das
4e36ac5e45 Fix crashes in IRCE caused by mismatched types
There are places where the inductive range check elimination pass
depends on two llvm::Values or llvm::SCEVs to be of the same
llvm::Type when they do not need to be. This patch relaxes those
restrictions (by bailing out of the optimization if the types
mismatch), and adds test cases to trigger those paths.

These issues were found by bootstrapping clang with IRCE running in
the -O3 pass ordering.

Differential Revision: http://reviews.llvm.org/D7082

llvm-svn: 226793
2015-01-22 08:29:18 +00:00
Elena Demikhovsky
837f1e39e9 Fixed a bug in masked load/store in reversed loop.
Added a test.

The bug was submitted to bugzilla:
http://llvm.org/bugs/show_bug.cgi?id=22225

llvm-svn: 226791
2015-01-22 08:20:06 +00:00
Chandler Carruth
001ed18423 [canonicalize] Teach InstCombine to canonicalize loads which are only
ever stored to always use a legal integer type if one is available.

Regardless of whether this particular type is good or bad, it ensures we
don't get weird differences in generated code (and resulting
performance) from "equivalent" patterns that happen to end up using
a slightly different type.

After some discussion on llvmdev it seems everyone generally likes this
canonicalization. However, there may be some parts of LLVM that handle
it poorly and need to be fixed. I have at least verified that this
doesn't impede GVN and instcombine's store-to-load forwarding powers in
any obvious cases. Subtle cases are exactly what we need te flush out if
they remain.

Also note that this IR pattern should already be hitting LLVM from Clang
at least because it is exactly the IR which would be produced if you
used memcpy to copy a pointer or floating point between memory instead
of a variable.

llvm-svn: 226781
2015-01-22 05:08:12 +00:00
David Blaikie
5ab2c04898 DebugInfo: Use distinct inlinedAt MDLocations to avoid separate inlined calls being coalesced
When two calls from the same MDLocation are inlined they currently get
treated as one inlined function call (creating difficulty debugging,
duplicate variables, etc).

Clang worked around this by including column information on inline calls
which doesn't address LTO inlining or calls to the same function from
the same line and column (such as through a macro). It also didn't
address ctor and member function calls.

By making the inlinedAt locations distinct, every call site has an
explicitly distinct location that cannot be coalesced with any other
call.

This can produce linearly (2x in the worst case where every call is
inlined and the call instruction has a non-call instruction at the same
location) more debug locations. Any increase beyond that are in cases
where the Clang workaround was insufficient and the new scheme is
creating necessary distinct nodes that were being erroneously coalesced
previously.

After this change to LLVM the incomplete workarounds in Clang. That
should reduce the number of debug locations (in a build without column
info, the default on Darwin, not the default on Linux) by not creating
pseudo-distinct locations for every call to an inline function.

(oh, and I made the inlined-at chain rebuilding iterative instead of
recursive because I was having trouble wrapping my head around it the
way it was - open to discussion on the right design for that function
(including going back to a recursive solution))

llvm-svn: 226736
2015-01-21 22:57:29 +00:00
David Majnemer
f4834fe3ed InstCombine: Don't strip bitcasts off of callsites marked 'thunk'
The return type of a thunk is meaningless, we just want the arguments
and return value to be forwarded.

llvm-svn: 226708
2015-01-21 22:32:04 +00:00
Alexander Potapenko
127e7cced0 Use a smaller pragma unroll threshold to reduce test execution time.
When opt is compiled with AddressSanitizer it takes more than 30 seconds
to unroll the loop in unroll_1M().

llvm-svn: 226660
2015-01-21 13:52:02 +00:00
Karthik Bhat
8fee134b62 Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain.
This patch fixes 2 issues in reorderInputsAccordingToOpcode
1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases.
2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode.
Thanks Michael for inputs and review.
Review: http://reviews.llvm.org/D6677

llvm-svn: 226547
2015-01-20 06:11:00 +00:00
Mehdi Amini
a1e86a9849 Fix Reassociate handling of constant in presence of undef float
http://reviews.llvm.org/D6993

llvm-svn: 226245
2015-01-16 03:00:58 +00:00
Sanjoy Das
8ce28789d0 Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

This pass was originally r226201.  It was reverted because it used C++
features not supported by MSVC 2012.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226238
2015-01-16 01:03:22 +00:00
Sanjoy Das
618e939258 Revert r226201 (Add a new pass "inductive range check elimination")
The change used C++11 features not supported by MSVC 2012.  I will fix
the change to use things supported MSVC 2012 and recommit shortly.

llvm-svn: 226216
2015-01-15 22:18:10 +00:00
Sanjoy Das
a7eb1a0b3d Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226201
2015-01-15 20:45:46 +00:00
Sanjoy Das
4b5a4b2975 Fix PR22222
The bug was introduced in r225282. r225282 assumed that sub X, Y is
the same as add X, -Y. This is not correct if we are going to upgrade
the sub to sub nuw. This change fixes the issue by making the
optimization ignore sub instructions.

Differential Revision: http://reviews.llvm.org/D6979

llvm-svn: 226075
2015-01-15 01:46:09 +00:00
Richard Smith
cd37c22fb7 For PR21145: recognise a builtin call to a known deallocation function even if
it's defined in the current module. Clang generates this situation for the
C++14 sized deallocation functions, because it generates a weak definition in
case one isn't provided by the C++ runtime library.

llvm-svn: 226069
2015-01-15 01:00:33 +00:00
Ramkumar Ramachandra
3cbd3c16ec [GC] CodeGenPrep transform: simplify offsetable relocate
The transform is somewhat involved, but the basic idea is simple: find
derived pointers that have been offset from the base pointer using gep
and replace the relocate of the derived pointer with a gep to the
relocated base pointer (with the same offset).

llvm-svn: 226060
2015-01-14 23:27:07 +00:00
Duncan P. N. Exon Smith
4a5feedcaa IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

llvm-svn: 226048
2015-01-14 22:27:36 +00:00
David Majnemer
b89ab3fd88 InstCombine: Don't take A-B<0 into A<B if A-B has other uses
This fixes PR22226.

llvm-svn: 226023
2015-01-14 19:26:56 +00:00
Ahmed Bougacha
284d7745d1 [SimplifyLibCalls] Don't try to simplify indirect calls.
It turns out, all callsites of the simplifier are guarded by a check for
CallInst::getCalledFunction (i.e., to make sure the callee is direct).

This check wasn't done when trying to further optimize a simplified fortified
libcall, introduced by a refactoring in r225640.

Fix that, add a testcase, and document the requirement.

llvm-svn: 225895
2015-01-14 00:55:05 +00:00
Sanjay Patel
ed6b379845 GVN: propagate equalities for floating point compares
Allow optimizations based on FP comparison values in the same way
as integers. 

This resolves PR17713:
http://llvm.org/bugs/show_bug.cgi?id=17713

Differential Revision: http://reviews.llvm.org/D6911

llvm-svn: 225660
2015-01-12 19:29:48 +00:00
Hal Finkel
c563f95905 [PowerPC] Readjust the loop unrolling threshold
Now that the way that the partial unrolling threshold for small loops is used
to compute the unrolling factor as been corrected, a slightly smaller threshold
is preferable. This is expected; other targets may need to re-tune as well.

llvm-svn: 225566
2015-01-10 00:31:10 +00:00
Hal Finkel
977fb5e0c3 [LoopUnroll] Fix the partial unrolling threshold for small loop sizes
When we compute the size of a loop, we include the branch on the backedge and
the comparison feeding the conditional branch. Under normal circumstances,
these don't get replicated with the rest of the loop body when we unroll. This
led to the somewhat surprising behavior that really small loops would not get
unrolled enough -- they could be unrolled more and the resulting loop would be
below the threshold, because we were assuming they'd take
(LoopSize * UnrollingFactor) instructions after unrolling, instead of
(((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.

llvm-svn: 225565
2015-01-10 00:30:55 +00:00
Hans Wennborg
5ee44a1865 SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210)
The previous code assumed that such instructions could not have any uses
outside CaseDest, with the motivation that the instruction could not
dominate CommonDest because CommonDest has phi nodes in it. That simply
isn't true; e.g., CommonDest could have an edge back to itself.

llvm-svn: 225552
2015-01-09 22:13:31 +00:00
Tim Northover
ef7d18507b Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

It's not really expected to stick around, last time it provoked a weird LTO
build failure that I can't reproduce now, and the bot logs are long gone. I'll
re-revert it if the failures recur.

Original description: Perform Scalar PRE on gep indices that feed loads before
doing Load PRE.

llvm-svn: 225536
2015-01-09 19:19:56 +00:00
Hal Finkel
556331a037 [PowerPC] Enable late partial unrolling on the POWER7
The P7 benefits from not have really-small loops so that we either have
multiple dispatch groups in the loop and/or the ability to form more-full
dispatch groups during scheduling. Setting the partial unrolling threshold to
44 seems good, empirically, for the P7. Compared to using no late partial
unrolling, this yields the following test-suite speedups:

SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding
	-66.3253% +/- 24.1975%
SingleSource/Benchmarks/Misc-C++/oopack_v1p8
	-44.0169% +/- 29.4881%
SingleSource/Benchmarks/Misc/pi
	-27.8351% +/- 12.2712%
SingleSource/Benchmarks/Stanford/Bubblesort
	-30.9898% +/- 22.4647%

I've speculatively added a similar setting for the P8. Also, I've noticed that
the unroller does not quite calculate the unrolling factor correctly for really
tiny loops because it neglects to account for the fact that not every loop body
replicant contains an ending branch and counter increment. I'll fix that later.

llvm-svn: 225522
2015-01-09 15:51:16 +00:00
Duncan P. N. Exon Smith
bc9ee9160a IR: Add 'distinct' MDNodes to bitcode and assembly
Propagate whether `MDNode`s are 'distinct' through the other types of IR
(assembly and bitcode).  This adds the `distinct` keyword to assembly.

Currently, no one actually calls `MDNode::getDistinct()`, so these nodes
only get created for:

  - self-references, which are never uniqued, and
  - nodes whose operands are replaced that hit a uniquing collision.

The concept of distinct nodes is still not quite first-class, since
distinct-ness doesn't yet survive across `MapMetadata()`.

Part of PR22111.

llvm-svn: 225474
2015-01-08 22:38:29 +00:00
Matt Arsenault
4b66850c40 Fix fcmp + fabs instcombines when using the intrinsic
This was only handling the libcall. This is another example
of why only the intrinsic should ever be used when it exists.

llvm-svn: 225465
2015-01-08 20:09:34 +00:00
Matt Arsenault
0d86cea633 Fix using wrong intrinsic in test
This is a leftover from renaming the intrinsic.
It's surprising the unknown llvm. intrinsic wasn't rejected.

llvm-svn: 225304
2015-01-06 23:00:33 +00:00
Rafael Espindola
20dc6c7571 Change the .ll syntax for comdats and add a syntactic sugar.
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.

Just dropping the $name causes problems for

@foo = globabl i32 0, comdat
$bar = comdat ...

and

declare void @foo() comdat
$bar = comdat ...

So the syntax is changed to

@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat

and

declare void @foo() comdat($c1)
declare void @foo() comdat

llvm-svn: 225302
2015-01-06 22:55:16 +00:00
Sanjoy Das
d42d2637e6 This patch teaches IndVarSimplify to add nuw and nsw to certain kinds
of operations that provably don't overflow. For example, we can prove
%civ.inc below does not sign-overflow. With this change,
IndVarSimplify changes %civ.inc to an add nsw.

  define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
   entry:
    %length = load i32* %length_ptr, !range !0
    %len.sub.1 = sub i32 %length, 1
    %upper = icmp slt i32 %init, %len.sub.1
    br i1 %upper, label %loop, label %exit
  
   loop:
    %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
    %civ.inc = add i32 %civ, 1
    %cmp = icmp slt i32 %civ.inc, %length
    br i1 %cmp, label %latch, label %break
  
   latch:
    store i32 0, i32* %array
    %check = icmp slt i32 %civ.inc, %len.sub.1
    br i1 %check, label %loop, label %break
  
   break:
    ret i32 %civ.inc
  
   exit:
    ret i32 42
  }

Differential Revision: http://reviews.llvm.org/D6748

llvm-svn: 225282
2015-01-06 19:02:56 +00:00
Matt Arsenault
416be52fbf Convert fcmp with 0.0 from casted integers to icmp
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.

This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.

Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.

Also fold cases that aren't integers to true / false.

llvm-svn: 225265
2015-01-06 15:50:59 +00:00
David Majnemer
82fa22459b InstCombine: Bitcast call arguments from/to pointer/integer type
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.

llvm-svn: 225254
2015-01-06 08:41:31 +00:00
Michael Kuperstein
6483dfa9b1 Fix broken test from r225159.
llvm-svn: 225164
2015-01-05 12:34:01 +00:00
Jiangning Liu
3fc6a7e69d Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm.
{code}
// loop body
   ... = a[i]          (1)
    ... = a[i+1]       (2)
 .......
a[i+1] = ....          (3)
   a[i] = ...          (4)
{code}

The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed.

For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized.

The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly.

llvm-svn: 225159
2015-01-05 10:08:58 +00:00
Chandler Carruth
2015da7ed2 [SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an
assert out of the new pre-splitting in SROA.

This fix makes the code do what was originally intended -- when we have
a store of a load both dealing in the same alloca, we force them to both
be pre-split with identical offsets. This is really quite hard to do
because we can keep discovering problems as we go along. We have to
track every load over the current alloca which for any resaon becomes
invalid for pre-splitting, and go back to remove all stores of those
loads. I've included a couple of test cases derived from PR22093 that
cover the different ways this can happen. While that PR only really
triggered the first of these two, its the same fundamental issue.

The other challenge here is documented in a FIXME now. We end up being
quite a bit more aggressive for pre-splitting when loads and stores
don't refer to the same alloca. This aggressiveness comes at the cost of
introducing potentially redundant loads. It isn't clear that this is the
right balance. It might be considerably better to require that we only
do pre-splitting when we can presplit every load and store involved in
the entire operation. That would give more consistent if conservative
results. Unfortunately, it requires a non-trivial change to the actual
pre-splitting operation in order to correctly handle cases where we end
up pre-splitting stores out-of-order. And it isn't 100% clear that this
is the right direction, although I'm starting to suspect that it is.

llvm-svn: 225149
2015-01-05 04:17:53 +00:00
David Majnemer
04c9e5e52d InstCombine: match can find ConstantExprs, don't assume we have a Value
We assumed the output of a match was a Value, this would cause us to
assert because we would fail a cast<>.  Instead, use a helper in the
Operator family to hide the distinction between Value and Constant.

This fixes PR22087.

llvm-svn: 225127
2015-01-04 07:36:02 +00:00
David Majnemer
8ed72d8999 ValueTracking: ComputeNumSignBits should tolerate misshapen phi nodes
PHI nodes can have zero operands in the middle of a transform.  It is
expected that utilities in Analysis don't freak out when this happens.

Note that it is considered invalid to allow these misshapen phi nodes to
make it to another pass.

This fixes PR22086.

llvm-svn: 225126
2015-01-04 07:06:53 +00:00
David Majnemer
78198d7245 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

llvm-svn: 225077
2015-01-02 07:29:47 +00:00
Chandler Carruth
0f3b8d1a20 [SROA] Teach SROA to be more aggressive in splitting now that we have
a pre-splitting pass over loads and stores.

Historically, splitting could cause enough problems that I hamstrung the
entire process with a requirement that splittable integer loads and
stores must cover the entire alloca. All smaller loads and stores were
unsplittable to prevent chaos from ensuing. With the new pre-splitting
logic that does load/store pair splitting I introduced in r225061, we
can now very nicely handle arbitrarily splittable loads and stores. In
order to fully benefit from these smarts, we need to mark all of the
integer loads and stores as splittable.

However, we don't actually want to rewrite partitions with all integer
loads and stores marked as splittable. This will fail to extract scalar
integers from aggregates, which is kind of the point of SROA. =] In
order to resolve this, what we really want to do is only do
pre-splitting on the alloca slices with integer loads and stores fully
splittable. This allows us to uncover all non-integer uses of the alloca
that would benefit from a split in an integer load or store (and where
introducing the split is safe because it is just memory transfer from
a load to a store). Once done, we make all the non-whole-alloca integer
loads and stores unsplittable just as they have historically been,
repartition and rewrite.

The result is that when there are integer loads and stores anywhere
within an alloca (such as from a memcpy of a sub-object of a larger
object), we can split them up if there are non-integer components to the
aggregate hiding beneath. I've added the challenging test cases to
demonstrate how this is able to promote to scalars even a case where we
have even *partially* overlapping loads and stores.

This restores the single-store behavior for small arrays of i8s which is
really nice. I've restored both the little endian testing and big endian
testing for these exactly as they were prior to r225061. It also forced
me to be more aggressive in an alignment test to actually defeat SROA.
=] Without the added volatiles there, we actually split up the weird i16
loads and produce nice double allocas with better alignment.

This also uncovered a number of bugs where we failed to handle
splittable load and store slices which didn't have a begininng offset of
zero. Those fixes are included, and without them the existing test cases
explode in glorious fireworks. =]

I've kept support for leaving whole-alloca integer loads and stores as
splittable even for the purpose of rewriting, but I think that's likely
no longer needed. With the new pre-splitting, we might be able to remove
all the splitting support for loads and stores from the rewriter. Not
doing that in this patch to try to isolate any performance regressions
that causes in an easy to find and revert chunk.

llvm-svn: 225074
2015-01-02 03:55:54 +00:00
Chandler Carruth
88543d8f23 [SROA] Add a test case for r225068 / PR22080.
llvm-svn: 225070
2015-01-02 00:34:29 +00:00
Chandler Carruth
e46230af0c [SROA] Teach SROA how to much more intelligently handle split loads and
stores.

When there are accesses to an entire alloca with an integer
load or store as well as accesses to small pieces of the alloca, SROA
splits up the large integer accesses. In order to do that, it uses bit
math to merge the small accesses into large integers. While this is
effective, it produces insane IR that can cause significant problems in
the rest of the optimizer:

- It can cause load and store mismatches with GVN on the non-alloca side
  where we end up loading an i64 (or some such) rather than loading
  specific elements that are stored.
- We can't always get rid of the integer bit math, which is why we can't
  always fix the loads and stores to work well with GVN.
- This is especially bad when we have operations that mix poorly with
  integer bit math such as floating point operations.
- It will block things like the vectorizer which might be able to handle
  the scalar stores that underly the aggregate.

At the same time, we can't just directly split up these loads and stores
in all cases. If there is actual integer arithmetic involved on the
values, then using integer bit math is actually the perfect lowering
because we can often combine it heavily with the surrounding math.

The solution this patch provides is to find places where SROA is
partitioning aggregates into small elements, and look for splittable
loads and stores that it can split all the way to some other adjacent
load and store. These are uniformly the cases where failing to split the
loads and stores hurts the optimizer that I have seen, and I've looked
extensively at the code produced both from more and less aggressive
approaches to this problem.

However, it is quite tricky to actually do this in SROA. We may have
loads and stores to the same alloca, or other complex patterns that are
hard to handle. This complexity leads to the somewhat subtle algorithm
implemented here. We have to do this entire process as a separate pass
over the partitioning of the alloca, and split up all of the loads prior
to splitting the stores so that we can handle safely the cases of
overlapping, including partially overlapping, loads and stores to the
same alloca. We also have to reconstitute the post-split slice
configuration so we can avoid iterating again over all the alloca uses
(the slow part of SROA). But we also have to ensure that when we split
up loads and stores to *other* allocas, we *do* re-iterate over them in
SROA to adapt to the more refined partitioning now required.

With this, I actually think we can fix a long-standing TODO in SROA
where I avoided splitting as many loads and stores as probably should be
splittable. This limitation historically mitigated the fallout of all
the bad things mentioned above. Now that we have more intelligent
handling, I plan to remove the FIXME and more aggressively mark integer
loads and stores as splittable. I'll do that in a follow-up patch to
help with bisecting any fallout.

The net result of this change should be more fine-grained and accurate
scalars being formed out of aggregates. At the very least, Clang now
generates perfect code for this high-level test case using
std::complex<float>:

  #include <complex>

  void g1(std::complex<float> &x, float a, float b) {
    x += std::complex<float>(a, b);
  }
  void g2(std::complex<float> &x, float a, float b) {
    x -= std::complex<float>(a, b);
  }

  void foo(const std::complex<float> &x, float a, float b,
           std::complex<float> &x1, std::complex<float> &x2) {
    std::complex<float> l1 = x;
    g1(l1, a, b);
    std::complex<float> l2 = x;
    g2(l2, a, b);
    x1 = l1;
    x2 = l2;
  }

This code isn't just hypothetical either. It was reduced out of the hot
inner loops of essentially every part of the Eigen math library when
using std::complex<float>. Those loops would consistently and
pervasively hop between the floating point unit and the integer unit due
to bit math extraction and insertion of floating point values that were
"stored" in a 64-bit integer register around the loop backedge.

So far, this change has passed a bootstrap and I have done some other
testing and so far, no issues. That doesn't mean there won't be though,
so I'll be prepared to help with any fallout. If you performance swings
in particular, please let me know. I'm very curious what all the impact
of this change will be. Stay tuned for the follow-up to also split more
integer loads and stores.

llvm-svn: 225061
2015-01-01 11:54:38 +00:00
Sanjay Patel
657e61ccbf InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731

llvm-svn: 225050
2014-12-31 22:14:05 +00:00
David Majnemer
83939d3744 InstCombine: try to transform A-B < 0 into A < B
We are allowed to move the 'B' to the right hand side if we an prove
there is no signed overflow and if the comparison itself is signed.

llvm-svn: 225034
2014-12-31 04:21:41 +00:00
Philip Reames
4527f27ca2 Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Philip Reames
10eda0ad7d Refine the notion of MayThrow in LICM to include a header specific version
In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.

This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.

define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
; CHECK-LABEL: nothrow_header
; CHECK-LABEL: entry
; CHECK: %div = udiv i64 %x, %y
; CHECK-LABEL: loop
; CHECK: call void @use(i64 %div)
entry:
  br label %loop
loop: ; preds = %entry, %for.inc
  %div = udiv i64 %x, %y
  br i1 %cond, label %loop-if, label %exit
loop-if:
  call void @use(i64 %div)
  br label %loop
exit:
  ret void
}

The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.

Differential Revision: http://reviews.llvm.org/D6725

llvm-svn: 224965
2014-12-29 23:00:57 +00:00
Philip Reames
a4f427e6e7 Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.

llvm-svn: 224961
2014-12-29 22:46:21 +00:00
David Majnemer
c57dbde8fa InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

llvm-svn: 224849
2014-12-26 09:50:35 +00:00
David Majnemer
ce5bd510cb InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

llvm-svn: 224847
2014-12-26 09:10:14 +00:00
Michael Kuperstein
180649bb38 [ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits()
GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. 

This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects.

Differential Revision: http://reviews.llvm.org/D6758

llvm-svn: 224765
2014-12-23 11:33:41 +00:00
Michael Liao
e7760832e1 [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.

llvm-svn: 224757
2014-12-23 08:26:55 +00:00
Bruno Cardoso Lopes
c8d20ce475 [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

llvm-svn: 224740
2014-12-22 22:35:46 +00:00
David Majnemer
ee1d92da94 This should have been part of r224676.
llvm-svn: 224677
2014-12-20 04:48:34 +00:00
David Majnemer
3da9d34415 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
David Majnemer
e2cf8f21e0 InstSimplify: Optimize away pointless comparisons
(X & INT_MIN) ? X & INT_MAX : X  into  X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX  into  X
(X & INT_MIN) ? X | INT_MIN : X  into  X
(X & INT_MIN) ? X : X | INT_MIN  into  X | INT_MIN

llvm-svn: 224669
2014-12-20 03:04:38 +00:00
Bruno Cardoso Lopes
b25873afd1 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Sanjay Patel
b1bb7100db use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes
ba090bc5c2 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes
afb4b15ab3 [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
David Majnemer
273acc7601 ConstantFold: Shifting undef by zero results in undef
llvm-svn: 224553
2014-12-18 23:54:43 +00:00
Suyog Sarda
ea83428380 Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it." 

This was re-ordering floating point data types resulting in mismatch in output.

llvm-svn: 224424
2014-12-17 10:34:27 +00:00
Elena Demikhovsky
537a8ee728 Added 5 more tests related to sink store revision 224247
- by Ella Bolshinsky

http://reviews.llvm.org/D6420

llvm-svn: 224418
2014-12-17 08:12:59 +00:00
Erik Eckstein
042c032147 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
David Majnemer
fe299df41a InstSimplify: shl nsw/nuw undef, %V -> undef
We can always choose an value for undef which might cause %V to shift
out an important bit except for one case, when %V is zero.

However, shl behaves like an identity function when the right hand side
is zero.

llvm-svn: 224405
2014-12-17 01:54:33 +00:00