1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

11247 Commits

Author SHA1 Message Date
Owen Anderson
8750294bae Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep).
llvm-svn: 199528
2014-01-18 00:48:14 +00:00
Kostya Serebryany
88b5111b60 [asan] extend asan-coverage (still experimental).
- add a mode for collecting per-block coverage (-asan-coverage=2).
   So far the implementation is naive (all blocks are instrumented),
   the performance overhead on top of asan could be as high as 30%.
 - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
   which in turn required to copy the original debug info into the call insn.

Here is the performance data on SPEC 2006
(train data, comparing asan with asan-coverage={0,1,2}):

                             asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
       400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
           401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
             403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
             429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
           445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
           456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
           458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
      462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
         464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
         471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
           473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
       483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
            433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
            444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
          447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
          450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
          453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
             470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
         482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06

llvm-svn: 199488
2014-01-17 11:00:30 +00:00
Quentin Colombet
b42dbc5117 [opt][PassInfo] Allow opt to run passes that need target machine.
When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.

Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).

Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll

llvm-svn: 199430
2014-01-16 21:44:34 +00:00
Owen Anderson
9c1a615059 Fix two cases where we could lose fast math flags when optimizing FADD expressions.
llvm-svn: 199427
2014-01-16 21:26:02 +00:00
Owen Anderson
dbdd830886 Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation.
llvm-svn: 199425
2014-01-16 21:07:52 +00:00
Owen Anderson
2c40c9a6c0 Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression.
llvm-svn: 199424
2014-01-16 20:59:41 +00:00
Owen Anderson
a218b5b798 Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X).
llvm-svn: 199420
2014-01-16 20:36:42 +00:00
Evgeniy Stepanov
5b1a672532 [asan] Remove -fsanitize-address-zero-base-shadow command line
flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.

- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.

This is the LLVM part of a larger change.

llvm-svn: 199371
2014-01-16 10:19:12 +00:00
Hans Wennborg
efa9ef0e63 Switch-to-lookup tables: set threshold to 3 cases
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.

The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.

In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.

llvm-svn: 199294
2014-01-15 05:00:27 +00:00
Arnold Schwaighofer
9fb94754bd LoopVectorize: Only strip casts from integer types when replacing symbolic
strides

Fixes PR18480.

llvm-svn: 199291
2014-01-15 03:35:46 +00:00
Matt Arsenault
babc737d7b Do pointer cast simplifications on addrspacecast
llvm-svn: 199254
2014-01-14 20:00:45 +00:00
Matt Arsenault
a5adc47c53 Remove a check for an illegal condition.
Bitcasts can't be between address spaces anymore.

llvm-svn: 199253
2014-01-14 19:56:57 +00:00
Matt Arsenault
50ba8b89a7 Make nocapture analysis work with addrspacecast
llvm-svn: 199246
2014-01-14 19:11:52 +00:00
Duncan P. N. Exon Smith
bb847bd59e Reapply "LTO: add API to set strategy for -internalize"
Reapply r199191, reverted in r199197 because it carelessly broke
Other/link-opts.ll.  The problem was that calling
createInternalizePass("main") would select
createInternalizePass(bool("main")) instead of
createInternalizePass(ArrayRef<const char *>("main")).  This commit
fixes the bug.

The original commit message follows.

Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199244
2014-01-14 18:52:17 +00:00
Nico Rieck
964a13bb4e Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199218
2014-01-14 15:22:47 +00:00
Nico Rieck
e8a579c6bc Revert "Decouple dllexport/dllimport from linkage"
Revert this for now until I fix an issue in Clang with it.

This reverts commit r199204.

llvm-svn: 199207
2014-01-14 12:38:32 +00:00
Nico Rieck
6203d44313 Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199204
2014-01-14 11:55:03 +00:00
NAKAMURA Takumi
068c8352f7 Revert r199191, "LTO: add API to set strategy for -internalize"
Please update also Other/link-opts.ll, in next time.

llvm-svn: 199197
2014-01-14 09:40:18 +00:00
Duncan P. N. Exon Smith
95dadb39e4 LTO: add API to set strategy for -internalize
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199191
2014-01-14 06:37:26 +00:00
Chandler Carruth
98adff6224 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth
59e885531a [PM] Pull the generic graph algorithms and data structures for dominator
trees into the Support library.

These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.

So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.

The yak shaving will continue...

llvm-svn: 199095
2014-01-13 10:52:56 +00:00
Chandler Carruth
ee051af6e2 [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Chandler Carruth
03b6c941a3 Re-sort #include lines again, prior to moving headers around.
llvm-svn: 199080
2014-01-13 08:04:33 +00:00
Hans Wennborg
f5c5f6e123 Switch-to-lookup tables: Don't require a result for the default
case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025
2014-01-12 00:44:41 +00:00
Arnold Schwaighofer
15e9d90974 LoopVectorizer: Enable strided memory accesses versioning per default
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
2014-01-11 20:40:34 +00:00
NAKAMURA Takumi
fbff75f61d LoopVectorize.cpp: Appease MSC16.
Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001
2014-01-11 09:59:27 +00:00
Diego Novillo
f47aa4d47f Extend and simplify the sample profile input file.
1- Use the line_iterator class to read profile files.

2- Allow comments in profile file. Lines starting with '#'
   are completely ignored while reading the profile.

3- Add parsing support for discriminators and indirect call samples.

   Our external profiler can emit more profile information that we are
   currently not handling. This patch does not add new functionality to
   support this information, but it allows profile files to provide it.

   I will add actual support later on (for at least one of these
   features, I need support for DWARF discriminators in Clang).

   A sample line may contain the following additional information:

   Discriminator. This is used if the sampled program was compiled with
   DWARF discriminator support
   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
   is currently only emitted by GCC and we just ignore it.

   Potential call targets and samples. If present, this line contains a
   call instruction. This models both direct and indirect calls. Each
   called target is listed together with the number of samples. For
   example,

                    130: 7  foo:3  bar:2  baz:7

   The above means that at relative line offset 130 there is a call
   instruction that calls one of foo(), bar() and baz(). With baz()
   being the relatively more frequent call target.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2355

4- Simplify format of profile input file.

   This implements earlier suggestions to simplify the format of the
   sample profile file. The symbol table is not necessary and function
   profiles do not need to know the number of samples in advance.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2419

llvm-svn: 198973
2014-01-10 23:23:51 +00:00
Diego Novillo
9e8454b3fe Propagation of profile samples through the CFG.
This adds a propagation heuristic to convert instruction samples
into branch weights. It implements a similar heuristic to the one
implemented by Dehao Chen on GCC.

The propagation proceeds in 3 phases:

1- Assignment of block weights. All the basic blocks in the function
   are initial assigned the same weight as their most frequently
   executed instruction.

2- Creation of equivalence classes. Since samples may be missing from
   blocks, we can fill in the gaps by setting the weights of all the
   blocks in the same equivalence class to the same weight. To compute
   the concept of equivalence, we use dominance and loop information.
   Two blocks B1 and B2 are in the same equivalence class if B1
   dominates B2, B2 post-dominates B1 and both are in the same loop.

3- Propagation of block weights into edges. This uses a simple
   propagation heuristic. The following rules are applied to every
   block B in the CFG:

   - If B has a single predecessor/successor, then the weight
     of that edge is the weight of the block.

   - If all the edges are known except one, and the weight of the
     block is already known, the weight of the unknown edge will
     be the weight of the block minus the sum of all the known
     edges. If the sum of all the known edges is larger than B's weight,
     we set the unknown edge weight to zero.

   - If there is a self-referential edge, and the weight of the block is
     known, the weight for that edge is set to the weight of the block
     minus the weight of the other incoming edges to that block (if
     known).

Since this propagation is not guaranteed to finalize for every CFG, we
only allow it to proceed for a limited number of iterations (controlled
by -sample-profile-max-propagate-iterations). It currently uses the same
GCC default of 100.

Before propagation starts, the pass builds (for each block) a list of
unique predecessors and successors. This is necessary to handle
identical edges in multiway branches. Since we visit all blocks and all
edges of the CFG, it is cleaner to build these lists once at the start
of the pass.

Finally, the patch fixes the computation of relative line locations.
The profiler emits lines relative to the function header. To discover
it, we traverse the compilation unit looking for the subprogram
corresponding to the function. The line number of that subprogram is the
line where the function begins. That becomes line zero for all the
relative locations.

llvm-svn: 198972
2014-01-10 23:23:46 +00:00
Arnold Schwaighofer
702d83d3d8 LoopVectorizer: Handle strided memory accesses by versioning
for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

llvm-svn: 198950
2014-01-10 18:20:32 +00:00
Chandler Carruth
53468087f3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Hao Liu
8c08e05c81 Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine.
llvm-svn: 198730
2014-01-08 03:06:15 +00:00
Chandler Carruth
7aa902a488 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Chandler Carruth
87f14b4eec Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Andrew Trick
bb6ce38639 Reapply r198654 "indvars: sink truncates outside the loop."
This doesn't seem to have actually broken anything. It was paranoia
on my part. Trying again now that bots are more stable.

This is a follow up of the r198338 commit that added truncates for
lcssa phi nodes. Sinking the truncates below the phis cleans up the
loop and simplifies subsequent analysis within the indvars pass.

llvm-svn: 198678
2014-01-07 06:59:12 +00:00
Andrew Trick
6d854ef50f Revert "indvars: sink truncates outside the loop."
This reverts commit r198654.

One of the bots reported a SciMark failure.

llvm-svn: 198659
2014-01-07 01:50:58 +00:00
Andrew Trick
7621f7c6a3 indvars: sink truncates outside the loop.
This is a follow up of the r198338 commit that added truncates for
lcssa phi nodes. Sinking the truncates below the phis cleans up the
loop and simplifies subsequent analysis within the indvars pass.

llvm-svn: 198654
2014-01-07 01:02:55 +00:00
Andrew Trick
7236fefab6 80 col. comment.
llvm-svn: 198653
2014-01-07 01:02:52 +00:00
Andrew Trick
12dfc32452 Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things."
Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted.

We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted.

llvm-svn: 198631
2014-01-06 19:43:14 +00:00
Alp Toker
b20c031b7a Add missed cleanup from r198456
All other uses of this macro in LLVM/clang have been moved to the function
definition so follow suite (and the usage advice) here too for consistency.

llvm-svn: 198516
2014-01-04 22:47:48 +00:00
Alp Toker
2d17611e90 Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things."
This commit was the source of crasher PR18384:

While deleting: label %for.cond127
An asserting value handle still pointed to this value!
UNREACHABLE executed at llvm/lib/IR/Value.cpp:671!

Reverting to get the builders green, feel free to re-land after fixing up.
(Renato has a handy isolated repro if you need it.)

This reverts commit r198478.

llvm-svn: 198503
2014-01-04 17:00:45 +00:00
Andrew Trick
45ef495b91 Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things.
getSCEV for an ashr instruction creates an intermediate zext
expression when it truncates its operand.

The operand is initially inside the loop, so the narrow zext
expression has a non-loop-invariant loop disposition.

LoopSimplify then runs on an outer loop, hoists the ashr operand, and
properly invalidate the SCEVs that are mapped to value.

The SCEV expression for the ashr is now an AddRec with the hoisted
value as the now loop-invariant start value.

The LoopDisposition of this wide value was properly invalidated during
LoopSimplify.

However, if we later get the ashr SCEV again, we again try to create
the intermediate zext expression. We get the same SCEV that we did
earlier, and it is still cached because it was never mapped to a
Value. When we try to create a new AddRec we abort because we're using
the old non-loop-invariant LoopDisposition.

I don't have a solution for this other than to clear LoopDisposition
when LoopSimplify hoists things.

I think the long-term strategy should be to perform LoopSimplify on
all loops before computing SCEV and before running any loop opts on
individual loops. It's possible we may want to rerun LoopSimplify on
individual loops, but it should rarely do anything, so rarely require
invalidating SCEV.

llvm-svn: 198478
2014-01-04 05:52:49 +00:00
Nico Weber
7e53ec0698 Add a LLVM_DUMP_METHOD macro.
The motivation is to mark dump methods as used in debug builds so that they can
be called from lldb, but to not do so in release builds so that they can be
dead-stripped.

There's lots of potential follow-up work suggested in the thread
"Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
but everyone seems to agreen on this subset.

Macro name chosen by fair coin toss.

llvm-svn: 198456
2014-01-03 22:53:37 +00:00
David Peixotto
2028917754 Fix loop rerolling pass failure with non-consant loop lower bound
The loop rerolling pass was failing with an assertion failure from a
failed cast on loops like this:

  void foo(int *A, int *B, int m, int n) {
    for (int i = m; i < n; i+=4) {
      A[i+0] = B[i+0] * 4;
      A[i+1] = B[i+1] * 4;
      A[i+2] = B[i+2] * 4;
      A[i+3] = B[i+3] * 4;
    }
  }

The code was casting the SCEV-expanded code for the new
induction variable to a phi-node. When the loop had a non-constant
lower bound, the SCEV expander would end the code expansion with an
add insted of a phi node and the cast would fail.

It looks like the cast to a phi node was only needed to get the
induction variable value coming from the backedge to compute the end
of loop condition. This patch changes the loop reroller to compare
the induction variable to the number of times the backedge is taken
instead of the iteration count of the loop. In other words, we stop
the loop when the current value of the induction variable ==
IterationCount-1. Previously, the comparison was comparing the
induction variable value from the next iteration == IterationCount.

This problem only seems to occur on 32-bit targets. For some reason,
the loop is not rerolled on 64-bit targets.

PR18290

llvm-svn: 198425
2014-01-03 17:20:01 +00:00
Hal Finkel
df8016f76f Disable compare sinking in CodeGenPrepare when multiple condition registers are available
As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively
sinks compares to reduce pressure on the condition register(s), for targets
such as PowerPC with multiple condition registers, this may not be the right
thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and
CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is
true.

This functionality will be used by the PowerPC backend in an upcoming commit.
Especially when the PowerPC backend starts tracking individual condition
register bits as separate allocatable entities (which will happen in this
upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
significantly suboptimial.

llvm-svn: 198354
2014-01-02 21:13:43 +00:00
Andrew Trick
9bdab3f1b3 indvars: cleanup the IV visitor. It does more than gather sext/zext info.
llvm-svn: 198353
2014-01-02 21:12:11 +00:00
Matt Arsenault
e28f607079 Delete unread globals through addrspacecast
llvm-svn: 198346
2014-01-02 20:01:43 +00:00
Matt Arsenault
090fe5a92a Fix addrspacecast with metadata globals
llvm-svn: 198345
2014-01-02 19:53:49 +00:00
Andrew Trick
5f76ab650f indvars: insert truncate at loop boundary to avoid redundant IVs.
When widening an IV to remove s/zext, we generally try to eliminate
the original narrow IV. However, LCSSA phi nodes outside the loop were
still using the original IV. Clean this up more aggressively to avoid
redundancy in generated code.

llvm-svn: 198338
2014-01-02 19:29:38 +00:00
Nico Weber
10bf32e628 Set LLVM_EXPORTED_SYMBOL_FILE in CMakeLists whose corresponding Makefiles do so.
(unittests/ExecutionEngine/JIT/CMakeLists.txt is still missing for now, since
it handles export files in a strange way: It generates a .exports file from a
.def file instead of the other way round.)

llvm-svn: 198183
2013-12-29 23:06:49 +00:00
Alexander Potapenko
7da398bcae [ASan] Fix the test for __asan_gen_ globals and actually fix http://llvm.org/bugs/show_bug.cgi?id=17976
by setting the correct linkage (as stated in the bug).

llvm-svn: 198018
2013-12-25 16:46:27 +00:00
Alexander Potapenko
53694d2efb [ASan] Make sure none of the __asan_gen_ global strings end up in the symbol table, add a test.
This should fix http://llvm.org/bugs/show_bug.cgi?id=17976
Another test checking for the global variables' locations and prefixes on Darwin will be committed separately.

llvm-svn: 198017
2013-12-25 14:22:15 +00:00
Andrew Trick
e7f9f5556d Add support to indvars for optimizing sadd.with.overflow.
Split sadd.with.overflow into add + sadd.with.overflow to allow
analysis and optimization. This should ideally be done after
InstCombine, which can perform code motion (eventually indvars should
run after all canonical instcombines). We want ISEL to recombine the
add and the check, at least on x86.

This is currently under an option for reducing live induction
variables: -liv-reduce. The next step is reducing liveness of IVs that
are live out of the overflow check paths. Once the related
optimizations are fully developed, reviewed and tested, I do expect
this to become default.

llvm-svn: 197926
2013-12-23 23:31:49 +00:00
Richard Sandiford
f367c783a7 Fix Scalarizer insertion point when replacing PHIs with insertelements
If the Scalarizer scalarized a vector PHI but could not scalarize
all uses of it, it would insert a series of insertelements to reconstruct
the vector PHI value from the scalar ones.  The problem was that it would
emit these insertelements immediately after the PHI, even if there were
other PHIs after it.

llvm-svn: 197909
2013-12-23 14:51:56 +00:00
Richard Sandiford
27fc4a21a8 Fix Scalarizer handling of vector GEPs with multiple index operands
The old code only worked for one index operand.  Also handle "inbounds".

llvm-svn: 197908
2013-12-23 14:45:00 +00:00
Kostya Serebryany
a148c8c9ed [asan] don't unpoison redzones on function exit in use-after-return mode.
Summary:
Before this change the instrumented code before Ret instructions looked like:
  <Unpoison Frame Redzones>
  if (Frame != OriginalFrame) // I.e. Frame is fake
     <Poison Complete Frame>

Now the instrumented code looks like:
  if (Frame != OriginalFrame) // I.e. Frame is fake
     <Poison Complete Frame>
  else
     <Unpoison Frame Redzones>

Reviewers: eugenis

Reviewed By: eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2458

llvm-svn: 197907
2013-12-23 14:15:08 +00:00
Kostya Serebryany
911683de1d [asan] produce fewer stores when poisoning stack shadow
llvm-svn: 197904
2013-12-23 09:24:36 +00:00
Justin Bogner
3b4e34606e Transforms: Don't create bad weights when eliminating dead cases
If we happen to eliminate every case in a switch that has branch
weights, we currently try to create metadata for the one remaining
branch, triggering an assert. Instead, we need to check that the
metadata we're trying to create is sensible.

llvm-svn: 197791
2013-12-20 08:21:30 +00:00
Kay Tiong Khoo
86f36f1147 Stay classy (and legal) LLVM. Remove links to 3rd party SMT solver whose links may not be permanent.
llvm-svn: 197713
2013-12-19 18:35:54 +00:00
Kay Tiong Khoo
304e305b5c Improved fix for PR17827 (instcombine of shift/and/compare).
This change fixes the case of arithmetic shift right - do not attempt to fold that case.
This change also relaxes the conditions when attempting to fold the logical shift right and shift left cases.

No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations.

llvm-svn: 197705
2013-12-19 18:07:17 +00:00
Evgeniy Stepanov
301154310a [dfsan] Simplify code after r197677.
llvm-svn: 197679
2013-12-19 14:37:03 +00:00
Evgeniy Stepanov
0cd4eea1b6 Add an explicit insert point argument to SplitBlockAndInsertIfThen.
Currently SplitBlockAndInsertIfThen requires that branch condition is an
Instruction itself, which is very inconvenient, because it is sometimes an
Operator, or even a Constant.

llvm-svn: 197677
2013-12-19 13:29:56 +00:00
Arnold Schwaighofer
e4d65aae7d LoopVectorizer: Don't if-convert constant expressions that can trap
A phi node operand or an instruction operand could be a constant expression that
can trap (division). Check that we don't vectorize such cases.

PR16729
radar://15653590

llvm-svn: 197449
2013-12-17 01:11:01 +00:00
Yi Jiang
67f2e8e3f8 Enable double to float shrinking optimizations for binary functions like 'fmin/fmax'. Fix radar:15283121
llvm-svn: 197434
2013-12-16 22:42:40 +00:00
Hal Finkel
89ba3023da Fix a use-after-free error in GlobalOpt CleanupConstantGlobalUsers
GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage
constant users to be visited. The pointers in this array need to be weak
handles because when we delete a constant array, we may also be holding a
pointer to one of its elements (or an element of one of its elements if we're
dealing with an array of arrays) in the worklist.

Fixes PR17347.

llvm-svn: 197178
2013-12-12 20:45:24 +00:00
Hal Finkel
4492bf3e5d Initialize the barrier pass llvm::initializeIPO
The barrier pass is a temporary hack, and should go away soon. Nevertheless, if
we don't initialize it, then opt will not understand -barrier, and this will
break bugpoint (because when it dumps the passes from the default pass manager
-barrier will be there).

llvm-svn: 197177
2013-12-12 20:45:08 +00:00
Yi Jiang
f648b7ec82 Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)
llvm-svn: 197109
2013-12-12 01:55:04 +00:00
NAKAMURA Takumi
54fa39136d Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Reid Kleckner
29793e54b1 [asan] Fix the coverage.cc test broken by r196939
It was failing because ASan was adding all of the following to one
function:
- dynamic alloca
- stack realignment
- inline asm

This patch avoids making the static alloca dynamic when coverage is
used.

ASan should probably not be inserting empty inline asm blobs to inhibit
duplicate tail elimination.

llvm-svn: 196973
2013-12-10 21:49:28 +00:00
NAKAMURA Takumi
3fda77b6c7 Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
NAKAMURA Takumi
b2c60b7ca7 Whitespaces.
llvm-svn: 196880
2013-12-10 05:39:12 +00:00
Justin Bogner
42f9d0cf93 Transforms: Don't create bad branch weights when folding a switch
This avoids creating branch weight metadata of length one when we fold
cases into the default of a switch instruction, which was triggering
an assert.

llvm-svn: 196845
2013-12-10 00:13:41 +00:00
Manman Ren
4fcc808139 Revert 196544 due to internal bot failures.
llvm-svn: 196732
2013-12-08 20:28:33 +00:00
Mark Seaborn
72517bc221 Fix inlining to not lose the "cleanup" clause from landingpads
This fixes PR17872.  This bug can lead to C++ destructors not being
called when they should be, when an exception is thrown.

llvm-svn: 196711
2013-12-08 00:51:21 +00:00
Mark Seaborn
2d856cb007 Fix inlining to not produce duplicate landingpad clauses
Before this change, inlining one "invoke" into an outer "invoke" call
site can lead to the outer landingpad's catch/filter clauses being
copied multiple times into the resulting landingpad.  This happens:

 * when the inlined function contains multiple "resume" instructions,
   because forwardResume() copies the clauses but is called multiple
   times;

 * when the inlined function contains a "resume" and a "call", because
   HandleCallsInBlockInlinedThroughInvoke() copies the clauses but is
   redundant with forwardResume().

Fix this by deduplicating the code.

This problem doesn't lead to any incorrect execution; it's only
untidy.

This change will make fixing PR17872 a little easier.

llvm-svn: 196710
2013-12-08 00:50:58 +00:00
Jakub Staszak
11e1c882f7 Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces
overall time of LLVM compilation by ~1%.

llvm-svn: 196667
2013-12-07 21:20:17 +00:00
Matt Arsenault
db406f2a95 Fix assert with copy from global through addrspacecast
llvm-svn: 196638
2013-12-07 02:58:45 +00:00
Duncan P. N. Exon Smith
de77610c42 Don't use isNullValue to evaluate ConstantExpr
ConstantExpr can evaluate to false even when isNullValue gives false.

Fixes PR18143.

llvm-svn: 196611
2013-12-06 21:48:36 +00:00
Kostya Serebryany
f0897919b4 [asan] fix ndebug build with strict warnings (-Wunused-variable)
llvm-svn: 196574
2013-12-06 09:26:09 +00:00
Kostya Serebryany
fc32a3e5d2 [asan] rewrite asan's stack frame layout
Summary:
Rewrite asan's stack frame layout.
First, most of the stack layout logic is moved into a separte file
to make it more testable and (potentially) useful for other projects.
Second, make the frames more compact by using adaptive redzones
(smaller for small objects, larger for large objects).
Third, try to minimized gaps due to large alignments (this is hypothetical since
today we don't see many stack vars aligned by more than 32).

The frames indeed become more compact, but I'll still need to run more benchmarks
before committing, but I am sking for review now to get early feedback.

This change will be accompanied by a trivial change in compiler-rt tests
to match the new frame sizes.

Reviewers: samsonov, dvyukov

Reviewed By: samsonov

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2324

llvm-svn: 196568
2013-12-06 09:00:17 +00:00
Yi Jiang
0bd0569be5 Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)
llvm-svn: 196544
2013-12-05 22:42:50 +00:00
Renato Golin
a4d4a4c44f Add #pragma vectorize enable/disable to LLVM
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

llvm-svn: 196537
2013-12-05 21:20:02 +00:00
Michael Gottesman
f2e7969964 Change std::deque => std::vector. No functionality change.
There is no reason to use std::deque here over std::vector. Thus given the
performance differences inbetween the two it makes sense to change deque to
vector.

llvm-svn: 196524
2013-12-05 18:42:12 +00:00
Rafael Espindola
2ad993fb14 Fix non-deterministic behavior.
We use CSEBlocks to initialize a worklist:

SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());

so it must have a deterministic order.

llvm-svn: 196520
2013-12-05 18:28:01 +00:00
Arnold Schwaighofer
120880c780 SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508
2013-12-05 15:14:40 +00:00
Kostya Serebryany
eb57b3e248 [tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations)
llvm-svn: 196507
2013-12-05 15:03:02 +00:00
Alp Toker
e845f8af67 Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Yuchen Wu
9559c6af5d llvm-cov: Replace size() with empty() in bool check.
llvm-svn: 196400
2013-12-04 19:18:23 +00:00
Daniel Jasper
ca41e63412 Un-revert r196358: "llvm-cov: Added support for function checksums."
And add the proper fix.

llvm-svn: 196367
2013-12-04 08:57:17 +00:00
Daniel Jasper
a7dd8af910 Revert r196358: "llvm-cov: Added support for function checksums."
This currently breaks clang/test/CodeGen/code-coverage.c. The root cause
is that the newly introduced access to Funcs[j] is out of bounds.

llvm-svn: 196365
2013-12-04 08:23:33 +00:00
Yuchen Wu
b1a23c9951 llvm-cov: Added support for function checksums.
The function checksums are hashed from the concatenation of the function
name and line number.

llvm-svn: 196358
2013-12-04 06:00:17 +00:00
Yunzhong Gao
05c1966c8c Teach the internalize pass to skip dllexported symbols because they could be
referenced in a way that even the linker does not see.

Differential Revision: http://llvm-reviews.chandlerc.com/D2280

llvm-svn: 196300
2013-12-03 18:05:14 +00:00
Kay Tiong Khoo
dd9c210766 Use local variable for repeated use rather than 'get' method. No functional change intended.
llvm-svn: 196164
2013-12-02 22:23:32 +00:00
Kay Tiong Khoo
40a987d7ca Move variables to where they are used and give them better names. No functional change intended.
llvm-svn: 196163
2013-12-02 22:20:40 +00:00
Kay Tiong Khoo
14489a85ab Rename variables to be consistent (CST -> Cst). No functional change intended.
llvm-svn: 196161
2013-12-02 22:11:56 +00:00
Mark Seaborn
b08ae3c322 InlineFunction.cpp: Remove a return value that is always false
Remove some associated dead code.

This cleanup is associated with PR17872.

llvm-svn: 196147
2013-12-02 20:50:59 +00:00
Kay Tiong Khoo
5257afa264 Conservative fix for PR17827 - don't optimize a shift + and + compare sequence where the shift is logical unless the comparison is unsigned
llvm-svn: 196129
2013-12-02 18:43:59 +00:00
Kostya Serebryany
e71e08d007 [tsan] fix instrumentation of vector vptr updates (https://code.google.com/p/thread-sanitizer/issues/detail?id=43)
llvm-svn: 196079
2013-12-02 08:07:15 +00:00
Bill Wendling
0f97d98496 Use accessor methods instead.
llvm-svn: 196006
2013-12-01 03:40:42 +00:00
Bill Wendling
178e2b5358 Use 'unsigned char' to get this past gcc error message:
error: invalid conversion from 'unsigned char' to '{anonymous}::Sequence'

llvm-svn: 196004
2013-12-01 03:36:07 +00:00
Stephen Canon
d8aaca93a6 Rein in overzealous InstCombine of fptrunc(OP(fpextend, fpextend)).
llvm-svn: 195934
2013-11-28 21:38:05 +00:00
Nadav Rotem
dc01e91cf5 PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.

llvm-svn: 195791
2013-11-26 22:24:25 +00:00
Arnold Schwaighofer
d0c05d2c84 LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

llvm-svn: 195787
2013-11-26 22:11:23 +00:00
Diego Novillo
0929297da9 Refactor some code in SampleProfile.cpp
I'm adding new functionality in the sample profiler. This will
require more data to be kept around for each function, so I moved
the structure SampleProfile that we keep for each function into
a separate class.

There are no functional changes in this patch. It simply provides
a new home where to place all the new data that I need to propagate
weights through edges.

There are some other name and minor edits throughout.

llvm-svn: 195780
2013-11-26 20:37:33 +00:00
Nadav Rotem
643eb4c26e PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.

llvm-svn: 195773
2013-11-26 17:29:19 +00:00
Stepan Dyatkovskiy
83455f2b60 PR17925 bugfix.
Short description.

This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:

void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)

foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.

As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.

The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.

llvm-svn: 195769
2013-11-26 16:11:03 +00:00
Chandler Carruth
5be5f8d16c [PM] Split the CallGraph out from the ModulePass which creates the
CallGraph.

This makes the CallGraph a totally generic analysis object that is the
container for the graph data structure and the primary interface for
querying and manipulating it. The pass logic is separated into its own
class. For compatibility reasons, the pass provides wrapper methods for
most of the methods on CallGraph -- they all just forward.

This will allow the new pass manager infrastructure to provide its own
analysis pass that constructs the same CallGraph object and makes it
available. The idea is that in the new pass manager, the analysis pass's
'run' method returns a concrete analysis 'result'. Here, that result is
a 'CallGraph'. The 'run' method will typically do only minimal work,
deferring much of the work into the implementation of the result object
in order to be lazy about computing things, but when (like DomTree)
there is *some* up-front computation, the analysis does it prior to
handing the result back to the querying pass.

I know some of this is fairly ugly. I'm happy to change it around if
folks can suggest a cleaner interim state, but there is going to be some
amount of unavoidable ugliness during the transition period. The good
thing is that this is very limited and will naturally go away when the
old pass infrastructure goes away. It won't hang around to bother us
later.

Next up is the initial new-PM-style call graph analysis. =]

llvm-svn: 195722
2013-11-26 04:19:30 +00:00
Chandler Carruth
a1094eb135 Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.

Fixes PR17741.

Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]

llvm-svn: 195528
2013-11-23 00:48:34 +00:00
Yuchen Wu
43df88959d llvm-cov: Split entry blocks in GCNOProfiling.cpp.
gcov expects every function to contain an entry block that
unconditionally branches into the next block. clang does not implement
basic blocks in this manner, so gcov did not output correct branch info
if the entry block branched to multiple blocks.

This change splits every function's entry block into an empty block and
a block with the rest of the instructions. The instrumentation code will
take care of the rest.

llvm-svn: 195513
2013-11-22 23:07:45 +00:00
Manman Ren
7590fee969 Debug Info: move StripDebugInfo from StripSymbols.cpp to DebugInfo.cpp.
We can share the implementation between StripSymbols and dropping debug info
for metadata versions that do not match.

Also update the comments to match the implementation. A follow-on patch will
drop the "Debug Info Version" module flag in StripDebugInfo.

llvm-svn: 195505
2013-11-22 22:06:31 +00:00
Matt Arsenault
35acaad8c7 StructurizeCFG: Fix verification failure with some loops.
If the beginning of the loop was also the entry block
of the function, branches were inserted to the entry block
which isn't allowed. If this occurs, create a new dummy
function entry block that branches to the start of the loop.

llvm-svn: 195493
2013-11-22 19:24:39 +00:00
Matt Arsenault
9afcbf3562 StructurizeCFG: Fix inverting a branch on an argument
llvm-svn: 195492
2013-11-22 19:24:37 +00:00
Rafael Espindola
524e7f35de Add a fixed version of r195470 back.
The fix is simply to use CurI instead of I when handling aliases to
avoid accessing a invalid iterator.

original message:

Convert linkonce* to weak* instead of strong.

Also refactor the logic into a helper function. This is an important improve
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.

llvm-svn: 195477
2013-11-22 17:58:12 +00:00
Rafael Espindola
749aa1e00d Revert "Convert linkonce* to weak* instead of strong."
This reverts commit r195470.
Debugging failure in some bots.

llvm-svn: 195472
2013-11-22 17:09:34 +00:00
Richard Sandiford
82ac8f6b68 Add a Scalarizer pass.
llvm-svn: 195471
2013-11-22 16:58:05 +00:00
Rafael Espindola
2ac1404bee Convert linkonce* to weak* instead of strong.
Also refactor the logic into a helper function. This is an important improvement
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.

llvm-svn: 195470
2013-11-22 16:14:30 +00:00
Arnold Schwaighofer
3fa9376236 SLPVectorizer: Fix whitespace errors.
llvm-svn: 195468
2013-11-22 15:47:17 +00:00
Yi Jiang
74286d427a SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses.
llvm-svn: 195406
2013-11-22 01:57:02 +00:00
Peter Collingbourne
89b5505b6b Introduce two command-line flags for the instrumentation pass to control whether the labels of pointers should be ignored in load and store instructions
The new command line flags are -dfsan-ignore-pointer-label-on-store and -dfsan-ignore-pointer-label-on-load. Their default value matches the current labelling scheme.

Additionally, the function __dfsan_union_load is marked as readonly.

Patch by Lorenzo Martignoni!

Differential Revision: http://llvm-reviews.chandlerc.com/D2187

llvm-svn: 195382
2013-11-21 23:20:54 +00:00
Evgeniy Stepanov
405470ce24 [msan] Propagate condition origin in select instruction.
llvm-svn: 195349
2013-11-21 12:00:24 +00:00
Yuchen Wu
5480edb76f llvm-cov: Don't assume FileChecksum was generated.
For cases where emitProfileArcs() was called but emitProfileNotes() was
not, set the CfgChecksum to 0.

llvm-svn: 195311
2013-11-21 04:53:39 +00:00
Yuchen Wu
d218a85f8c llvm-cov: Fixed some bugs related to file checksum.
Added call to update CfgChecksum. Made FileChecksum a vector, separate
for each source file.

llvm-svn: 195309
2013-11-21 04:01:05 +00:00
Yuchen Wu
734fa40b2a llvm-cov: Added file checksum to gcno and gcda files.
Instead of permanently outputting "MVLL" as the file checksum, clang
will create gcno and gcda checksums by hashing the destination block
numbers of every arc. This allows for llvm-cov to check if the two gcov
files are synchronized.

Regenerated the test files so they contain the checksum. Also added
negative test to ensure error when the checksums don't match.

llvm-svn: 195191
2013-11-20 04:15:05 +00:00
Arnold Schwaighofer
242935ec8c SLPVectorizer: Fix stale for Value pointer array
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.

Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.

The test case will only fail when running with libgmalloc.

radar://15498655

llvm-svn: 195162
2013-11-19 22:20:20 +00:00
Arnold Schwaighofer
3149313505 SLPVectorizer: Fix whitespace errors
llvm-svn: 195161
2013-11-19 22:20:18 +00:00
Chandler Carruth
4d8e469cd3 Fix an issue where SROA computed different results based on the relative
order of slices of the alloca which have exactly the same size and other
properties. This was found by a perniciously unstable sort
implementation used to flush out buggy uses of the algorithm.

The fundamental idea is that findCommonType should return the best
common type it can find across all of the slices in the range. There
were two bugs here previously:

1) We would accept an integer type smaller than a byte-width multiple,
   and if there were different bit-width integer types, we would accept
   the first one. This caused an actual failure in the testcase updated
   here when the sort order changed.
2) If we found a bad combination of types or a non-load, non-store use
   before an integer typed load or store we would bail, but if we found
   the integere typed load or store, we would use it. The correct
   behavior is to always use an integer typed operation which covers the
   partition if one exists.

While a clever debugging sort algorithm found problem #1 in our existing
test cases, I have no useful test case ideas for #2. I spotted in by
inspection when looking at this code.

llvm-svn: 195118
2013-11-19 09:03:18 +00:00
Michael Ilseman
0f69f39907 Add support for software expansion of 64-bit integer division instructions.
Patch by Dmitri Shtilman!

llvm-svn: 195116
2013-11-19 06:54:19 +00:00
Adrian Prantl
afeac86924 Debug info: Let LowerDbgDeclare perfom the dbg.declare -> dbg.value
lowering only for load/stores to scalar allocas. The resulting values
confuse the backend and don't add anything because we can describe
array-allocas with a dbg.declare intrinsic just fine.

rdar://problem/15464571

llvm-svn: 195052
2013-11-18 23:04:38 +00:00
Alexey Samsonov
cbf7462c74 [ASan] Fix PR17867 - make sure ASan doesn't crash if use-after-scope and use-after-return are combined.
llvm-svn: 195014
2013-11-18 14:53:55 +00:00
Arnold Schwaighofer
e4280ec4dd LoopVectorizer: Extend the induction variable to a larger type
In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.

The problem is loops like:
int main ()
{
  int a = 1;
  char b = 0;
  lbl:
    a &= 4;
    b--;
    if (b) goto lbl;
  return a;
}

The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.

To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.

PR17532

llvm-svn: 195008
2013-11-18 13:14:32 +00:00
NAKAMURA Takumi
413002f21c Utils/LoopUnroll.cpp: Tweak (StringRef)OldName to be valid until it is used, since r194601.
eraseFromParent() invalidates OldName.

llvm-svn: 194970
2013-11-17 18:05:34 +00:00
Hal Finkel
2557302085 Add a loop rerolling flag to the PassManagerBuilder
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.

llvm-svn: 194966
2013-11-17 16:02:50 +00:00
Hal Finkel
f09058ab9e Add the cold attribute to error-reporting call sites
Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.

The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).

llvm-svn: 194943
2013-11-17 02:06:35 +00:00
Hal Finkel
5d880eed19 Fix ndebug-build unused variable in loop rerolling
llvm-svn: 194941
2013-11-17 01:21:54 +00:00
Hal Finkel
cc70e01f05 Add a loop rerolling pass
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:

for (int i = 0; i < 3200; i += 5) {
  a[i]     += alpha * b[i];
  a[i + 1] += alpha * b[i + 1];
  a[i + 2] += alpha * b[i + 2];
  a[i + 3] += alpha * b[i + 3];
  a[i + 4] += alpha * b[i + 4];
}

and turn them into this:

for (int i = 0; i < 3200; ++i) {
  a[i] += alpha * b[i];
}

and loops like this:

for (int i = 0; i < 500; ++i) {
  x[3*i] = foo(0);
  x[3*i+1] = foo(0);
  x[3*i+2] = foo(0);
}

and turn them into this:

for (int i = 0; i < 1500; ++i) {
  x[i] = foo(0);
}

There are two motivations for this transformation:

  1. Code-size reduction (especially relevant, obviously, when compiling for
code size).

  2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.

The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.

This pass is not currently enabled by default at any optimization level.

llvm-svn: 194939
2013-11-16 23:59:05 +00:00
Hal Finkel
79b1387151 Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt
InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:

  (fptrunc (sqrt (fpext x))) -> (sqrtf x)

but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.

This change makes InstCombine apply this optimization to llvm.sqrt as well.

This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.

llvm-svn: 194935
2013-11-16 21:29:08 +00:00
Benjamin Kramer
0519e29d1b InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs.
This is common in bitfield code.

llvm-svn: 194925
2013-11-16 16:00:48 +00:00
Arnold Schwaighofer
01b6f1cc9a LoopVectorizer: Use abi alignment for accesses with no alignment
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

llvm-svn: 194876
2013-11-15 23:09:33 +00:00
Manman Ren
2189fb16b9 ArgumentPromotion: correctly transfer TBAA tags and alignments.
We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.

The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.

PR17906

llvm-svn: 194846
2013-11-15 20:41:15 +00:00
Kostya Serebryany
3b2f8ea060 [asan] use GlobalValue::PrivateLinkage for coverage guard to save quite a bit of code size
llvm-svn: 194800
2013-11-15 09:52:05 +00:00
Bob Wilson
5f54393e59 Reapply "[asan] Poor man's coverage that works with ASan"
I was able to successfully run a bootstrapped LTO build of clang with
r194701, so this change does not seem to be the cause of our failing
buildbots.

llvm-svn: 194789
2013-11-15 07:16:09 +00:00
Matt Arsenault
2c9ec5c652 Add instcombine visitor for addrspacecast
llvm-svn: 194786
2013-11-15 05:45:08 +00:00
Bob Wilson
1eb0da26bc Revert "[asan] Poor man's coverage that works with ASan"
This reverts commit 194701. Apple's bootstrapped LTO builds have been failing,
and this change (along with compiler-rt 194702-194704) is the only thing on
the blamelist.  I will either reappy these changes or help debug the problem,
depending on whether this fixes the buildbots.

llvm-svn: 194780
2013-11-15 03:28:22 +00:00
Kostya Serebryany
8bd4b917a6 [asan] Poor man's coverage that works with ASan
llvm-svn: 194701
2013-11-14 13:27:41 +00:00
Evgeniy Stepanov
01ee97e26c [msan] Fast path optimization for wrap-indirect-calls feature of MemorySanitizer.
Indirect call wrapping helps MSanDR (dynamic instrumentation companion tool
for MSan) to catch all cases where execution leaves a compiler-instrumented
module by allowing the tool to rewrite targets of indirect calls.

This change is an optimization that skips wrapping for calls when target is
inside the current module. This relies on the linker providing symbols at the
begin and end of the module code (or code + data, does not really matter).
Gold linker provides such symbols by default. GNU (BFD) linker needs a link
flag: -Wl,--defsym=__executable_start=0.

More info:
https://code.google.com/p/memory-sanitizer/wiki/MSanDR#Native_exec

llvm-svn: 194697
2013-11-14 12:29:04 +00:00
Jakub Staszak
5e8ab0ddcf Use StringRef instead of std::string
llvm-svn: 194601
2013-11-13 20:09:11 +00:00
Alexey Samsonov
89a2c4d5df Fix -Wdelete-non-virtual-dtor warnings by making SampleProfile methods non-virtual
llvm-svn: 194568
2013-11-13 13:09:39 +00:00
Diego Novillo
7b4e2dda6b SampleProfileLoader pass. Initial setup.
This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.

The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.

External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.

The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains.  This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.

To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.

The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.

Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.

This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.

Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.

This patch implements:

- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
  generates branch weight metadata on every branch instructions that
  matches the profiles.
- A text loader class to assist the implementation of
  SampleProfile::loadText().
- Basic unit tests for the pass.

Additionally, the patch uses profile information to compute branch
weights based on instruction samples.

This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:

Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.

Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.

I will adjust this assignment in subsequent patches.

llvm-svn: 194566
2013-11-13 12:22:21 +00:00
Nadav Rotem
a2f08b8300 Update the docs to match the function name.
llvm-svn: 194537
2013-11-13 01:12:01 +00:00
Nadav Rotem
e15585e8ff Fold (iszero(A&K1) | iszero(A&K2)) -> (A&(K1|K2)) != (K1|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on).
llvm-svn: 194525
2013-11-12 22:38:59 +00:00
Nadav Rotem
8fbd606127 FoldBranchToCommonDest merges branches into a single branch with or/and of the condition. It has a heuristics for estimating when some of the dependencies are processed by out-of-order processors. This patch adds another rule to the heuristics that says that if the "BonusInstruction" that we speculatively execute is used by the condition of the second branch then it is okay to hoist it. This change exposes more opportunities for other passes to transform the code. It does not matter that much that we if-convert the code because the selectiondag builder splits or/and branches into multiple branches when profitable.
llvm-svn: 194524
2013-11-12 22:37:16 +00:00
Rafael Espindola
318b459092 Corruptly merge constants with explicit and implicit alignments.
Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.

Fixes pr17815.

Patch by Chris Smowton!

llvm-svn: 194506
2013-11-12 20:21:43 +00:00
Benjamin Kramer
d93d22eb86 SimplifyCFG: Use existing constant folding logic when forming switch tables.
Both simpler and more powerful than the hand-rolled folding logic.

llvm-svn: 194475
2013-11-12 12:24:36 +00:00
Shuxin Yang
ccc49e9a1b Correct a glitch in r194424 which may invalidate iterator.
llvm-svn: 194457
2013-11-12 08:33:03 +00:00
Yuchen Wu
12b9b5086e llvm-cov: Added call to update run/program counts.
Also updated test files that were generated from this change.

llvm-svn: 194453
2013-11-12 04:59:08 +00:00
Shuxin Yang
fc370d42f7 Fix PR17952.
The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
 value from dead block to alive join node is certainly ok.)

  The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that 
split the critial edge needs to be flagged "dead" as well.

  There are 3 ways to fix this bug:
  1) Get rid of the assertion as I mentioned eariler 
  2) When an dead edge is split, flag the inserted block "dead".
  3) proactively split the critical edges connecting dead and live blocks when
     new dead blocks are revealed.

  This fix go for 3) with additional 2 LOC.

  Testing case was added by Rafael the other day.

llvm-svn: 194424
2013-11-11 22:00:23 +00:00
Renato Golin
ed3b88828a Move debug message in vectorizer
No functional change, just better reporting.

llvm-svn: 194388
2013-11-11 16:27:35 +00:00
Evgeniy Stepanov
32b834b198 [msan] Propagate origin for insertvalue, extractvalue.
llvm-svn: 194374
2013-11-11 13:37:10 +00:00
Bill Wendling
74b6463d35 Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308."
This causes PR17852.

This reverts commit d93e8a06b2ca09ab18f390cd514b7443e2e571f7.

Conflicts:
	test/Transforms/GVN/cond_br2.ll

llvm-svn: 194348
2013-11-10 07:34:34 +00:00
Matt Arsenault
e3debcae62 Use type form of getIntPtrType.
This should be inconsequential and is work
towards removing the default address space
arguments.

llvm-svn: 194347
2013-11-10 04:46:57 +00:00
Nadav Rotem
537b5180e9 SimplifyCFG has a heuristics for out-of-order processors that decides when it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite.
llvm-svn: 194346
2013-11-10 04:13:31 +00:00
Matt Arsenault
4fcb58f7fb Teach MergeFunctions about address spaces
llvm-svn: 194342
2013-11-10 01:44:37 +00:00
Hal Finkel
b485e0c406 Remove dead code from LoopUnswitch
LoopUnswitch's code simplification routine has logic to convert conditional
branches into unconditional branches, after unswitching makes the condition
constant, and then remove any blocks that renders dead. Unfortunately, this
code is dead, currently broken, and furthermore, has never been alive (at least
as far back at 2006).

No functionality change intended.

llvm-svn: 194277
2013-11-08 19:58:21 +00:00
Michael Gottesman
04ff2468f0 [objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail.
Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.

Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.

rdar://15377890

llvm-svn: 194083
2013-11-05 16:02:40 +00:00
Hal Finkel
26ff1b7453 Add a runtime unrolling parameter to the LoopUnroll pass constructor
As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).

No functionality change intended.

llvm-svn: 194027
2013-11-05 00:08:03 +00:00
Shuxin Yang
08d2dc8405 Remove dead code
llvm-svn: 194017
2013-11-04 21:44:01 +00:00
Benjamin Kramer
9eaaead296 SLPVectorizer: Use properlyDominates to satisfy the irreflexivity of a strict weak ordering.
STL debug mode checks this.

llvm-svn: 194015
2013-11-04 21:34:55 +00:00
Matt Arsenault
1f521e921d Scalarize select vector arguments when extracted.
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.

llvm-svn: 194013
2013-11-04 20:36:06 +00:00
Benjamin Kramer
15ebc47438 SLPVectorizer: Add a missing pair of parens. No functionality change.
llvm-svn: 193958
2013-11-03 12:54:32 +00:00
Benjamin Kramer
f45bcf5480 SLPVectorizer: When CSEing generated gathers only scan blocks containing them.
Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.

The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.

llvm-svn: 193956
2013-11-03 12:27:52 +00:00
David Majnemer
2bbbbaaeee Revert "Inliner: Handle readonly attribute per argument when adding memcpy"
This reverts commit r193356, it caused PR17781.

A reduced test case covering this regression has been added to the test suite.

llvm-svn: 193955
2013-11-03 12:22:13 +00:00
David Majnemer
fbb2338856 Spell "Actual" correctly
llvm-svn: 193954
2013-11-03 11:09:39 +00:00
Bob Wilson
eb7943100b Convert calls to __sinpi and __cospi into __sincospi_stret
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.

Patch by Tim Northover

llvm-svn: 193943
2013-11-03 06:48:38 +00:00
Benjamin Kramer
ae919396b6 SLPVectorizer: Remove duplicated function.
llvm-svn: 193927
2013-11-02 14:46:27 +00:00
Benjamin Kramer
abc7baa1dc LoopVectorize: Remove quadratic behavior the local CSE.
Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.

llvm-svn: 193926
2013-11-02 13:39:00 +00:00
Arnold Schwaighofer
fed0c4f8e8 LoopVectorizer: Move cse code into its own function
llvm-svn: 193895
2013-11-01 23:28:54 +00:00
Arnold Schwaighofer
fba1c74b67 LoopVectorizer: Perform redundancy elimination on induction variables
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.

On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.

This should also help lpbench and paq8p.

I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.

radar://15339680

llvm-svn: 193891
2013-11-01 22:18:19 +00:00
Benjamin Kramer
3045156cee LoopVectorize: Look for consecutive acces in GEPs with trailing zero indices
If we have a pointer to a single-element struct we can still build wide loads
and stores to it (if there is no padding).

llvm-svn: 193860
2013-11-01 14:09:50 +00:00
Arnold Schwaighofer
5d7be45165 LoopVectorizer: If dependency checks fail try runtime checks
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.

This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.

radar://15339680

llvm-svn: 193853
2013-11-01 03:05:07 +00:00
Arnold Schwaighofer
fe8e481ef6 LoopVectorizer: Clear all member data structures in RuntimeCheck.reset()
Clear all data structures when resetting the RuntimeCheck data structure.

No test case. This was exposed by an upcomming change.

llvm-svn: 193852
2013-11-01 03:05:04 +00:00
Manman Ren
bba5655c39 Do not convert "call asm" to "invoke asm" in Inliner.
Given that backend does not handle "invoke asm" correctly ("invoke asm" will be
handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right
setup for LPadToCallSiteMap) and we already made the assumption that inline asm
does not throw in InstCombiner::visitCallSite, we are going to make the same
assumption in Inliner to make sure we don't convert "call asm" to "invoke asm".

If it becomes necessary to add support for "invoke asm" later on, we will need
to modify the backend as well as remove the assumptions that inline asm does
not throw.

Fix rdar://15317907

llvm-svn: 193808
2013-10-31 21:56:03 +00:00
Rafael Espindola
24353f2de2 Use LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN instead of the "dso list".
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
  files.
* The linker tells LLVM which symbols are not used from other object files,
  but will be put in the dso symbol table if present.

GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.

LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.

This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
  global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.

llvm-svn: 193800
2013-10-31 20:51:58 +00:00
Rafael Espindola
afc61d382c Merge CallGraph and BasicCallGraph.
llvm-svn: 193734
2013-10-31 03:03:55 +00:00
Matt Arsenault
7af7ffc005 Teach scalarrepl about address spaces
llvm-svn: 193720
2013-10-30 22:54:58 +00:00
Matt Arsenault
f608b07c6f Fix GVN creating bitcast between address spaces
llvm-svn: 193710
2013-10-30 19:05:41 +00:00
Arnold Schwaighofer
fe80e563da ARM cost model: Account for zero cost scalar SROA instructions
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.

radar://15336950

llvm-svn: 193573
2013-10-29 01:33:53 +00:00
Arnold Schwaighofer
6f22639253 SLPVectorizer: Use vector type for vectorized memory operations
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.

radar://15332579

llvm-svn: 193572
2013-10-29 01:33:50 +00:00
Shuxin Yang
eb29e658a0 Revert r193251 : Use address-taken to disambiguate global variable and indirect memops.
llvm-svn: 193489
2013-10-27 03:08:44 +00:00
Wan Xiaofei
f3100f24fa Quick look-up for block in loop.
This patch implements quick look-up for block in loop by maintaining a hash set for blocks.
It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng).
Below are the compilation time for our benchmark in llc before & after the patch.

Benchmark	llc - trunk		llc - patched	
401.bzip2	0.339081	100.00%	0.329657	102.86%
403.gcc		19.853966	100.00%	19.605466	101.27%
429.mcf		0.049823	100.00%	0.048451	102.83%
433.milc	0.514898	100.00%	0.510217	100.92%
444.namd	1.109328	100.00%	1.103481	100.53%
445.gobmk	4.988028	100.00%	4.929114	101.20%
456.hmmer	0.843871	100.00%	0.825865	102.18%
458.sjeng	0.754238	100.00%	0.714095	105.62%
464.h264ref	2.9668		100.00%	2.90612		102.09%
471.omnetpp	4.556533	100.00%	4.511886	100.99%
bitmnp01	0.038168	100.00%	0.0357		106.91%
idctrn01	0.037745	100.00%	0.037332	101.11%
libquake2	3.78689		100.00%	3.76209		100.66%
libquake_	2.251525	100.00%	2.234104	100.78%
linpack		0.033159	100.00%	0.032788	101.13%
matrix01	0.045319	100.00%	0.043497	104.19%
nbench		0.333161	100.00%	0.329799	101.02%
tblook01	0.017863	100.00%	0.017666	101.12%
ttsprk01	0.054337	100.00%	0.053057	102.41%

Reviewer	: Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov>
Approver	: Andrew Trick <atrick@apple.com>
Test		: Pass make check-all & llvm test-suite

llvm-svn: 193460
2013-10-26 03:08:02 +00:00
Andrew Trick
741b7a0cfc Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop.
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)

When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.

llvm-svn: 193438
2013-10-25 21:35:56 +00:00
Rafael Espindola
37e88054ad Handle calls and invokes in GlobalStatus.
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.

With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.

llvm-svn: 193436
2013-10-25 21:29:52 +00:00
Hal Finkel
d554c99b37 LoopVectorizer: Don't attempt to vectorize extractelement instructions
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:

  %58 = extractelement <2 x i32> %15, i32 0
  %59 = extractelement i32 %58, i32 0

where the second extractelement is illegal because its first operand is not a vector.

llvm-svn: 193434
2013-10-25 20:40:15 +00:00
Tom Stellard
3863b94253 Inliner: Handle readonly attribute per argument when adding memcpy
Patch by: Vincent Lejeune

llvm-svn: 193356
2013-10-24 16:38:33 +00:00
Renato Golin
ae79e04f36 Mark vector loops as already vectorized
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.

llvm-svn: 193349
2013-10-24 14:50:51 +00:00
Nuno Lopes
09c3fc8dac fix PR17635: false positive with packed structures
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object

llvm-svn: 193317
2013-10-24 09:17:24 +00:00
Juergen Ributzka
db5b9c734f Fix a bug in LinearFunctionTestReplace that created invalid loop exit checks.
Reviewed by Andy

llvm-svn: 193303
2013-10-24 05:29:56 +00:00
Andrew Trick
780bd06488 Clarify comments in genLoopLimit.
llvm-svn: 193292
2013-10-24 00:43:38 +00:00
Yuchen Wu
88a414eb08 Fixed comment typo in GCOVProfiling.cpp
llvm-svn: 193268
2013-10-23 20:35:00 +00:00
Shuxin Yang
45a453cafe Use address-taken to disambiguate global variable and indirect memops.
Major steps include:
 1). introduces a not-addr-taken bit-field in GlobalVariable
 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable 
    dosen't have its address taken.
 3). AA use this info for disambiguation. 

llvm-svn: 193251
2013-10-23 17:28:19 +00:00
Eric Christopher
4762bba338 Fix spelling, grammar, and match naming convention for test files.
llvm-svn: 193130
2013-10-21 23:14:06 +00:00
Tom Stellard
6f3b22d6ce SimplifyCFG: Don't duplicate calls to functions marked noduplicate v2
v2:
  - Use CI->cannotDuplicate()

llvm-svn: 193115
2013-10-21 20:07:30 +00:00
Matt Arsenault
fcca6dd732 Use more type helper functions
llvm-svn: 193109
2013-10-21 19:43:56 +00:00
Matt Arsenault
8faf1f4444 Teach SimplifyCFG about address spaces
llvm-svn: 193104
2013-10-21 18:55:08 +00:00
Rafael Espindola
34304975c5 Optimize more linkonce_odr values during LTO.
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.

This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.

llvm-svn: 193090
2013-10-21 17:14:55 +00:00
Michael Gottesman
80054d1ccd Fix the predecessor removal logic in r193045.
Additionally some small comment/stylistic fixes are included as well.

llvm-svn: 193068
2013-10-21 05:20:11 +00:00
Bill Wendling
ab46af5515 Don't eliminate a partially redundant load if it's in a landing pad.
A landing pad can be jumped to only by the unwind edge of an invoke
instruction. If we eliminate a partially redundant load in a landing pad, it
will create a basic block that violates this constraint. It then leads to other
problems down the line if it tries to merge that basic block with the landing
pad. Avoid this by not eliminating the load in a landing pad.

PR17621

llvm-svn: 193064
2013-10-21 04:09:17 +00:00
Michael Gottesman
f056facc24 Teach simplify-cfg how to correctly create covered lookup tables for switches on iN with N >= 3.
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:

1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.

The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.

If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.

This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.

rdar://15268442

llvm-svn: 193045
2013-10-20 07:04:37 +00:00
Bill Wendling
fda67cdf91 Perform an intelligent splice of the predecessor with the single successor.
If the predecessor's being spliced into a landing pad, then we need the PHIs to
come first and the rest of the predecessor's code to come *after* the landing
pad instruction.

llvm-svn: 193035
2013-10-19 11:27:12 +00:00
Nadav Rotem
fd357159bc Mark some command line flags as hidden
llvm-svn: 193013
2013-10-18 23:38:13 +00:00
Rafael Espindola
e040b98ded Rename fields of GlobalStatus to match the coding style.
llvm-svn: 192910
2013-10-17 18:18:52 +00:00
Rafael Espindola
4e9662d96e rename SafeToDestroyConstant to isSafeToDestroyConstant and clang-format.
llvm-svn: 192907
2013-10-17 18:06:32 +00:00
Rafael Espindola
04822e2dd6 Simplify the interface of AnalyzeGlobal a bit and rename to analyzeGlobal.
No functionality change.

llvm-svn: 192906
2013-10-17 18:00:25 +00:00
Evgeniy Stepanov
bf4a5e9d07 [msan] Use zero-extension in shadow cast by default.
Switch to sign-extension in r192575 caused 7% perf loss on 482.sphinx3.

llvm-svn: 192882
2013-10-17 10:53:50 +00:00
Dmitry Vyukov
012cdf1364 tsan: implement no_sanitize_thread attribute
If a function has no_sanitize_thread attribute,
do not instrument memory accesses in it.

llvm-svn: 192871
2013-10-17 07:20:06 +00:00
Arnold Schwaighofer
789187ee86 SLPVectorizer: Don't vectorize volatile memory operations
radar://15231682

Reapply r192799,
  http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/8226
showed that the bot is still broken even with this out.

llvm-svn: 192820
2013-10-16 17:52:40 +00:00
Arnold Schwaighofer
7097263371 Revert "SLPVectorizer: Don't vectorize volatile memory operations"
This speculatively reverts commit 192799. It might have broken a linux buildbot.

llvm-svn: 192816
2013-10-16 17:19:40 +00:00
Arnold Schwaighofer
eebda9d6cf SLPVectorizer: Don't vectorize volatile memory operations
radar://15231682

llvm-svn: 192799
2013-10-16 16:09:00 +00:00
Kostya Serebryany
2ae1ed1c8f [asan] Optimize accesses to global arrays with constant index
Summary:
Given a global array G[N], which is declared in this CU and has static initializer
avoid instrumenting accesses like G[i], where 'i' is a constant and 0<=i<N.
Also add a bit of stats.

This eliminates ~1% of instrumentations on SPEC2006
and also partially helps when asan is being run together with coverage.

Reviewers: samsonov

Reviewed By: samsonov

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1947

llvm-svn: 192794
2013-10-16 14:06:14 +00:00
Benjamin Kramer
a00487e169 LoopVectorize: Properly reflect PODness in comments.
llvm-svn: 192717
2013-10-15 16:19:54 +00:00
Craig Topper
037594e792 Remove x86_sse42_crc32_64_8 intrinsic. It has no functional difference from x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext.
llvm-svn: 192672
2013-10-15 05:20:47 +00:00
Rafael Espindola
d9b08f1296 Remove lib/Transforms/Instrumentation/ProfilingUtils.*
They were leftover from the old profiling support.

Patch by Alastair Murray.

llvm-svn: 192605
2013-10-14 16:46:46 +00:00
Chris Lattner
18bf2f7f32 Basic blocks typically have few predecessors. Use a SmallDenseMap to
avoid a heap allocation when this is the case.

llvm-svn: 192602
2013-10-14 16:05:55 +00:00
Evgeniy Stepanov
bd5e5ef8a1 [msan] Instrument x86.*_cvt* intrinsics.
Currently MSan checks that arguments of *cvt* intrinsics are fully initialized.
That's too much to ask: some of them only operate on lower half, or even
quarter, of the input register.

llvm-svn: 192599
2013-10-14 15:16:25 +00:00
Evgeniy Stepanov
57dafd2630 [msan] Fix handling of scalar select of vectors.
llvm-svn: 192575
2013-10-14 09:52:09 +00:00
Arnold Schwaighofer
38ec37faba SLPVectorizer: Sort PHINodes based on their opcode
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.

This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.

radar://15024459

llvm-svn: 192537
2013-10-12 18:56:27 +00:00
Tobias Grosser
bc154d94d0 LoopVectorize: Add missing INITIALIZE_PASS_DEPENDENCY macros
Contributed-by:  Peter Zotov  <whitequark@whitequark.org>
llvm-svn: 192536
2013-10-12 18:29:15 +00:00
Renato Golin
ec7fe56cfa Better info when debugging vectorizer
llvm-svn: 192460
2013-10-11 16:14:39 +00:00
Shuxin Yang
708f0cc2e4 Fix a bug in Dead Argument Elimination.
If a function seen at compile time is not necessarily the one linked to
the binary being built, it is illegal to change the actual arguments
passing to it. 

  e.g. 
   --------------------------
   void foo(int lol) {
     // foo() has linkage satisifying isWeakForLinker()
     // "lol" is not used at all.
   }

   void bar(int lo2) {
      // xform to foo(undef) is illegal, as compiler dose not know which
      // instance of foo() will be linked to the the binary being built.
      foo(lol2); 
   }
  -----------------------------

  Such functions can be captured by isWeakForLinker(). NOTE that
mayBeOverridden() is insufficient for this purpose as it dosen't include
linkage types like AvailableExternallyLinkage and LinkOnceODRLinkage.
Take link_odr* as an example, it indicates a set of *EQUIVALENT* globals
that can be merged at link-time. However, the semantic of 
*EQUIVALENT*-functions includes parameters. Changing parameters breaks
the assumption.

  Thank John McCall for help, especially for the explanation of subtle
difference between linkage types.

  rdar://11546243

llvm-svn: 192302
2013-10-09 17:21:44 +00:00
Arnold Schwaighofer
bfe48b104a LoopVectorize: External uses must use the last value in a reduction cycle
Otherwise, we don't perform operations that would have been performed on
the scalar version.

Fixes PR17498.

llvm-svn: 192133
2013-10-07 21:05:43 +00:00
Alexey Samsonov
a04533615d Revert r191834 until we measure the effect of this benchmarks and maybe find a better way to fix it
llvm-svn: 192121
2013-10-07 19:03:24 +00:00
Hal Finkel
83df3058ed UpdatePHINodes in BasicBlockUtils should not crash on duplicate predecessors
UpdatePHINodes has an optimization to reuse an existing PHI node, where it
first deletes all of its entries and then replaces them. Unfortunately, in the
case where we had duplicate predecessors (which are allowed so long as the
associated PHI entries have the same value), the loop removing the existing PHI
entries from the to-be-reused PHI would assert (if that PHI was not the one
which had the duplicates).

llvm-svn: 192001
2013-10-04 23:41:05 +00:00
Arnold Schwaighofer
6fab174840 SLPVectorizer: Sort inputs to commutative binary operations
Sort the operands of the other entries in the current vectorization root
according to the first entry's operands opcodes.

%conv0 = uitofp ...
%load0 = load float ...

= fmul %conv0, %load0
= fmul %load0, %conv1
= fmul %load0, %conv2

Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0,
%load0, %load0>.

This makes it more likely to obtain vectorizable trees. We have to be careful
when we sort that we don't destroy 'good' existing ordering implied by source
order.

radar://15080067

llvm-svn: 191977
2013-10-04 20:39:16 +00:00
Owen Anderson
5e2eba8c18 Pull fptrunc's upwards through selects when one of the select's selectands was a constant. This has a number of benefits, including producing small immediates (easier to materialize, smaller constant pools) as well as being more likely to allow the fptrunc to fuse with a preceding instruction (truncating selects are unusual).
llvm-svn: 191929
2013-10-03 21:08:05 +00:00
Rafael Espindola
a88c415234 Optimize linkonce_odr unnamed_addr functions during LTO.
Generalize the API so we can distinguish symbols that are needed just for a DSO
symbol table from those that are used from some native .o.

The symbols that are only wanted for the dso symbol table can be dropped if
llvm can prove every other dso has a copy (linkonce_odr) and the address is not
important (unnamed_addr).

llvm-svn: 191922
2013-10-03 18:29:09 +00:00
Matt Arsenault
c8e9a77ae3 Make gep i8* X, -(ptrtoint Y) transform work with address spaces
llvm-svn: 191920
2013-10-03 18:15:57 +00:00
Matt Arsenault
2c180f66a9 Don't use runtime bounds check between address spaces.
Don't vectorize with a runtime check if it requires a
comparison between pointers with different address spaces.
The values can't be assumed to be directly comparable.
Previously it would create an illegal bitcast.

llvm-svn: 191862
2013-10-02 22:38:17 +00:00
Yi Jiang
09195de8f3 Apply slp vectorization on fully-vectorizable tree of height 2
llvm-svn: 191852
2013-10-02 20:20:39 +00:00
Matt Arsenault
15633246b6 Fix debug printing spacing.
Fix missing newlines, missing and extra spaces in printed messages.

llvm-svn: 191851
2013-10-02 20:04:29 +00:00
Matt Arsenault
26cc78f548 Fix comment grammar and capitalization.
llvm-svn: 191850
2013-10-02 20:04:26 +00:00
Benjamin Kramer
4458ae839d SLPVectorizer: Make store chain finding more aggressive with GetUnderlyingObject.
This recursively strips all GEPs like the existing code. It also handles bitcasts and
other operations that do not change the pointer value.

llvm-svn: 191847
2013-10-02 19:06:06 +00:00
Tom Stellard
b9cbfd3959 StructurizeCFG: Add dependency on LowerSwitch pass
Switch instructions were crashing the StructurizeCFG pass, and it's
probably easier anyway if we don't need to handle them in this pass.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 191841
2013-10-02 17:04:59 +00:00
Chandler Carruth
ee12d58370 Remove the very substantial, largely unmaintained legacy PGO
infrastructure.

This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.

Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.

llvm-svn: 191835
2013-10-02 15:42:23 +00:00
Alexey Samsonov
3291ccfed7 Remove "localize global" optimization
Summary:
As discussed in http://llvm-reviews.chandlerc.com/D1754,
this optimization isn't really valid for C, and fires too rarely anyway.

Reviewers: rafael, nicholas

Reviewed By: nicholas

CC: rnk, llvm-commits, nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D1769

llvm-svn: 191834
2013-10-02 15:31:34 +00:00
Matt Arsenault
e06d79aa83 Don't merge tiny functions.
It's silly to merge functions like these:

define void @foo(i32 %x) {
  ret void
}

define void @bar(i32 %x) {
  ret void
}

to get

define void @bar(i32) {
  tail call void @foo(i32 %0)
  ret void
}

llvm-svn: 191786
2013-10-01 18:05:30 +00:00
Rafael Espindola
a279462828 Remove several unused variables.
Patch by Alp Toker.

llvm-svn: 191757
2013-10-01 13:32:03 +00:00
Matt Arsenault
a3e171a6c8 Fix code duplication
llvm-svn: 191716
2013-10-01 00:01:14 +00:00
Matt Arsenault
bc0b70cd8d Use right address space size in InstCombineCompares
The test's output doesn't change, but this ensures
this is actually hit with a different address space.

llvm-svn: 191701
2013-09-30 21:11:01 +00:00
Matt Arsenault
27f5203bba Constant fold ptrtoint + compare with address spaces
llvm-svn: 191699
2013-09-30 21:06:18 +00:00
Benjamin Kramer
0b73a10aeb BoundsChecking: Fix refacto.
llvm-svn: 191676
2013-09-30 15:52:50 +00:00
Benjamin Kramer
7b5eaaacfd Convert manual insert point restores to the new RAII object.
llvm-svn: 191675
2013-09-30 15:40:17 +00:00
Benjamin Kramer
ae9249f56f InstCombine: Replace manual fast math flag copying with the new IRBuilder RAII helper.
Defines away the issue where cast<Instruction> would fail because constant
folding happened. Also slightly cleaner.

llvm-svn: 191674
2013-09-30 15:39:59 +00:00