1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 22:12:47 +01:00
Commit Graph

11638 Commits

Author SHA1 Message Date
Reid Kleckner
5013d7ca2d Fix PR7272 in -tailcallelim instead of the inliner
The -tailcallelim pass should be checking if byval or inalloca args can
be captured before marking calls as tail calls.  This was the real root
cause of PR7272.

With a better fix in place, revert the inliner change from r105255.  The
test case it introduced still passes and has been moved to
test/Transforms/Inline/byval-tail-call.ll.

Reviewers: chandlerc

Differential Revision: http://reviews.llvm.org/D3403

llvm-svn: 206789
2014-04-21 20:48:47 +00:00
David Blaikie
057cb4407d Simplify expression that was explicitly naming an operator overload in a call.
llvm-svn: 206788
2014-04-21 20:43:51 +00:00
David Blaikie
f04826c7d5 Use unique_ptr to handle ownership of GCOVFunctions in GCOVProfiler.
llvm-svn: 206786
2014-04-21 20:41:55 +00:00
Chandler Carruth
e07407deea [Modules] Sink all the DEBUG_TYPE defines for InstCombine out of the
header files and into the cpp files.

These files will require more touches as the header files actually use
DEBUG(). Eventually, I'll have to introduce a matched #define and #undef
of DEBUG_TYPE for the header files, but that comes as step N of many to
clean all of this up.

llvm-svn: 206777
2014-04-21 19:51:41 +00:00
Evgeniy Stepanov
b8b4d1d879 [msan] Enable out-of-line instrumentation for large functions by default.
llvm-svn: 206759
2014-04-21 15:04:05 +00:00
Kostya Serebryany
0ca459b956 [asan] add a run-time flag detect_container_overflow=true/false
llvm-svn: 206756
2014-04-21 14:35:00 +00:00
Kostya Serebryany
8369b857f6 [asan] instead of inserting inline instrumentation around memset/memcpy/memmove, replace the intrinsic with __asan_memset/etc. This makes the memset/etc handling more complete and consistent with what we do in msan. It may slowdown some cases (when the intrinsic was actually inlined) and speedup other cases (when it was not inlined)
llvm-svn: 206746
2014-04-21 11:50:42 +00:00
Kostya Serebryany
25679a433d [asan] temporary disable generating __asan_loadN/__asan_storeN
llvm-svn: 206741
2014-04-21 10:28:13 +00:00
Kostya Serebryany
0405013a8c [asan] insert __asan_loadN/__asan_storeN as out-lined asan checks, llvm part
llvm-svn: 206734
2014-04-21 07:10:43 +00:00
Alp Toker
faee7c31dd Remove some empty statements
Cleanup only.

llvm-svn: 206710
2014-04-19 23:56:35 +00:00
Nick Lewycky
bd9ff641e7 Check whether functions have any lines associated before emitting coverage info for them. This isn't just a size/time saving, gcov may crash on these.
llvm-svn: 206671
2014-04-18 23:32:28 +00:00
Evgeniy Stepanov
de38078fd6 [msan] Add -msan-instrumentation-with-call-threshold.
This flag replaces inline instrumentation for checks and origin stores with
calls into MSan runtime library. This is a workaround for PR17409.

Disabled by default.

llvm-svn: 206585
2014-04-18 12:17:20 +00:00
Kostya Serebryany
2b02920109 [asan] one more workaround for PR17409: don't do BB-level coverage instrumentation if there are more than N (=1500) basic blocks. This makes ASanCoverage work on libjpeg_turbo/jchuff.c used by Chrome, which has 1824 BBs
llvm-svn: 206564
2014-04-18 08:02:42 +00:00
Duncan P. N. Exon Smith
7063f2846a PMBuilder: Expose an option to disable tail calls
Adds API to allow frontends to disable tail calls in PassManagerBuilder.

<rdar://problem/16050591>

llvm-svn: 206542
2014-04-18 01:05:15 +00:00
Diego Novillo
45811c5ea3 Fix bug 19437 - Only add discriminators for DWARF 4 and above.
Summary:
This prevents the discriminator generation pass from triggering if
the DWARF version being used in the module is prior to 4.

Reviewers: echristo, dblaikie

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3413

llvm-svn: 206507
2014-04-17 22:33:50 +00:00
Nuno Lopes
4a36b584a3 remove some dead code
lib/Analysis/IPA/InlineCost.cpp         |   18 ------------------
 lib/Analysis/RegionPass.cpp             |    1 -
 lib/Analysis/TypeBasedAliasAnalysis.cpp |    1 -
 lib/Transforms/Scalar/LoopUnswitch.cpp  |   21 ---------------------
 lib/Transforms/Utils/LCSSA.cpp          |    2 --
 lib/Transforms/Utils/LoopSimplify.cpp   |    6 ------
 utils/TableGen/AsmWriterEmitter.cpp     |   13 -------------
 utils/TableGen/DFAPacketizerEmitter.cpp |    7 -------
 utils/TableGen/IntrinsicEmitter.cpp     |    2 --
 9 files changed, 71 deletions(-)

llvm-svn: 206506
2014-04-17 22:26:44 +00:00
NAKAMURA Takumi
51c35adf06 Inliner::OptimizationRemark: Fix crash in clang/test/Frontend/optimization-remark.c on some hosts, including --vg.
DebugLoc in Callsite would not live after Inliner. It should be copied before Inliner.

llvm-svn: 206459
2014-04-17 12:22:14 +00:00
Kostya Serebryany
9bec638044 [asan] add two new hidden compile-time flags for asan: asan-instrumentation-with-call-threshold and asan-memory-access-callback-prefix. This is part of the workaround for PR17409 (instrument huge functions with callbacks instead of inlined code). These flags will also help us experiment with kasan (kernel-asan) and clang
llvm-svn: 206383
2014-04-16 12:12:19 +00:00
Julien Lerouge
dd5842a2e5 Add lifetime markers for allocas created to hold byval arguments, make them
appear in the InlineFunctionInfo.

llvm-svn: 206308
2014-04-15 18:06:46 +00:00
Julien Lerouge
9ecdd0ce5b Split byval argument initialization so the memcpy(s) are injected at the
beginning of the first new block after inlining.

llvm-svn: 206307
2014-04-15 18:01:54 +00:00
Duncan P. N. Exon Smith
571c11d959 LTO: Add more loop simplification passes to LTO
Similar to r202051, add missing loop simplification passes to the LTO
optimization pipeline.

Patch by Rafael Espindola.

llvm-svn: 206306
2014-04-15 17:48:15 +00:00
Duncan P. N. Exon Smith
58154f2238 verify-di: Implement DebugInfoVerifier
Implement DebugInfoVerifier, which steals verification relying on
DebugInfoFinder from Verifier.

  - Adds LegacyDebugInfoVerifierPassPass, a ModulePass which wraps
    DebugInfoVerifier.  Uses -verify-di command-line flag.

  - Change verifyModule() to invoke DebugInfoVerifier as well as
    Verifier.

  - Add a call to createDebugInfoVerifierPass() wherever there was a
    call to createVerifierPass().

This implementation as a module pass should sidestep efficiency issues,
allowing us to turn debug info verification back on.

<rdar://problem/15500563>

llvm-svn: 206300
2014-04-15 16:27:38 +00:00
Alexey Bataev
135cfee77c D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata
llvm-svn: 206266
2014-04-15 09:37:30 +00:00
Matt Arsenault
2a6aada789 Revert "Revert r206045, "Fix shift by constants for vector.""
Fix cases where the Value itself is used, and not the constant value.

llvm-svn: 206214
2014-04-14 21:50:37 +00:00
NAKAMURA Takumi
1a21e608ca Whitespace.
llvm-svn: 206154
2014-04-14 07:03:13 +00:00
NAKAMURA Takumi
c6fb0494ea Revert r206045, "Fix shift by constants for vector."
It broke some builders, at least, i686.

llvm-svn: 206153
2014-04-14 07:02:57 +00:00
Serge Pavlov
c145d314a1 Use APInt arithmetic, fixed typo. Thanks to Benjamin Kramer for noticing that.
llvm-svn: 206144
2014-04-14 02:20:19 +00:00
Serge Pavlov
816d014c52 Recognize test for overflow in integer multiplication.
If multiplication involves zero-extended arguments and the result is
compared as in the patterns:

    %mul32 = trunc i64 %mul64 to i32
    %zext = zext i32 %mul32 to i64
    %overflow = icmp ne i64 %mul64, %zext
or
    %overflow = icmp ugt i64 %mul64 , 0xffffffff

then the multiplication may be replaced by call to umul.with.overflow.
This change fixes PR4917 and PR4918.

Differential Revision: http://llvm-reviews.chandlerc.com/D2814

llvm-svn: 206137
2014-04-13 18:23:41 +00:00
Matt Arsenault
c399a3f659 Fix shift by constants for vector.
ashr <N x iM>, <N x iM> M -> undef

llvm-svn: 206045
2014-04-11 17:57:53 +00:00
David Blaikie
1573e6e09f Implement depth_first and inverse_depth_first range factory functions.
Also updated as many loops as I could find using df_begin/idf_begin -
strangely I found no uses of idf_begin. Is that just used out of tree?

Also a few places couldn't use df_begin because either they used the
member functions of the depth first iterators or had specific ordering
constraints (I added a comment in the latter case).

Based on a patch by Jim Grosbach. (Jim - you just had iterator_range<T>
where you needed iterator_range<idf_iterator<T>>)

llvm-svn: 206016
2014-04-11 01:50:01 +00:00
Arnold Schwaighofer
c65ae6074a Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This commit reapplies 205018. After 205855 we should correctly vectorize
intrinsics.

llvm-svn: 205965
2014-04-10 13:41:35 +00:00
Alp Toker
111bd28e59 Fix some doc and comment typos
llvm-svn: 205899
2014-04-09 14:47:27 +00:00
Arnold Schwaighofer
1a503c9322 SLPVectorizer: Only vectorize intrinsics whose operands are widened equally
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.

Patch by Tyler Nowicki!

llvm-svn: 205855
2014-04-09 14:20:47 +00:00
Diego Novillo
224c7b79fe Add support for optimization reports.
Summary:
This patch adds backend support for -Rpass=, which indicates the name
of the optimization pass that should emit remarks stating when it
made a transformation to the code.

Pass names are taken from their DEBUG_NAME definitions.

When emitting an optimization report diagnostic, the lack of debug
information causes the diagnostic to use "<unknown>:0:0" as the
location string.

This is the back end counterpart for

http://llvm-reviews.chandlerc.com/D3226

Reviewers: qcolombet

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3227

llvm-svn: 205774
2014-04-08 16:42:34 +00:00
Eric Christopher
23b79cb873 Add NDEBUG markers around debug only function.
llvm-svn: 205706
2014-04-07 12:46:30 +00:00
Eric Christopher
06a9cfdefa Add debug location information to the vectorizer debug statements.
Patch by Zinovy Nis.

llvm-svn: 205705
2014-04-07 12:32:17 +00:00
David Blaikie
e0b9857e92 Fixing typo.
Differential Revision: http://reviews.llvm.org/D3154

llvm-svn: 205674
2014-04-05 20:30:31 +00:00
Eli Bendersky
be453afe99 Fix PR19270 - type mismatch caused by invalid optimization.
Patch by Jingyue Wu.

llvm-svn: 205547
2014-04-03 17:51:58 +00:00
Juergen Ributzka
da301c01ab Revert "[Constant Hoisting] Lazily compute the idom and cache the result."
This code is no longer usefull, because we only compute and use the
IDom once. There is no benefit in caching it anymore.

llvm-svn: 205498
2014-04-03 01:38:47 +00:00
Duncan P. N. Exon Smith
7f2af7c18b Revert "Reapply "LTO: add API to set strategy for -internalize""
This reverts commit r199244.

Conflicts:
	include/llvm-c/lto.h
	include/llvm/LTO/LTOCodeGenerator.h
	lib/LTO/LTOCodeGenerator.cpp

llvm-svn: 205471
2014-04-02 22:05:57 +00:00
Tim Northover
466b3a39e1 SLPVectorizer: compare entire intrinsic for SLP compatibility.
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.

llvm-svn: 205424
2014-04-02 14:39:02 +00:00
Hal Finkel
5a327230eb [LoopVectorizer] Count dependencies of consecutive pointers as uniforms
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).

For example, the TSVC benchmark function s173 has:
  ...
  %3 = add nsw i64 %indvars.iv, 16000
  %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
  ...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.

Fixes PR19296.

llvm-svn: 205387
2014-04-02 02:34:49 +00:00
Hal Finkel
b3f2a21eed Add some additional fields to TTI::UnrollingPreferences
In preparation for an upcoming commit implementing unrolling preferences for
x86, this adds additional fields to the UnrollingPreferences structure:

 - PartialThreshold and PartialOptSizeThreshold - Like Threshold and
   OptSizeThreshold, but used when not fully unrolling. These are necessary
   because we need different thresholds for full unrolling from those used when
   partially unrolling (the full unrolling thresholds are generally going to be
   larger).

 - MaxCount - A cap on the unrolling factor when partially unrolling. This can
   be used by a target to prevent the unrolled loop from exceeding some
   resource limit independent of the loop size (such as number of branches).

There should be no functionality change for any in-tree targets.

llvm-svn: 205347
2014-04-01 18:50:30 +00:00
Hal Finkel
dc0e116444 Move partial/runtime unrolling late in the pipeline
The generic (concatenation) loop unroller is currently placed early in the
standard optimization pipeline. This is a good place to perform full unrolling,
but not the right place to perform partial/runtime unrolling. However, most
targets don't enable partial/runtime unrolling, so this never mattered.

However, even some x86 cores benefit from partial/runtime unrolling of very
small loops, and follow-up commits will enable this. First, we need to move
partial/runtime unrolling late in the optimization pipeline (importantly, this
is after SLP and loop vectorization, as vectorization can drastically change
the size of a loop), while keeping the full unrolling where it is now. This
change does just that.

llvm-svn: 205264
2014-03-31 23:23:51 +00:00
Arnold Schwaighofer
219f6a43e0 Revert "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This reverts commit r205018.

Conflicts:
	lib/Transforms/Vectorize/SLPVectorizer.cpp
	test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

This is breaking libclc build.

llvm-svn: 205260
2014-03-31 23:05:56 +00:00
Rafael Espindola
18c992ab85 Add a missing break.
Patch by Tobias Güntner.

I tried to write a test, but the only difference is the Changed value that
gets returned. It can be tested with "opt -debug-pass=Executions -functionattrs,
but that doesn't seem worth it.

llvm-svn: 205121
2014-03-30 03:26:17 +00:00
Tim Northover
2f13163a84 ARM64: initial backend import
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.

Everything will be easier with the target in-tree though, hence this
commit.

llvm-svn: 205090
2014-03-29 10:18:08 +00:00
Arnold Schwaighofer
bf6c68c0be SLPVectorizer: Take credit for free extractelement instructions
Extract element instructions that will be removed when vectorzing lower the
cost.

Patch by Arch D. Robison!

llvm-svn: 205020
2014-03-28 17:21:32 +00:00
Arnold Schwaighofer
ffb5e31163 SLPVectorizer: Fix typos
Patch by Arch D. Robison!

llvm-svn: 205019
2014-03-28 17:21:27 +00:00
Arnold Schwaighofer
8510d16f52 SLPVectorizer: Ignore users that are insertelements we can reschedule them
Patch by Arch D. Robison!

llvm-svn: 205018
2014-03-28 17:21:22 +00:00
Erik Verbruggen
11e61b79e5 Revert "InstCombine: merge constants in both operands of icmp."
This reverts commit r204912, and follow-up commit r204948.

This introduced a performance regression, and the fix is not completely
clear yet.

llvm-svn: 205010
2014-03-28 14:50:57 +00:00
Erik Verbruggen
06cf5cbf74 Revert "GVN: merge overflow intrinsics with non-overflow instructions."
This reverts commit r203553, and follow-up commits r203558 and r203574.

I will follow this up on the mailinglist to do it in a way that won't
cause subtle PRE bugs.

llvm-svn: 205009
2014-03-28 14:42:34 +00:00
Adrian Prantl
3324d55193 C++11: convert verbose loops to range-based loops.
llvm-svn: 204981
2014-03-27 23:30:04 +00:00
Reid Kleckner
c826dce075 InstCombine: Don't combine constants on unsigned icmps
Fixes a miscompile introduced in r204912.  It would miscompile code like
(unsigned)(a + -49) <= 5U.  The transform would turn this into
(unsigned)a < 55U, which would return true for values in [0, 49], when
it should not.

llvm-svn: 204948
2014-03-27 17:49:27 +00:00
Rafael Espindola
5c8926deed Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Erik Verbruggen
5e4efd4306 InstCombine: merge constants in both operands of icmp.
Transform:
    icmp X+Cst2, Cst
into:
    icmp X, Cst-Cst2
when Cst-Cst2 does not overflow, and the add has nsw.

llvm-svn: 204912
2014-03-27 11:16:05 +00:00
Nick Lewycky
a6e0e1eae1 Treat lifetime.start'd memory like we treat freshly alloca'd memory. Patch by Björn Steinbrink!
llvm-svn: 204876
2014-03-26 23:45:15 +00:00
Reid Kleckner
509530b2ae CloneFunction: Clone all attributes, including the CC
Summary:
Tested with a unit test because we don't appear to have any transforms
that use this other than ASan, I think.

Fixes PR17935.

Reviewers: nicholas

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3194

llvm-svn: 204866
2014-03-26 22:26:35 +00:00
Rafael Espindola
63a8ff6883 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Rafael Espindola
c9179b8b50 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Juergen Ributzka
822c198051 [Constant Hoisting] Make the constant candidate map local to the collectConstantCandidates method.
llvm-svn: 204758
2014-03-25 21:21:10 +00:00
Richard Osborne
fd123c2caf [InstCombine] Don't fold bitcast into store if it would need addrspacecast
Summary:
Previously the code didn't check if the before and after types for the
store were pointers to different address spaces. This resulted in
instcombine using a bitcast to convert between pointers to different
address spaces, causing an assertion due to the invalid cast.

It is not be appropriate to use addrspacecast this case because it is
not guaranteed to be a no-op cast. Instead bail out and do not do the
transformation.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3117

llvm-svn: 204733
2014-03-25 17:21:41 +00:00
Richard Osborne
db5d56840b Reuse earlier variables to make it clear the types involved in the cast.
No functionality change.

llvm-svn: 204732
2014-03-25 17:21:35 +00:00
Evgeniy Stepanov
ad64faed33 [msan] More precise instrumentation of select IR.
Some bits of select result may be initialized even if select condition
is not.

https://code.google.com/p/memory-sanitizer/issues/detail?id=50

llvm-svn: 204716
2014-03-25 13:08:34 +00:00
Andrew Trick
16d04697fd SLP vectorizer: Don't hoist vector extracts of phis.
Extracts coming from phis were being hoisted, while all others were
sunk to their uses. This was inconsistent and didn't seem to serve a
purpose. Changing all extracts to be sunk to uses is a prerequisite
for adding block frequency to the SLP vectorizer's cost model.

I benchmarked the change in isolation (without block frequency). I
only saw noise on x86 and some potentially significant improvements on
ARM. No major regressions is good enough for me.

llvm-svn: 204699
2014-03-25 02:18:47 +00:00
Nuno Lopes
79d18a66ec remove a bunch of unused private methods
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.

 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)

llvm-svn: 204560
2014-03-23 17:09:26 +00:00
Lang Hames
75ea6aebf8 Revert r204076 for now - it caused significant regressions in a number of
benchmarks.

<rdar://problem/16368461>

llvm-svn: 204558
2014-03-23 04:22:31 +00:00
Juergen Ributzka
9a985a07a3 [Constant Hoisting] Erase dead cast instructions.
The cleanup code that removes dead cast instructions only removed them from the
basic block, but didn't delete them. This fix erases them now too.

llvm-svn: 204538
2014-03-22 01:49:30 +00:00
Juergen Ributzka
a05de79cbb [Constant Hoisting] Fix multiple entries for the same basic block in PHI nodes.
A PHI node usually has only one value/basic block pair per incoming basic block.
In the case of a switch statement it is possible that a following PHI node may
have more than one such pair per incoming basic block. E.g.:
%0 = phi i64 [ 123456, %case2 ], [ 654321, %Entry ], [ 654321, %Entry ]
This is valid and the verfier doesn't complain, because both values are the
same.

Constant hoisting materializes the constant for each operand separately and the
value is still the same, but the variable names have changed. As a result the
verfier can't recognize anymore that they are the same value and complains.

This fix adds special update code for PHI node in constant hoisting to prevent
this corner case.

This fixes <rdar://problem/16394449>

llvm-svn: 204537
2014-03-22 01:49:27 +00:00
Arnaud A. de Grandmaison
c417a5d328 Remove some dead assignements found by scan-build
llvm-svn: 204526
2014-03-21 21:54:46 +00:00
Tom Stellard
fe1239b8cb Sink: Don't sink static allocas from the entry block
CodeGen treats allocas outside the entry block as dynamically sized
stack objects.

llvm-svn: 204473
2014-03-21 15:51:51 +00:00
Juergen Ributzka
4470e9c92d [Constant Hoisting] Make the constant materialization cost operand dependent
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.

Related to <rdar://problem/16381500>

llvm-svn: 204435
2014-03-21 06:04:45 +00:00
Juergen Ributzka
a65ce371c7 [Constant Hoisting] Lazily compute the idom and cache the result.
Related to <rdar://problem/16381500>

llvm-svn: 204434
2014-03-21 06:04:39 +00:00
Juergen Ributzka
b52c2c678c [Constant Hoisting] Change the algorithm to only track constants for instructions.
Originally the algorithm would search for expensive constants and track their
users, which could be instructions and constant expressions. This change only
tracks the constants for instructions, but constant expressions are indirectly
covered too. If an operand is an constant expression, then we look through the
expression to find anny expensive constants.

The algorithm keep now track of the instruction and the operand index where the
constant is used. This allows more precise hoisting of constant materialization
code for PHI instructions, because we only hoist to the basic block of the
incoming operand. Before we had to find the idom of all PHI operands and hoist
the materialization code there.

This also makes updating of instructions easier. Before we had to keep track of
the original constant, find it in the instructions, and then replace it. Now we
can just simply update the operand.

Related to <rdar://problem/16381500>

llvm-svn: 204433
2014-03-21 06:04:36 +00:00
Juergen Ributzka
2e77fbe182 [Constant Hoisting] Fix capitalization of function names.
llvm-svn: 204432
2014-03-21 06:04:33 +00:00
Juergen Ributzka
60d9807b0e [Constant Hoisting] Replace the MapVector with a separate Map and Vector to keep track of constant candidates.
This simplifies working with the constant candidates and removes the tight
coupling between the map and the vector.

Related to <rdar://problem/16381500>

llvm-svn: 204431
2014-03-21 06:04:30 +00:00
Juergen Ributzka
c55e0f3fc7 Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."
I will break this up into smaller pieces for review and recommit.

llvm-svn: 204393
2014-03-20 20:17:13 +00:00
Juergen Ributzka
7dae5f7baa [Constant Hoisting] Extend coverage of the constant hoisting pass.
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.

Related to <rdar://problem/16381500>

llvm-svn: 204389
2014-03-20 19:55:52 +00:00
Mark Seaborn
ade468f2c3 Remove LowerInvoke's obsolete "-enable-correct-eh-support" option
This option caused LowerInvoke to generate code using SJLJ-based
exception handling, but there is no code left that interprets the
jmp_buf stack that the resulting code maintained (llvm.sjljeh.jblist).
This option has been obsolete for a while, and replaced by
SjLjEHPrepare.

This leaves the default behaviour of LowerInvoke, which is to convert
invokes to calls.

Differential Revision: http://llvm-reviews.chandlerc.com/D3136

llvm-svn: 204388
2014-03-20 19:54:47 +00:00
Alexander Potapenko
0eb130e34f [ASan] Do not instrument globals from the llvm.metadata section.
Fixes https://code.google.com/p/address-sanitizer/issues/detail?id=279.

llvm-svn: 204331
2014-03-20 10:48:34 +00:00
Evgeniy Stepanov
17d50b69f6 Set debug info for instructions inserted in SplitBlockAndInsertIfThen.
llvm-svn: 204230
2014-03-19 12:56:38 +00:00
Duncan P. N. Exon Smith
6fc0b70a7b Fix use_iterator crash in ObjCArc from r203364
The use_iterator redesign in r203364 introduced an increment past the
end of a range in -objc-arc-contract.  Added an explicit check for the
end of the range.

<rdar://problem/16333235>

llvm-svn: 204195
2014-03-18 22:32:43 +00:00
Chandler Carruth
536fb8893d [LV] While I'm here, use range based for loops which are so much cleaner
for this kind of walk.

llvm-svn: 204188
2014-03-18 22:00:32 +00:00
Chandler Carruth
9fa85c489d [LV] The actual change I intended to commit in r204148. Sorry for the
noise.

Original commit log:
Replace some dead code with an assert. When I first ported this pass
from a loop pass to a function pass I did so in the naive, recursive
way. It doesn't actually work, we need a worklist instead. When
I switched to the worklist I didn't delete the naive recursion. That
recursion was also buggy because it was dead and never really exercised.

llvm-svn: 204187
2014-03-18 21:58:38 +00:00
Chandler Carruth
4ac3e74751 [LV] Replace some dead code with an assert. When I first ported this
pass from a loop pass to a function pass I did so in the naive,
recursive way. It doesn't actually work, we need a worklist instead.
When I switched to the worklist I didn't delete the naive recursion.
That recursion was also buggy because it was dead and never really
exercised.

llvm-svn: 204184
2014-03-18 21:51:46 +00:00
Evgeniy Stepanov
4e42dcfe00 [msan] Origin tracking with history.
LLVM part of MSan implementation of advanced origin tracking,
when we record not only creation point, but all locations where
an uninitialized value was stored to memory, too.

llvm-svn: 204151
2014-03-18 13:30:56 +00:00
Diego Novillo
119221ccbc Tolerate unmangled names in sample profiles.
Summary:
The compiler does not always generate linkage names. If a function
has been inlined and its body elided, its linkage name may not be
generated.

When the binary executes, the profiler will use its unmangled name
when attributing samples. This results in unmangled names in the
input profile.

We are currently failing hard when this happens. However, in this case
all that happens is that we fail to attribute samples to the inlined
function. While this means fewer optimization opportunities, it should
not cause a compilation failure.

This patch accepts all valid function names, regardless of whether
they were mangled or not.

Reviewers: chandlerc

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3087

llvm-svn: 204142
2014-03-18 12:03:12 +00:00
Evgeniy Stepanov
cfd1cf2b01 [msan] Kill -msan-store-clean-origin flag.
Not only is it slower than the alternative, but also subtly broken.
This commit does not change the default behavior.

llvm-svn: 204131
2014-03-18 09:47:06 +00:00
Alon Mishne
70ba46ff38 [C++11] Change DebugInfoFinder to use range-based loops
Also changes the iterators to return actual DI type over MDNode.

llvm-svn: 204130
2014-03-18 09:41:07 +00:00
Evgeniy Stepanov
7ad8a1f5a2 [msan] Remove unused code.
llvm-svn: 204125
2014-03-18 08:29:42 +00:00
Dan Gohman
b0339af0e1 Use range metadata instead of introducing selects.
When GlobalOpt has determined that a GlobalVariable only ever has two values,
it would convert the GlobalVariable to a boolean, and introduce SelectInsts
at every load, to choose between the two possible values. These SelectInsts
introduce overhead and other unpleasantness.

This patch makes GlobalOpt just add range metadata to loads from such
GlobalVariables instead. This enables the same main optimization (as seen in
test/Transforms/GlobalOpt/integer-bool.ll), without introducing selects.

The main downside is that it doesn't get the memory savings of shrinking such
GlobalVariables, but this is expected to be negligible.

llvm-svn: 204076
2014-03-17 19:57:04 +00:00
Eli Bendersky
631277f3dd Consistent use of the noduplicate attribute.
The "noduplicate" attribute of call instructions is sometimes queried directly
and sometimes through the cannotDuplicate() predicate. This patch streamlines
all queries to use the cannotDuplicate() predicate. It also adds this predicate
to InvokeInst, to mirror what CallInst has.

llvm-svn: 204049
2014-03-17 16:19:07 +00:00
David Blaikie
60ddd2b93c Remove named Twine.
While technically correct, we generally disallow any instance of named
Twines due to their subtlety.

llvm-svn: 204016
2014-03-16 01:36:18 +00:00
Arnaud A. de Grandmaison
4544f80f7c Remove some dead assignements found by scan-build
llvm-svn: 204013
2014-03-15 22:13:15 +00:00
Benjamin Kramer
8e0892b4f3 LSR: Compress a pair (and get rid of the DenseMapInfo for it).
Also convert a horrible hash function to use our hashing infrastructure.
No functionality change.

llvm-svn: 204008
2014-03-15 17:17:48 +00:00
NAKAMURA Takumi
2d8e94aadc SampleProfile.cpp: Fix take #2. The issue was abuse of StringRef here.
llvm-svn: 203996
2014-03-15 01:56:17 +00:00
NAKAMURA Takumi
8cb5938310 SampleProfile.cpp: Quick fix to r203976 about abuse of Twine. The life of Twine was too short.
FIXME: DiagnosticInfoSampleProfile should not hold Twine&.
llvm-svn: 203990
2014-03-15 00:10:12 +00:00
Diego Novillo
6368420655 Re-format SampleProfile.cpp with clang-format. No functional changes.
llvm-svn: 203977
2014-03-14 22:07:18 +00:00
Diego Novillo
a9a26c6236 Use DiagnosticInfo facility.
Summary:
The sample profiler pass emits several error messages. Instead of
just aborting the compiler with report_fatal_error, we can emit
better messages using DiagnosticInfo.

This adds a new sub-class of DiagnosticInfo to handle the sample
profiler.

Reviewers: chandlerc, qcolombet

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3086

llvm-svn: 203976
2014-03-14 21:58:59 +00:00
Alexander Potapenko
a22163b83f [ASan] Fix https://code.google.com/p/address-sanitizer/issues/detail?id=274
by ignoring globals from __TEXT,__cstring,cstring_literals during instrumenation.
Add a regression test.

llvm-svn: 203916
2014-03-14 10:41:49 +00:00
Stepan Dyatkovskiy
2789f696ba MergeFunctions, cmpType: fixed variable names from XXTy1 and XXTy2 to XXTyL and XXTyR.
llvm-svn: 203907
2014-03-14 08:48:52 +00:00
Stepan Dyatkovskiy
0504c3145a MergeFunctions, cmpType: Fixed comments wrapping.
llvm-svn: 203905
2014-03-14 08:17:19 +00:00
Owen Anderson
3a006737fe Fix a bug in InstCombine where we would incorrectly attempt to construct a
bitcast between pointers of two different address spaces if they happened to have
the same pointer size.

llvm-svn: 203862
2014-03-13 22:51:43 +00:00
Evgeniy Stepanov
04442bc559 [msan] Fix handling of byval arguments in VarArg calls.
llvm-svn: 203794
2014-03-13 13:17:11 +00:00
Stepan Dyatkovskiy
47660351ae First patch of patch series that improves MergeFunctions performance time from O(N*N) to
O(N*log(N)). The idea is to introduce total ordering among functions set.
That allows to build binary tree and perform function look-up procedure in O(log(N)) time. 

This patch description:
Introduced total ordering among Type instances. Actually it is improvement for existing
isEquivalentType.
0. Coerce pointer of 0 address space to integer.
1. If left and right types are equal (the same Type* value), return 0 (means equal).
2. If types are of different kind (different type IDs). Return result of type IDs
comparison, treating them as numbers.
3. If types are vectors or integers, return result of its
pointers comparison (casted to numbers).
4. Check whether type ID belongs to the next group: 
* Void 
* Float 
* Double 
* X86_FP80 
* FP128 
* PPC_FP128 
* Label 
* Metadata 
If so, return 0.
5. If left and right are pointers, return result of address space
comparison (numbers comparison).
6. If types are complex.
Then both LEFT and RIGHT will be expanded and their element types will be checked with
the same way. If we get Res != 0 on some stage, return it. Otherwise return 0.
7. For all other cases put llvm_unreachable.

llvm-svn: 203788
2014-03-13 11:54:50 +00:00
Mark Seaborn
91085966b7 Fix typo in comment: "inwoke" -> "invoke"
llvm-svn: 203739
2014-03-13 00:04:17 +00:00
Raul E. Silvera
1c39640e2d Resubmit "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8.
The problems previously found have been resolved through other CLs.

llvm-svn: 203707
2014-03-12 20:21:50 +00:00
Hans Wennborg
a2aafbb7b5 Allow switch-to-lookup table for tables with holes by adding bitmask check
This allows us to generate table lookups for code such as:

  unsigned test(unsigned x) {
    switch (x) {
      case 100: return 0;
      case 101: return 1;
      case 103: return 2;
      case 105: return 3;
      case 107: return 4;
      case 109: return 5;
      case 110: return 6;
      default: return f(x);
    }
  }

Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.

Patch by Jasper Neumann!

llvm-svn: 203694
2014-03-12 18:35:40 +00:00
Evan Cheng
f2d3d2bf92 Revert r203488 and r203520.
llvm-svn: 203687
2014-03-12 18:09:37 +00:00
Eli Bendersky
3af4500090 Revive SizeOptLevel-explaining comments that were dropped in r203669
llvm-svn: 203675
2014-03-12 16:44:17 +00:00
Eli Bendersky
fa2b4f20f2 Move duplicated code into a helper function (exposed through overload).
There's a bit of duplicated "magic" code in opt.cpp and Clang's CodeGen that
computes the inliner threshold from opt level and size opt level.

This patch moves the code to a function that lives alongside the inliner itself,
providing a convenient overload to the inliner creation.

A separate patch can be committed to Clang to use this once it's committed to
LLVM. Standalone tools that use the inlining pass can also avoid duplicating
this code and fearing it will go out of sync.

Note: this patch also restructures the conditinal logic of the computation to
be cleaner.

llvm-svn: 203669
2014-03-12 16:12:36 +00:00
Alon Mishne
00d720ff32 Cloning a function now also clones its debug metadata if 'ModuleLevelChanges' is true.
llvm-svn: 203662
2014-03-12 14:42:51 +00:00
Erik Verbruggen
11cc704d2c Fix crash in PRE.
After r203553 overflow intrinsics and their non-intrinsic (normal)
instruction get hashed to the same value. This patch prevents PRE from
moving an instruction into a predecessor block, and trying to add a phi
node that gets two different types (the intrinsic result and the
non-intrinsic result), resulting in a failing assert.

llvm-svn: 203574
2014-03-11 15:07:32 +00:00
Tim Northover
68c567a38a IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Erik Verbruggen
c2bf18261b GVN: fix hashing of extractvalue.
My last commit did not add the indexes to the hashed value for
extractvalue. Adding that back in.

llvm-svn: 203558
2014-03-11 10:21:30 +00:00
Erik Verbruggen
638ff95018 GVN: merge overflow intrinsics with non-overflow instructions.
When an overflow intrinsic is followed by a non-overflow instruction,
replace the latter with an extract. For example:

  %sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
  %sadd3 = add i32 %a, %b

Here the add statement will be replaced by an extract.

When an overflow intrinsic follows a non-overflow instruction, a clone
of the intrinsic is inserted before the normal instruction, which makes
it the same as the previous case. Subsequent runs of GVN can then clean
up the duplicate instructions and insert the extract.

This fixes PR8817.

llvm-svn: 203553
2014-03-11 09:36:48 +00:00
Duncan P. N. Exon Smith
f9624311ce Cleanup whitespace
llvm-svn: 203529
2014-03-11 02:44:45 +00:00
Evan Cheng
9a155c5f78 Follow up to r203488. Code clean up to eliminate a lot of copy+paste.
llvm-svn: 203520
2014-03-11 00:24:20 +00:00
Diego Novillo
dd37be24ca Use discriminator information in sample profiles.
Summary:
When the sample profiles include discriminator information,
use the discriminator values to distinguish instruction weights
in different basic blocks.

This modifies the BodySamples mapping to map <line, discriminator> pairs
to weights. Instructions on the same line but different blocks, will
use different discriminator values. This, in turn, means that the blocks
may have different weights.

Other changes in this patch:

- Add tests for positive values of line offset, discriminator and samples.
- Change data types from uint32_t to unsigned and int and do additional
  validation.

Reviewers: chandlerc

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2857

llvm-svn: 203508
2014-03-10 22:41:28 +00:00
Benjamin Kramer
108d24886e MemCpyOpt: When merging memsets also merge the trivial case of two memsets with the same destination.
The testcase is from PR19092, but I think the bug described there is actually a clang issue.

llvm-svn: 203489
2014-03-10 21:05:13 +00:00
Evan Cheng
b0fdca31bc For functions with ARM target specific calling convention, when simplify-libcall
optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.

e.g.
extern double pow(double, double);
double t(double x) {
  return pow(10, x);
}

Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
  ret double %0
}

declare double @llvm.pow.f64(double, double) #1

Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %__exp10 = call double @__exp10(double %x) #1
  ret double %__exp10
}

declare double @__exp10(double)

The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.

I can think of 3 options to fix this.

1. Make "C" calling convention just work since the target should know what CC
   is being used.

   This doesn't work because each function can use different CC with the "pcs"
   attribute.

2. Have Clang add the right CC keyword on the calls to LLVM builtin.

   This will work but it doesn't match the LLVM IR specification which states
   these are "Standard C Library Intrinsics".

3. Fix simplify libcall so the resulting calls to the C routines will have the
   proper CC keyword. e.g.
   %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1

   This works and is the solution I implemented here.

Both solutions #2 and #3 would work. After carefully considering the pros and
cons, I decided to implement #3 for the following reasons.

1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.

There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
   that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
   simplify-libcall that's not handled.

But for now, this is the fix that I'm most comfortable with.

llvm-svn: 203488
2014-03-10 20:49:45 +00:00
Benjamin Kramer
488ab03435 SimplifyCFG: Simplify the weight scaling algorithm.
No change in functionality.

llvm-svn: 203413
2014-03-09 14:42:55 +00:00
Ahmed Charles
e4b10534bd Fix build break.
llvm-svn: 203366
2014-03-09 03:50:36 +00:00
Chandler Carruth
fad39ebe19 [C++11] Add range based accessors for the Use-Def chain of a Value.
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
   detail
2) Change it to actually be a *Use* iterator rather than a *User*
   iterator.
3) Add an adaptor which is a User iterator that always looks through the
   Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
   they wanted a use_iterator (and to explicitly dig out the User when
   needed), or a user_iterator which makes the Use itself totally
   opaque.

Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.

The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.

However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]

llvm-svn: 203364
2014-03-09 03:16:01 +00:00
Benjamin Kramer
aaa10dc26a [C++11] Revert uses of lambdas with array_pod_sort.
Looks like GCC implements the lambda->function pointer conversion differently.

llvm-svn: 203294
2014-03-07 21:52:38 +00:00
Benjamin Kramer
f042a6ba0a [C++11] Convert sort predicates into lambdas.
No functionality change.

llvm-svn: 203288
2014-03-07 21:35:39 +00:00
Tim Northover
b74aa030d9 InstCombine: form shuffles from wider range of insert/extractelements
Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).

This should allow shuffles with different numbers of elements on the
input and output sides as well.

llvm-svn: 203229
2014-03-07 10:24:44 +00:00
Ahmed Charles
52ce0c101e Replace OwningPtr<T> with std::unique_ptr<T>.
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.

llvm-svn: 203083
2014-03-06 05:51:42 +00:00
Chandler Carruth
a48d15a676 [Layering] Move InstVisitor.h into the IR library as it is pretty
obviously coupled to the IR.

llvm-svn: 203064
2014-03-06 03:23:41 +00:00
Chandler Carruth
0873afae39 [Layering] Move DebugInfo.h into the IR library where its implementation
already lives.

llvm-svn: 203046
2014-03-06 00:46:21 +00:00
Chandler Carruth
2b135c4e9f [Layering] Move DIBuilder.h into the IR library where its implementation
already lives.

llvm-svn: 203038
2014-03-06 00:22:06 +00:00
Arnold Schwaighofer
adebac793b LoopVectorizer: Preserve fast-math flags
Fixes PR19045.

llvm-svn: 203008
2014-03-05 21:10:47 +00:00
Chandler Carruth
797ae6fd0d [Layering] Move DebugLoc.h into the IR library. The implementation
already lived there and it is where it belongs -- this is the in-memory
debug location representation.

This is just cleanup -- Modules can actually cope with this, but that
doesn't make it right. After chatting with folks that have out-of-tree
stuff, going ahead and moving the rest of the headers seems preferable.

llvm-svn: 202960
2014-03-05 10:30:38 +00:00
Chandler Carruth
0e2a8390e0 [C++11] Make this interface accept const Use pointers and use override
to ensure we don't mess up any of the overrides. Necessary for cleaning
up the Value use iterators and enabling range-based traversing of use
lists.

llvm-svn: 202958
2014-03-05 10:21:48 +00:00
Ahmed Charles
4a96a15754 [C++11] Replace OwningPtr::take() with OwningPtr::release().
llvm-svn: 202957
2014-03-05 10:19:29 +00:00
Craig Topper
a3683ec835 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 202953
2014-03-05 09:10:37 +00:00
Chandler Carruth
436597fe00 [Modules] Move the ConstantRange class into the IR library. This is
a bit surprising, as the class is almost entirely abstracted away from
any particular IR, however it encodes the comparsion predicates which
mutate ranges as ICmp predicate codes. This is reasonable as they're
used for both instructions and constants. Thus, it belongs in the IR
library with instructions and constants.

llvm-svn: 202838
2014-03-04 12:24:34 +00:00
Chandler Carruth
4b66708834 [Modules] Move the PredIteratorCache into the IR library -- it is
hardcoded to use IR BasicBlocks.

llvm-svn: 202835
2014-03-04 12:09:19 +00:00
Chandler Carruth
248195469c [Modules] Move the NoFolder into the IR library as it creates
instructions.

llvm-svn: 202834
2014-03-04 12:05:47 +00:00
Chandler Carruth
b4f244209e [Modules] Move the TargetFolder into the Analysis library. Historically,
this would have been required because of the use of DataLayout, but that
has moved into the IR proper. It is still required because this folder
uses the constant folding in the analysis library (which uses the
datalayout) as the more aggressive basis of its folder.

llvm-svn: 202832
2014-03-04 11:59:06 +00:00
Chandler Carruth
075812f27c [Modules] Move CFG.h to the IR library as it defines graph traits over
IR types.

llvm-svn: 202827
2014-03-04 11:45:46 +00:00
Chandler Carruth
63713e9f95 [Modules] Move ValueMap to the IR library. While this class does not
directly care about the Value class (it is templated so that the key can
be any arbitrary Value subclass), it is in fact concretely tied to the
Value class through the ValueHandle's CallbackVH interface which relies
on the key type being some Value subclass to establish the value handle
chain.

Ironically, the unittest is already in the right library.

llvm-svn: 202824
2014-03-04 11:26:31 +00:00
Chandler Carruth
649f6270aa [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth
d0657fe39f [Modules] Move the LLVM IR pattern match header into the IR library, it
obviously is coupled to the IR.

llvm-svn: 202818
2014-03-04 11:08:18 +00:00
Chandler Carruth
cfb81122cc [Modules] Move CallSite into the IR library where it belogs. It is
abstracting between a CallInst and an InvokeInst, both of which are IR
concepts.

llvm-svn: 202816
2014-03-04 11:01:28 +00:00
Chandler Carruth
0bf5689f06 [Modules] Move GetElementPtrTypeIterator into the IR library. As its
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.

Another step of modularizing the support library.

llvm-svn: 202815
2014-03-04 10:40:04 +00:00
Chandler Carruth
d7b36fdea7 [Modules] Move InstIterator out of the Support library, where it had no
business.

This header includes Function and BasicBlock and directly uses the
interfaces of both classes. It has to do with the IR, it even has that
in the name. =] Put it in the library it belongs to.

This is one step toward making LLVM's Support library survive a C++
modules bootstrap.

llvm-svn: 202814
2014-03-04 10:30:26 +00:00
Chandler Carruth
cd48c56575 [cleanup] Re-sort all the includes with utils/sort_includes.py.
llvm-svn: 202811
2014-03-04 10:07:28 +00:00
Diego Novillo
2ccb22f509 Pass to emit DWARF path discriminators.
DWARF discriminators are used to distinguish multiple control flow paths
on the same source location. When this happens, instructions across
basic block boundaries will share the same debug location.

This pass detects this situation and creates a new lexical scope to one
of the two instructions. This lexical scope is a child scope of the
original and contains a new discriminator value. This discriminator is
then picked up from MCObjectStreamer::EmitDwarfLocDirective to be
written on the object file.

This fixes http://llvm.org/bugs/show_bug.cgi?id=18270.

llvm-svn: 202752
2014-03-03 20:06:11 +00:00
Benjamin Kramer
6b03dd4034 [C++11] Use std::tie to simplify compare operators.
No functionality change.

llvm-svn: 202751
2014-03-03 19:58:30 +00:00
Benjamin Kramer
ed651fd956 [C++11] Remove a leftover std::function instance.
It's not needed anymore.

llvm-svn: 202748
2014-03-03 19:49:02 +00:00
Chandler Carruth
dfdccab32b [C++11] Remove the completely unnecessary requirement on SetVector's
remove_if that its predicate is adaptable. We don't actually need this,
we can write a generic adapter for any predicate.

This lets us remove some very wrong std::function usages. We should
never be using std::function for predicates to algorithms. This incurs
an *indirect* call overhead for every evaluation of the predicate, and
makes it very hard to inline through.

llvm-svn: 202742
2014-03-03 19:28:52 +00:00
Evgeniy Stepanov
218cb7e60b [msan] Handle X86 SIMD bitshift intrinsics.
llvm-svn: 202712
2014-03-03 13:47:42 +00:00
Tobias Grosser
8d3b137235 [C++11] Add a basic block range view for RegionInfo
This also switches the users in LLVM to ensure this functionality is tested.

llvm-svn: 202705
2014-03-03 13:00:39 +00:00
Chandler Carruth
a7b2a4e865 [C++11] Add two range adaptor views to User: operands and
operand_values. The first provides a range view over operand Use
objects, and the second provides a range view over the Value*s being
used by those operands.

The naming is "STL-style" rather than "LLVM-style" because we have
historically named iterator methods STL-style, and range methods seem to
have far more in common with their iterator counterparts than with
"normal" APIs. Feel free to bikeshed on this one if you want, I'm happy
to change these around if people feel strongly.

I've switched code in SROA and LCG to exercise these mostly to ensure
they work correctly -- we don't really have an easy way to unittest this
and they're trivial.

llvm-svn: 202687
2014-03-03 10:42:58 +00:00
Benjamin Kramer
3ac154a395 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer
e4eb1b495f [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Craig Topper
b0056a4ca7 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
Chandler Carruth
db906c8499 [C++11] Switch all uses of the llvm_move macro to use std::move
directly, and remove the macro.

llvm-svn: 202612
2014-03-02 04:08:41 +00:00
Benjamin Kramer
803ba41365 Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Reid Kleckner
4698f3f991 Reflow isProfitableToMakeFastCC
llvm-svn: 202555
2014-02-28 22:50:08 +00:00
Kostya Serebryany
1d5ff228f0 [asan] fix a pair of silly typos
llvm-svn: 202391
2014-02-27 13:13:59 +00:00
Kostya Serebryany
71b9995519 [asan] disable asan-detect-invalid-pointer-pair (was enabled by mistake)
llvm-svn: 202390
2014-02-27 12:56:20 +00:00
Kostya Serebryany
089f21bdde [asan] *experimental* implementation of invalid-pointer-pair detector (finds when two unrelated pointers are compared or subtracted). This implementation has both false positives and false negatives and is not tuned for performance. A bug report for a proper implementation will follow.
llvm-svn: 202389
2014-02-27 12:45:36 +00:00
Reid Kleckner
995e719494 GlobalOpt: Apply fastcc to internal x86_thiscallcc functions
We should apply fastcc whenever profitable.  We can expand this list,
but there are lots of conventions with performance implications that we
don't want to change.

Differential Revision: http://llvm-reviews.chandlerc.com/D2705

llvm-svn: 202293
2014-02-26 19:57:30 +00:00
Andrew Trick
850c9f4adf Fix PR18165: LSR must avoid scaling factors that exceed the limit on truncated use.
Patch by Michael Zolotukhin!

llvm-svn: 202273
2014-02-26 16:31:56 +00:00
Chandler Carruth
73312313f4 [SROA] Use the correct index integer size in GEPs through non-default
address spaces.

This isn't really a correctness issue (the values are truncated) but its
much cleaner.

Patch by Matt Arsenault!

llvm-svn: 202252
2014-02-26 10:08:16 +00:00
Chandler Carruth
d24c86e0dd [SROA] Teach SROA how to handle pointers from address spaces other than
the default.

Based on the patch by Matt Arsenault, D1764!

I switched one place to use the more direct pointer type to compute the
desired address space, and I reworked the memcpy rewriting section to
reflect significant refactorings that this patch helped inspire.

Thanks to several of the folks who helped review and improve the patch
as well.

llvm-svn: 202247
2014-02-26 08:25:02 +00:00
Chandler Carruth
33730334ba [SROA] Split the alignment computation complete for the memcpy rewriting
to work independently for the slice side and the other side.

This allows us to only compute the minimum of the two when we actually
rewrite to a memcpy that needs to take the minimum, and preserve higher
alignment for one side or the other when rewriting to loads and stores.

This fix was inspired by seeing the result of some refactoring that
makes addrspace handling better.

llvm-svn: 202242
2014-02-26 07:29:54 +00:00
Chandler Carruth
724a260ac5 [SROA] The original refactoring inspired by the addrspace patch in
D1764, which in turn set off the other refactorings to make
'getSliceAlign()' a sensible thing.

There are two possible inputs to the required alignment of a memory
transfer intrinsic: the alignment constraints of the source and the
destination. If we are *only* introducing a (potentially new) offset
onto one side of the transfer, we don't need to consider the alignment
constraints of the other side. Use this to simplify the logic feeding
into alignment computation for unsplit transfers.

Also, hoist the clamp of the magical zero alignment for these intrinsics
to the more customary one alignment early. This lets several other
conditions melt away.

No functionality changed. There is a further improvement this exposes
which *will* change functionality, but that's arriving in a separate
patch.

llvm-svn: 202232
2014-02-26 05:33:36 +00:00
Chandler Carruth
b93e3941c1 [SROA] Yet another slight refactoring that simplifies an API in the
rewriting logic: don't pass custom offsets for the adjusted pointer to
the new alloca.

We always passed NewBeginOffset here. Sometimes we spelled it
BeginOffset, but only when they were in fact equal. Whats worse, the API
is set up so that you can't reasonably call it with anything else -- it
assumes that you're passing it an offset relative to the *original*
alloca that happens to fall within the new one. That's the whole point
of NewBeginOffset, it's the clamped beginning offset.

No functionality changed.

llvm-svn: 202231
2014-02-26 05:12:43 +00:00
Chandler Carruth
836ce7bd11 [SROA] Simplify the computing of alignment: we only ever need the
alignment of the slice being rewritten, not any arbitrary offset.

Every caller is really just trying to compute the alignment for the
whole slice, never for some arbitrary alignment. They are also just
passing a type when they have one to see if we can skip an explicit
alignment in the IR by using the type's alignment. This makes for a much
simpler interface.

Another refactoring inspired by the addrspace patch for SROA, although
only loosely related.

llvm-svn: 202230
2014-02-26 05:02:19 +00:00
Chandler Carruth
c8cbd02c0c [SROA] Use NewOffsetBegin in the unsplit case for memset merely for
consistency with memcpy rewriting, and fix a latent bug in the alignment
management for memset.

The alignment issue is that getAdjustedAllocaPtr is computing the
*relative* offset into the new alloca, but the alignment isn't being set
to the relative offset, it was using the the absolute offset which is
into the old alloca.

I don't think its possible to write a test case that actually reaches
this code where the resulting alignment would be observably different,
but the intent was clearly to use the relative offset within the new
alloca.

llvm-svn: 202229
2014-02-26 04:45:24 +00:00
Chandler Carruth
4eab6cfb07 [SROA] Use the members for New{Begin,End}Offset in the rewrite helpers
rather than passing them as arguments.

While I generally prefer actual arguments, in this case the readability
loss is substantial. By using members we avoid repeatedly calculating
the offsets, and once we're using members it is useful to ensure that
those names *always* refer to the original-alloca-relative new offset
for a rewritten slice.

No functionality changed. Follow-up refactoring, all toward getting the
address space patch merged.

llvm-svn: 202228
2014-02-26 04:25:04 +00:00
Chandler Carruth
2894b88fb9 [SROA] Compute the New{Begin,End}Offset values once for each alloca
slice being rewritten.

We had the same code scattered across most of the visits. Instead,
compute the new offsets and the slice size once when we start to visit
a particular slice, and use the member variables from then on. This
reduces quite a bit of code duplication.

No functionality changed. Refactoring inspired to make it easier to
apply the address space patch to SROA.

llvm-svn: 202227
2014-02-26 04:20:00 +00:00
Chandler Carruth
62c5338f7a [SROA] Fix PR18615 with some long overdue simplifications to the bounds
checking in SROA.

The primary change is to just rely on uge for checking that the offset
is within the allocation size. This removes the explicit checks against
isNegative which were terribly error prone (including the reversed logic
that led to PR18615) and prevented us from supporting stack allocations
larger than half the address space.... Ok, so maybe the latter isn't
*common* but it's a silly restriction to have.

Also, we used to try to support a PHI node which loaded from before the
start of the allocation if any of the loaded bytes were within the
allocation. This doesn't make any sense, we have never really supported
loading or storing *before* the allocation starts. The simplified logic
just doesn't care.

We continue to allow loading past the end of the allocation in part to
support cases where there is a PHI and some loads are larger than others
and the larger ones reach past the end of the allocation. We could solve
this a different and more conservative way, but I'm still somewhat
paranoid about this.

llvm-svn: 202224
2014-02-26 03:14:14 +00:00
Chandler Carruth
e79993509f [reassociate] Switch two std::sort calls into std::stable_sort calls as
their inputs come from std::stable_sort and they are not total orders.

I'm not a huge fan of this, but the really bad std::stable_sort is right
at the beginning of Reassociate. After we commit to stable-sort based
consistent respect of source order, the downstream sorts shouldn't undo
that unless they have a total order or they are used in an
order-insensitive way. Neither appears to be true for these cases.
I don't have particularly good test cases, but this jumped out by
inspection when looking for output instability in this pass due to
changes in the ordering of std::sort.

llvm-svn: 202196
2014-02-25 21:54:50 +00:00
Chandler Carruth
5a7b0aba14 [SROA] Add an off-by-default *strict* inbounds check to SROA. I had SROA
implemented this way a long time ago and due to the overwhelming bugs
that surfaced, moved to a much more relaxed variant. Richard Smith would
like to understand the magnitude of this problem and it seems fairly
harmless to keep some flag-controlled logic to get the extremely strict
behavior here. I'll remove it if it doesn't prove useful.

llvm-svn: 202193
2014-02-25 21:24:45 +00:00
Rafael Espindola
32da4bdd4b Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola
ea1d1e568d Factor out calls to AA.getDataLayout().
llvm-svn: 202157
2014-02-25 15:52:19 +00:00
Rafael Espindola
9869474f57 Make a few more DataLayout variables const.
llvm-svn: 202155
2014-02-25 14:24:11 +00:00
Chandler Carruth
f4a944dda1 [SROA] Use the original load name with the SROA-prefixed IRB rather than
just "load". This helps avoid pointless de-duping with order-sensitive
numbers as we already have unique names from the original load. It also
makes the resulting IR quite a bit easier to read.

llvm-svn: 202140
2014-02-25 11:21:48 +00:00
Chandler Carruth
d54b53e176 [SROA] Thread the ability to add a pointer-specific name prefix through
the pointer adjustment code. This is the primary code path that creates
totally new instructions in SROA and being able to lump them based on
the pointer value's name for which they were created causes
*significantly* fewer name collisions and general noise in the debug
output. This is particularly significant because it is making it much
harder to track down instability in the output of SROA, as name
de-duplication is a totally harmless form of instability that gets in
the way of seeing real problems.

The new fancy naming scheme tries to dig out the root "pre-SROA" name
for pointer values and associate that all the way through the pointer
formation instructions. Digging out the root is important to prevent the
multiple iterative rounds of SROA from just layering too much cruft on
top of cruft here. We already track the layers of SROAs iteration in the
alloca name prefix. We don't need to duplicate it here.

Should have no functionality change, and shouldn't have any really
measurable impact on NDEBUG builds, as most of the complex logic is
debug-only.

llvm-svn: 202139
2014-02-25 11:19:56 +00:00
Chandler Carruth
ea27d3f4fc [SROA] Rather than copying the logic for building a name prefix into the
PHI-pointer builder, just copy the builder and clobber the obvious
fields.

llvm-svn: 202136
2014-02-25 11:12:04 +00:00
Chandler Carruth
4ced299134 [SROA] Simplify some of the logic to dig out the old pointer value by
using OldPtr more heavily. Lots of this code was written before the
rewriter had an OldPtr member setup ahead of time. There are already
asserts in place that should ensure this doesn't change any
functionality.

llvm-svn: 202135
2014-02-25 11:08:02 +00:00
Chandler Carruth
f7d0635448 [SROA] Adjust to new clang-format style.
llvm-svn: 202134
2014-02-25 11:07:58 +00:00
Chandler Carruth
11e572d7b0 [SROA] Fix a *glaring* bug in r202091: you have to actually *write*
the break statement, not just think it to yourself....

No idea how this worked at all, much less survived most bots, my
bootstrap, and some bot bootstraps!

The Polly one didn't survive, and this was filed as PR18959. I don't
have a reduced test case and honestly I'm not seeing the need. What we
probably need here are better asserts / debug-build behavior in
SmallPtrSet so that this madness doesn't make it so far.

llvm-svn: 202129
2014-02-25 09:45:27 +00:00
Alexey Samsonov
92c41baf88 Silence GCC warning
llvm-svn: 202119
2014-02-25 07:56:00 +00:00
Alp Toker
f3e1a22860 Fix typos
llvm-svn: 202107
2014-02-25 04:21:15 +00:00
Chandler Carruth
2dab15dfbc [SROA] Add a debugging tool which shuffles the slices sequence prior to
sorting it. This helps uncover latent reliance on the original ordering
which aren't guaranteed to be preserved by std::sort (but often are),
and which are based on the use-def chain orderings which also aren't
(technically) guaranteed.

Only available in C++11 debug builds, and behind a flag to prevent noise
at the moment, but this is generally useful so figured I'd put it in the
tree rather than keeping it out-of-tree.

llvm-svn: 202106
2014-02-25 03:59:29 +00:00
Chandler Carruth
e33cfcb4e8 [SROA] Use a more direct way of determining whether we are processing
the destination operand or source operand of a memmove.

It so happens that it was impossible for SROA to try to rewrite
self-memmove where the operands are *identical*, because either such
a think is volatile (and we don't rewrite) or it is non-volatile, and we
don't even register it as a use of the alloca.

However, making the 'IsDest' test *rely* on this subtle fact is... Very
confusing for the reader. We should use the direct and readily available
test of the Use* which gives us concrete information about which operand
is being rewritten.

No functionality changed, I hope! ;]

llvm-svn: 202103
2014-02-25 03:50:14 +00:00
Chandler Carruth
2a5f3cfadc [SROA] Fix another instability in SROA with respect to the slice
ordering.

The fundamental problem that we're hitting here is that the use-def
chain ordering is *itself* not a stable thing to be relying on in the
rewriting for SROA. Further, we use a non-stable sort over the slices to
arrange them based on the section of the alloca they're operating on.
With a debugging STL implementation (or different implementations in
stage2 and stage3) this can cause stage2 != stage3.

The specific aspect of this problem fixed in this commit deals with the
rewriting and load-speculation around PHIs and Selects. This, like many
other aspects of the use-rewriting in SROA, is really part of the
"strong SSA-formation" that is doen by SROA where it works very hard to
canonicalize loads and stores in *just* the right way to satisfy the
needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the
same alloca, we test that loads downstream of the select are
speculatable around it twice. If only one of the operands to the select
needs to be rewritten, then if we get lucky we rewrite that one first
and the select is immediately speculatable. This can cause the order of
operand visitation, and thus the order of slices to be rewritten, to
change an alloca from promotable to non-promotable and vice versa.

The fix is to defer all of the speculation until *after* the rewrite
phase is done. Once we've rewritten everything, we can accurately test
for whether speculation will work (once, instead of twice!) and the
order ceases to matter.

This also happens to simplify the other subtlety of speculation -- we
need to *not* speculate anything unless the result of speculating will
make the alloca fully promotable by mem2reg. I had a previous attempt at
simplifying this, but it was still pretty horrible.

There is actually already a *really* nice test case for this in
basictest.ll, but on multiple STL implementations and inputs, we just
got "lucky". Fortunately, the test case is very small and we can
essentially build it in exactly the opposite way to get reasonable
coverage in both directions even from normal STL implementations.

llvm-svn: 202092
2014-02-25 00:07:09 +00:00
Rafael Espindola
6c834371d9 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Arnold Schwaighofer
c68a727215 SLPVectorizer: Try vectorizing 'splat' stores
Vectorize sequential stores of a broadcasted value.
5% on eon.

radar://16124699

llvm-svn: 202067
2014-02-24 19:52:29 +00:00
Rafael Espindola
d89ca7eab7 Replace the F_Binary flag with a F_Text one.
After this I will set the default back to F_None. The advantage is that
before this patch forgetting to set F_Binary would corrupt a file on windows.
Forgetting to set F_Text produces one that cannot be read in notepad, which
is a better failure mode :-)

llvm-svn: 202052
2014-02-24 18:20:12 +00:00
Arnold Schwaighofer
8cd1e62020 LTO: Add the loop vectorizer to the LTO pipeline.
During the LTO phase LICM will move loop invariant global variables out of loops
(informed by GlobalModRef). This makes more loops countable presenting
opportunity for the loop vectorizer.

Adding the loop vectorizer improves some TSVC benchmarks and twolf/ref dataset
(5%) on x86-64.

radar://15970632

llvm-svn: 202051
2014-02-24 18:19:31 +00:00
Rafael Espindola
c3ff946af1 Don't make F_None the default.
This will make it easier to switch the default to being binary files.

llvm-svn: 202042
2014-02-24 15:07:20 +00:00
Kostya Serebryany
e1bb3de3a9 [asan] simplify the code that compute the shadow offset; get rid of two internal flags that allowed to override it. The tests pass, but still this change might break asan on some platform not covered by tests. If you see this, please submit a fix with a test.
llvm-svn: 202033
2014-02-24 13:40:24 +00:00
Logan Chien
6cc287e13e Include <cctype> for isdigit().
llvm-svn: 201930
2014-02-22 06:34:10 +00:00
Quentin Colombet
fc711dd23c [CodeGenPrepare] Move CodeGenPrepare into lib/CodeGen.
CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen.
This is a layer violation which would introduce eventually a dependence on
CodeGen in ScalarOpts.

Move CodeGenPrepare into libLLVMCodeGen to avoid that.

Follow-up of <rdar://problem/15519855>

llvm-svn: 201912
2014-02-22 00:07:45 +00:00
Rafael Espindola
4803b77df5 Rename a few more DataLayout variables from TD to DL.
llvm-svn: 201870
2014-02-21 18:34:28 +00:00
Rafael Espindola
1f7e9d4bed Rename a few more DataLayout variables.
llvm-svn: 201833
2014-02-21 01:53:35 +00:00
Rafael Espindola
83f8550fb2 Rename many DataLayout variables from TD to DL.
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.

llvm-svn: 201827
2014-02-21 00:06:31 +00:00
Nick Lewycky
4f83ef47dd Make sure that value handle users see the transformation of an indirect call to a direct call. This is important for the CallGraph iteration. Patch by Björn Steinbrink!
llvm-svn: 201822
2014-02-20 23:00:15 +00:00
Rafael Espindola
aea6192f20 Add back r201608, r201622, r201624 and r201625
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.

They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.

llvm-svn: 201700
2014-02-19 17:23:20 +00:00
Rafael Espindola
5bdf119d0c This reverts commit r201625 and r201624.
Since r201608 got reverted, it is not safe to use private linkage in these cases
until it is committed back.

llvm-svn: 201688
2014-02-19 15:49:46 +00:00
Tim Northover
1b102abe53 X86 CodeGenPrep: sink shufflevectors before shifts
On x86, shifting a vector by a scalar is significantly cheaper than shifting a
vector by another fully general vector. Unfortunately, because SelectionDAG
operates on just one basic block at a time, the shufflevector instruction that
reveals whether the right-hand side of a shift *is* really a scalar is often
not visible to CodeGen when it's needed.

This adds another handler to CodeGenPrepare, to sink any useful shufflevector
instructions down to the basic block where they're used, predicated on a target
hook (since on other architectures, doing so will often just introduce extra
real work).

rdar://problem/16063505

llvm-svn: 201655
2014-02-19 10:02:43 +00:00
Rafael Espindola
e5f011f1b1 Now that llvm always does the right thing with private, use it.
llvm-svn: 201625
2014-02-19 02:08:39 +00:00
Rafael Espindola
d85e4eb0f5 Rename some member variables from TD to DL.
TargetData was renamed DataLayout back in r165242.

llvm-svn: 201581
2014-02-18 15:33:12 +00:00
Tim Northover
83bbdcb246 GlobalMerge: move "-global-merge" option to the pass itself.
It's rather odd to have the flag enabling and disabling this pass only affect a
single target.

llvm-svn: 201559
2014-02-18 11:17:29 +00:00
Gerolf Hoflehner
ec56f33316 fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178
llvm-svn: 201501
2014-02-17 03:06:16 +00:00
Gerolf Hoflehner
283b0694b1 fixed typo in comment as my test commit
llvm-svn: 201486
2014-02-16 10:43:25 +00:00
Quentin Colombet
5700bbac29 [CodeGenPrepare][AddressingModeMatcher] Give up on type promotion if the
transformation does not bring any immediate benefits and introduce an illegal
operation. 

llvm-svn: 201439
2014-02-14 22:23:22 +00:00
Rafael Espindola
cd84fe8173 Trivial cleanup: reuse existing variable.
Extracted while trying to understand http://llvm-reviews.chandlerc.com/D1764.

Patch by Matt Arsenault.

llvm-svn: 201425
2014-02-14 19:02:01 +00:00
Matt Arsenault
7594b13bbb Do more addrspacecast transforms that happen for bitcast.
Makes addrspacecast (gep) do addrspacecast (gep) instead.

llvm-svn: 201376
2014-02-14 00:49:12 +00:00
Benjamin Kramer
7455fc3975 InstCombine: Replace custom constant folding code with ConstantExpr.
llvm-svn: 201352
2014-02-13 18:23:24 +00:00
Benjamin Kramer
57d9ecca57 Reduce code duplication resulting from the ConstantVector/ConstantDataVector split.
No intended functionality change.

llvm-svn: 201344
2014-02-13 16:48:38 +00:00
Reid Kleckner
72c9e73170 GlobalOpt: Aliases don't have sections, don't copy them when replacing
As defined in LangRef, aliases do not have sections.  However, LLVM's
GlobalAlias class inherits from GlobalValue, which means we can read and
set its section.  We should probably ban that as a separate change,
since it doesn't make much sense for an alias to have a section that
differs from its aliasee.

Fixes PR18757, where the section was being lost on the global in code
from Clang like:

extern "C" {
__attribute__((used, section("CUSTOM"))) static int in_custom_section;
}

Reviewers: rafael.espindola

Differential Revision: http://llvm-reviews.chandlerc.com/D2758

llvm-svn: 201286
2014-02-13 02:18:36 +00:00
Owen Anderson
5dc9f991c9 Remove a very old instcombine where we would turn sequences of selects into
logical operations on the i1's driving them.  This is a bad idea for every
target I can think of (confirmed with micro tests on all of: x86-64, ARM,
AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into
a general purpose register, whereas consuming it directly into a select generally
allows it to exist only transiently in a predicate or flags register.

Chandler ran a set of performance tests with this change, and reported no
measurable change on x86-64.

llvm-svn: 201275
2014-02-12 23:54:07 +00:00
Andrea Di Biagio
594ea331ef [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.

llvm-svn: 201272
2014-02-12 23:43:47 +00:00
Benjamin Kramer
e435e87a6a InstCombine: Teach icmp merging about the equivalence of bit tests and UGE/ULT with a power of 2.
This happens in bitfield code. While there reorganize the existing code
a bit.

llvm-svn: 201176
2014-02-11 21:09:03 +00:00
Chandler Carruth
aa1d9ed9b0 [LPM] Switch LICM to actively use LCSSA in addition to preserving it.
Fixes PR18753 and PR18782.

This is necessary for LICM to preserve LCSSA correctly and efficiently.
There is still some active discussion about whether we should be using
LCSSA, but we can't just immediately stop using it and we *need* LICM to
preserve it while we are using it. We can restore the old SSAUpdater
driven code if and when there is a serious effort to remove the reliance
on LCSSA from all of the loop passes.

However, this also serves as a great example of why LCSSA is very nice
to have. This change significantly simplifies the process of sinking
instructions for LICM, and makes it quite a bit less expensive.

It wouldn't even be as complex as it is except that I had to start the
process of removing the big recursive LCSSA formation hammer in order to
switch even this much of the re-forming code to asserting that LCSSA was
preserved. I'll fully remove that next just to tidy things up until the
LCSSA debate settles one way or the other.

llvm-svn: 201148
2014-02-11 12:52:27 +00:00
Quentin Colombet
826c5ca154 [CodeGenPrepare] Undo changes that happened for the profitability check.
The addressing mode matcher checks at some point the profitability of folding an
instruction into the addressing mode. When the instruction to be folded has
several uses, it checks that the instruction can be folded in each use.
To do so, it creates a new matcher for each use and check if the instruction is
in the list of the matched instructions of this new matcher.

The new matchers may promote some instructions and this has to be undone to keep
the state of the original matcher consistent.

A test case will follow.

<rdar://problem/16020230>

llvm-svn: 201121
2014-02-11 01:59:02 +00:00
Chandler Carruth
90e016d8fc [LPM] A terribly simple fix to a terribly complex bug: PR18773.
The crux of the issue is that LCSSA doesn't preserve stateful alias
analyses. Before r200067, LICM didn't cause LCSSA to run in the LTO pass
manager, where LICM runs essentially without any of the other loop
passes. As a consequence the globalmodref-aa pass run before that loop
pass manager was able to survive the loop pass manager and be used by
DSE to eliminate stores in the function called from the loop body in
Adobe-C++/loop_unroll (and similar patterns in other benchmarks).

When LICM was taught to preserve LCSSA it had to require it as well.
This caused it to be run in the loop pass manager and because it did not
preserve AA, the stateful AA was lost. Most of LLVM's AA isn't stateful
and so this didn't manifest in most cases. Also, in most cases LCSSA was
already running, and so there was no interesting change.

The real kicker is that LCSSA by its definition (injecting PHI nodes
only) trivially preserves AA! All we need to do is mark it, and then
everything goes back to working as intended. It probably was blocking
some other weird cases of stateful AA but the only one I have is
a 1000-line IR test case from loop_unroll, so I don't really have a good
test case here.

Hopefully this fixes the regressions on performance that have been seen
since that revision.

llvm-svn: 201104
2014-02-10 19:39:35 +00:00
Benjamin Kramer
4779ebf069 Make succ_iterator a real random access iterator and clean up a couple of users.
llvm-svn: 201088
2014-02-10 14:17:42 +00:00
Kostya Serebryany
668ea393ac [asan] support for FreeBSD, LLVM part. patch by Viktor Kutuzov
llvm-svn: 201067
2014-02-10 07:37:04 +00:00
Arnold Schwaighofer
0bd2bb3092 LoopVectorizer: Keep track of conditional store basic blocks
Before conditional store vectorization/unrolling we had only one
vectorized/unrolled basic block. After adding support for conditional store
vectorization this will not only be one block but multiple basic blocks. The
last block would have the back-edge. I updated the code to use a vector of basic
blocks instead of a single basic block and fixed the users to use the last entry
in this vector. But, I forgot to add the basic blocks to this vector!

Fixes PR18724.

llvm-svn: 201028
2014-02-08 20:41:13 +00:00
Juergen Ributzka
a44e3756e3 [Constant Hoisting] Fix insertion point for constant materialization.
The bitcast instruction during constant materialization was not placed correcly
in the presence of phi nodes. This commit fixes the insertion point to be in the
idom instead.

This fixes PR18768

llvm-svn: 201009
2014-02-08 00:20:49 +00:00
Juergen Ributzka
5435f3e6f6 [Constant Hoisting] Don't update the use list while traversing it - DOH!
This fix first traverses the whole use list of the constant expression and
keeps track of the instructions that need to be updated. Then perform the
fixup afterwards.

llvm-svn: 201008
2014-02-08 00:20:45 +00:00
Quentin Colombet
f0d12dd9ee [CodeGenPrepare] Move away sign extensions that get in the way of addressing
mode.

Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep

Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep

That way the computation can be folded into the addressing mode.

This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.

<rdar://problem/15519855>

llvm-svn: 200947
2014-02-06 21:44:56 +00:00
Nick Lewycky
03b9ed1b7b A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton!
llvm-svn: 200907
2014-02-06 06:29:19 +00:00
Manman Ren
91c0933df0 Set default of inlinecold-threshold to 225.
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.

As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.

r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.

The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.

Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.

llvm-svn: 200898
2014-02-06 01:59:22 +00:00
Paul Robinson
189e175394 Disable most IR-level transform passes on functions marked 'optnone'.
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892
2014-02-06 00:07:05 +00:00
Manman Ren
b78e9a1411 Inliner uses a smaller inline threshold for callees with cold attribute.
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.

llvm-svn: 200886
2014-02-05 22:53:44 +00:00
Benjamin Kramer
700474a946 SimplifyLibCalls: Push TLI through the exp2->ldexp transform.
For the odd case of platforms with exp2 available but not ldexp.

llvm-svn: 200795
2014-02-04 20:27:23 +00:00
Duncan P. N. Exon Smith
7024ad6965 cleanup: scc_iterator consumers should use isAtEnd
No functional change.  Updated loops from:

    for (I = scc_begin(), E = scc_end(); I != E; ++I)

to:

    for (I = scc_begin(); !I.isAtEnd(); ++I)

for teh win.

llvm-svn: 200789
2014-02-04 19:19:07 +00:00
Tim Northover
d6fb863f04 OS X: the correct function is __sincospif_stret, not __sincospi_stretf
rdar://problem/13729466

llvm-svn: 200771
2014-02-04 16:28:20 +00:00
Kai Nacke
a3477b4ff6 Add strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
and remove the ToDo comment.

Reviewer: Duncan P.N. Exan Smith
llvm-svn: 200736
2014-02-04 05:55:16 +00:00
Nick Lewycky
df5396144d Self-memcpy-elision and memcpy of constant byte to memset transforms don't care how many bytes you were trying to transfer. Sink that safety test after those transforms. Noticed by inspection.
llvm-svn: 200726
2014-02-04 00:18:54 +00:00
Reid Kleckner
8fe10af69d inalloca: Don't remove dead arguments in the presence of inalloca args
It disturbs the layout of the parameters in memory and registers,
leading to problems in the backend.

The plan for optimizing internal inalloca functions going forward is to
essentially SROA the argument memory and demote any captured arguments
(things that aren't trivially written by a load or store) to an indirect
pointer to a static alloca.

llvm-svn: 200717
2014-02-03 20:42:49 +00:00
Duncan P. N. Exon Smith
4f1b28340d Lower llvm.expect intrinsic correctly for i1
LowerExpectIntrinsic previously only understood the idiom of an expect
intrinsic followed by a comparison with zero. For llvm.expect.i1, the
comparison would be stripped by the early-cse pass.

Patch by Daniel Micay.

llvm-svn: 200664
2014-02-02 22:43:55 +00:00
Arnold Schwaighofer
8a0e82c2bc LoopVectorizer: Enable unrolling of conditional stores and the load/store
unrolling heuristic per default

Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.

llvm-svn: 200621
2014-02-02 03:12:34 +00:00
Chandler Carruth
a93c365f31 [LPM] Apply a really big hammer to fix PR18688 by recursively reforming
LCSSA when we promote to SSA registers inside of LICM.

Currently, this is actually necessary. The promotion logic in LICM uses
SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
Teaching it to do so would be a very significant undertaking. It may be
worthwhile and I've left a FIXME about this in the code as well as
starting a thread on llvmdev to try to figure out the right long-term
solution.

For now, the PR needs to be fixed. Short of using the promition
SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
I don't see a cleaner or cheaper way of achieving this. Fortunately,
LCSSA is relatively lazy and sparse -- it should only update
instructions which need it. We can also skip the recursive variant when
we don't promote to SSA values.

llvm-svn: 200612
2014-02-01 13:35:14 +00:00
Eli Bendersky
62efb50a57 Remove some unused #includes
llvm-svn: 200611
2014-02-01 13:12:54 +00:00
Reid Kleckner
0421c6aef8 Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit r200576.  It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.

llvm-svn: 200602
2014-02-01 01:37:30 +00:00
Chandler Carruth
74c658030d [SLPV] Recognize vectorizable intrinsics during SLP vectorization and
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.

Patch by Raul Silvera! (Much delayed due to my not running dcommit)

llvm-svn: 200576
2014-01-31 21:14:40 +00:00
Chandler Carruth
fbc2b60e8a [vectorizer] Tweak the way we do small loop runtime unrolling in the
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.

I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.

llvm-svn: 200530
2014-01-31 10:51:08 +00:00
Bob Wilson
1478ea0cc7 Fix a bug in gcov instrumentation introduced by r195513. <rdar://15930350>
The entry block of a function starts with all the static allocas. The change
in r195513 splits the block before those allocas, which has the effect of
turning them into dynamic allocas. That breaks all sorts of things. Change to
split after the initial allocas, and also add a comment explaining why the
block is split.

llvm-svn: 200515
2014-01-31 05:24:01 +00:00
Chandler Carruth
6ba48b6c38 [LPM] Fix PR18643, another scary place where loop transforms failed to
preserve loop simplify of enclosing loops.

The problem here starts with LoopRotation which ends up cloning code out
of the latch into the new preheader it is buidling. This can create
a new edge from the preheader into the exit block of the loop which
breaks LoopSimplify form. The code tries to fix this by splitting the
critical edge between the latch and the exit block to get a new exit
block that only the latch dominates. This sadly isn't sufficient.

The exit block may be an exit block for multiple nested loops. When we
clone an edge from the latch of the inner loop to the new preheader
being built in the outer loop, we create an exiting edge from the outer
loop to this exit block. Despite breaking the LoopSimplify form for the
inner loop, this is fine for the outer loop. However, when we split the
edge from the inner loop to the exit block, we create a new block which
is in neither the inner nor outer loop as the new exit block. This is
a predecessor to the old exit block, and so the split itself takes the
outer loop out of LoopSimplify form. We need to split every edge
entering the exit block from inside a loop nested more deeply than the
exit block in order to preserve all of the loop simplify constraints.

Once we try to do that, a problem with splitting critical edges
surfaces. Previously, we tried a very brute force to update LoopSimplify
form by re-computing it for all exit blocks. We don't need to do this,
and doing this much will sometimes but not always overlap with the
LoopRotate bug fix. Instead, the code needs to specifically handle the
cases which can start to violate LoopSimplify -- they aren't that
common. We need to see if the destination of the split edge was a loop
exit block in simplified form for the loop of the source of the edge.
For this to be true, all the predecessors need to be in the exact same
loop as the source of the edge being split. If the dest block was
originally in this form, we have to split all of the deges back into
this loop to recover it. The old mechanism of doing this was
conservatively correct because at least *one* of the exiting blocks it
rewrote was the DestBB and so the DestBB's predecessors were fixed. But
this is a much more targeted way of doing it. Making it targeted is
important, because ballooning the set of edges touched prevents
LoopRotate from being able to split edges *it* needs to split to
preserve loop simplify in a coherent way -- the critical edge splitting
would sometimes find the other edges in need of splitting but not
others.

Many, *many* thanks for help from Nick reducing these test cases
mightily. And helping lots with the analysis here as this one was quite
tricky to track down.

llvm-svn: 200393
2014-01-29 13:16:53 +00:00
Chandler Carruth
ed726e1be7 [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered"
because of the inside-out run of LoopSimplify in the LoopPassManager and
the fact that LoopSimplify couldn't be "preserved" across two
independent LoopPassManagers.

Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
node because it thought it was rewriting (via SCEV) the incoming value
to a loop invariant value. While it may well be invariant for the
current loop, it may be rewritten in terms of an enclosing loop's
values. This in and of itself is fine, as the LCSSA PHI node in the
enclosing loop for the inner loop value we're rewriting will have its
own LCSSA PHI node if used outside of the enclosing loop. With me so
far?

Well, the current loop and the enclosing loop may share an exiting
block and exit block, and when they do they also share LCSSA PHI nodes.
In this case, its not valid to RAUW through the LCSSA PHI node.

Expected crazy test included.

llvm-svn: 200372
2014-01-29 04:40:19 +00:00
Arnold Schwaighofer
5b96c24a7a LoopVectorizer: Don't count the induction variable multiple times
When estimating register pressure, don't count the induction variable mulitple
times. It is unlikely to be unrolled. This is currently disabled and hidden
behind a flag ("enable-ind-var-reg-heur").

llvm-svn: 200371
2014-01-29 04:36:12 +00:00
Rafael Espindola
e8856107f0 Fix pr14893.
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.

The patch implements this with an utility function to drop all metadata not
in a white list.

llvm-svn: 200322
2014-01-28 16:56:46 +00:00
Chandler Carruth
6a45efab46 [vectorizer] Completely disable the block frequency guidance of the loop
vectorizer, placing it behind an off-by-default flag.

It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.

I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.

llvm-svn: 200294
2014-01-28 09:10:41 +00:00
Reid Kleckner
c9ab4a9a3b Update optimization passes to handle inalloca arguments
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.

I added tests for any change that would allow an optimization to fire on
inalloca.

Reviewers: nlewycky

Differential Revision: http://llvm-reviews.chandlerc.com/D2449

llvm-svn: 200281
2014-01-28 02:38:36 +00:00
Chandler Carruth
b19a7319a9 [LPM] Fix PR18616 where the shifts to the loop pass manager to extract
LCSSA from it caused a crasher with the LoopUnroll pass.

This crasher is really nasty. We destroy LCSSA form in a suprising way.
When unrolling a loop into an outer loop, we not only need to restore
LCSSA form for the outer loop, but for all children of the outer loop.
This is somewhat obvious in retrospect, but hey!

While this seems pretty heavy-handed, it's not that bad. Fundamentally,
we only do this when we unroll a loop, which is already a heavyweight
operation. We're unrolling all of these hypothetical inner loops as
well, so their size and complexity is already on the critical path. This
is just adding another pass over them to re-canonicalize.

I have a test case from PR18616 that is great for reproducing this, but
pretty useless to check in as it relies on many 10s of nested empty
loops that get unrolled and deleted in just the right order. =/ What's
worse is that investigating this has exposed another source of failure
that is likely to be even harder to test. I'll try to come up with test
cases for these fixes, but I want to get the fixes into the tree first
as they're causing crashes in the wild.

llvm-svn: 200273
2014-01-28 01:25:38 +00:00
Arnold Schwaighofer
8f596e2047 LoopVectorize: Support conditional stores by scalarizing
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

  for (i = 0; i < 128; ++i) {
    if (a[i] < 10)
      a[i] += val;
  }

  for (i = 0; i < 128; i+=2) {
    v = a[i:i+1];
    v0 = (extract v, 0) + 10;
    v1 = (extract v, 1) + 10;
    if (v0 < 10)
      a[i] = v0;
    if (v1 < 10)
      a[i] = v1;
  }

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.

rdar://15892953

Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
   Applications/siod/siod         2.18%
 Performance improvements:
   mesa                          -4.42%
   libquantum                    -4.15%

 With a patch that slightly changes the register heuristics (by subtracting the
 induction variable on both sides of the register pressure equation, as the
 induction variable is probably not really unrolled):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2  7.73%
   Applications/siod/siod          1.97%

 Performance Improvements:
   libquantum                    -13.05% (we now also unroll quantum_toffoli)
   mesa                           -4.27%

llvm-svn: 200270
2014-01-28 01:01:53 +00:00
Manman Ren
c3f51e8e54 PGO branch weight: keep halving the weights until they can fit into
uint32.

When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.

llvm-svn: 200262
2014-01-27 23:39:03 +00:00
Chandler Carruth
f70ef7ae29 [vectorize] Initial version of respecting PGO in the vectorizer: treat
cold loops as-if they were being optimized for size.

Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.

The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:

1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).

I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.

One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.

llvm-svn: 200219
2014-01-27 13:11:50 +00:00
Benjamin Kramer
65df2371a8 ConstantHoisting: We can't insert instructions directly in front of a PHI node.
Insert before the terminating instruction of the dominating block instead.

llvm-svn: 200218
2014-01-27 13:11:43 +00:00
Chandler Carruth
88d92716dd [vectorizer] Add an override for the target instruction cost and use it
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.

llvm-svn: 200215
2014-01-27 11:41:50 +00:00
Chandler Carruth
eb82628ff7 [vectorizer] Simplify code to use existing helpers on the Function
object and fewer pointless variables.

Also, add a clarifying comment and a FIXME because the code which
disables *all* vectorization if we can't use implicit floating point
instructions just makes no sense at all.

llvm-svn: 200214
2014-01-27 11:27:37 +00:00
Chandler Carruth
d1ecfe35ae [vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.

Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).

While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.

llvm-svn: 200213
2014-01-27 11:12:24 +00:00
Chandler Carruth
bdbe34a1a1 [vectorizer] Add some flags which are useful for conducting experiments
with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.

These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.

llvm-svn: 200212
2014-01-27 11:12:19 +00:00
Chandler Carruth
dd6cf9494b [vectorizer] Fix a trivial oversight where we always requested the
number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.

If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.

Also made the variable name a bit more sensible while I'm here.

llvm-svn: 200211
2014-01-27 11:12:14 +00:00
Chandler Carruth
a89deb11ba [vectorizer] Clean up the handling of unvectorized loop unrolling in the
LoopVectorize pass.

The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.

This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.

llvm-svn: 200198
2014-01-27 08:17:58 +00:00
Chandler Carruth
4fb3e5831e [LPM] Conclude my immediate work by making the LoopVectorizer
a FunctionPass. With this change the loop vectorizer no longer is a loop
pass and can readily depend on function analyses. In particular, with
this change we no longer have to form a loop pass manager to run the
loop vectorizer which simplifies the entire pass management of LLVM.

The next step here is to teach the loop vectorizer to leverage profile
information through the profile information providing analysis passes.

llvm-svn: 200074
2014-01-25 10:01:55 +00:00
Chandler Carruth
3998de34a0 [LPM] Make LCSSA a utility with a FunctionPass that applies it to all
the loops in a function, and teach LICM to work in the presance of
LCSSA.

Previously, LCSSA was a loop pass. That made passes requiring it also be
loop passes and unable to depend on function analysis passes easily. It
also caused outer loops to have a different "canonical" form from inner
loops during analysis. Instead, we go into LCSSA form and preserve it
through the loop pass manager run.

Note that this has the same problem as LoopSimplify that prevents
enabling its verification -- loop passes which run at the end of the loop
pass manager and don't preserve these are valid, but the subsequent loop
pass runs of outer loops that do preserve this pass trigger too much
verification and fail because the inner loop no longer verifies.

The other problem this exposed is that LICM was completely unable to
handle LCSSA form. It didn't preserve it and it actually would give up
on moving instructions in many cases when they were used by an LCSSA phi
node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
hoist and sink around them. This may actually let LICM fire
significantly more because we put everything into LCSSA form to rotate
the loop before running LICM. =/ Now LICM should handle that fine and
preserve it correctly. The down side is that LICM has to require LCSSA
in order to preserve it. This is just a fact of life for LCSSA. It's
entirely possible we should completely remove LCSSA from the optimizer.

The test updates are essentially accomodating LCSSA phi nodes in the
output of LICM, and the fact that we now completely sink every
instruction in ashr-crash below the loop bodies prior to unrolling.

With this change, LCSSA is computed only three times in the pass
pipeline. One of them could be removed (and potentially a SCEV run and
a separate LoopPassManager entirely!) if we had a LoopPass variant of
InstCombine that ran InstCombine on the loop body but refused to combine
away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
being in the same loop pass manager is rotate, LICM, and unswitch.

There is one thing that I *really* don't like -- preserving LCSSA in
LICM is quite expensive. We end up having to re-run LCSSA twice for some
loops after LICM runs because LICM can undo LCSSA both in the current
loop and the parent loop. I don't really see good solutions to this
other than to completely move away from LCSSA and using tools like
SSAUpdater instead.

llvm-svn: 200067
2014-01-25 04:07:24 +00:00
Juergen Ributzka
818bab9511 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

llvm-svn: 200062
2014-01-25 02:02:55 +00:00
Hans Wennborg
e89eb1955d Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

llvm-svn: 200058
2014-01-25 01:18:18 +00:00
Juergen Ributzka
45b2cea1c9 Add Constant Hoisting Pass
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.

llvm-svn: 200034
2014-01-24 20:18:00 +00:00
Benjamin Kramer
78991033ac InstCombine: Don't try to use aggregate elements of ConstantExprs.
PR18600.

llvm-svn: 200028
2014-01-24 19:02:37 +00:00
Juergen Ributzka
cd77ee7cf2 Revert "Add Constant Hoisting Pass"
This reverts commit r200022 to unbreak the build bots.

llvm-svn: 200024
2014-01-24 18:40:30 +00:00
Juergen Ributzka
fa4fb4d6a4 Add Constant Hoisting Pass
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.

First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.

If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.

When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.

This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)

Reviewed by Eric

llvm-svn: 200022
2014-01-24 18:23:08 +00:00
Alp Toker
1c4b33e8e5 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Chandler Carruth
1a313307e7 [LPM] Fix a logic error in LICM spotted by inspection.
We completely skipped promotion in LICM if the loop has a preheader or
dedicated exits, but not *both*. We hoist if there is a preheader, and
sink if there are dedicated exits, but either hoisting or sinking can
move loop invariant code out of the loop!

I have no idea if this has a practical consequence. If anyone has ideas
for a test case, let me know.

llvm-svn: 199966
2014-01-24 02:24:47 +00:00
Chandler Carruth
d8a6468af8 [cleanup] Use the type-based preservation method rather than a string
literal that bakes a pass name and forces parsing it in the pass
manager.

llvm-svn: 199963
2014-01-24 01:59:49 +00:00
Rafael Espindola
adb277286a Remove tail marker when changing an argument to an alloca.
Argument promotion can replace an argument of a call with an alloca. This
requires clearing the tail marker as it is very likely that the callee is now
using an alloca in the caller.

This fixes pr14710.

llvm-svn: 199909
2014-01-23 17:19:42 +00:00
Chandler Carruth
46bbc995de [LPM] Make LoopSimplify no longer a LoopPass and instead both a utility
function and a FunctionPass.

This has many benefits. The motivating use case was to be able to
compute function analysis passes *after* running LoopSimplify (to avoid
invalidating them) and then to run other passes which require
LoopSimplify. Specifically passes like unrolling and vectorization are
critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so
that they can be profile aware. For the LoopVectorize pass the only
things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify
and LCSSA is next on my list.

There are also a bunch of other benefits of doing this:
- It is now very feasible to make more passes *preserve* LoopSimplify
  because they can simply run it after changing a loop. Because
  subsequence passes can assume LoopSimplify is preserved we can reduce
  the runs of this pass to the times when we actually mutate a loop
  structure.
- The new pass manager should be able to more easily support loop passes
  factored in this way.
- We can at long, long last observe that LoopSimplify is preserved
  across SCEV. This *halves* the number of times we run LoopSimplify!!!

Now, getting here wasn't trivial. First off, the interfaces used by
LoopSimplify are all over the map regarding how analysis are updated. We
end up with weird "pass" parameters as a consequence. I'll try to clean
at least some of this up later -- I'll have to have it all clean for the
new pass manager.

Next up I discovered a really frustrating bug. LoopUnroll *claims* to
preserve LoopSimplify. That's actually a lie. But the way the
LoopPassManager ends up running the passes, it always ran LoopSimplify
on the unrolled-into loop, rectifying this oversight before any
verification could kick in and point out that in fact nothing was
preserved. So I've added code to the unroller to *actually* simplify the
surrounding loop when it succeeds at unrolling.

The only functional change in the test suite is that we now catch a case
that was previously missed because SCEV and other loop transforms see
their containing loops as simplified and thus don't miss some
opportunities. One test case has been converted to check that we catch
this case rather than checking that we miss it but at least don't get
the wrong answer.

Note that I have #if-ed out all of the verification logic in
LoopSimplify! This is a temporary workaround while extracting these bits
from the LoopPassManager. Currently, there is no way to have a pass in
the LoopPassManager which preserves LoopSimplify along with one which
does not. The LPM will try to verify on each loop in the nest that
LoopSimplify holds but the now-Function-pass cannot distinguish what
loop is being verified and so must try to verify all of them. The inner
most loop is clearly no longer simplified as there is a pass which
didn't even *attempt* to preserve it. =/ Once I get LCSSA out (and maybe
LoopVectorize and some other fixes) I'll be able to re-enable this check
and catch any places where we are still failing to preserve
LoopSimplify. If this causes problems I can back this out and try to
commit *all* of this at once, but so far this seems to work and allow
much more incremental progress.

llvm-svn: 199884
2014-01-23 11:23:19 +00:00
Matt Arsenault
52e557deb2 Handle an addrspacecast case in memcpyopt
llvm-svn: 199836
2014-01-22 21:53:19 +00:00
Tim Northover
8a4cb5ce31 Loop strength reduce: fix function name.
llvm-svn: 199801
2014-01-22 13:27:00 +00:00
Chandler Carruth
e90b399e43 [SROA] Fix a bug which could cause the common type finding to return
inconsistent results for different orderings of alloca slices. The
fundamental issue is that it is just always a mistake to return early
from this function. There is no effective early exit to leverage. This
patch stops trynig to do so and simplifies the code a bit as
a consequence.

Original diagnosis and patch by James Molloy with some name tweaks by me
in part reflecting feedback from Duncan Smith on the mailing list.

llvm-svn: 199771
2014-01-21 23:16:05 +00:00
Owen Anderson
e0205fdcd8 Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators.
This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.

llvm-svn: 199629
2014-01-20 07:44:53 +00:00
Benjamin Kramer
813eb189fa InstCombine: Modernize a bunch of cast combines.
Also make them vector-aware.

llvm-svn: 199608
2014-01-19 20:05:13 +00:00
Benjamin Kramer
319cbf6707 InstCombine: Hoist 3 copies of AddOne/SubOne into a header.
llvm-svn: 199605
2014-01-19 16:56:10 +00:00
Benjamin Kramer
47d4c4c113 InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing.
llvm-svn: 199604
2014-01-19 16:48:41 +00:00
Benjamin Kramer
0de38fdc6a InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors.
llvm-svn: 199602
2014-01-19 15:24:22 +00:00
Benjamin Kramer
b864b5d907 InstCombine: Refactor fmul/fdiv combines to handle vectors.
llvm-svn: 199598
2014-01-19 13:36:27 +00:00
Chandler Carruth
8b7504e0a3 Fix a really nasty SROA bug with how we handled out-of-bounds memcpy
intrinsics.

Reported on the list by Evan with a couple of attempts to fix, but it
took a while to dig down to the root cause. There are two overlapping
bugs here, both centering around the circumstance of discovering
a memcpy operand which is known to be completely outside the bounds of
the alloca.

First, we need to kill the *other* side of the memcpy if it was added to
this alloca. Otherwise we'll factor it into our slicing and try to
rewrite it even though we know for a fact that it is dead. This is made
more tricky because we can visit the sides in either order. So we have
to both kill the other side and skip instructions marked as dead. The
latter really should be goodness in every case, but here is a matter of
correctness.

Second, we need to actually remove the *uses* of the alloca by the
memcpy when queuing it for later deletion. Otherwise it may still be
using the alloca when we go to promote it (if the rewrite re-uses the
existing alloca instruction). Do this by factoring out the
use-clobbering used when for nixing a Phi argument and re-using it
across the operands of a to-be-deleted instruction.

llvm-svn: 199590
2014-01-19 12:16:54 +00:00
Arnold Schwaighofer
2c67b7dc58 LoopVectorizer: A reduction that has multiple uses of the reduction value is not
a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

llvm-svn: 199570
2014-01-19 03:18:31 +00:00
Nick Lewycky
f31f7a5863 Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu!
llvm-svn: 199564
2014-01-18 22:47:12 +00:00
Benjamin Kramer
ace2801d74 InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too.
PR18532.

llvm-svn: 199553
2014-01-18 16:43:14 +00:00
Owen Anderson
8750294bae Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep).
llvm-svn: 199528
2014-01-18 00:48:14 +00:00
Kostya Serebryany
88b5111b60 [asan] extend asan-coverage (still experimental).
- add a mode for collecting per-block coverage (-asan-coverage=2).
   So far the implementation is naive (all blocks are instrumented),
   the performance overhead on top of asan could be as high as 30%.
 - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
   which in turn required to copy the original debug info into the call insn.

Here is the performance data on SPEC 2006
(train data, comparing asan with asan-coverage={0,1,2}):

                             asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
       400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
           401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
             403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
             429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
           445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
           456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
           458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
      462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
         464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
         471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
           473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
       483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
            433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
            444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
          447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
          450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
          453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
             470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
         482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06

llvm-svn: 199488
2014-01-17 11:00:30 +00:00
Quentin Colombet
b42dbc5117 [opt][PassInfo] Allow opt to run passes that need target machine.
When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.

Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).

Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll

llvm-svn: 199430
2014-01-16 21:44:34 +00:00
Owen Anderson
9c1a615059 Fix two cases where we could lose fast math flags when optimizing FADD expressions.
llvm-svn: 199427
2014-01-16 21:26:02 +00:00
Owen Anderson
dbdd830886 Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation.
llvm-svn: 199425
2014-01-16 21:07:52 +00:00
Owen Anderson
2c40c9a6c0 Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression.
llvm-svn: 199424
2014-01-16 20:59:41 +00:00
Owen Anderson
a218b5b798 Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X).
llvm-svn: 199420
2014-01-16 20:36:42 +00:00
Evgeniy Stepanov
5b1a672532 [asan] Remove -fsanitize-address-zero-base-shadow command line
flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.

- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.

This is the LLVM part of a larger change.

llvm-svn: 199371
2014-01-16 10:19:12 +00:00
Hans Wennborg
efa9ef0e63 Switch-to-lookup tables: set threshold to 3 cases
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.

The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.

In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.

llvm-svn: 199294
2014-01-15 05:00:27 +00:00
Arnold Schwaighofer
9fb94754bd LoopVectorize: Only strip casts from integer types when replacing symbolic
strides

Fixes PR18480.

llvm-svn: 199291
2014-01-15 03:35:46 +00:00
Matt Arsenault
babc737d7b Do pointer cast simplifications on addrspacecast
llvm-svn: 199254
2014-01-14 20:00:45 +00:00
Matt Arsenault
a5adc47c53 Remove a check for an illegal condition.
Bitcasts can't be between address spaces anymore.

llvm-svn: 199253
2014-01-14 19:56:57 +00:00
Matt Arsenault
50ba8b89a7 Make nocapture analysis work with addrspacecast
llvm-svn: 199246
2014-01-14 19:11:52 +00:00
Duncan P. N. Exon Smith
bb847bd59e Reapply "LTO: add API to set strategy for -internalize"
Reapply r199191, reverted in r199197 because it carelessly broke
Other/link-opts.ll.  The problem was that calling
createInternalizePass("main") would select
createInternalizePass(bool("main")) instead of
createInternalizePass(ArrayRef<const char *>("main")).  This commit
fixes the bug.

The original commit message follows.

Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199244
2014-01-14 18:52:17 +00:00
Nico Rieck
964a13bb4e Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199218
2014-01-14 15:22:47 +00:00
Nico Rieck
e8a579c6bc Revert "Decouple dllexport/dllimport from linkage"
Revert this for now until I fix an issue in Clang with it.

This reverts commit r199204.

llvm-svn: 199207
2014-01-14 12:38:32 +00:00
Nico Rieck
6203d44313 Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199204
2014-01-14 11:55:03 +00:00
NAKAMURA Takumi
068c8352f7 Revert r199191, "LTO: add API to set strategy for -internalize"
Please update also Other/link-opts.ll, in next time.

llvm-svn: 199197
2014-01-14 09:40:18 +00:00
Duncan P. N. Exon Smith
95dadb39e4 LTO: add API to set strategy for -internalize
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199191
2014-01-14 06:37:26 +00:00
Chandler Carruth
98adff6224 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth
59e885531a [PM] Pull the generic graph algorithms and data structures for dominator
trees into the Support library.

These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.

So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.

The yak shaving will continue...

llvm-svn: 199095
2014-01-13 10:52:56 +00:00
Chandler Carruth
ee051af6e2 [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Chandler Carruth
03b6c941a3 Re-sort #include lines again, prior to moving headers around.
llvm-svn: 199080
2014-01-13 08:04:33 +00:00
Hans Wennborg
f5c5f6e123 Switch-to-lookup tables: Don't require a result for the default
case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025
2014-01-12 00:44:41 +00:00
Arnold Schwaighofer
15e9d90974 LoopVectorizer: Enable strided memory accesses versioning per default
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
2014-01-11 20:40:34 +00:00
NAKAMURA Takumi
fbff75f61d LoopVectorize.cpp: Appease MSC16.
Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001
2014-01-11 09:59:27 +00:00
Diego Novillo
f47aa4d47f Extend and simplify the sample profile input file.
1- Use the line_iterator class to read profile files.

2- Allow comments in profile file. Lines starting with '#'
   are completely ignored while reading the profile.

3- Add parsing support for discriminators and indirect call samples.

   Our external profiler can emit more profile information that we are
   currently not handling. This patch does not add new functionality to
   support this information, but it allows profile files to provide it.

   I will add actual support later on (for at least one of these
   features, I need support for DWARF discriminators in Clang).

   A sample line may contain the following additional information:

   Discriminator. This is used if the sampled program was compiled with
   DWARF discriminator support
   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
   is currently only emitted by GCC and we just ignore it.

   Potential call targets and samples. If present, this line contains a
   call instruction. This models both direct and indirect calls. Each
   called target is listed together with the number of samples. For
   example,

                    130: 7  foo:3  bar:2  baz:7

   The above means that at relative line offset 130 there is a call
   instruction that calls one of foo(), bar() and baz(). With baz()
   being the relatively more frequent call target.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2355

4- Simplify format of profile input file.

   This implements earlier suggestions to simplify the format of the
   sample profile file. The symbol table is not necessary and function
   profiles do not need to know the number of samples in advance.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2419

llvm-svn: 198973
2014-01-10 23:23:51 +00:00
Diego Novillo
9e8454b3fe Propagation of profile samples through the CFG.
This adds a propagation heuristic to convert instruction samples
into branch weights. It implements a similar heuristic to the one
implemented by Dehao Chen on GCC.

The propagation proceeds in 3 phases:

1- Assignment of block weights. All the basic blocks in the function
   are initial assigned the same weight as their most frequently
   executed instruction.

2- Creation of equivalence classes. Since samples may be missing from
   blocks, we can fill in the gaps by setting the weights of all the
   blocks in the same equivalence class to the same weight. To compute
   the concept of equivalence, we use dominance and loop information.
   Two blocks B1 and B2 are in the same equivalence class if B1
   dominates B2, B2 post-dominates B1 and both are in the same loop.

3- Propagation of block weights into edges. This uses a simple
   propagation heuristic. The following rules are applied to every
   block B in the CFG:

   - If B has a single predecessor/successor, then the weight
     of that edge is the weight of the block.

   - If all the edges are known except one, and the weight of the
     block is already known, the weight of the unknown edge will
     be the weight of the block minus the sum of all the known
     edges. If the sum of all the known edges is larger than B's weight,
     we set the unknown edge weight to zero.

   - If there is a self-referential edge, and the weight of the block is
     known, the weight for that edge is set to the weight of the block
     minus the weight of the other incoming edges to that block (if
     known).

Since this propagation is not guaranteed to finalize for every CFG, we
only allow it to proceed for a limited number of iterations (controlled
by -sample-profile-max-propagate-iterations). It currently uses the same
GCC default of 100.

Before propagation starts, the pass builds (for each block) a list of
unique predecessors and successors. This is necessary to handle
identical edges in multiway branches. Since we visit all blocks and all
edges of the CFG, it is cleaner to build these lists once at the start
of the pass.

Finally, the patch fixes the computation of relative line locations.
The profiler emits lines relative to the function header. To discover
it, we traverse the compilation unit looking for the subprogram
corresponding to the function. The line number of that subprogram is the
line where the function begins. That becomes line zero for all the
relative locations.

llvm-svn: 198972
2014-01-10 23:23:46 +00:00
Arnold Schwaighofer
702d83d3d8 LoopVectorizer: Handle strided memory accesses by versioning
for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

llvm-svn: 198950
2014-01-10 18:20:32 +00:00
Chandler Carruth
53468087f3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Hao Liu
8c08e05c81 Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine.
llvm-svn: 198730
2014-01-08 03:06:15 +00:00
Chandler Carruth
7aa902a488 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Chandler Carruth
87f14b4eec Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Andrew Trick
bb6ce38639 Reapply r198654 "indvars: sink truncates outside the loop."
This doesn't seem to have actually broken anything. It was paranoia
on my part. Trying again now that bots are more stable.

This is a follow up of the r198338 commit that added truncates for
lcssa phi nodes. Sinking the truncates below the phis cleans up the
loop and simplifies subsequent analysis within the indvars pass.

llvm-svn: 198678
2014-01-07 06:59:12 +00:00
Andrew Trick
6d854ef50f Revert "indvars: sink truncates outside the loop."
This reverts commit r198654.

One of the bots reported a SciMark failure.

llvm-svn: 198659
2014-01-07 01:50:58 +00:00
Andrew Trick
7621f7c6a3 indvars: sink truncates outside the loop.
This is a follow up of the r198338 commit that added truncates for
lcssa phi nodes. Sinking the truncates below the phis cleans up the
loop and simplifies subsequent analysis within the indvars pass.

llvm-svn: 198654
2014-01-07 01:02:55 +00:00
Andrew Trick
7236fefab6 80 col. comment.
llvm-svn: 198653
2014-01-07 01:02:52 +00:00
Andrew Trick
12dfc32452 Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things."
Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted.

We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted.

llvm-svn: 198631
2014-01-06 19:43:14 +00:00
Alp Toker
b20c031b7a Add missed cleanup from r198456
All other uses of this macro in LLVM/clang have been moved to the function
definition so follow suite (and the usage advice) here too for consistency.

llvm-svn: 198516
2014-01-04 22:47:48 +00:00
Alp Toker
2d17611e90 Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things."
This commit was the source of crasher PR18384:

While deleting: label %for.cond127
An asserting value handle still pointed to this value!
UNREACHABLE executed at llvm/lib/IR/Value.cpp:671!

Reverting to get the builders green, feel free to re-land after fixing up.
(Renato has a handy isolated repro if you need it.)

This reverts commit r198478.

llvm-svn: 198503
2014-01-04 17:00:45 +00:00
Andrew Trick
45ef495b91 Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things.
getSCEV for an ashr instruction creates an intermediate zext
expression when it truncates its operand.

The operand is initially inside the loop, so the narrow zext
expression has a non-loop-invariant loop disposition.

LoopSimplify then runs on an outer loop, hoists the ashr operand, and
properly invalidate the SCEVs that are mapped to value.

The SCEV expression for the ashr is now an AddRec with the hoisted
value as the now loop-invariant start value.

The LoopDisposition of this wide value was properly invalidated during
LoopSimplify.

However, if we later get the ashr SCEV again, we again try to create
the intermediate zext expression. We get the same SCEV that we did
earlier, and it is still cached because it was never mapped to a
Value. When we try to create a new AddRec we abort because we're using
the old non-loop-invariant LoopDisposition.

I don't have a solution for this other than to clear LoopDisposition
when LoopSimplify hoists things.

I think the long-term strategy should be to perform LoopSimplify on
all loops before computing SCEV and before running any loop opts on
individual loops. It's possible we may want to rerun LoopSimplify on
individual loops, but it should rarely do anything, so rarely require
invalidating SCEV.

llvm-svn: 198478
2014-01-04 05:52:49 +00:00
Nico Weber
7e53ec0698 Add a LLVM_DUMP_METHOD macro.
The motivation is to mark dump methods as used in debug builds so that they can
be called from lldb, but to not do so in release builds so that they can be
dead-stripped.

There's lots of potential follow-up work suggested in the thread
"Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
but everyone seems to agreen on this subset.

Macro name chosen by fair coin toss.

llvm-svn: 198456
2014-01-03 22:53:37 +00:00
David Peixotto
2028917754 Fix loop rerolling pass failure with non-consant loop lower bound
The loop rerolling pass was failing with an assertion failure from a
failed cast on loops like this:

  void foo(int *A, int *B, int m, int n) {
    for (int i = m; i < n; i+=4) {
      A[i+0] = B[i+0] * 4;
      A[i+1] = B[i+1] * 4;
      A[i+2] = B[i+2] * 4;
      A[i+3] = B[i+3] * 4;
    }
  }

The code was casting the SCEV-expanded code for the new
induction variable to a phi-node. When the loop had a non-constant
lower bound, the SCEV expander would end the code expansion with an
add insted of a phi node and the cast would fail.

It looks like the cast to a phi node was only needed to get the
induction variable value coming from the backedge to compute the end
of loop condition. This patch changes the loop reroller to compare
the induction variable to the number of times the backedge is taken
instead of the iteration count of the loop. In other words, we stop
the loop when the current value of the induction variable ==
IterationCount-1. Previously, the comparison was comparing the
induction variable value from the next iteration == IterationCount.

This problem only seems to occur on 32-bit targets. For some reason,
the loop is not rerolled on 64-bit targets.

PR18290

llvm-svn: 198425
2014-01-03 17:20:01 +00:00
Hal Finkel
df8016f76f Disable compare sinking in CodeGenPrepare when multiple condition registers are available
As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively
sinks compares to reduce pressure on the condition register(s), for targets
such as PowerPC with multiple condition registers, this may not be the right
thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and
CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is
true.

This functionality will be used by the PowerPC backend in an upcoming commit.
Especially when the PowerPC backend starts tracking individual condition
register bits as separate allocatable entities (which will happen in this
upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
significantly suboptimial.

llvm-svn: 198354
2014-01-02 21:13:43 +00:00
Andrew Trick
9bdab3f1b3 indvars: cleanup the IV visitor. It does more than gather sext/zext info.
llvm-svn: 198353
2014-01-02 21:12:11 +00:00
Matt Arsenault
e28f607079 Delete unread globals through addrspacecast
llvm-svn: 198346
2014-01-02 20:01:43 +00:00
Matt Arsenault
090fe5a92a Fix addrspacecast with metadata globals
llvm-svn: 198345
2014-01-02 19:53:49 +00:00
Andrew Trick
5f76ab650f indvars: insert truncate at loop boundary to avoid redundant IVs.
When widening an IV to remove s/zext, we generally try to eliminate
the original narrow IV. However, LCSSA phi nodes outside the loop were
still using the original IV. Clean this up more aggressively to avoid
redundancy in generated code.

llvm-svn: 198338
2014-01-02 19:29:38 +00:00
Nico Weber
10bf32e628 Set LLVM_EXPORTED_SYMBOL_FILE in CMakeLists whose corresponding Makefiles do so.
(unittests/ExecutionEngine/JIT/CMakeLists.txt is still missing for now, since
it handles export files in a strange way: It generates a .exports file from a
.def file instead of the other way round.)

llvm-svn: 198183
2013-12-29 23:06:49 +00:00
Alexander Potapenko
7da398bcae [ASan] Fix the test for __asan_gen_ globals and actually fix http://llvm.org/bugs/show_bug.cgi?id=17976
by setting the correct linkage (as stated in the bug).

llvm-svn: 198018
2013-12-25 16:46:27 +00:00
Alexander Potapenko
53694d2efb [ASan] Make sure none of the __asan_gen_ global strings end up in the symbol table, add a test.
This should fix http://llvm.org/bugs/show_bug.cgi?id=17976
Another test checking for the global variables' locations and prefixes on Darwin will be committed separately.

llvm-svn: 198017
2013-12-25 14:22:15 +00:00
Andrew Trick
e7f9f5556d Add support to indvars for optimizing sadd.with.overflow.
Split sadd.with.overflow into add + sadd.with.overflow to allow
analysis and optimization. This should ideally be done after
InstCombine, which can perform code motion (eventually indvars should
run after all canonical instcombines). We want ISEL to recombine the
add and the check, at least on x86.

This is currently under an option for reducing live induction
variables: -liv-reduce. The next step is reducing liveness of IVs that
are live out of the overflow check paths. Once the related
optimizations are fully developed, reviewed and tested, I do expect
this to become default.

llvm-svn: 197926
2013-12-23 23:31:49 +00:00
Richard Sandiford
f367c783a7 Fix Scalarizer insertion point when replacing PHIs with insertelements
If the Scalarizer scalarized a vector PHI but could not scalarize
all uses of it, it would insert a series of insertelements to reconstruct
the vector PHI value from the scalar ones.  The problem was that it would
emit these insertelements immediately after the PHI, even if there were
other PHIs after it.

llvm-svn: 197909
2013-12-23 14:51:56 +00:00
Richard Sandiford
27fc4a21a8 Fix Scalarizer handling of vector GEPs with multiple index operands
The old code only worked for one index operand.  Also handle "inbounds".

llvm-svn: 197908
2013-12-23 14:45:00 +00:00
Kostya Serebryany
a148c8c9ed [asan] don't unpoison redzones on function exit in use-after-return mode.
Summary:
Before this change the instrumented code before Ret instructions looked like:
  <Unpoison Frame Redzones>
  if (Frame != OriginalFrame) // I.e. Frame is fake
     <Poison Complete Frame>

Now the instrumented code looks like:
  if (Frame != OriginalFrame) // I.e. Frame is fake
     <Poison Complete Frame>
  else
     <Unpoison Frame Redzones>

Reviewers: eugenis

Reviewed By: eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2458

llvm-svn: 197907
2013-12-23 14:15:08 +00:00
Kostya Serebryany
911683de1d [asan] produce fewer stores when poisoning stack shadow
llvm-svn: 197904
2013-12-23 09:24:36 +00:00
Justin Bogner
3b4e34606e Transforms: Don't create bad weights when eliminating dead cases
If we happen to eliminate every case in a switch that has branch
weights, we currently try to create metadata for the one remaining
branch, triggering an assert. Instead, we need to check that the
metadata we're trying to create is sensible.

llvm-svn: 197791
2013-12-20 08:21:30 +00:00
Kay Tiong Khoo
86f36f1147 Stay classy (and legal) LLVM. Remove links to 3rd party SMT solver whose links may not be permanent.
llvm-svn: 197713
2013-12-19 18:35:54 +00:00
Kay Tiong Khoo
304e305b5c Improved fix for PR17827 (instcombine of shift/and/compare).
This change fixes the case of arithmetic shift right - do not attempt to fold that case.
This change also relaxes the conditions when attempting to fold the logical shift right and shift left cases.

No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations.

llvm-svn: 197705
2013-12-19 18:07:17 +00:00