1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
Commit Graph

11415 Commits

Author SHA1 Message Date
Nick Lewycky
d7c0e22d5f Improve 'tail' call marking in TRE. A bootstrap of clang goes from 375k calls marked tail in the IR to 470k, however this improvement does not carry into an improvement of the call/jmp ratio on x86. The most common pattern is a tail call + br to a block with nothing but a 'ret'.
The number of tail call to loop conversions remains the same (1618 by my count).

The new algorithm does a local scan over the use-def chains to identify local "alloca-derived" values, as well as points where the alloca could escape. Then, a visit over the CFG marks blocks as being before or after the allocas have escaped, and annotates the calls accordingly.

llvm-svn: 208017
2014-05-05 23:59:03 +00:00
Yi Jiang
8e66fa3904 Reapply: Add slp vectorization to LTO passes. The bug it exposed has been fixed by r207983. <radar://16641956>
llvm-svn: 208013
2014-05-05 23:14:46 +00:00
Yi Jiang
86dccb7e97 Always set alignment of vectorized LD/ST in SLP-Vectorizer. <rdar://problem/16812145>
llvm-svn: 207983
2014-05-05 17:59:14 +00:00
Duncan P. N. Exon Smith
a7513de2e4 LTO: -internalize sets visibility to default
Visibility is meaningless when the linkage is local.  Change
`-internalize` to reset the visibility to `default`.

<rdar://problem/16141113>

llvm-svn: 207979
2014-05-05 17:40:44 +00:00
Timur Iskhodzhanov
16ebb61afc [ASan/Win] Fix issue 305 -- don't instrument .CRT initializer/terminator callbacks
See https://code.google.com/p/address-sanitizer/issues/detail?id=305
Reviewed at http://reviews.llvm.org/D3607

llvm-svn: 207968
2014-05-05 14:28:38 +00:00
Benjamin Kramer
8125aa2cd4 LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling.
Otherwise we use the same threshold as for complete unrolling, which is
way too high. This made us unroll any loop smaller than 150 instructions
by 8 times, but only if someone specified -march=core2 or better,
which happens to be the default on darwin.

llvm-svn: 207940
2014-05-04 19:12:38 +00:00
Arnold Schwaighofer
d752a09f8b SLPVectorizer: Bring back the insertelement patch (r205965) with fixes
When can't assume a vectorized tree is rooted in an instruction. The IRBuilder
could have constant folded it. When we rebuild the build_vector (the series of
InsertElement instructions) use the last original InsertElement instruction. The
vectorized tree root is guaranteed to be before it.

Also, we can't assume that the n-th InsertElement inserts the n-th element into
a vector.

This reverts r207746 which reverted the revert of the revert of r205018 or so.

Fixes the test case in PR19621.

llvm-svn: 207939
2014-05-04 17:10:15 +00:00
Benjamin Kramer
fbeb105fa6 SLPVectorizer: Lazily allocate the map for block numbering.
There is no point in creating it if we're not going to vectorize
anything. Creating the map is expensive as it creates large values.
No functionality change.

llvm-svn: 207916
2014-05-03 15:50:37 +00:00
Karthik Bhat
4591b173c7 Vectorize intrinsic math function calls in SLPVectorizer.
This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer.
Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559

llvm-svn: 207901
2014-05-03 09:59:54 +00:00
Eric Christopher
b8d435f9a6 Clean up constructor logic and member access for LoopVectorizeHints.
There are public functions that mutate various members as well as
another private member already, so make all the members private to
avoid the discontinuity and add accessors for the values. Should
be no functional change.

llvm-svn: 207868
2014-05-02 20:40:04 +00:00
Nico Weber
fdced47a40 Teach GlobalDCE how to remove empty global_ctor entries.
This moves most of GlobalOpt's constructor optimization
code out of GlobalOpt into Transforms/Utils/CDtorUtils.{h,cpp}. The
public interface is a single function OptimizeGlobalCtorsList() that
takes a predicate returning which constructors to remove.

GlobalOpt calls this with a function that statically evaluates all
constructors, just like it did before. This part of the change is
behavior-preserving.

Also add a call to this from GlobalDCE with a filter that removes global
constructors that contain a "ret" instruction and nothing else – this
fixes PR19590.

llvm-svn: 207856
2014-05-02 18:35:25 +00:00
Akira Hatanaka
f64b00f1d7 [GVN] Pass the phi-translated address of a load instead of the untranslated
address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where
PRE is applied to a load that is not partially redundant.

<rdar://problem/16638765>.

llvm-svn: 207853
2014-05-02 17:59:17 +00:00
Nick Lewycky
e4f0d18fe0 Fold strlen(expr ? "str1" : "str2") to x ? len1 : len2. This fires about 330 times in a bootstrap of clang.
llvm-svn: 207828
2014-05-02 04:11:45 +00:00
Benjamin Kramer
040af9d6b4 Update and sort CMakeLists.
llvm-svn: 207785
2014-05-01 18:59:11 +00:00
Eli Bendersky
0602e236ae Add an optimization that does CSE in a group of similar GEPs.
This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.

The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.

Review: http://reviews.llvm.org/D3462

Patch by Jingyue Wu.

llvm-svn: 207783
2014-05-01 18:38:36 +00:00
Chandler Carruth
fa8592dc61 Revert r205965, which essentially reverts r205018 for the second time.
=[

Turns out that this was the root cause of PR19621. We found a crasher
only recently (likely due to improvements elsewhere in the SLP
vectorizer) but the reduced test case failed all the way back to here.
I've confirmed that reverting this patch both fixes the reduced test
case in PR19621 and the actual source file that led to it, so it seems
to really be rooted here. I've replied to the commit thread with
discussion of my (feeble) attempts to debug this. Didn't make it very
far, so reverting now that we have a good test case so that things can
get back to healthy while the debugging carries on.

llvm-svn: 207746
2014-05-01 11:24:11 +00:00
Gerolf Hoflehner
aec10987f2 Patch for function cloning to inline all blocks whose address is taken
Not all address taken blocks get inlined. The reason is
that a blocks new address is known only when it is cloned. But e.g.
a branch instruction in a different block could need that address earlier
while it gets cloned. The solution is to collect the set of all
blocks that can potentially get inlined and compute a new block address
up front. Then clone and cleanup.

rdar://16427209

llvm-svn: 207713
2014-04-30 22:05:02 +00:00
Yi Jiang
5faccf9d45 Revert r207571 - Add slp vectorization to LTO passes
llvm-svn: 207693
2014-04-30 19:27:24 +00:00
Carlo Kok
3c208bd22a [IPO/MergeFunctions] changes so it doesn't try to bitcast a struct return type but instead recreates it with insert/extract value.
llvm-svn: 207679
2014-04-30 17:53:04 +00:00
Benjamin Kramer
805d786f28 Add a <tuple> include to more files that aren't getting it transitively on MSVC.
llvm-svn: 207617
2014-04-30 07:21:01 +00:00
NAKAMURA Takumi
45cffbdde3 ConstantHoisting.cpp: Add <tuple> for std::tie, since r207593 removed FileSystem.h, it includes <tuple>.
llvm-svn: 207614
2014-04-30 06:44:50 +00:00
Jim Grosbach
c9daf08519 Tidy up.
llvm-svn: 207585
2014-04-29 22:41:58 +00:00
Jim Grosbach
d4614a7a4c Spelling.
llvm-svn: 207584
2014-04-29 22:41:55 +00:00
Rafael Espindola
f399f03ccc Also handle ConstantAggregateZero when optimizing vpermilvar*.
llvm-svn: 207582
2014-04-29 22:20:40 +00:00
Rafael Espindola
b4468db535 Remove tabs.
Sorry, new machine and I forgot to change the editor setting.

llvm-svn: 207578
2014-04-29 21:02:37 +00:00
Rafael Espindola
7372093fcc Two fixes to the vpermilvar optimization.
The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect.
The _256 variants have indexes into the individual 128 bit lanes and in all
cases it also has to mask out unused bits.

llvm-svn: 207577
2014-04-29 20:41:54 +00:00
Diego Novillo
210bb92cec Fix vectorization remarks.
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.

Added tests to exercise some loop remarks.

llvm-svn: 207574
2014-04-29 20:06:10 +00:00
Yi Jiang
f658582852 Continue slp vectorization even the BB already has vectorized store radar://16641956
llvm-svn: 207572
2014-04-29 19:37:20 +00:00
Yi Jiang
1c75ac2f65 Add slp vectorization to LTO passes
llvm-svn: 207571
2014-04-29 19:35:39 +00:00
Adam Nemet
109f756317 Reapply r207271 without the testcase
PR19608 was filed to find a suitable testcase.

llvm-svn: 207569
2014-04-29 18:25:28 +00:00
Diego Novillo
81a83e3d17 Add optimization remarks to the loop unroller and vectorizer.
Summary:
This calls emitOptimizationRemark from the loop unroller and vectorizer
at the point where they make a positive transformation. For the
vectorizer, it reports vectorization and interleave factors. For the
loop unroller, it reports all the different supported types of
unrolling.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3456

llvm-svn: 207528
2014-04-29 14:27:31 +00:00
Zinovy Nis
26320a9150 [BUG] Fix -Wunused-variable warning in Release mode. Thnx to Kostya Serebryany for pointing.
llvm-svn: 207516
2014-04-29 09:45:08 +00:00
Kostya Serebryany
6630aee422 fix -Wunused-variable warning in Release mode
llvm-svn: 207514
2014-04-29 09:33:02 +00:00
Zinovy Nis
a7f05bffeb [OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer
llvm-svn: 207512
2014-04-29 08:55:11 +00:00
Michael Zolotukhin
354ce4fae7 Fix a typo in comment
llvm-svn: 207499
2014-04-29 07:35:33 +00:00
Chandler Carruth
718a254f79 Revert r207271 for now. This commit introduced a test case that ran
clang directly from the LLVM test suite! That doesn't work. I've
followed up on the review thread to try and get a viable solution sorted
out, but trying to get the tree clean here.

llvm-svn: 207462
2014-04-28 23:07:49 +00:00
Hans Wennborg
6405294846 InstCombine: don't drop 'inalloca' in PromoteCastOfAllocation (PR19569)
llvm-svn: 207426
2014-04-28 17:40:03 +00:00
Chandler Carruth
d81a0d614d Fix rampant quadratic behavior in UpdatePHINodes. The operation of
mapping from a basic block to an incoming value, either for removal or
just lookup, is linear in the number of predecessors, and we were doing
this for every entry in the 'Preds' list which is in many cases almost
all of them!

Unfortunately, the fixes are quite ugly. PHI nodes just don't make this
operation easy. The efficient way to fix this is to have a clever
'remove_if' operation on PHI nodes that lets us do a single pass over
all the incoming values of the original PHI node, extracting the ones we
care about. Then we could quickly construct the new phi node from this
list. This would remove the remaining underlying quadratic movement of
unrelated incoming values and the need for silly backwards looping to
"minimize" how often we hit the quadratic case.

This is the last obvious fix for PR19499. It shaves another 20% off the
compile time for me, and while UpdatePHINodes remains in the profile,
most of the time is now stemming from the well known inefficiencies of
LVI and jump threading.

llvm-svn: 207409
2014-04-28 10:37:30 +00:00
Craig Topper
b663bffa27 [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Gerolf Hoflehner
13cf626de6 RecursivelyDeleteTriviallyDeadInstructions() could remove
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

Repaired r207302.

llvm-svn: 207309
2014-04-26 05:58:11 +00:00
Gerolf Hoflehner
c780c786a3 Restore CloneFunction.cpp which got accidently
overwritten by previous backout of r207303

llvm-svn: 207308
2014-04-26 05:43:41 +00:00
Gerolf Hoflehner
99a3e5b9f5 Revert commit r207302 since build failures
have been reported.

llvm-svn: 207303
2014-04-26 02:03:17 +00:00
Gerolf Hoflehner
54bf53d166 RecursivelyDeleteTriviallyDeadInstructions() could remove
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

llvm-svn: 207302
2014-04-26 01:19:16 +00:00
Andrea Di Biagio
815dfb7574 [InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shift
right intrinsics.

A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.

llvm-svn: 207299
2014-04-26 01:03:22 +00:00
Adrian Prantl
ca946260a4 Unbreak the gdb buildbot by not lowering dbg.declare intrinsics for arrays.
llvm-svn: 207284
2014-04-25 23:00:25 +00:00
Adam Nemet
c4071d355a [LoopStrengthReduce] Don't trim formula that uses a subset of required registers
Consider this use from the new testcase:

  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
    reg({1000,+,-1}<nw><%for.body>)
    -3003 + reg({3,+,3}<nw><%for.body>)
    -1001 + reg({1,+,1}<nuw><nsw><%for.body>)
    -1000 + reg({0,+,1}<nw><%for.body>)
    -3000 + reg({0,+,3}<nuw><%for.body>)
    reg({-1000,+,1}<nw><%for.body>)
    reg({-3000,+,3}<nsw><%for.body>)

This is the last use we consider for a solution in SolveRecurse, so CurRegs is
a large set.  (CurRegs is the set of registers that are needed by the
previously visited uses in the in-progress solution.)

ReqRegs is {
  {3,+,3}<nw><%for.body>,
  {1,+,1}<nuw><nsw><%for.body>
}

This is the intersection of the regs used by any of the formulas for the
current use and CurRegs.

Now, the code requires a formula to contain *all* these regs (the comment is
simply wrong), otherwise the formula is immediately disqualified.  Obviously,
no formula for this use contains two regs so they will all get disqualified.

The fix modifies the check to allow the formula in this case.  The idea is
that neither of these formulae is introducing any new registers which is the
point of this early pruning as far as I understand.

In terms of set arithmetic, we now allow formulas whose used regs are a subset
of the required regs not just the other way around.

There are few more loops in the test-suite that are now successfully LSRed.  I
have benchmarked those and found very minimal change.

Fixes <rdar://problem/13965777>

llvm-svn: 207271
2014-04-25 21:02:21 +00:00
Adrian Prantl
7566e72bb8 This reapplies r207235 with an additional bugfixes caught by the msan
buildbot - do not insert debug intrinsics before phi nodes.

Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207269
2014-04-25 20:49:25 +00:00
Duncan P. N. Exon Smith
ea68e6a3d5 SCC: Change clients to use const, NFC
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit.  First, update the users.

<rdar://problem/14292693>

llvm-svn: 207252
2014-04-25 18:24:50 +00:00
Adrian Prantl
319db7c542 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
This reverts commit 207235 to investigate msan buildbot breakage.

llvm-svn: 207250
2014-04-25 18:18:09 +00:00
Manman Ren
5d62183fae [inline cold threshold] Command line argument for inline threshold will
override the default cold threshold.

When we use command line argument to set the inline threshold, the default
cold threshold will not be used. This is in line with how we use
OptSizeThreshold. When we want a higher threshold for all functions, we
do not have to set both inline threshold and cold threshold.

llvm-svn: 207245
2014-04-25 17:34:55 +00:00