1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

632 Commits

Author SHA1 Message Date
Zinovy Nis
ce225593e1 [BUG][REFACTOR]
1) Fix for printing debug locations for absolute paths.
2) Location printing is moved into public method DebugLoc::print() to avoid re-inventing the wheel.

Differential Revision: http://reviews.llvm.org/D3513

llvm-svn: 208177
2014-05-07 09:51:22 +00:00
Yi Jiang
86dccb7e97 Always set alignment of vectorized LD/ST in SLP-Vectorizer. <rdar://problem/16812145>
llvm-svn: 207983
2014-05-05 17:59:14 +00:00
Arnold Schwaighofer
d752a09f8b SLPVectorizer: Bring back the insertelement patch (r205965) with fixes
When can't assume a vectorized tree is rooted in an instruction. The IRBuilder
could have constant folded it. When we rebuild the build_vector (the series of
InsertElement instructions) use the last original InsertElement instruction. The
vectorized tree root is guaranteed to be before it.

Also, we can't assume that the n-th InsertElement inserts the n-th element into
a vector.

This reverts r207746 which reverted the revert of the revert of r205018 or so.

Fixes the test case in PR19621.

llvm-svn: 207939
2014-05-04 17:10:15 +00:00
Benjamin Kramer
fbeb105fa6 SLPVectorizer: Lazily allocate the map for block numbering.
There is no point in creating it if we're not going to vectorize
anything. Creating the map is expensive as it creates large values.
No functionality change.

llvm-svn: 207916
2014-05-03 15:50:37 +00:00
Karthik Bhat
4591b173c7 Vectorize intrinsic math function calls in SLPVectorizer.
This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer.
Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559

llvm-svn: 207901
2014-05-03 09:59:54 +00:00
Eric Christopher
b8d435f9a6 Clean up constructor logic and member access for LoopVectorizeHints.
There are public functions that mutate various members as well as
another private member already, so make all the members private to
avoid the discontinuity and add accessors for the values. Should
be no functional change.

llvm-svn: 207868
2014-05-02 20:40:04 +00:00
Chandler Carruth
fa8592dc61 Revert r205965, which essentially reverts r205018 for the second time.
=[

Turns out that this was the root cause of PR19621. We found a crasher
only recently (likely due to improvements elsewhere in the SLP
vectorizer) but the reduced test case failed all the way back to here.
I've confirmed that reverting this patch both fixes the reduced test
case in PR19621 and the actual source file that led to it, so it seems
to really be rooted here. I've replied to the commit thread with
discussion of my (feeble) attempts to debug this. Didn't make it very
far, so reverting now that we have a good test case so that things can
get back to healthy while the debugging carries on.

llvm-svn: 207746
2014-05-01 11:24:11 +00:00
Benjamin Kramer
805d786f28 Add a <tuple> include to more files that aren't getting it transitively on MSVC.
llvm-svn: 207617
2014-04-30 07:21:01 +00:00
Diego Novillo
210bb92cec Fix vectorization remarks.
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.

Added tests to exercise some loop remarks.

llvm-svn: 207574
2014-04-29 20:06:10 +00:00
Yi Jiang
f658582852 Continue slp vectorization even the BB already has vectorized store radar://16641956
llvm-svn: 207572
2014-04-29 19:37:20 +00:00
Diego Novillo
81a83e3d17 Add optimization remarks to the loop unroller and vectorizer.
Summary:
This calls emitOptimizationRemark from the loop unroller and vectorizer
at the point where they make a positive transformation. For the
vectorizer, it reports vectorization and interleave factors. For the
loop unroller, it reports all the different supported types of
unrolling.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3456

llvm-svn: 207528
2014-04-29 14:27:31 +00:00
Zinovy Nis
26320a9150 [BUG] Fix -Wunused-variable warning in Release mode. Thnx to Kostya Serebryany for pointing.
llvm-svn: 207516
2014-04-29 09:45:08 +00:00
Kostya Serebryany
6630aee422 fix -Wunused-variable warning in Release mode
llvm-svn: 207514
2014-04-29 09:33:02 +00:00
Zinovy Nis
a7f05bffeb [OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer
llvm-svn: 207512
2014-04-29 08:55:11 +00:00
Craig Topper
b663bffa27 [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper
c0a2a29f4e [C++] Use 'nullptr'. Transforms edition.
llvm-svn: 207196
2014-04-25 05:29:35 +00:00
Karthik Bhat
a6070a9b75 Allow vectorization of bit intrinsics in BB Vectorizer.
This patch adds support for vectorization of  bit intrinsics such as bswap,ctpop,ctlz,cttz.

llvm-svn: 207174
2014-04-25 03:33:48 +00:00
Karthik Bhat
fd6c53ce06 Allow vectorization of few missed llvm intrinsic calls in BBVectorizor by handling them in isVectorizableIntrinsic function.
llvm-svn: 207085
2014-04-24 07:29:55 +00:00
Alexander Musman
721e76d2fa [LV] Statistics numbers for LoopVectorize introduced: a number of analyzed loops & a number of vectorized loops.
Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized.
Patch by Zinovy Nis

Differential Revision: http://reviews.llvm.org/D3438

llvm-svn: 206956
2014-04-23 08:40:37 +00:00
Chandler Carruth
6f9ba6a633 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Transforms/...
edition.

This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.

Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.

llvm-svn: 206844
2014-04-22 02:55:47 +00:00
Alexey Bataev
135cfee77c D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata
llvm-svn: 206266
2014-04-15 09:37:30 +00:00
Arnold Schwaighofer
c65ae6074a Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This commit reapplies 205018. After 205855 we should correctly vectorize
intrinsics.

llvm-svn: 205965
2014-04-10 13:41:35 +00:00
Arnold Schwaighofer
1a503c9322 SLPVectorizer: Only vectorize intrinsics whose operands are widened equally
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.

Patch by Tyler Nowicki!

llvm-svn: 205855
2014-04-09 14:20:47 +00:00
Eric Christopher
23b79cb873 Add NDEBUG markers around debug only function.
llvm-svn: 205706
2014-04-07 12:46:30 +00:00
Eric Christopher
06a9cfdefa Add debug location information to the vectorizer debug statements.
Patch by Zinovy Nis.

llvm-svn: 205705
2014-04-07 12:32:17 +00:00
David Blaikie
e0b9857e92 Fixing typo.
Differential Revision: http://reviews.llvm.org/D3154

llvm-svn: 205674
2014-04-05 20:30:31 +00:00
Tim Northover
466b3a39e1 SLPVectorizer: compare entire intrinsic for SLP compatibility.
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.

llvm-svn: 205424
2014-04-02 14:39:02 +00:00
Hal Finkel
5a327230eb [LoopVectorizer] Count dependencies of consecutive pointers as uniforms
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).

For example, the TSVC benchmark function s173 has:
  ...
  %3 = add nsw i64 %indvars.iv, 16000
  %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
  ...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.

Fixes PR19296.

llvm-svn: 205387
2014-04-02 02:34:49 +00:00
Arnold Schwaighofer
219f6a43e0 Revert "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This reverts commit r205018.

Conflicts:
	lib/Transforms/Vectorize/SLPVectorizer.cpp
	test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

This is breaking libclc build.

llvm-svn: 205260
2014-03-31 23:05:56 +00:00
Arnold Schwaighofer
bf6c68c0be SLPVectorizer: Take credit for free extractelement instructions
Extract element instructions that will be removed when vectorzing lower the
cost.

Patch by Arch D. Robison!

llvm-svn: 205020
2014-03-28 17:21:32 +00:00
Arnold Schwaighofer
ffb5e31163 SLPVectorizer: Fix typos
Patch by Arch D. Robison!

llvm-svn: 205019
2014-03-28 17:21:27 +00:00
Arnold Schwaighofer
8510d16f52 SLPVectorizer: Ignore users that are insertelements we can reschedule them
Patch by Arch D. Robison!

llvm-svn: 205018
2014-03-28 17:21:22 +00:00
Andrew Trick
16d04697fd SLP vectorizer: Don't hoist vector extracts of phis.
Extracts coming from phis were being hoisted, while all others were
sunk to their uses. This was inconsistent and didn't seem to serve a
purpose. Changing all extracts to be sunk to uses is a prerequisite
for adding block frequency to the SLP vectorizer's cost model.

I benchmarked the change in isolation (without block frequency). I
only saw noise on x86 and some potentially significant improvements on
ARM. No major regressions is good enough for me.

llvm-svn: 204699
2014-03-25 02:18:47 +00:00
Chandler Carruth
536fb8893d [LV] While I'm here, use range based for loops which are so much cleaner
for this kind of walk.

llvm-svn: 204188
2014-03-18 22:00:32 +00:00
Chandler Carruth
9fa85c489d [LV] The actual change I intended to commit in r204148. Sorry for the
noise.

Original commit log:
Replace some dead code with an assert. When I first ported this pass
from a loop pass to a function pass I did so in the naive, recursive
way. It doesn't actually work, we need a worklist instead. When
I switched to the worklist I didn't delete the naive recursion. That
recursion was also buggy because it was dead and never really exercised.

llvm-svn: 204187
2014-03-18 21:58:38 +00:00
Chandler Carruth
4ac3e74751 [LV] Replace some dead code with an assert. When I first ported this
pass from a loop pass to a function pass I did so in the naive,
recursive way. It doesn't actually work, we need a worklist instead.
When I switched to the worklist I didn't delete the naive recursion.
That recursion was also buggy because it was dead and never really
exercised.

llvm-svn: 204184
2014-03-18 21:51:46 +00:00
Raul E. Silvera
1c39640e2d Resubmit "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8.
The problems previously found have been resolved through other CLs.

llvm-svn: 203707
2014-03-12 20:21:50 +00:00
Ahmed Charles
e4b10534bd Fix build break.
llvm-svn: 203366
2014-03-09 03:50:36 +00:00
Chandler Carruth
fad39ebe19 [C++11] Add range based accessors for the Use-Def chain of a Value.
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
   detail
2) Change it to actually be a *Use* iterator rather than a *User*
   iterator.
3) Add an adaptor which is a User iterator that always looks through the
   Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
   they wanted a use_iterator (and to explicitly dig out the User when
   needed), or a user_iterator which makes the Use itself totally
   opaque.

Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.

The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.

However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]

llvm-svn: 203364
2014-03-09 03:16:01 +00:00
Arnold Schwaighofer
adebac793b LoopVectorizer: Preserve fast-math flags
Fixes PR19045.

llvm-svn: 203008
2014-03-05 21:10:47 +00:00
Craig Topper
a3683ec835 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 202953
2014-03-05 09:10:37 +00:00
Chandler Carruth
649f6270aa [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth
d0657fe39f [Modules] Move the LLVM IR pattern match header into the IR library, it
obviously is coupled to the IR.

llvm-svn: 202818
2014-03-04 11:08:18 +00:00
Benjamin Kramer
3ac154a395 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer
e4eb1b495f [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Benjamin Kramer
803ba41365 Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Rafael Espindola
32da4bdd4b Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola
9869474f57 Make a few more DataLayout variables const.
llvm-svn: 202155
2014-02-25 14:24:11 +00:00
Rafael Espindola
6c834371d9 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Arnold Schwaighofer
c68a727215 SLPVectorizer: Try vectorizing 'splat' stores
Vectorize sequential stores of a broadcasted value.
5% on eon.

radar://16124699

llvm-svn: 202067
2014-02-24 19:52:29 +00:00
Rafael Espindola
83f8550fb2 Rename many DataLayout variables from TD to DL.
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.

llvm-svn: 201827
2014-02-21 00:06:31 +00:00
Gerolf Hoflehner
ec56f33316 fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178
llvm-svn: 201501
2014-02-17 03:06:16 +00:00
Gerolf Hoflehner
283b0694b1 fixed typo in comment as my test commit
llvm-svn: 201486
2014-02-16 10:43:25 +00:00
Benjamin Kramer
57d9ecca57 Reduce code duplication resulting from the ConstantVector/ConstantDataVector split.
No intended functionality change.

llvm-svn: 201344
2014-02-13 16:48:38 +00:00
Andrea Di Biagio
594ea331ef [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.

llvm-svn: 201272
2014-02-12 23:43:47 +00:00
Arnold Schwaighofer
0bd2bb3092 LoopVectorizer: Keep track of conditional store basic blocks
Before conditional store vectorization/unrolling we had only one
vectorized/unrolled basic block. After adding support for conditional store
vectorization this will not only be one block but multiple basic blocks. The
last block would have the back-edge. I updated the code to use a vector of basic
blocks instead of a single basic block and fixed the users to use the last entry
in this vector. But, I forgot to add the basic blocks to this vector!

Fixes PR18724.

llvm-svn: 201028
2014-02-08 20:41:13 +00:00
Paul Robinson
189e175394 Disable most IR-level transform passes on functions marked 'optnone'.
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892
2014-02-06 00:07:05 +00:00
Arnold Schwaighofer
8a0e82c2bc LoopVectorizer: Enable unrolling of conditional stores and the load/store
unrolling heuristic per default

Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.

llvm-svn: 200621
2014-02-02 03:12:34 +00:00
Reid Kleckner
0421c6aef8 Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit r200576.  It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.

llvm-svn: 200602
2014-02-01 01:37:30 +00:00
Chandler Carruth
74c658030d [SLPV] Recognize vectorizable intrinsics during SLP vectorization and
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.

Patch by Raul Silvera! (Much delayed due to my not running dcommit)

llvm-svn: 200576
2014-01-31 21:14:40 +00:00
Chandler Carruth
fbc2b60e8a [vectorizer] Tweak the way we do small loop runtime unrolling in the
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.

I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.

llvm-svn: 200530
2014-01-31 10:51:08 +00:00
Arnold Schwaighofer
5b96c24a7a LoopVectorizer: Don't count the induction variable multiple times
When estimating register pressure, don't count the induction variable mulitple
times. It is unlikely to be unrolled. This is currently disabled and hidden
behind a flag ("enable-ind-var-reg-heur").

llvm-svn: 200371
2014-01-29 04:36:12 +00:00
Chandler Carruth
6a45efab46 [vectorizer] Completely disable the block frequency guidance of the loop
vectorizer, placing it behind an off-by-default flag.

It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.

I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.

llvm-svn: 200294
2014-01-28 09:10:41 +00:00
Arnold Schwaighofer
8f596e2047 LoopVectorize: Support conditional stores by scalarizing
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

  for (i = 0; i < 128; ++i) {
    if (a[i] < 10)
      a[i] += val;
  }

  for (i = 0; i < 128; i+=2) {
    v = a[i:i+1];
    v0 = (extract v, 0) + 10;
    v1 = (extract v, 1) + 10;
    if (v0 < 10)
      a[i] = v0;
    if (v1 < 10)
      a[i] = v1;
  }

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.

rdar://15892953

Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
   Applications/siod/siod         2.18%
 Performance improvements:
   mesa                          -4.42%
   libquantum                    -4.15%

 With a patch that slightly changes the register heuristics (by subtracting the
 induction variable on both sides of the register pressure equation, as the
 induction variable is probably not really unrolled):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2  7.73%
   Applications/siod/siod          1.97%

 Performance Improvements:
   libquantum                    -13.05% (we now also unroll quantum_toffoli)
   mesa                           -4.27%

llvm-svn: 200270
2014-01-28 01:01:53 +00:00
Chandler Carruth
f70ef7ae29 [vectorize] Initial version of respecting PGO in the vectorizer: treat
cold loops as-if they were being optimized for size.

Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.

The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:

1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).

I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.

One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.

llvm-svn: 200219
2014-01-27 13:11:50 +00:00
Chandler Carruth
88d92716dd [vectorizer] Add an override for the target instruction cost and use it
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.

llvm-svn: 200215
2014-01-27 11:41:50 +00:00
Chandler Carruth
eb82628ff7 [vectorizer] Simplify code to use existing helpers on the Function
object and fewer pointless variables.

Also, add a clarifying comment and a FIXME because the code which
disables *all* vectorization if we can't use implicit floating point
instructions just makes no sense at all.

llvm-svn: 200214
2014-01-27 11:27:37 +00:00
Chandler Carruth
d1ecfe35ae [vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.

Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).

While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.

llvm-svn: 200213
2014-01-27 11:12:24 +00:00
Chandler Carruth
bdbe34a1a1 [vectorizer] Add some flags which are useful for conducting experiments
with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.

These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.

llvm-svn: 200212
2014-01-27 11:12:19 +00:00
Chandler Carruth
dd6cf9494b [vectorizer] Fix a trivial oversight where we always requested the
number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.

If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.

Also made the variable name a bit more sensible while I'm here.

llvm-svn: 200211
2014-01-27 11:12:14 +00:00
Chandler Carruth
a89deb11ba [vectorizer] Clean up the handling of unvectorized loop unrolling in the
LoopVectorize pass.

The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.

This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.

llvm-svn: 200198
2014-01-27 08:17:58 +00:00
Chandler Carruth
4fb3e5831e [LPM] Conclude my immediate work by making the LoopVectorizer
a FunctionPass. With this change the loop vectorizer no longer is a loop
pass and can readily depend on function analyses. In particular, with
this change we no longer have to form a loop pass manager to run the
loop vectorizer which simplifies the entire pass management of LLVM.

The next step here is to teach the loop vectorizer to leverage profile
information through the profile information providing analysis passes.

llvm-svn: 200074
2014-01-25 10:01:55 +00:00
Alp Toker
1c4b33e8e5 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Arnold Schwaighofer
2c67b7dc58 LoopVectorizer: A reduction that has multiple uses of the reduction value is not
a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

llvm-svn: 199570
2014-01-19 03:18:31 +00:00
Arnold Schwaighofer
9fb94754bd LoopVectorize: Only strip casts from integer types when replacing symbolic
strides

Fixes PR18480.

llvm-svn: 199291
2014-01-15 03:35:46 +00:00
Chandler Carruth
98adff6224 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth
ee051af6e2 [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Arnold Schwaighofer
15e9d90974 LoopVectorizer: Enable strided memory accesses versioning per default
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
2014-01-11 20:40:34 +00:00
NAKAMURA Takumi
fbff75f61d LoopVectorize.cpp: Appease MSC16.
Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001
2014-01-11 09:59:27 +00:00
Arnold Schwaighofer
702d83d3d8 LoopVectorizer: Handle strided memory accesses by versioning
for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

llvm-svn: 198950
2014-01-10 18:20:32 +00:00
Chandler Carruth
87f14b4eec Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Arnold Schwaighofer
e4d65aae7d LoopVectorizer: Don't if-convert constant expressions that can trap
A phi node operand or an instruction operand could be a constant expression that
can trap (division). Check that we don't vectorize such cases.

PR16729
radar://15653590

llvm-svn: 197449
2013-12-17 01:11:01 +00:00
NAKAMURA Takumi
54fa39136d Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
NAKAMURA Takumi
b2c60b7ca7 Whitespaces.
llvm-svn: 196880
2013-12-10 05:39:12 +00:00
Jakub Staszak
11e1c882f7 Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces
overall time of LLVM compilation by ~1%.

llvm-svn: 196667
2013-12-07 21:20:17 +00:00
Renato Golin
a4d4a4c44f Add #pragma vectorize enable/disable to LLVM
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

llvm-svn: 196537
2013-12-05 21:20:02 +00:00
Rafael Espindola
2ad993fb14 Fix non-deterministic behavior.
We use CSEBlocks to initialize a worklist:

SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());

so it must have a deterministic order.

llvm-svn: 196520
2013-12-05 18:28:01 +00:00
Arnold Schwaighofer
120880c780 SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508
2013-12-05 15:14:40 +00:00
Alp Toker
e845f8af67 Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Nadav Rotem
dc01e91cf5 PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.

llvm-svn: 195791
2013-11-26 22:24:25 +00:00
Arnold Schwaighofer
d0c05d2c84 LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

llvm-svn: 195787
2013-11-26 22:11:23 +00:00
Nadav Rotem
643eb4c26e PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.

llvm-svn: 195773
2013-11-26 17:29:19 +00:00
Chandler Carruth
a1094eb135 Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.

Fixes PR17741.

Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]

llvm-svn: 195528
2013-11-23 00:48:34 +00:00
Arnold Schwaighofer
3fa9376236 SLPVectorizer: Fix whitespace errors.
llvm-svn: 195468
2013-11-22 15:47:17 +00:00
Yi Jiang
74286d427a SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses.
llvm-svn: 195406
2013-11-22 01:57:02 +00:00
Arnold Schwaighofer
242935ec8c SLPVectorizer: Fix stale for Value pointer array
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.

Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.

The test case will only fail when running with libgmalloc.

radar://15498655

llvm-svn: 195162
2013-11-19 22:20:20 +00:00
Arnold Schwaighofer
3149313505 SLPVectorizer: Fix whitespace errors
llvm-svn: 195161
2013-11-19 22:20:18 +00:00
Arnold Schwaighofer
e4280ec4dd LoopVectorizer: Extend the induction variable to a larger type
In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.

The problem is loops like:
int main ()
{
  int a = 1;
  char b = 0;
  lbl:
    a &= 4;
    b--;
    if (b) goto lbl;
  return a;
}

The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.

To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.

PR17532

llvm-svn: 195008
2013-11-18 13:14:32 +00:00
Arnold Schwaighofer
01b6f1cc9a LoopVectorizer: Use abi alignment for accesses with no alignment
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

llvm-svn: 194876
2013-11-15 23:09:33 +00:00
Renato Golin
ed3b88828a Move debug message in vectorizer
No functional change, just better reporting.

llvm-svn: 194388
2013-11-11 16:27:35 +00:00