1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

489 Commits

Author SHA1 Message Date
Nadav Rotem
a18fab3891 Fix a typo
llvm-svn: 180806
2013-04-30 21:04:51 +00:00
Nadav Rotem
fe6c769d60 LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes.
llvm-svn: 180593
2013-04-26 05:08:59 +00:00
Nadav Rotem
d5eaf768a9 LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers.
llvm-svn: 180570
2013-04-25 19:55:03 +00:00
Arnold Schwaighofer
b2a88cf3ef LoopVectorizer: Change variable name Stride to ConsecutiveStride
This makes it easier to read the code.

No functionality change.

llvm-svn: 180197
2013-04-24 16:16:03 +00:00
Arnold Schwaighofer
ad591145df LoopVectorize: Scalarize padded types
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.

Patch by Daisuke Takahashi!

Fixes PR15758.

llvm-svn: 180196
2013-04-24 16:16:01 +00:00
Arnold Schwaighofer
169f004ff2 LoopVectorizer: Bail out if we don't have datalayout we need it
llvm-svn: 180195
2013-04-24 16:15:58 +00:00
Nadav Rotem
1bfb7903e3 LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order.
This fixes a miscompilation in FreeBSD's regex library.

llvm-svn: 180121
2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen
d8a8d1d02f Call the potentially costly isAnnotatedParallel() only once.
Made the uniform write test's checks a bit stricter.

llvm-svn: 180119
2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen
1492231ce9 Refuse to (even try to) vectorize loops which have uniform writes,
even if erroneously annotated with the parallel loop metadata.

Fixes Bug 15794: 
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"

llvm-svn: 180081
2013-04-23 08:08:51 +00:00
Eric Christopher
beec5d09da Move C++ code out of the C headers and into either C++ headers
or the C++ files themselves. This enables people to use
just a C compiler to interoperate with LLVM.

llvm-svn: 180063
2013-04-22 22:47:22 +00:00
Nadav Rotem
e567845da4 SLPVectorize: Add support for vectorization of casts.
llvm-svn: 179975
2013-04-21 08:05:59 +00:00
Nadav Rotem
b0e93c71e8 SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes with multiple users.
We did not terminate the switch case and we executed the search routine twice.

llvm-svn: 179974
2013-04-21 07:37:56 +00:00
Nadav Rotem
069e6d9a7f Fix PR15800. Do not try to vectorize vectors and structs.
llvm-svn: 179960
2013-04-20 22:29:43 +00:00
Benjamin Kramer
f048a2579c VecUtils: Clean up uses of dyn_cast.
llvm-svn: 179936
2013-04-20 10:36:17 +00:00
Benjamin Kramer
e587f9b0bb SLPVectorizer: Strength reduce SmallVectors to ArrayRefs.
Avoids a couple of copies and allows more flexibility in the clients.

llvm-svn: 179935
2013-04-20 09:49:10 +00:00
Nadav Rotem
a244928706 SLPVectorizer: Reduce the compile time by eliminating the search for some of the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr.
llvm-svn: 179933
2013-04-20 07:29:34 +00:00
Nadav Rotem
0d6e377312 refactor tryToVectorizePair to a new method that supports vectorization of lists.
llvm-svn: 179932
2013-04-20 07:22:58 +00:00
Nadav Rotem
594ff6b41e Fix an unused variable warning.
llvm-svn: 179931
2013-04-20 06:40:28 +00:00
Nadav Rotem
ed63de18d3 SLPVectorizer: Improve the cost model for loop invariant broadcast values.
llvm-svn: 179930
2013-04-20 06:13:47 +00:00
Nadav Rotem
571d90215a Report the number of stores that were found in the debug message.
llvm-svn: 179929
2013-04-20 05:23:11 +00:00
Nadav Rotem
5195d6b091 Fix the header comment.
llvm-svn: 179928
2013-04-20 05:18:51 +00:00
Nadav Rotem
e5d4f04ae7 Use 64bit arithmetic for calculating distance between pointers.
llvm-svn: 179927
2013-04-20 05:17:47 +00:00
Arnold Schwaighofer
7b05635869 LoopVectorizer: Use matcher from PatternMatch.h for the min/max patterns
Also make some static function class functions to avoid having to mention the
class namespace for enums all the time.

No functionality change intended.

llvm-svn: 179886
2013-04-19 21:03:36 +00:00
Dmitri Gribenko
ad7b3011a6 Fix a -Wdocumentation warning
llvm-svn: 179789
2013-04-18 20:13:04 +00:00
Arnold Schwaighofer
acd551152c LoopVectorizer: Recognize min/max reductions
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.

This appears to help bzip2 by about 1.5% on an imac12,2.

radar://12960601

llvm-svn: 179773
2013-04-18 17:22:34 +00:00
Benjamin Kramer
18f31a4d5e LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.
Fixes PR15748.

llvm-svn: 179757
2013-04-18 14:29:13 +00:00
Nadav Rotem
40ad92b46d SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops.
llvm-svn: 179562
2013-04-15 22:00:26 +00:00
Nadav Rotem
7ab2574900 SLPVectorizer: Add support for vectorizing trees that start at compare instructions.
llvm-svn: 179504
2013-04-15 04:25:27 +00:00
Benjamin Kramer
715265e7d7 Miscellaneous cleanups for VecUtils.h
llvm-svn: 179483
2013-04-14 09:33:08 +00:00
Nadav Rotem
c46de9ba5b SLP: Document the scalarization cost method.
llvm-svn: 179479
2013-04-14 07:22:22 +00:00
Nadav Rotem
e380208d4f SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.
llvm-svn: 179475
2013-04-14 05:15:53 +00:00
Nadav Rotem
433f05f5de SLPVectorizer: add initial support for reduction variable vectorization.
llvm-svn: 179470
2013-04-14 03:22:20 +00:00
Nadav Rotem
51df846152 SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
llvm-svn: 179414
2013-04-12 21:16:54 +00:00
Nadav Rotem
0b2109a86c Add debug prints.
llvm-svn: 179412
2013-04-12 21:11:14 +00:00
Arnold Schwaighofer
48b1c3e915 LoopVectorizer: integer division is not a reduction operation
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381
2013-04-12 15:15:19 +00:00
Benjamin Kramer
cf2731d0e0 Rename the C function to create a SLPVectorizerPass to something sane and expose it in the header file.
llvm-svn: 179272
2013-04-11 11:36:36 +00:00
Nadav Rotem
6a6b998435 Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.
llvm-svn: 179207
2013-04-10 19:41:36 +00:00
Nadav Rotem
aa6eefd489 We require DataLayout for analyzing the size of stores.
llvm-svn: 179206
2013-04-10 18:57:27 +00:00
Nadav Rotem
96f8f45bd5 Add support for bottom-up SLP vectorization infrastructure.
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117
2013-04-09 19:44:35 +00:00
Arnold Schwaighofer
abd363c1bc LoopVectorizer: Pass OperandValueKind information to the cost model
Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809
2013-04-04 23:26:27 +00:00
Arnold Schwaighofer
3e3105f2f8 LoopVectorize: Invert case when we use a vector cmp value to query select cost
We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.

llvm-svn: 177098
2013-03-14 18:54:36 +00:00
Hal Finkel
a5dcace09c BBVectorize: Fixup debugging statements
After the recent data-structure improvements, a couple of debugging statements
were broken (printing pointer values).

llvm-svn: 176791
2013-03-10 20:57:42 +00:00
Benjamin Kramer
c6db6a4d69 Remove a source of nondeterminism from the LoopVectorizer.
This made us emit runtime checks in a random order. Hopefully bootstrap
miscompares will go away now.

llvm-svn: 176775
2013-03-09 19:22:40 +00:00
Arnold Schwaighofer
2a2a785543 LoopVectorizer: Ignore all dbg intrinisic
Ignore all DbgIntriniscInfo instructions instead of just DbgValueInst.

llvm-svn: 176769
2013-03-09 16:27:27 +00:00
Arnold Schwaighofer
652df6a5cb LoopVectorizer: Ignore dbg.value instructions
We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic
and don't transfer them to the vectorized code.

radar://13378964

llvm-svn: 176768
2013-03-09 15:56:34 +00:00
Benjamin Kramer
7b2d670a66 Insert the reduction start value into the first bypass block to preserve domination.
Fixes PR15344.

llvm-svn: 176701
2013-03-08 16:58:37 +00:00
Nadav Rotem
6d803820f8 PR14448 - prevent the loop vectorizer from vectorizing the same loop twice.
The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.

PR14448.

llvm-svn: 176399
2013-03-02 01:33:49 +00:00
Benjamin Kramer
a462070739 LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses.
Fixes PR15384.

llvm-svn: 176366
2013-03-01 19:07:31 +00:00
Benjamin Kramer
33d5dd0b08 LoopVectorize: Vectorize math builtin calls.
This properly asks TargetLibraryInfo if a call is available and if it is, it
can be translated into the corresponding LLVM builtin. We don't vectorize sqrt()
yet because I'm not sure about the semantics for negative numbers. The other
intrinsic should be exact equivalents to the libm functions.

Differential Revision: http://llvm-reviews.chandlerc.com/D465

llvm-svn: 176188
2013-02-27 15:24:19 +00:00
Renato Golin
d371b2ca14 Allow GlobalValues to vectorize with AliasAnalysis
Storing the load/store instructions with the values
and inspect them using Alias Analysis to make sure
they don't alias, since the GEP pointer operand doesn't
take the offset into account.

Trying hard to not add any extra cost to loads and stores
that don't overlap on global values, AA is *only* calculated
if all of the previous attempts failed.

Using biggest vector register size as the stride for the
vectorization access, as we're being conservative and
the cost model (which calculates the real vectorization
factor) is only run after the legalization phase.

We might re-think this relationship in the future, but
for now, I'd rather be safe than sorry.

llvm-svn: 175818
2013-02-21 22:39:03 +00:00
Hal Finkel
92f63997ce BBVectorize: Fix an invalid reference bug
This fixes PR15289. This bug was introduced (recently) in r175215; collecting
all std::vector references for candidate pairs to delete at once is invalid
because subsequent lookups in the owning DenseMap could invalidate the
references.

bugpoint was able to reduce a useful test case. Unfortunately, because whether
or not this asserts depends on memory layout, this test case will sometimes
appear to produce valid output. Nevertheless, running under valgrind will
reveal the error.

llvm-svn: 175397
2013-02-17 15:59:26 +00:00
Hal Finkel
891df5ece6 BBVectorize: Call a DAG and DAG instead of a tree
Several functions and variable names used the term 'tree' to refer
to what is actually a DAG. Correcting this mistake will, hopefully,
prevent confusion in the future.

No functionality change intended.

llvm-svn: 175278
2013-02-15 17:20:54 +00:00
Hal Finkel
5e320e9019 BBVectorize: Cap the number of candidate pairs in each instruction group
For some basic blocks, it is possible to generate many candidate pairs for
relatively few pairable instructions. When many (tens of thousands) of these pairs
are generated for a single instruction group, the time taken to generate and
rank the different vectorization plans can become quite large. As a result, we now
cap the number of candidate pairs within each instruction group. This is done by
closing out the group once the threshold is reached (set now at 3000 pairs).

Although this will limit the overall compile-time impact, this may not be the best
way to achieve this result. It might be better, for example, to prune excessive
candidate pairs after the fact the prevent the generation of short, but highly-connected
groups. We can experiment with this in the future.

This change reduces the overall compile-time slowdown of the csa.ll test case in
PR15222 to ~5x. If 5x is still considered too large, a lower limit can be
used as the default.

This represents a functionality change, but only for very large inputs
(thus, there is no regression test).

llvm-svn: 175251
2013-02-15 04:28:42 +00:00
Hal Finkel
29b84d5692 BBVectorize: Remove the remaining instances of std::multimap
All instances of std::multimap have now been replaced by
DenseMap<K, std::vector<V> >, and this yields a speedup of 5% on the
csa.ll test case from PR15222.

No functionality change intended.

llvm-svn: 175216
2013-02-14 22:38:04 +00:00
Hal Finkel
b302f08ac1 BBVectorize: Don't store candidate pairs in a std::multimap
This is another commit on the road to removing std::multimap from
BBVectorize. This gives an ~1% speedup on the csa.ll test case
in PR15222.

No functionality change intended.

llvm-svn: 175215
2013-02-14 22:37:09 +00:00
Benjamin Kramer
7e28d6495d LoopVectorize: Simplify code for clarity.
No functionality change.

llvm-svn: 175076
2013-02-13 21:12:29 +00:00
Pekka Jaaskelainen
7e2908d0f3 Metadata for annotating loops as parallel. The first consumer for this
metadata is the loop vectorizer.

See the documentation update for more info.

llvm-svn: 175060
2013-02-13 18:08:57 +00:00
Hal Finkel
5e637e4e5d BBVectorize: Don't over-search when building the dependency map
When building the pairable-instruction dependency map, don't search
past the last pairable instruction. For large blocks that have been
divided into multiple instruction groups, searching past the last
instruction in each group is very wasteful. This gives a 32% speedup
on the csa.ll test case from PR15222 (when using 50 instructions
in each group).

No functionality change intended.

llvm-svn: 174915
2013-02-11 23:02:17 +00:00
Hal Finkel
b67d7ef969 BBVectorize: Omit unnecessary entries in PairableInstUsers
This map is queried only for instructions in pairs of pairable
instructions; so make sure that only pairs of pairable
instructions are added to the map. This gives a 3.5% speedup
on the csa.ll test case from PR15222.

No functionality change intended.

llvm-svn: 174914
2013-02-11 23:02:09 +00:00
Hal Finkel
2e8c1799a0 BBVectorize: Eliminate one more restricted linear search
This eliminates one more linear search over a range of
std::multimap entries. This gives a 22% speedup on the
csa.ll test case from PR15222.

No functionality change intended.

llvm-svn: 174893
2013-02-11 17:19:34 +00:00
Hal Finkel
d37a002ca6 BBVectorize: Remove the linear searches from pair connection searching
This removes the last of the linear searches over ranges of std::multimap
iterators, giving a 7% speedup on the doduc.bc input from PR15222.

No functionality change intended.

llvm-svn: 174859
2013-02-11 05:29:51 +00:00
Hal Finkel
e006b66033 BBVectorize: Avoid linear searches within the load-move set
This is another cleanup aimed at eliminating linear searches
in ranges of std::multimap.

No functionality change intended.

llvm-svn: 174858
2013-02-11 05:29:49 +00:00
Hal Finkel
6672c24cc9 BBVectorize: isa/cast cleanup in getInstructionTypes
Profiling suggests that getInstructionTypes is performance-sensitive,
this cleans up some double-casting in that function in favor of
using dyn_cast.

No functionality change intended.

llvm-svn: 174857
2013-02-11 05:29:48 +00:00
Hal Finkel
fad3c1ec89 BBVectorize: Make the bookkeeping to support full cycle checking less expensive
By itself, this does not have much of an effect, but only because in the default
configuration the full cycle checks are used only for small problem sizes.
This is part of a general cleanup of uses of iteration over std::multimap
ranges only for the purpose of checking membership.

No functionality change intended.

llvm-svn: 174856
2013-02-11 05:29:41 +00:00
Hal Finkel
2e1d39a40e BBVectorize: Use TTI->getAddressComputationCost
This is a follow-up to the cost-model change in r174713 which splits
the cost of a memory operation between the address computation and the
actual memory access. In r174713, this cost is always added to the
memory operation cost, and so BBVectorize will do the same.

Currently, this new cost function is used only by ARM, and I don't
have any ARM test cases for BBVectorize. Assistance in generating some
good ARM test cases for BBVectorize would be greatly appreciated!

llvm-svn: 174743
2013-02-08 21:13:39 +00:00
Jakob Stoklund Olesen
c8bee11d10 Typos.
llvm-svn: 174723
2013-02-08 17:43:32 +00:00
Arnold Schwaighofer
381c4a3e54 ARM cost model: Address computation in vector mem ops not free
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.

radar://13097204

llvm-svn: 174713
2013-02-08 14:50:48 +00:00
Michael Kuperstein
ed93eafd8f Test Commit
llvm-svn: 174709
2013-02-08 12:58:29 +00:00
Nadav Rotem
dcab6c8ee5 fix 80-col violation and fix the docs.
llvm-svn: 174671
2013-02-07 22:34:07 +00:00
Arnold Schwaighofer
cae839558d Loop Vectorizer: Refactor Memory Cost Computation
We don't want too many classes in a pass and the classes obscure the details. I
was going a little overboard with object modeling here. Replace classes by
generic code that handles both loads and stores.

No functionality change intended.

llvm-svn: 174646
2013-02-07 19:05:21 +00:00
Arnold Schwaighofer
b835fa09f3 Loop Vectorizer: Refactor code to compute vectorized memory instruction cost
Introduce a helper class that computes the cost of memory access instructions.
No functionality change intended.

llvm-svn: 174422
2013-02-05 18:46:41 +00:00
Arnold Schwaighofer
5ee07db17d Loop Vectorizer: Handle pointer stores/loads in getWidestType()
In the loop vectorizer cost model, we used to ignore stores/loads of a pointer
type when computing the widest type within a loop. This meant that if we had
only stores/loads of pointers in a loop we would return a widest type of 8bits
(instead of 32 or 64 bit) and therefore a vector factor that was too big.

Now, if we see a consecutive store/load of pointers we use the size of a pointer
(from data layout).

This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced
test case is the first test in vector_ptr_load_store.ll).

radar://13139343

llvm-svn: 174377
2013-02-05 15:08:02 +00:00
Pekka Jaaskelainen
67cddcca8c LoopVectorize: convert TinyTripCountVectorThreshold constant
to a command line switch.

llvm-svn: 173837
2013-01-29 21:42:08 +00:00
Benjamin Kramer
5142e76f3a LoopVectorize: Clean up ValueMap a bit and avoid double lookups.
No intended functionality change.

llvm-svn: 173809
2013-01-29 17:31:33 +00:00
Renato Golin
7fe847b274 Vectorization Factor clarification
llvm-svn: 173691
2013-01-28 16:02:45 +00:00
Hal Finkel
115cffae92 BBVectorize: Better use of TTI->getShuffleCost
When flipping the pair of subvectors that form a vector, if the
vector length is 2, we can use the SK_Reverse shuffle kind to get
more-accurate cost information. Also we can use the SK_ExtractSubvector
shuffle kind to get accurate subvector extraction costs.

The current cost model implementations don't yet seem complex enough
for this to make a difference (thus, there are no test cases with this
commit), but it should help in future.

Depending on how the various targets optimize and combine shuffles in
practice, we might be able to get more-accurate costs by combining the
costs of multiple shuffle kinds. For example, the cost of flipping the
subvector pairs could be modeled as two extractions and two subvector
insertions. These changes, however, should probably be motivated
by specific test cases.

llvm-svn: 173621
2013-01-27 20:07:01 +00:00
Hal Finkel
9882e8c083 BBVectorize: Add a additional comment about the cost computation
llvm-svn: 173580
2013-01-26 16:49:04 +00:00
Hal Finkel
2d9bc41033 BBVectorize: Fix anomalous capital letter in comment
llvm-svn: 173579
2013-01-26 16:49:03 +00:00
Nadav Rotem
80f2c07dc3 LoopVectorize: Refactor the code that vectorizes loads/stores to remove duplication.
llvm-svn: 173500
2013-01-25 21:47:42 +00:00
Benjamin Kramer
3f37f1a557 LoopVectorize: Simplify code. No functionality change.
llvm-svn: 173475
2013-01-25 19:43:15 +00:00
Nadav Rotem
1d462abb9a LoopVectorizer: Refactor more code to use the IRBuilder.
llvm-svn: 173471
2013-01-25 19:26:23 +00:00
Nadav Rotem
df96d1dffb Refactor some code to use the IRBuilder.
llvm-svn: 173467
2013-01-25 18:34:09 +00:00
Nadav Rotem
52f5279653 Add support for reverse pointer induction variables. These are loops that contain pointers that count backwards.
For example, this is the hot loop in BZIP:

  do {
    m = *--p;
    *p = ( ... );
  } while (--n);

llvm-svn: 173219
2013-01-23 01:35:00 +00:00
Nadav Rotem
6e8c348e91 Fix a comment. Induction vars dont need to start at zero.
llvm-svn: 173061
2013-01-21 17:59:18 +00:00
Benjamin Kramer
3acd79adc5 LoopVectorize: Fix a C++11 incompatibility.
llvm-svn: 172990
2013-01-20 20:29:52 +00:00
Nadav Rotem
7c9244fdca Fix a build error.
llvm-svn: 172971
2013-01-20 09:39:17 +00:00
Nadav Rotem
9ec02f071a LoopVectorizer: Implement a new heuristics for selecting the unroll factor.
We ignore the cpu frontend and focus on pipeline utilization. We do this because we
don't have a good way to estimate the loop body size at the IR level.

llvm-svn: 172964
2013-01-20 05:24:29 +00:00
Benjamin Kramer
28c812f680 LoopVectorizer: Emit memory checks into their own basic block.
This separates the check for "too few elements to run the vector loop" from the
"memory overlap" check, giving a lot nicer code and allowing to skip the memory
checks when we're not going to execute the vector code anyways. We still leave
the decision of whether to emit the memory checks as branches or setccs, but it
seems to be doing a good job. If ugly code pops up we may want to emit them as
separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso.

Most of this is legwork to allow multiple bypass blocks while updating PHIs,
dominators and loop info.

llvm-svn: 172902
2013-01-19 13:57:58 +00:00
Nadav Rotem
3fc70ae776 LoopVectorizer cost model. Honor the user command line flag that selects the vectorization factor even if the target machine does not have any vector registers.
llvm-svn: 172544
2013-01-15 18:25:16 +00:00
Nadav Rotem
c6cce40085 Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 and i16).
llvm-svn: 172348
2013-01-13 07:56:29 +00:00
Nadav Rotem
008741a0e0 ARM Cost Model: We need to detect the max bitwidth of types in the loop in order to select the max vectorization factor.
We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use
another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out
of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector
zext/sext/trunc operations.

llvm-svn: 172178
2013-01-11 07:11:59 +00:00
Nadav Rotem
d5f59a81d9 LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals.
PR14878

llvm-svn: 172079
2013-01-10 17:34:39 +00:00
Nadav Rotem
436dc952aa ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
2013-01-09 22:29:00 +00:00
Nadav Rotem
9c27f36e59 Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.
llvm-svn: 171928
2013-01-09 01:15:42 +00:00
Nadav Rotem
df0bfd2e95 Code cleanup: refactor the switch statements in the generation of reduction variables into an IR builder call.
llvm-svn: 171871
2013-01-08 17:37:45 +00:00
Nadav Rotem
5bb578a21c Rename the enum members to match the LLVM coding style.
llvm-svn: 171868
2013-01-08 17:23:17 +00:00
Nadav Rotem
4aa065a2a3 LoopVectorizer: Add support for floating point reductions
llvm-svn: 171812
2013-01-07 23:13:00 +00:00
Nadav Rotem
5906222eab LoopVectorizer: When we vectorizer and widen loops we process many elements at once. This is a good thing, except for
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.

llvm-svn: 171798
2013-01-07 21:54:51 +00:00
Chandler Carruth
c894d24706 Simplify LoopVectorize to require target transform info and rely on it
being present. Make a member of one of the helper classes a reference as
part of this.

Reformatting goodness brought to you by clang-format.

llvm-svn: 171726
2013-01-07 11:12:29 +00:00
Chandler Carruth
cbf30d85b9 Merge the unused header file for LoopVectorizer into the source file.
This makes the loop vectorizer match the pattern followed by roughly all
other passses. =]

Notably, this header file was braken in several regards: it contained
a using namespace directive, global #define's that aren't globaly
appropriate, and global constants defined directly in the header file.

As a side benefit, lots of the types in this file become internal, which
will cause the optimizer to chew on this pass more effectively.

llvm-svn: 171723
2013-01-07 10:44:06 +00:00
Chandler Carruth
3487258579 Switch BBVectorize to directly depend on having a TTI analysis.
This could be simplified further, but Hal has a specific feature for
ignoring TTI, and so I preserved that.

Also, I needed to use it because a number of tests fail when switching
from a null TTI to the NoTTI nonce implementation. That seems suspicious
to me and so may be something that you need to look into Hal. I worked
it by preserving the old behavior for these tests with the flag that
ignores all target info.

llvm-svn: 171722
2013-01-07 10:22:36 +00:00
Chandler Carruth
362e34c603 Fix a slew of indentation and parameter naming style issues. This 80% of
this patch brought to you by the tool clang-format.

I wanted to fix up the names of constructor parameters because they
followed a bit of an anti-pattern by naming initialisms with CamelCase:
'Tti', 'Se', etc. This appears to have been in an attempt to not overlap
with the names of member variables 'TTI', 'SE', etc. However,
constructor arguments can very safely alias members, and in fact that's
the conventional way to pass in members. I've fixed all of these I saw,
along with making some strang abbreviations such as 'Lp' be simpler 'L',
or 'Lgl' be the word 'Legal'.

However, the code I was touching had indentation and formatting somewhat
all over the map. So I ran clang-format and fixed them.

I also fixed a few other formatting or doxygen formatting issues such as
using ///< on trailing comments so they are associated with the correct
entry.

There is still a lot of room for improvement of the formating and
cleanliness of this code. ;] At least a few parts of the coding
standards or common practices in LLVM's code aren't followed, the enum
naming rules jumped out at me. I may mix some of these while I'm here,
but not all of them.

llvm-svn: 171719
2013-01-07 09:57:00 +00:00
Chandler Carruth
7723d75e9e Fix the enumerator names for ShuffleKind to match tho coding standards,
and make its comments doxygen comments.

llvm-svn: 171688
2013-01-07 03:20:02 +00:00
Chandler Carruth
3c0f5d4efb Move TargetTransformInfo to live under the Analysis library. This no
longer would violate any dependency layering and it is in fact an
analysis. =]

llvm-svn: 171686
2013-01-07 03:08:10 +00:00
Chandler Carruth
7e058addc4 Switch the loop vectorizer from VTTI to just use TTI directly.
llvm-svn: 171620
2013-01-05 10:16:02 +00:00
Chandler Carruth
07ee67f2dc Switch the BB vectorizer from the VTTI interface to the simple TTI
interface.

llvm-svn: 171618
2013-01-05 10:05:28 +00:00
Nadav Rotem
836b9a9fda iLoopVectorize: Non commutative operators can be used as reduction variables as long as the reduction chain is used in the LHS.
PR14803.

llvm-svn: 171583
2013-01-05 01:15:47 +00:00
Paul Redmond
6ce33a6ae9 Do not vectorize loops with subtraction reductions
Since subtraction does not commute the loop vectorizer incorrectly vectorizes
reductions such as x = A[i] - x.

Disabling for now.

llvm-svn: 171537
2013-01-04 22:10:16 +00:00
Nadav Rotem
3349017273 Fix a warning
llvm-svn: 171525
2013-01-04 21:08:44 +00:00
Nadav Rotem
cb3562a88e LoopVectorizer:
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.

llvm-svn: 171469
2013-01-04 17:48:25 +00:00
Nadav Rotem
29dd0667aa LoopVectorizer: Add support for loop-unrolling during vectorization for increasing the ILP. At the moment this feature is disabled by default and this commit should not cause any functional changes.
llvm-svn: 171436
2013-01-03 00:52:27 +00:00
Nadav Rotem
a7cac72b7d Avoid vectorization when the function has the "noimplicitflot" attribute.
llvm-svn: 171429
2013-01-02 23:54:43 +00:00
Chandler Carruth
4c1f3c24db Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Benjamin Kramer
ddae3440aa Add IRBuilder::CreateVectorSplat and use it to simplify code.
llvm-svn: 171349
2013-01-01 19:55:16 +00:00
Bill Wendling
e0920e4122 Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Nadav Rotem
0e391907a5 LoopVectorizer: Fix a bug in the code that updates the loop exiting block.
LCSSA PHIs may have undef values. The vectorizer updates values that are used by outside users such as PHIs.
The bug happened because undefs are not loop values. This patch handles these PHIs.

PR14725

llvm-svn: 171251
2012-12-30 07:47:00 +00:00
Nadav Rotem
4cad811734 If all of the write objects are identified then we can vectorize the loop even if the read objects are unidentified.
PR14719.

llvm-svn: 171124
2012-12-26 23:30:53 +00:00
Nadav Rotem
90712b89cc LoopVectorizer: Optimize the vectorization of consecutive memory access when the iteration step is -1
llvm-svn: 171114
2012-12-26 19:08:17 +00:00
Hal Finkel
7cfc448749 BBVectorize: Use VTTI to compute costs for intrinsics vectorization
For the time being this includes only some dummy test cases. Once the
generic implementation of the intrinsics cost function does something other
than assuming scalarization in all cases, or some target specializes the
interface, some real test cases can be added.

Also, for consistency, I changed the type of IID from unsigned to Intrinsic::ID
in a few other places.

llvm-svn: 171079
2012-12-26 01:36:57 +00:00
Hal Finkel
f9b3cb9121 LoopVectorize: Enable vectorization of the fmuladd intrinsic
llvm-svn: 171076
2012-12-25 23:21:29 +00:00
Hal Finkel
8299a9e0b2 BBVectorize: Enable vectorization of the fmuladd intrinsic
llvm-svn: 171075
2012-12-25 22:36:08 +00:00
Nadav Rotem
ace51e510e LoopVectorizer: When checking for vectorizable types, also check
the StoreInst operands.

PR14705.

llvm-svn: 171023
2012-12-24 09:14:18 +00:00
Nadav Rotem
309d628c4f LoopVectorizer: Fix an endless loop in the code that looks for reductions.
The bug was in the code that detects PHIs in if-then-else block sequence.

PR14701.

llvm-svn: 171008
2012-12-24 01:22:06 +00:00
Benjamin Kramer
1add5cc571 LoopVectorize: Fix accidentaly inverted condition.
llvm-svn: 171001
2012-12-23 13:21:41 +00:00
Benjamin Kramer
d0a5bcaf53 LoopVectorize: For scalars and void types there is no need to compute vector insert/extract costs.
Fixes an assert during the build of oggenc in the test suite.

llvm-svn: 171000
2012-12-23 13:19:18 +00:00
Nadav Rotem
e237376e62 Loop Vectorizer: Update the cost model of scatter/gather operations and make
them more expensive.

llvm-svn: 170995
2012-12-23 07:23:55 +00:00
Bill Wendling
a16e5519db Change 'AttrVal' to 'AttrKind' to better reflect that it's a kind of attribute instead of the value of the attribute.
llvm-svn: 170972
2012-12-22 00:37:52 +00:00
Roman Divacky
a299a2efa5 Remove duplicate includes.
llvm-svn: 170902
2012-12-21 17:06:44 +00:00
Nadav Rotem
ad3bbdd5b0 Enable if-conversion.
llvm-svn: 170841
2012-12-21 04:47:54 +00:00
Nadav Rotem
ec327f0de1 BB-Vectorizer: Check the cost of the store pointer type
and not the return type, which is void. A number of test
cases fail after adding the assertion in TTImpl.

llvm-svn: 170828
2012-12-21 01:24:36 +00:00
Nadav Rotem
80fefbe978 Fix a bug in the code that checks if we can vectorize loops while using dynamic
memory bound checks.  Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:

for ( k=1 ; k<n ; k++ )
  x[k] = x[k-1] + y[k];

llvm-svn: 170811
2012-12-21 00:07:35 +00:00
Nadav Rotem
ccffd4527d LoopVectorize: Fix a bug in the scalarization of instructions.
Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.

This fixes External/SPEC/CINT95/099_go/099_go

llvm-svn: 170756
2012-12-20 20:24:40 +00:00
Nadav Rotem
759544a715 Loop Vectorizer: turn-off if-conversion.
llvm-svn: 170708
2012-12-20 17:42:53 +00:00
Nadav Rotem
f79dac509d Loop Vectorizer: Enable if-conversion.
llvm-svn: 170632
2012-12-20 02:00:02 +00:00
Nadav Rotem
fb2b09c7a8 whitespace
llvm-svn: 170626
2012-12-20 00:49:56 +00:00
Benjamin Kramer
7cecaab3f7 LoopVectorize: Make iteration over induction variables not depend on pointer values.
MapVector is a bit heavyweight, but I don't see a simpler way. Also the
InductionList is unlikely to be large. This should help 3-stage selfhost
compares (PR14647).

llvm-svn: 170528
2012-12-19 11:09:15 +00:00
Bill Wendling
56d9c4b832 Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future.
llvm-svn: 170502
2012-12-19 07:18:57 +00:00
Benjamin Kramer
820b613d80 LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops instead of scalar operations.
For example on x86 with SSE4.2 a <8 x i8> add reduction becomes
	movdqa	%xmm0, %xmm1
	movhlps	%xmm1, %xmm1            ## xmm1 = xmm1[1,1]
	paddw	%xmm0, %xmm1
	pshufd	$1, %xmm1, %xmm0        ## xmm0 = xmm1[1,0,0,0]
	paddw	%xmm1, %xmm0
	phaddw	%xmm0, %xmm0
	pextrb	$0, %xmm0, %edx

instead of
	pextrb	$2, %xmm0, %esi
	pextrb	$0, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$4, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$6, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$8, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$10, %xmm0, %edi
	pextrb	$14, %xmm0, %edx
	addb	%sil, %dil
	pextrb	$12, %xmm0, %esi
	addb	%dil, %sil
	addb	%sil, %dl

llvm-svn: 170439
2012-12-18 18:40:20 +00:00
Nadav Rotem
9e2542c0cb Enable the Loop Vectorizer by default for O2 and O3. Disable if-conversion by default. I plan to revert this patch later today.
llvm-svn: 170157
2012-12-13 23:11:54 +00:00
Nadav Rotem
ca05f9e72b Teach the cost model about the optimization in r169904: Truncation of induction variables costs the same as scalar trunc.
llvm-svn: 170051
2012-12-13 00:21:03 +00:00
Nadav Rotem
f1990abd0b Fix indentation.
llvm-svn: 170005
2012-12-12 19:39:36 +00:00
Nadav Rotem
2c25a05088 LoopVectorizer: Use the "optsize" attribute to decide if we are allowed to increase the function size.
llvm-svn: 170004
2012-12-12 19:29:45 +00:00
Nadav Rotem
8d59a4ff51 Fix the ascii drawing that was ruined when I split the H and CPP
llvm-svn: 169955
2012-12-12 01:33:47 +00:00
Nadav Rotem
1696945e9d fix a typo.
llvm-svn: 169953
2012-12-12 01:31:10 +00:00
Nadav Rotem
edcebe4904 LoopVectorizer: When -Os is used, vectorize only loops that dont require a tail loop. There is no testcase because I dont know of a way to initialize the loop vectorizer pass without adding an additional hidden flag.
llvm-svn: 169950
2012-12-12 01:11:46 +00:00
Nadav Rotem
054379720d PR14574. Fix a bug in the code that calculates the mask the converted PHIs in if-conversion.
llvm-svn: 169916
2012-12-11 21:30:14 +00:00
Nadav Rotem
fb45c4d6b4 Loop Vectorize: optimize the vectorization of trunc(induction_var). The truncation is now done on scalars.
llvm-svn: 169904
2012-12-11 18:58:10 +00:00
Nadav Rotem
0715a221d8 Fix PR14565. Don't if-convert loops that have switch statements in them.
llvm-svn: 169813
2012-12-11 04:55:10 +00:00
Nadav Rotem
417eaafbc4 Split the LoopVectorizer into H and CPP.
llvm-svn: 169771
2012-12-10 21:39:02 +00:00
Nadav Rotem
196fc7cc8c Add support for reverse induction variables. For example:
while (i--)
 sum+=A[i];

llvm-svn: 169752
2012-12-10 19:25:06 +00:00
Paul Redmond
e43761293d LoopVectorize: support vectorizing intrinsic calls
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.

Reviewed by: Nadav

llvm-svn: 169711
2012-12-09 20:42:17 +00:00
Paul Redmond
b778deb83a test commit.
llvm-svn: 169709
2012-12-09 19:46:31 +00:00
Nadav Rotem
4990f6a07b LoopVectorizer: Increase the number of pointers that can be tested at runtime. If we cant prove statically that the pointers are disjoint then we add the runtime check.
llvm-svn: 169334
2012-12-04 23:25:24 +00:00
Nadav Rotem
c58f94b665 Enable if-conversion during vectorization.
llvm-svn: 169331
2012-12-04 22:59:52 +00:00
Nadav Rotem
452993ad1a Fix a bug in vectorization of if-converted reduction variables. If the
reduction variable is not used outside the loop then we ran into an
endless loop. This change checks if we found the original PHI.

llvm-svn: 169324
2012-12-04 22:40:22 +00:00
Nadav Rotem
4f22c83996 Add support for reduction variables when IF-conversion is enabled.
llvm-svn: 169288
2012-12-04 18:17:33 +00:00
Nadav Rotem
aa136f70ba Give scalar if-converted blocks half the score because they are not always executed due to CF.
llvm-svn: 169223
2012-12-04 07:11:52 +00:00
Nadav Rotem
43d200ded1 Add the last part that is needed for vectorization of if-converted code.
Added the code that actually performs the if-conversion during vectorization.

We can now vectorize this code:

for (int i=0; i<n; ++i) {
  unsigned k = 0;

  if (a[i] > b[i])   <------ IF inside the loop.
    k = k * 5 + 3;

  a[i] = k;          <---- K is a phi node that becomes vector-select.
}

llvm-svn: 169217
2012-12-04 06:15:11 +00:00
NAKAMURA Takumi
3ba9b62972 LoopVectorize.cpp: Suppress a warning. [-Wunused-variable]
llvm-svn: 169195
2012-12-04 00:49:34 +00:00
NAKAMURA Takumi
350924d8cc Fix whitespace.
llvm-svn: 169194
2012-12-04 00:49:28 +00:00
Nadav Rotem
7a8cea8699 minor renaming, documentation and cleanups.
llvm-svn: 169175
2012-12-03 22:57:09 +00:00
Nadav Rotem
6da9592bb1 IF-conversion: teach the cost-model how to grade if-converted loops.
llvm-svn: 169171
2012-12-03 22:46:31 +00:00
Nadav Rotem
aaef945ad0 Now that we have a basic if-conversion infrastructure we can rename the
"single basic block loop vectorizer" to "innermost loop vectorizer".

llvm-svn: 169158
2012-12-03 21:33:08 +00:00
Nadav Rotem
ee9155231b Add initial support for IF-conversion. This patch implements the first 1/3,
which is the legality of the if-conversion transformation. The next step is to
implement the cost-model for the if-converted code as well as the
vectorization itself.

llvm-svn: 169152
2012-12-03 21:06:35 +00:00
Chandler Carruth
a490793037 Use the new script to sort the includes of every file under lib.
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.

Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]

llvm-svn: 169131
2012-12-03 16:50:05 +00:00
Nadav Rotem
7931e87e83 minor cleanups
llvm-svn: 169048
2012-11-30 22:37:11 +00:00
Nadav Rotem
afa1d260c0 Remove the use of LPPassManager. We can remove LPM because we dont need to run any additional loop passes on the new vector loop.
llvm-svn: 169016
2012-11-30 17:27:53 +00:00
Nadav Rotem
d7449a0a92 When broadcasting invariant scalars into vectors, place the broadcast code in the preheader.
llvm-svn: 168927
2012-11-29 19:25:41 +00:00
Hal Finkel
e25b9ebee4 BBVectorize: Correctly merge SubclassOptionalData
When two instructions are combined into a vector instruction,
the resulting instruction must have the most-conservative flags.

llvm-svn: 168765
2012-11-28 03:04:10 +00:00
Nadav Rotem
a582cad328 Move the code that uses SCEVs prior to creating the new loops.
llvm-svn: 168601
2012-11-26 19:51:46 +00:00
Nadav Rotem
91f709901b Move the max vector width to a constant parameter. No functionality change.
llvm-svn: 168570
2012-11-25 16:48:08 +00:00
Nadav Rotem
d47e1995cd Fix the document style.
llvm-svn: 168569
2012-11-25 16:39:01 +00:00
Nadav Rotem
eee0c236a5 Refactor the ptr runtime check generation code. No functionality change.
llvm-svn: 168568
2012-11-25 16:27:16 +00:00
Nadav Rotem
e0fcf801ff Rename method. No functionality change.
llvm-svn: 168560
2012-11-25 09:13:57 +00:00
Nadav Rotem
fe91aee744 The induction-pointer work is inspired by a research paper. This commit adds a reference.
llvm-svn: 168559
2012-11-25 09:09:26 +00:00
Nadav Rotem
c973546f75 Add support for pointer induction variables even when there is no integer induction variable.
llvm-svn: 168558
2012-11-25 08:41:35 +00:00
Nadav Rotem
6ff38dc8d2 LoopVectorizer: Add initial support for pointer induction variables (for example: *dst++ = *src++).
At the moment we still require to have an integer induction variable (for example: i++).

llvm-svn: 168231
2012-11-17 00:27:03 +00:00
Nadav Rotem
84e1ff3bfb LoopVectorize: Division reductions generate incorrect code. Remove the part of the code that deals with divs.
Thanks to Paul Redmond for catching this while reviewing the code.

llvm-svn: 168142
2012-11-16 06:51:17 +00:00
Hal Finkel
572a66e798 Replace std::vector -> SmallVector in BBVectorize
For now, this uses 8 on-stack elements. I'll need to do some profiling
to see if this is the best number.

Pointed out by Jakob in post-commit review.

llvm-svn: 167966
2012-11-14 19:53:27 +00:00
Hal Finkel
0d0d4dfb94 Fix the largest offender of determinism in BBVectorize
Iterating over the children of each node in the potential vectorization
plan must happen in a deterministic order (because it affects which children
are erased when two children conflict). There was no need for this data
structure to be a map in the first place, so replacing it with a vector
is a small change.

I believe that this was the last remaining instance if iterating over the
elements of a Dense* container where the iteration order could matter.
There are some remaining iterations over std::*map containers where the order
might matter, but so long as the Value* for instructions in a block increase
with the order of the instructions in the block (or decrease) monotonically,
then this will appear to be deterministic.

llvm-svn: 167942
2012-11-14 18:38:11 +00:00
Nadav Rotem
d74e0efed9 use the getSplat API. Patch by Paul Redmond.
llvm-svn: 167892
2012-11-14 00:02:13 +00:00
Hal Finkel
53c57f3d33 BBVectorize: Remove temporary assert used for debugging
llvm-svn: 167817
2012-11-13 05:54:54 +00:00
Hal Finkel
f33a9ea70d BBVectorize: Don't vectorize vector-manipulation chains
Don't choose a vectorization plan containing only shuffles and
vector inserts/extracts. Due to inperfections in the cost model,
these can lead to infinite recusion.

llvm-svn: 167811
2012-11-13 03:12:40 +00:00
Hal Finkel
47f58fe181 BBVectorize: Only some insert element operand pairs are free.
This fixes another infinite recursion case when using target costs.
We can only replace insert element input chains that are pure (end
with inserting into an undef).

llvm-svn: 167784
2012-11-12 23:55:36 +00:00
Hal Finkel
1c4de5a823 BBVectorize: Use a more sophisticated check for input cost
The old checking code, which assumed that input shuffles and insert-elements
could always be folded (and thus were free) is too simple.
This can only happen in special circumstances.
Using the simple check caused infinite recursion.

llvm-svn: 167750
2012-11-12 21:21:02 +00:00
Hal Finkel
ff11a22f1a BBVectorize: Check the types of compare instructions
The pass would previously assert when trying to compute the cost of
compare instructions with illegal vector types (like struct pointers).

llvm-svn: 167743
2012-11-12 19:41:38 +00:00
Hal Finkel
7cca290894 BBVectorize: Check the input types of shuffles for legality
This fixes a bug where shuffles were being fused such that the
resulting input types were not legal on the target. This would
occur only when both inputs and dependencies were also foldable
operations (such as other shuffles) and there were other connected
pairs in the same block.

llvm-svn: 167731
2012-11-12 14:50:59 +00:00
Nadav Rotem
0f2d71a695 Fix a comment typo and add comments.
llvm-svn: 167684
2012-11-11 05:15:00 +00:00
Nadav Rotem
ee232d62d1 Add support for memory runtime check. When we can, we calculate array bounds.
If the arrays are found to be disjoint then we run the vectorized version of
the loop. If they are not, we run the scalar code.

llvm-svn: 167608
2012-11-09 07:09:44 +00:00
Chandler Carruth
856168a72c Fix sign compare warning. Patch by Mahesha HS.
llvm-svn: 167282
2012-11-02 05:24:00 +00:00
Hal Finkel
ebd97dd2ef BBVectorize: Use target costs for incoming and outgoing values instead of the depth heuristic.
When target cost information is available, compute explicit costs of inserting and
extracting values from vectors. At this point, all costs are estimated using the
target information, and the chain-depth heuristic is not needed. As a result, it is now, by
default, disabled when using target costs.

llvm-svn: 167256
2012-11-01 21:50:12 +00:00
Hal Finkel
5463245540 BBVectorize: Account for internal shuffle costs
When target costs are available, use them to account for the costs of
shuffles on internal edges of the DAG of candidate pairs.

Because the shuffle costs here are currently for only the internal edges,
the current target cost model is trivial, and the chain depth requirement
is still in place, I don't yet have an easy test
case. Nevertheless, by looking at the debug output, it does seem to do the right
think to the effective "size" of each DAG of candidate pairs.

llvm-svn: 167217
2012-11-01 06:26:34 +00:00
Nadav Rotem
0a30b41020 LoopVectorize: Preserve NSW, NUW and IsExact flags.
llvm-svn: 167174
2012-10-31 21:40:39 +00:00
Nadav Rotem
c0528d8ea7 Put the threshold magic number in a variable.
llvm-svn: 167134
2012-10-31 16:22:16 +00:00
Nadav Rotem
6319d757df Remove enum values since they are not used anymore.
llvm-svn: 167131
2012-10-31 16:14:06 +00:00
Hal Finkel
d1fc849359 BBVectorize: Choose pair ordering to minimize shuffles
BBVectorize would, except for loads and stores, always fuse instructions
so that the first instruction (in the current source order) would always
represent the low part of the input vectors and the second instruction
would always represent the high part. This lead to too many shuffles
being produced because sometimes the opposite order produces fewer of them.

With this change, BBVectorize tracks the kind of pair connections that form
the DAG of candidate pairs, and uses that information to reorder the pairs to
avoid excess shuffles. Using this information, a future commit will be able
to add VTTI-based shuffle costs to the pair selection procedure. Importantly,
the number of remaining shuffles can now be estimated during pair selection.

There are some trivial instruction reorderings in the test cases, and one
simple additional test where we certainly want to do a reordering to
avoid an unnecessary shuffle.

llvm-svn: 167122
2012-10-31 15:17:07 +00:00
Nadav Rotem
9ab0e93cc1 LoopVectorize: Do not vectorize loops with tiny constant trip counts.
llvm-svn: 167101
2012-10-31 03:31:07 +00:00
Nadav Rotem
240ead98fd Add support for loops that don't start with Zero.
This is important for loops in the LAPACK test-suite.
These loops start at 1 because they are auto-converted from fortran.

llvm-svn: 167084
2012-10-31 00:45:26 +00:00
Nadav Rotem
e06ea2d50f Add documentation.
llvm-svn: 167055
2012-10-30 22:06:26 +00:00
Hal Finkel
6cfb988397 BBVectorize: Cache fixed-order pairs instead of recomputing pointer info.
Instead of recomputing relative pointer information just prior to fusing,
cache this information (which also needs to be computed during the
candidate-pair selection process). This cuts down on the total number of
SE queries made, and also is a necessary intermediate step on the road toward
including shuffle costs in the pair selection procedure.

No functionality change is intended.

llvm-svn: 167049
2012-10-30 20:17:37 +00:00
Hal Finkel
1c116e9ec0 BBVectorize: Fix a small bug introduced in r167042.
We need to make sure that we take the correct load/store alignment
when the inputs are flipped.

llvm-svn: 167044
2012-10-30 19:47:37 +00:00
Hal Finkel
2d3b9c41d5 BBVectorize: Simplify how input swapping is handled.
Stop propagating the FlipMemInputs variable into the routines that
create the replacement instructions. Instead, just flip the arguments
of those routines. This allows for some associated cleanup (not all
of which is done here). No functionality change is intended.

llvm-svn: 167042
2012-10-30 19:35:29 +00:00
Hal Finkel
a27a64ab3e BBVectorize: Don't make calls to SE when the result is unused.
SE was being called during the instruction-fusion process (when the result
is unreliable, and thus ignored). No functionality change is intended.

llvm-svn: 167037
2012-10-30 18:55:49 +00:00
Nadav Rotem
69e6bca813 LoopVectorize: Add support for write-only loops when the write destination is a single pointer.
Speedup SciMark by 1%

llvm-svn: 167035
2012-10-30 18:36:45 +00:00
Nadav Rotem
4fc2912062 LoopVectorize: Fix a bug in the initialization of reduction variables. AND needs to start at all-one
while XOR, and OR need to start at zero.

llvm-svn: 167032
2012-10-30 18:12:36 +00:00
Nadav Rotem
2ada2db2a2 LoopVectorizer: change debug prints: Print the module identifier when deciding to vectorize. When deciding not to vectorize do not print the called function name because it can be null.
llvm-svn: 166989
2012-10-30 00:40:39 +00:00
Nadav Rotem
0c9445eb5c LoopVectorize: Update and preserve the dominator tree info.
llvm-svn: 166970
2012-10-29 21:52:38 +00:00
Hal Finkel
82db9961eb Update BBVectorize to use the new VTTI instr. cost interfaces.
The monolithic interface for instruction costs has been split into
several functions. This is the corresponding change. No functionality
change is intended.

llvm-svn: 166865
2012-10-27 04:33:48 +00:00
Nadav Rotem
04f3086065 1. Fix a bug in getTypeConversion. When a *simple* type is split, we need to return the type of the split result.
2. Change the maximum vectorization width from 4 to 8.
3. A test for both.

llvm-svn: 166864
2012-10-27 04:11:32 +00:00
Nadav Rotem
133e437c48 Refactor the VectorTargetTransformInfo interface.
Add getCostXXX calls for different families of opcodes, such as casts, arithmetic, cmp, etc.

Port the LoopVectorizer to the new API.

The LoopVectorizer now finds instructions which will remain uniform after vectorization. It uses this information when calculating the cost of these instructions.

llvm-svn: 166836
2012-10-26 23:49:28 +00:00
Hal Finkel
32f63f9091 Use VTTI->getNumberOfParts in BBVectorize.
This change reflects VTTI refactoring; no functionality change intended.

llvm-svn: 166752
2012-10-26 04:28:06 +00:00
Hal Finkel
a47e6ef6e6 Disable generation of pointer vectors by BBVectorize.
Once vector-of-pointer support works, then this can be reverted.

llvm-svn: 166741
2012-10-26 00:05:26 +00:00
Hal Finkel
d26d094306 BBVectorize, when using VTTI, should not form types that will be split.
This is needed so that perl's SHA can be compiled (otherwise
BBVectorize takes far too long to find its fixed point).

I'll try to come up with a reduced test case.

llvm-svn: 166738
2012-10-25 23:47:16 +00:00
Hal Finkel
e2184ac235 Begin incorporating target information into BBVectorize.
This is the first of several steps to incorporate information from the new
TargetTransformInfo infrastructure into BBVectorize. Two things are done here:

 1. Target information is used to determine if it is profitable to fuse two
    instructions. This means that the cost of the vector operation must not
    be more expensive than the cost of the two original operations. Pairs that
    are not profitable are no longer considered (because current cost information
    is incomplete, for intrinsics for example, equal-cost pairs are still
    considered).

 2. The 'cost savings' computed for the profitability check are also used to
    rank the DAGs that represent the potential vectorization plans. Specifically,
    for nodes of non-trivial depth, the cost savings is used as the node
    weight.

The next step will be to incorporate the shuffle costs into the DAG weighting;
this will give the edges of the DAG weights as well. Once that is done, when
target information is available, we should be able to dispense with the
depth heuristic.

llvm-svn: 166716
2012-10-25 21:12:23 +00:00
Nadav Rotem
f73d286571 LoopVectorize: Teach the cost model to query scalar costs as scalar types and not vectors of 1.
llvm-svn: 166715
2012-10-25 21:03:48 +00:00
Nadav Rotem
5635a9350f Add support for additional reduction variables: AND, OR, XOR.
Patch by Paul Redmond <paul.redmond@intel.com>.

llvm-svn: 166649
2012-10-25 00:08:41 +00:00
Nadav Rotem
9d7ba0ef55 Implement a basic cost model for vector and scalar instructions.
llvm-svn: 166642
2012-10-24 23:47:38 +00:00
Nadav Rotem
23bafecedf whitespace
llvm-svn: 166622
2012-10-24 20:58:40 +00:00
Nadav Rotem
05d9e80245 LoopVectorizer: Add a basic cost model which uses the VTTI interface.
llvm-svn: 166620
2012-10-24 20:36:32 +00:00
Micah Villmow
ce5e56a156 Back out r166591, not sure why this made it through since I cancelled the command. Bleh, sorry about this!
llvm-svn: 166596
2012-10-24 17:25:11 +00:00
Micah Villmow
ae5ce80c36 Delete a directory that wasn't supposed to be checked in yet.
llvm-svn: 166591
2012-10-24 17:20:04 +00:00
Nadav Rotem
3deae09579 Use the AliasAnalysis isIdentifiedObj because it also understands mallocs and c++ news.
PR14158.

llvm-svn: 166491
2012-10-23 18:44:18 +00:00
Nadav Rotem
302d4b678a Don't crash if the load/store pointer is not a GEP.
Fix by Shivarama Rao <Shivarama.Rao@amd.com>

llvm-svn: 166427
2012-10-22 18:27:56 +00:00
Hal Finkel
7a55058abc BBVectorize should ignore unreachable blocks.
Unreachable blocks can have invalid instructions. For example,
jump threading can produce self-referential instructions in
unreachable blocks. Also, we should not be spending time
optimizing unreachable code. Fixes PR14133.

llvm-svn: 166423
2012-10-22 18:00:55 +00:00
Nadav Rotem
ea70508da6 Rename a variable.
llvm-svn: 166410
2012-10-22 04:53:05 +00:00
Nadav Rotem
6b56385c1a Vectorizer: optimize the generation of selects. If the condition is uniform, generate a scalar-cond select (i1 as selector).
llvm-svn: 166409
2012-10-22 04:38:00 +00:00
Nadav Rotem
708e5d2fb0 Update the loop vectorizer docs.
llvm-svn: 166408
2012-10-22 03:52:53 +00:00
Anders Carlsson
d04e66ae01 Avoid an extra hash lookup when inserting a value into the widen map.
llvm-svn: 166395
2012-10-21 16:26:35 +00:00
Jakub Staszak
a477da32fc Simplify code. No functionality change.
llvm-svn: 166393
2012-10-21 15:36:03 +00:00
Jakub Staszak
ed5ec60053 Simplify code. No functionality change.
llvm-svn: 166392
2012-10-21 15:29:19 +00:00
Nadav Rotem
380fe201de Fix a bug in the vectorization of wide load/store operations.
We used a SCEV to detect that A[X] is consecutive. We assumed that X was
the induction variable. But X can be any expression that uses the induction
for example: X = i + 2;

llvm-svn: 166388
2012-10-21 06:49:10 +00:00
Nadav Rotem
825cda19d5 Add support for reduction variables that do not start at zero.
This is important for nested-loop reductions such as :

In the innermost loop, the induction variable does not start with zero:

for (i = 0 .. n)
 for (j = 0 .. m)
  sum += ...

llvm-svn: 166387
2012-10-21 05:52:51 +00:00
Nadav Rotem
5ab04af30a Document change. Describe the pass and some papers that inspired the design of the pass.
llvm-svn: 166386
2012-10-21 04:04:25 +00:00
Nadav Rotem
763abacb83 Vectorizer: fix a bug in the classification of induction/reduction phis.
llvm-svn: 166384
2012-10-21 02:38:01 +00:00
Nadav Rotem
2ee8edf34a Fix an infinite loop in the loop-vectorizer.
PR14134.

llvm-svn: 166379
2012-10-20 20:45:01 +00:00
Nadav Rotem
cdd573e703 Vectorize: teach cavVectorizeMemory to distinguish between A[i]+=x and A[B[i]]+=x.
If the pointer is consecutive then it is safe to read and write. If the pointer is non-loop-consecutive then
it is unsafe to vectorize it because we may hit an ordering issue.

llvm-svn: 166371
2012-10-20 08:26:33 +00:00
Nadav Rotem
762317ecc6 Fix a typo
llvm-svn: 166367
2012-10-20 05:03:27 +00:00
Nadav Rotem
4e013454ca Vectorizer: refactor the memory checks to a new function. No functionality change.
llvm-svn: 166366
2012-10-20 04:59:06 +00:00
Nadav Rotem
61a5b018ad LoopVectorize: Keep the IRBuilder on the stack.
llvm-svn: 166354
2012-10-19 23:27:19 +00:00
Nadav Rotem
8fe03aa4c1 Vectorizer: Add support for loop reductions.
For example:

  for (i=0; i<n; i++)
   sum += A[i] +  B[i] + i;

llvm-svn: 166351
2012-10-19 23:05:40 +00:00
Benjamin Kramer
5080891e07 LoopVectorize: Keep the IRBuilder on the stack.
No functionality change.

llvm-svn: 166274
2012-10-19 08:42:02 +00:00
Nadav Rotem
451f76acc3 vectorizer: Add support for reading and writing from the same memory location.
llvm-svn: 166255
2012-10-19 01:24:18 +00:00
Nadav Rotem
fe5d8c8c09 cleanup the comment.
llvm-svn: 166247
2012-10-18 23:21:01 +00:00
Nadav Rotem
ac2f32d807 fix a naming typo
llvm-svn: 166232
2012-10-18 21:45:31 +00:00
Nadav Rotem
b1d0bfd68b Avoid reconstructing the pointer set when searching for duplicated read/write pointers.
llvm-svn: 166205
2012-10-18 18:34:50 +00:00
Nadav Rotem
f1bbc20bb0 When looking for a vector representation of a scalar, do a single lookup. Also, cache the result of the broadcast instruction.
No functionality change.

llvm-svn: 166191
2012-10-18 17:31:49 +00:00
Nadav Rotem
76ff018e87 remove unused variable to fix a warning.
llvm-svn: 166170
2012-10-18 06:09:21 +00:00
Nadav Rotem
67e1c2c770 Remove the use of dominators and AA.
llvm-svn: 166167
2012-10-18 05:33:02 +00:00
Nadav Rotem
fd924ec3c6 Vectorizer: Add support for loops with an unknown count. For example:
for (i=0; i<n; i++){
        a[i] = b[i+1] + c[i+3];
     }

llvm-svn: 166165
2012-10-18 05:29:12 +00:00
NAKAMURA Takumi
8ffefaa778 LoopVectorize.cpp: Fix a warning. [-Wunused-variable]
llvm-svn: 166153
2012-10-17 23:40:15 +00:00