llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00

Author	SHA1	Message	Date
Matt Arsenault	c32a5a3d46	Fix inserting instructions before last in bundle. The builder inserts from before the insert point, not after, so this would insert before the last instruction in the bundle instead of after it. I'm not sure if this can actually be a problem with any of the current insertions. llvm-svn: 189285	2013-08-26 23:08:37 +00:00
Nadav Rotem	7d4e24e1f4	LoopVectorize: Implement partial loop unrolling when vectorization is not profitable. This patch enables unrolling of loops when vectorization is legal but not profitable. We add a new class InnerLoopUnroller, that extends InnerLoopVectorizer and replaces some of the vector-specific logic with scalars. This patch does not introduce any runtime regressions and improves the following workloads: SingleSource/Benchmarks/Shootout/matrix -22.64% SingleSource/Benchmarks/Shootout-C++/matrix -13.06% External/SPEC/CINT2006/464_h264ref/464_h264ref -3.99% SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -1.95% llvm-svn: 189281	2013-08-26 22:33:26 +00:00
Yi Jiang	b951785e03	test commit. Remove blank line llvm-svn: 189265	2013-08-26 18:57:55 +00:00
Matt Arsenault	49ebc53ae4	Fix unused variable in release build llvm-svn: 189264	2013-08-26 18:38:29 +00:00
Matt Arsenault	0d6c559675	Constify functions llvm-svn: 189234	2013-08-26 17:56:38 +00:00
Matt Arsenault	fe57252c78	Vectorize starting from insertelements building a vector llvm-svn: 189233	2013-08-26 17:56:35 +00:00
Matt Arsenault	c1b8722791	Check if in set on insertion instead of separately llvm-svn: 189179	2013-08-24 19:55:38 +00:00
Chandler Carruth	e6b6740e73	Teach the SLP vectorizer the correct way to check for consecutive access using GEPs. Previously, it used a number of different heuristics for analyzing the GEPs. Several of these were conservatively correct, but failed to fall back to SCEV even when SCEV might have given a reasonable answer. One was simply incorrect in how it was formulated. There was good code already to recursively evaluate the constant offsets in GEPs, look through pointer casts, etc. I gathered this into a form code like the SLP code can use in a previous commit, which allows all of this code to become quite simple. There is some performance (compile time) concern here at first glance as we're directly attempting to walk both pointers constant GEP chains. However, a couple of thoughts: 1) The very common cases where there is a dynamic pointer, and a second pointer at a constant offset (usually a stride) from it, this code will actually not do any unnecessary work. 2) InstCombine and other passes work very hard to collapse constant GEPs, so it will be rare that we iterate here for a long time. That said, if there remain performance problems here, there are some obvious things that can improve the situation immensely. Doing a vectorizer-pass-wide memoizer for each individual layer of pointer values, their base values, and the constant offset is likely to be able to completely remove redundant work and strictly limit the scaling of the work to scrape these GEPs. Since this optimization was not done on the prior version (which would still benefit from it), I've not done it here. But if folks have benchmarks that slow down it should be straight forward for them to add. I've added a test case, but I'm not really confident of the amount of testing done for different access patterns, strides, and pointer manipulation. llvm-svn: 189007	2013-08-22 12:45:17 +00:00
Matt Arsenault	1b410a4dec	Teach LoopVectorize about address space sizes llvm-svn: 188980	2013-08-22 02:42:55 +00:00
Matt Arsenault	3e8997425a	Use attribute helper function llvm-svn: 188916	2013-08-21 18:54:50 +00:00
Matt Arsenault	68d9c05b51	Fix typo llvm-svn: 188915	2013-08-21 18:54:47 +00:00
Arnold Schwaighofer	276cfe784a	SLPVectorizer: Fix invalid iterator errors Update iterator when the SLP vectorizer changes the instructions in the basic block by restarting the traversal of the basic block. Patch by Yi Jiang! Fixes PR 16899. llvm-svn: 188832	2013-08-20 21:21:45 +00:00
Hal Finkel	8f395a803a	Add a llvm.copysign intrinsic This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728	2013-08-19 23:35:46 +00:00
Joerg Sonnenberger	72a32889f4	PR 16899: Do not modify the basic block using the iterator, but keep the next value. This avoids crashes due to invalidation. Patch by Joey Gouly. llvm-svn: 188605	2013-08-17 11:04:47 +00:00
Matt Arsenault	66eeeddb1d	Fix spelling llvm-svn: 188506	2013-08-15 23:11:03 +00:00
Hal Finkel	fd36621506	BBVectorize: Add initial stores to the write set when tracking uses When computing the use set of a store, we need to add the store to the write set prior to iterating over later instructions. Otherwise, if there is a later aliasing load of that store, that load will not be tagged as a use, and bad things will happen. trackUsesOfI still adds later dependent stores of an instruction to that instruction's write set, but it never sees the original instruction, and so when tracking uses of a store, the store must be added to the write set by the caller. Fixes PR16834. llvm-svn: 188329	2013-08-13 23:34:32 +00:00
Nadav Rotem	bc08e7ce84	Fix PR16797 - Support PHINodes with multiple inputs from the same basic block. Do not generate new vector values for the same entries because we know that the incoming values from the same block must be identical. llvm-svn: 188185	2013-08-12 17:46:44 +00:00
Hal Finkel	bdc7aa32c1	Add ISD::FROUND for libm round() All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926	2013-08-07 22:49:12 +00:00
Arnold Schwaighofer	af6776a17b	LoopVectorize: Allow vectorization of loops with lifetime markers Patch by Marc Jessome! llvm-svn: 187825	2013-08-06 22:37:52 +00:00
Nadav Rotem	eef986f7a3	SLPVectorizer: Fix PR16777. PHInodes may use multiple extracted values that come from different blocks. Thanks Alexey Samsonov. llvm-svn: 187663	2013-08-02 18:40:24 +00:00
Nadav Rotem	403f78810d	80-col llvm-svn: 187535	2013-07-31 22:17:45 +00:00
Nadav Rotem	fc24bbce9c	SLPVectorier: update the debug location for the new instructions. llvm-svn: 187363	2013-07-29 18:18:46 +00:00
Nadav Rotem	931f83b2ef	Don't vectorize when the attribute NoImplicitFloat is used. llvm-svn: 187340	2013-07-29 05:13:00 +00:00
Nadav Rotem	b6c365bd70	Update the comment llvm-svn: 187316	2013-07-27 23:28:47 +00:00
Nadav Rotem	9bebda1517	SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. llvm-svn: 187267	2013-07-26 23:07:55 +00:00
Nadav Rotem	965db2159b	SLP Vectorizer: Disable the vectorization of non power of two chains, such as <3 x float>, because we dont have a good cost model for these types. llvm-svn: 187265	2013-07-26 22:53:11 +00:00
Nadav Rotem	383f980dca	When we vectorize across multiple basic blocks we may vectorize PHINodes that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can. llvm-svn: 186883	2013-07-22 22:18:07 +00:00
Nadav Rotem	53033f1204	Fix an obvious typo in the loop vectorizer where the cost model uses the wrong variable. The variable BlockCost is ignored. We don't have tests for the effect of if-conversion loops because it requires a big test (that includes if-converted loops) and it is difficult to find and balance a loop to do the right thing. llvm-svn: 186845	2013-07-22 17:10:48 +00:00
Nadav Rotem	c15d196f83	Delete unused helper functions. llvm-svn: 186808	2013-07-22 05:19:22 +00:00
Nadav Rotem	cb987ae696	Revert a part of r186420. Don't forbid multiple store chains that merge. llvm-svn: 186786	2013-07-21 06:12:57 +00:00
Nadav Rotem	85fcd9dde0	fix an 80-col line. llvm-svn: 186733	2013-07-19 23:14:01 +00:00
Nadav Rotem	43e3cf61bf	Use LLVMs ADTs that improve the compile time of this pass. llvm-svn: 186732	2013-07-19 23:12:19 +00:00
Nadav Rotem	aead8bb4d6	SLPVectorizer: Improve the compile time of isConsecutive by reordering the conditions that check GEPs and eliminate two of the calls to accumulateConstantOffset. llvm-svn: 186731	2013-07-19 23:11:15 +00:00
Nadav Rotem	d9b6f69b94	Handle constants without going through SCEV. llvm-svn: 186593	2013-07-18 18:34:21 +00:00
Nadav Rotem	c47d289b77	SLPVectorizer: Speedup isConsecutive by manually checking GEPs with multiple indices. This brings the compile time of the SLP-Vectorizer to about 2.5% of OPT for my testcase. llvm-svn: 186592	2013-07-18 18:20:45 +00:00
Nadav Rotem	440c849dd9	SLPVectorizer: Speedup isConsecutive (that checks if two addresses are consecutive in memory) by checking for additional patterns that don't need to go through SCEV. llvm-svn: 186563	2013-07-18 04:33:20 +00:00
Nadav Rotem	22f3c7e5fb	Fix a comment. llvm-svn: 186541	2013-07-17 22:41:16 +00:00
Nadav Rotem	990f24a7d2	Add a micro optimization to catch cases where the PtrA equals PtrB. llvm-svn: 186531	2013-07-17 19:52:25 +00:00
Nadav Rotem	ae8b6de415	SLPVectorizer: Accelerate the isConsecutive check by replacing the subtraction of the two values with a simple SCEV expression that adds the offset to one of the pointers that we compare. llvm-svn: 186479	2013-07-17 00:48:31 +00:00
Nadav Rotem	adfe58a7ad	flip the scev minus direction to simplify the code. llvm-svn: 186466	2013-07-16 22:57:06 +00:00
Nadav Rotem	633bd23118	SLPVectorizer: Improve the compile time of isConsecutive by adding a simple constant-gep check before using SCEV. This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV. llvm-svn: 186465	2013-07-16 22:51:07 +00:00
Nadav Rotem	ebe95f88ed	SLPVectorizer: Reduce the compile time of the consecutive store lookup. Process groups of stores in chunks of 16. llvm-svn: 186420	2013-07-16 15:25:17 +00:00
Nadav Rotem	f1e0aebca1	PR16628: Fix a bug in the code that merges compares. Compares return i1 but they compare different types. llvm-svn: 186359	2013-07-15 22:52:48 +00:00
Nadav Rotem	37c18f5cda	SLPVectorizer: change the order in which we search for vectorization candidates. Do stores first and PHIs second. llvm-svn: 186277	2013-07-14 06:15:46 +00:00
Craig Topper	58fa7a9b4a	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. llvm-svn: 186274	2013-07-14 04:42:23 +00:00
Arnold Schwaighofer	970f54281c	LoopVectorizer: Disallow reductions whose header phi is used outside the loop If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256	2013-07-13 19:09:29 +00:00
Andrew Trick	651c624842	LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander. In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241	2013-07-13 06:20:06 +00:00
Arnold Schwaighofer	b9c37551bc	TargetTransformInfo: address calculation parameter for gather/scather Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits. radar://14351991 llvm-svn: 186187	2013-07-12 19:16:02 +00:00
Nadav Rotem	ee62470368	SLPVectorizer: Sink and enable CSE for ExtractElements. llvm-svn: 186145	2013-07-12 06:09:24 +00:00
Nadav Rotem	1e6246b38c	SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes. Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler. llvm-svn: 186139	2013-07-12 00:04:18 +00:00
Nadav Rotem	cd9c4e430f	Remove an argument that we dont use anymore. llvm-svn: 186116	2013-07-11 20:56:13 +00:00
Arnold Schwaighofer	a8667081e1	LoopVectorize: Vectorize all accesses in address space zero with unit stride We can vectorize them because in the case where we wrap in the address space the unvectorized code would have had to access a pointer value of zero which is undefined behavior in address space zero according to the LLVM IR semantics. (Thank you Duncan, for pointing this out to me). Fixes PR16592. llvm-svn: 186088	2013-07-11 15:21:55 +00:00
Nadav Rotem	965af9cb85	Fix a warning. llvm-svn: 186064	2013-07-11 05:39:02 +00:00
Nadav Rotem	8e1d89128f	SLPVectorizer: refactor the code that places extracts. Place the code that decides where to put extracts in the build-tree phase. This allows us to take the cost of the extracts into account. llvm-svn: 186058	2013-07-11 04:54:05 +00:00
Nadav Rotem	417c1a3150	Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform. llvm-svn: 185970	2013-07-09 21:38:08 +00:00
Nadav Rotem	07b212b07b	Set the default insert point to the first instruction, and not to end() llvm-svn: 185953	2013-07-09 17:55:36 +00:00
Nadav Rotem	c43699ed15	This patch changes the saved IRBuilder insert point from BasicBlock::iterator to AssertingVH. Commit 185883 fixes a bug in the IRBuilder that should fix the ASan bot. AssertingVH can help in exposing some RAUW problems. Thanks Ben and Alexey! llvm-svn: 185886	2013-07-08 23:31:13 +00:00
Nadav Rotem	709b733114	Clear the builder insert point between tree-vectorization phases. llvm-svn: 185777	2013-07-07 14:57:18 +00:00
Nadav Rotem	883fb8ad80	SLPVectorizer: Implement DCE as part of vectorization. This is a complete re-write if the bottom-up vectorization class. Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization. There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design. In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree. llvm-svn: 185774	2013-07-07 06:57:07 +00:00
Craig Topper	783617eba7	Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. llvm-svn: 185606	2013-07-04 01:31:24 +00:00
Arnold Schwaighofer	26461b04c7	LoopVectorize: Math functions only read rounding mode Math functions are mark as readonly because they read the floating point rounding mode. Because we don't vectorize loops that would contain function calls that set the rounding mode it is safe to ignore this memory read. llvm-svn: 185299	2013-07-01 00:54:44 +00:00
Benjamin Kramer	78189f47c1	LoopVectorizer: Pack MemAccessInfo pairs. llvm-svn: 185263	2013-06-29 17:52:08 +00:00
Benjamin Kramer	55f216aff4	Move helper classes into anonymous namespaces. llvm-svn: 185262	2013-06-29 17:02:06 +00:00
Nadav Rotem	201beeb585	We preserve the CFG and some of the analysis passes. llvm-svn: 185251	2013-06-29 05:38:15 +00:00
Nadav Rotem	aed4323517	Update docs. llvm-svn: 185250	2013-06-29 05:37:19 +00:00
Nadav Rotem	12c3b510fa	SLP Vectorizer: Add support for trees with external users. To support this we have to insert 'extractelement' instructions to pick the right lane. We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated. llvm-svn: 185230	2013-06-28 22:07:09 +00:00
Nadav Rotem	803f7c3932	LoopVectorizer: Refactor the code that checks if it is safe to predicate blocks. In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks. We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault. llvm-svn: 185214	2013-06-28 20:46:27 +00:00
Arnold Schwaighofer	e6284189a7	LoopVectorize: Pull dyn_cast into setDebugLocFromInst llvm-svn: 185168	2013-06-28 17:14:48 +00:00
Arnold Schwaighofer	09766b6b9f	LoopVectorize: Use static function instead of DebugLocSetter class I used the class to safely reset the state of the builder's debug location. I think I have caught all places where we need to set the debug location to a new one. Therefore, we can replace the class by a function that just sets the debug location. llvm-svn: 185165	2013-06-28 16:26:54 +00:00
Arnold Schwaighofer	d6aee045b3	LoopVectorize: Preserve debug location info radar://14169017 llvm-svn: 185122	2013-06-28 00:38:54 +00:00
Arnold Schwaighofer	c0e3a07c99	LoopVectorize: Cache edge masks created during if-conversion Otherwise, we end up with an exponential IR blowup. Fixes PR16472. llvm-svn: 185097	2013-06-27 20:31:06 +00:00
Arnold Schwaighofer	ccd78deec7	LoopVectorize: Use vectorized loop invariant gep index anchored in loop Use vectorized instruction instead of original instruction anchored in the original loop. Fixes PR16452 and t2075.c of PR16455. llvm-svn: 185081	2013-06-27 15:11:55 +00:00
Arnold Schwaighofer	18efca433e	LoopVectorize: Don't store a reversed value in the vectorized value map When we store values for reversed induction stores we must not store the reversed value in the vectorized value map. Another instruction might use this value. This fixes 3 test cases of PR16455. llvm-svn: 185051	2013-06-27 00:45:41 +00:00
Nadav Rotem	195bbbe54b	No need to use a Set when a vector would do. llvm-svn: 185047	2013-06-27 00:14:13 +00:00
Nadav Rotem	897ca82595	SLP: When searching for vectorization opportunities scan the blocks in post-order because we grow chains upwards. llvm-svn: 185041	2013-06-26 23:44:45 +00:00
Nadav Rotem	962b32446e	SLP: Dont erase instructions during vectorization because it prevents the outerloops from iterating over the instructions. llvm-svn: 185040	2013-06-26 23:43:23 +00:00
Nadav Rotem	860aebf69a	Erase all of the instructions that we RAUWed llvm-svn: 184969	2013-06-26 17:16:09 +00:00
Nadav Rotem	e0a5a586b8	Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited. llvm-svn: 184966	2013-06-26 16:54:53 +00:00
Nadav Rotem	a8fba65221	SLPVectorizer: support slp-vectorization of PHINodes between basic blocks llvm-svn: 184888	2013-06-25 23:04:09 +00:00
Nadav Rotem	8fcb707c24	Fix a typo in the code that collected the costs recursively. llvm-svn: 184827	2013-06-25 05:30:56 +00:00
Nadav Rotem	eff545235c	Rename the variable to fix a warning. Thanks Andy Gibbs. llvm-svn: 184749	2013-06-24 15:59:47 +00:00
Arnold Schwaighofer	0a98597e80	Reapply 184685 after the SetVector iteration order fix. This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg testers. "LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598" llvm-svn: 184724	2013-06-24 12:09:15 +00:00
Arnold Schwaighofer	75b76bf92f	LoopVectorize: Use SetVector for the access set We are creating the runtime checks using this set so we need a deterministic iteration order. llvm-svn: 184723	2013-06-24 12:09:12 +00:00
Arnold Schwaighofer	f022b11b08	Revert "LoopVectorize: Use the dependence test utility class" This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac. We are seeing a stage2 and stage3 miscompare on some dragonegg bots. llvm-svn: 184690	2013-06-24 06:10:41 +00:00
Arnold Schwaighofer	c49cd1a668	LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598 llvm-svn: 184685	2013-06-24 03:55:48 +00:00
Arnold Schwaighofer	f9828b092b	LoopVectorize: Add utility class for checking dependency among accesses This class checks dependences by subtracting two Scalar Evolution access functions allowing us to catch very simple linear dependences. The checker assumes source order in determining whether vectorization is safe. We currently don't reorder accesses. Positive true dependencies need to be a multiple of VF otherwise we impede store-load forwarding. llvm-svn: 184684	2013-06-24 03:55:45 +00:00
Arnold Schwaighofer	67714fedcd	LoopVectorize: Add utility class for building sets of dependent accesses Sets of dependent accesses are built by unioning sets based on underlying objects. This class will be used by the upcoming dependence checker. llvm-svn: 184683	2013-06-24 03:55:44 +00:00
Nadav Rotem	6c2ae14dc5	SLP Vectorizer: Add support for vectorizing parts of the tree. Untill now we detected the vectorizable tree and evaluated the cost of the entire tree. With this patch we can decide to trim-out branches of the tree that are not profitable to vectorizer. Also, increase the max depth from 6 to 12. In the worse possible case where all of the code is made of diamond-shaped graph this can bring the cost to 2**10, but diamonds are not very common. llvm-svn: 184681	2013-06-24 02:52:43 +00:00
Nadav Rotem	5f8e32a66f	SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences. Make sure that we don't replace and RAUW two sequences if one does not dominate the other. llvm-svn: 184674	2013-06-23 21:57:27 +00:00
Nadav Rotem	8aa1211383	SLP Vectorizer: Erase instructions outside the vectorizeTree method. The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization. llvm-svn: 184671	2013-06-23 19:38:56 +00:00
Nadav Rotem	03f4c0b02d	SLP Vectorizer: Implement a simple CSE optimization for the gather sequences. llvm-svn: 184660	2013-06-23 06:15:46 +00:00
Nadav Rotem	3dc5b0a65a	SLP Vectorizer: Implement multi-block slp-vectorization. Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks. It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function. I removed the support for extracting values from trees. We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2). llvm-svn: 184647	2013-06-22 21:34:10 +00:00
Nadav Rotem	232096ea37	SLP Vectorizer: do not search for store-chains that are wider than the vector-register size. llvm-svn: 184527	2013-06-21 04:18:13 +00:00
Nadav Rotem	9191086b4b	Clang-format the SLP vectorizer. No functionality change. llvm-svn: 184446	2013-06-20 17:54:36 +00:00
Nadav Rotem	83cdea61cd	SLPVectorization: Add a basic support for cross-basic block slp vectorization. We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent hints for vectorization of other basic blocks. llvm-svn: 184444	2013-06-20 17:41:45 +00:00
Nadav Rotem	b87c78377e	Change the debug type to match the debug type that is used by vecutils.cpp. This change makes it easier to filter debug messages. llvm-svn: 184440	2013-06-20 16:38:05 +00:00
Nadav Rotem	a578c6d3ad	SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst). llvm-svn: 184325	2013-06-19 17:33:16 +00:00
Nadav Rotem	79be778ce2	SLPVectorizer: start constructing chains at stores that are not power of two. The type <3 x i8> is a common in graphics and we want to be able to vectorize it. This changes accelerates bullet by 12% and 471_omnetpp by 5%. llvm-svn: 184317	2013-06-19 15:57:29 +00:00
Nadav Rotem	61c4560eab	SLPVectorizer: vectorize compares and selects. llvm-svn: 184282	2013-06-19 05:49:52 +00:00
Nadav Rotem	9accad7fa1	Document the return value and fix a typo. llvm-svn: 184281	2013-06-19 05:47:33 +00:00
Nadav Rotem	6796716256	Scan the successor blocks and use the PHI nodes as a hint for possible chain roots. llvm-svn: 184201	2013-06-18 15:58:05 +00:00
Nadav Rotem	c7a7b98ec1	Add a return value to make this function more useful. llvm-svn: 184200	2013-06-18 15:57:12 +00:00
Pekka Jaaskelainen	4f4b7ec54b	Fix for a regression caused by the LoopVectorizer when vectorizing loops with memory accesses to non-zero address spaces. It simply dropped the AS info. Fixes PR16306. llvm-svn: 184103	2013-06-17 18:49:06 +00:00
Arnold Schwaighofer	1ddd15e16f	LoopVectorize: Change API call to get the backedge taken count Use ScalarEvolution's getBackedgeTakenCount API instead of getExitCount since that is really what we want to know. Using the more specific getExitCount was safe because we made sure that there is only one exiting block. No functionality change. llvm-svn: 183047	2013-05-31 21:48:56 +00:00
Arnold Schwaighofer	12f1ab46d1	LoopVectorize: PHIs with only outside users should prevent vectorization We check that instructions in the loop don't have outside users (except if they are reduction values). Unfortunately, we skipped this check for if-convertable PHIs. Fixes PR16184. llvm-svn: 183035	2013-05-31 19:53:50 +00:00
NAKAMURA Takumi	0271817249	LoopVectorize.cpp: Fix abuse of StringRef on Twine. Twine captures the pointer of StringRef. llvm-svn: 182820	2013-05-29 03:13:47 +00:00
NAKAMURA Takumi	4514caf74f	Whitespace. llvm-svn: 182819	2013-05-29 03:13:41 +00:00
Paul Redmond	0eb4837b24	Add support for llvm.vectorizer metadata - llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem llvm-svn: 182802	2013-05-28 20:00:34 +00:00
Benjamin Kramer	8a9baa49c3	LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in it, don't assert on those cases. Fixes PR16139. llvm-svn: 182656	2013-05-24 18:05:35 +00:00
Nadav Rotem	f4096f7321	SLPVectorizer: Change the order in which new instructions are added to the function. We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs). * Imroved the numbering API. * Changed the placement of new instructions to the last root. * Fixed a bug with external tree users with non-zero lane. * Fixed a bug in the placement of in-tree users. llvm-svn: 182508	2013-05-22 19:47:32 +00:00
Arnold Schwaighofer	65d9b93021	LoopVectorize: Make Value pointers that could be RAUW'ed a VH The Value pointers we store in the induction variable list can be RAUW'ed by a call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing in some other places where we store pointers that could potentially be RAUW'ed. Fixes PR16073. llvm-svn: 182485	2013-05-22 16:54:56 +00:00
Arnold Schwaighofer	4a2fd393c6	LoopVectorize: Handle single edge PHIs We might encouter single edge PHIs - handle them with an identity select. Fixes PR15990. llvm-svn: 182199	2013-05-18 18:38:34 +00:00
Benjamin Kramer	c08526696f	LoopVectorize: Simplify code. No functionality change. llvm-svn: 182100	2013-05-17 14:48:17 +00:00
Arnold Schwaighofer	63eb047996	LoopVectorize: Move call of canHoistAllLoads to canVectorizeWithIfConvert We only want to check this once, not for every conditional block in the loop. No functionality change (except that we don't perform a check redudantly anymore). llvm-svn: 181942	2013-05-15 22:38:14 +00:00
Arnold Schwaighofer	f8c04c27fc	LoopVectorize: Fix comments No functionality change. llvm-svn: 181862	2013-05-15 02:02:45 +00:00
Arnold Schwaighofer	0e2020dbaf	LoopVectorize: Hoist conditional loads if possible InstCombine can be uncooperative to vectorization and sink loads into conditional blocks. This prevents vectorization. Undo this optimization if there are unconditional memory accesses to the same addresses in the loop. radar://13815763 llvm-svn: 181860	2013-05-15 01:44:30 +00:00
Arnold Schwaighofer	5f1c1c6187	LoopVectorize: Handle loops with multiple forward inductions We used to give up if we saw two integer inductions. After this patch, we base further induction variables on the chosen one like we do in the reverse induction and pointer induction case. Fixes PR15720. radar://13851975 llvm-svn: 181746	2013-05-14 00:21:18 +00:00
Duncan Sands	526a10c53f	Suppress GCC compiler warnings in release builds about variables that are only read in asserts. llvm-svn: 181689	2013-05-13 07:50:47 +00:00
Nadav Rotem	a6e8db3ab5	SLPVectorizer: Swap LHS and RHS. No functionality change. llvm-svn: 181684	2013-05-13 05:13:13 +00:00
Nadav Rotem	434624fd18	SLPVectorizer: Fix a bug in the code that generates extracts for values with multiple users. The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract. llvm-svn: 181674	2013-05-12 22:58:45 +00:00
Nadav Rotem	298f969650	SLPVectorizer: Clear the map that maps between scalars to vectors after each round of vectorization. Testcase in the next commit. llvm-svn: 181673	2013-05-12 22:55:57 +00:00
Arnold Schwaighofer	c532e0f494	LoopVectorize: Use the widest induction variable type Use the widest induction type encountered for the cannonical induction variable. We used to turn the following loop into an empty loop because we used i8 as induction variable type and truncated 1024 to 0 as trip count. int a[1024]; void fail() { int reverse_induction = 1023; unsigned char forward_induction = 0; while ((reverse_induction) >= 0) { forward_induction++; a[reverse_induction] = forward_induction; --reverse_induction; } } radar://13862901 llvm-svn: 181667	2013-05-11 23:04:28 +00:00
Arnold Schwaighofer	e5f236f9a7	LoopVectorize: Use variable instead of repeated function call No functionality change intended. llvm-svn: 181666	2013-05-11 23:04:26 +00:00
Arnold Schwaighofer	062224410c	LoopVectorize: Use IRBuilder interface in more places No functionality change intended. llvm-svn: 181665	2013-05-11 23:04:24 +00:00
Nadav Rotem	062be30d6a	SLPVectorizer: Add support for trees with external users. For example: bar() { int a = A[i]; int b = A[i+1]; B[i] = a; B[i+1] = b; foo(a); <--- a is used outside the vectorized expression. } llvm-svn: 181648	2013-05-10 22:59:33 +00:00
Nadav Rotem	4eab353eed	Add a debug print llvm-svn: 181647	2013-05-10 22:56:18 +00:00
Arnold Schwaighofer	374ad2d113	LoopVectorizer: Don't assert on the absence of induction variables A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495	2013-05-09 00:32:18 +00:00
Arnold Schwaighofer	b00b8ac69a	LoopVectorizer: Improve reduction variable identification The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369	2013-05-07 21:55:37 +00:00
Arnold Schwaighofer	f95f087afb	LoopVectorize: getConsecutiveVector must respect signed arithmetic We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286	2013-05-07 04:37:05 +00:00
Nadav Rotem	9119c025de	Update the comment to mention that we use TTI. llvm-svn: 181178	2013-05-06 03:06:36 +00:00
Benjamin Kramer	06f30fc527	LoopVectorize: Print values instead of pointers in debug output. llvm-svn: 181157	2013-05-05 14:54:52 +00:00
Arnold Schwaighofer	0a0d84c7b7	LoopVectorize: Add support for floating point min/max reductions Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144	2013-05-05 01:54:48 +00:00
Arnold Schwaighofer	0791ec5e44	LoopVectorizer: Cleanup of miminimum/maximum pattern match code No need for setting the operands. The pointers are going to be bound by the matcher. radar://13723044 llvm-svn: 181142	2013-05-05 01:54:44 +00:00
Arnold Schwaighofer	0158aa58f2	LoopVectorize: We don't need an identity element for min/max reductions We can just use the initial element that feeds the reduction. max(max(x, y), z) == max(max(x,y), max(x,z)) radar://13723044 llvm-svn: 181141	2013-05-05 01:54:42 +00:00
Dmitri Gribenko	82c92dc3dd	Add ArrayRef constructor from None, and do the cleanups that this constructor enables Patch by Robert Wilhelm. llvm-svn: 181138	2013-05-05 00:40:33 +00:00
Nadav Rotem	97fe0281b4	LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values. By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int A, int B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037	2013-05-03 17:42:55 +00:00
Filip Pizlo	dd62846c56	This patch breaks up Wrap.h so that it does not have to include all of the things, and renames it to CBindingWrapping.h. I also moved CBindingWrapping.h into Support/. This new file just contains the macros for defining different wrap/unwrap methods. The calls to those macros, as well as any custom wrap/unwrap definitions (like for array of Values for example), are put into corresponding C++ headers. Doing this required some #include surgery, since some .cpp files relied on the fact that including Wrap.h implicitly caused the inclusion of a bunch of other things. This also now means that the C++ headers will include their corresponding C API headers; for example Value.h must include llvm-c/Core.h. I think this is harmless, since the C API headers contain just external function declarations and some C types, so I don't believe there should be any nasty dependency issues here. llvm-svn: 180881	2013-05-01 20:59:00 +00:00
Nadav Rotem	a18fab3891	Fix a typo llvm-svn: 180806	2013-04-30 21:04:51 +00:00
Nadav Rotem	fe6c769d60	LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes. llvm-svn: 180593	2013-04-26 05:08:59 +00:00
Nadav Rotem	d5eaf768a9	LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers. llvm-svn: 180570	2013-04-25 19:55:03 +00:00
Arnold Schwaighofer	b2a88cf3ef	LoopVectorizer: Change variable name Stride to ConsecutiveStride This makes it easier to read the code. No functionality change. llvm-svn: 180197	2013-04-24 16:16:03 +00:00
Arnold Schwaighofer	ad591145df	LoopVectorize: Scalarize padded types This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196	2013-04-24 16:16:01 +00:00
Arnold Schwaighofer	169f004ff2	LoopVectorizer: Bail out if we don't have datalayout we need it llvm-svn: 180195	2013-04-24 16:15:58 +00:00
Nadav Rotem	1bfb7903e3	LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121	2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen	d8a8d1d02f	Call the potentially costly isAnnotatedParallel() only once. Made the uniform write test's checks a bit stricter. llvm-svn: 180119	2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen	1492231ce9	Refuse to (even try to) vectorize loops which have uniform writes, even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081	2013-04-23 08:08:51 +00:00
Eric Christopher	beec5d09da	Move C++ code out of the C headers and into either C++ headers or the C++ files themselves. This enables people to use just a C compiler to interoperate with LLVM. llvm-svn: 180063	2013-04-22 22:47:22 +00:00
Nadav Rotem	e567845da4	SLPVectorize: Add support for vectorization of casts. llvm-svn: 179975	2013-04-21 08:05:59 +00:00
Nadav Rotem	b0e93c71e8	SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes with multiple users. We did not terminate the switch case and we executed the search routine twice. llvm-svn: 179974	2013-04-21 07:37:56 +00:00
Nadav Rotem	069e6d9a7f	Fix PR15800. Do not try to vectorize vectors and structs. llvm-svn: 179960	2013-04-20 22:29:43 +00:00
Benjamin Kramer	f048a2579c	VecUtils: Clean up uses of dyn_cast. llvm-svn: 179936	2013-04-20 10:36:17 +00:00
Benjamin Kramer	e587f9b0bb	SLPVectorizer: Strength reduce SmallVectors to ArrayRefs. Avoids a couple of copies and allows more flexibility in the clients. llvm-svn: 179935	2013-04-20 09:49:10 +00:00
Nadav Rotem	a244928706	SLPVectorizer: Reduce the compile time by eliminating the search for some of the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr. llvm-svn: 179933	2013-04-20 07:29:34 +00:00
Nadav Rotem	0d6e377312	refactor tryToVectorizePair to a new method that supports vectorization of lists. llvm-svn: 179932	2013-04-20 07:22:58 +00:00
Nadav Rotem	594ff6b41e	Fix an unused variable warning. llvm-svn: 179931	2013-04-20 06:40:28 +00:00
Nadav Rotem	ed63de18d3	SLPVectorizer: Improve the cost model for loop invariant broadcast values. llvm-svn: 179930	2013-04-20 06:13:47 +00:00
Nadav Rotem	571d90215a	Report the number of stores that were found in the debug message. llvm-svn: 179929	2013-04-20 05:23:11 +00:00
Nadav Rotem	5195d6b091	Fix the header comment. llvm-svn: 179928	2013-04-20 05:18:51 +00:00
Nadav Rotem	e5d4f04ae7	Use 64bit arithmetic for calculating distance between pointers. llvm-svn: 179927	2013-04-20 05:17:47 +00:00
Arnold Schwaighofer	7b05635869	LoopVectorizer: Use matcher from PatternMatch.h for the min/max patterns Also make some static function class functions to avoid having to mention the class namespace for enums all the time. No functionality change intended. llvm-svn: 179886	2013-04-19 21:03:36 +00:00
Dmitri Gribenko	ad7b3011a6	Fix a -Wdocumentation warning llvm-svn: 179789	2013-04-18 20:13:04 +00:00
Arnold Schwaighofer	acd551152c	LoopVectorizer: Recognize min/max reductions A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773	2013-04-18 17:22:34 +00:00
Benjamin Kramer	18f31a4d5e	LoopVectorize: Use a set to avoid longer cycles in the reduction chain too. Fixes PR15748. llvm-svn: 179757	2013-04-18 14:29:13 +00:00
Nadav Rotem	40ad92b46d	SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops. llvm-svn: 179562	2013-04-15 22:00:26 +00:00
Nadav Rotem	7ab2574900	SLPVectorizer: Add support for vectorizing trees that start at compare instructions. llvm-svn: 179504	2013-04-15 04:25:27 +00:00
Benjamin Kramer	715265e7d7	Miscellaneous cleanups for VecUtils.h llvm-svn: 179483	2013-04-14 09:33:08 +00:00
Nadav Rotem	c46de9ba5b	SLP: Document the scalarization cost method. llvm-svn: 179479	2013-04-14 07:22:22 +00:00
Nadav Rotem	e380208d4f	SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree. llvm-svn: 179475	2013-04-14 05:15:53 +00:00
Nadav Rotem	433f05f5de	SLPVectorizer: add initial support for reduction variable vectorization. llvm-svn: 179470	2013-04-14 03:22:20 +00:00
Nadav Rotem	51df846152	SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. llvm-svn: 179414	2013-04-12 21:16:54 +00:00
Nadav Rotem	0b2109a86c	Add debug prints. llvm-svn: 179412	2013-04-12 21:11:14 +00:00
Arnold Schwaighofer	48b1c3e915	LoopVectorizer: integer division is not a reduction operation Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381	2013-04-12 15:15:19 +00:00
Benjamin Kramer	cf2731d0e0	Rename the C function to create a SLPVectorizerPass to something sane and expose it in the header file. llvm-svn: 179272	2013-04-11 11:36:36 +00:00
Nadav Rotem	6a6b998435	Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions. llvm-svn: 179207	2013-04-10 19:41:36 +00:00
Nadav Rotem	aa6eefd489	We require DataLayout for analyzing the size of stores. llvm-svn: 179206	2013-04-10 18:57:27 +00:00
Nadav Rotem	96f8f45bd5	Add support for bottom-up SLP vectorization infrastructure. This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int x, int y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117	2013-04-09 19:44:35 +00:00
Arnold Schwaighofer	abd363c1bc	LoopVectorizer: Pass OperandValueKind information to the cost model Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809	2013-04-04 23:26:27 +00:00
Arnold Schwaighofer	3e3105f2f8	LoopVectorize: Invert case when we use a vector cmp value to query select cost We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098	2013-03-14 18:54:36 +00:00
Hal Finkel	a5dcace09c	BBVectorize: Fixup debugging statements After the recent data-structure improvements, a couple of debugging statements were broken (printing pointer values). llvm-svn: 176791	2013-03-10 20:57:42 +00:00
Benjamin Kramer	c6db6a4d69	Remove a source of nondeterminism from the LoopVectorizer. This made us emit runtime checks in a random order. Hopefully bootstrap miscompares will go away now. llvm-svn: 176775	2013-03-09 19:22:40 +00:00
Arnold Schwaighofer	2a2a785543	LoopVectorizer: Ignore all dbg intrinisic Ignore all DbgIntriniscInfo instructions instead of just DbgValueInst. llvm-svn: 176769	2013-03-09 16:27:27 +00:00
Arnold Schwaighofer	652df6a5cb	LoopVectorizer: Ignore dbg.value instructions We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic and don't transfer them to the vectorized code. radar://13378964 llvm-svn: 176768	2013-03-09 15:56:34 +00:00
Benjamin Kramer	7b2d670a66	Insert the reduction start value into the first bypass block to preserve domination. Fixes PR15344. llvm-svn: 176701	2013-03-08 16:58:37 +00:00
Nadav Rotem	6d803820f8	PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. The LoopVectorizer often runs multiple times on the same function due to inlining. When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches. With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time. PR14448. llvm-svn: 176399	2013-03-02 01:33:49 +00:00
Benjamin Kramer	a462070739	LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses. Fixes PR15384. llvm-svn: 176366	2013-03-01 19:07:31 +00:00
Benjamin Kramer	33d5dd0b08	LoopVectorize: Vectorize math builtin calls. This properly asks TargetLibraryInfo if a call is available and if it is, it can be translated into the corresponding LLVM builtin. We don't vectorize sqrt() yet because I'm not sure about the semantics for negative numbers. The other intrinsic should be exact equivalents to the libm functions. Differential Revision: http://llvm-reviews.chandlerc.com/D465 llvm-svn: 176188	2013-02-27 15:24:19 +00:00
Renato Golin	d371b2ca14	Allow GlobalValues to vectorize with AliasAnalysis Storing the load/store instructions with the values and inspect them using Alias Analysis to make sure they don't alias, since the GEP pointer operand doesn't take the offset into account. Trying hard to not add any extra cost to loads and stores that don't overlap on global values, AA is only calculated if all of the previous attempts failed. Using biggest vector register size as the stride for the vectorization access, as we're being conservative and the cost model (which calculates the real vectorization factor) is only run after the legalization phase. We might re-think this relationship in the future, but for now, I'd rather be safe than sorry. llvm-svn: 175818	2013-02-21 22:39:03 +00:00
Hal Finkel	92f63997ce	BBVectorize: Fix an invalid reference bug This fixes PR15289. This bug was introduced (recently) in r175215; collecting all std::vector references for candidate pairs to delete at once is invalid because subsequent lookups in the owning DenseMap could invalidate the references. bugpoint was able to reduce a useful test case. Unfortunately, because whether or not this asserts depends on memory layout, this test case will sometimes appear to produce valid output. Nevertheless, running under valgrind will reveal the error. llvm-svn: 175397	2013-02-17 15:59:26 +00:00
Hal Finkel	891df5ece6	BBVectorize: Call a DAG and DAG instead of a tree Several functions and variable names used the term 'tree' to refer to what is actually a DAG. Correcting this mistake will, hopefully, prevent confusion in the future. No functionality change intended. llvm-svn: 175278	2013-02-15 17:20:54 +00:00
Hal Finkel	5e320e9019	BBVectorize: Cap the number of candidate pairs in each instruction group For some basic blocks, it is possible to generate many candidate pairs for relatively few pairable instructions. When many (tens of thousands) of these pairs are generated for a single instruction group, the time taken to generate and rank the different vectorization plans can become quite large. As a result, we now cap the number of candidate pairs within each instruction group. This is done by closing out the group once the threshold is reached (set now at 3000 pairs). Although this will limit the overall compile-time impact, this may not be the best way to achieve this result. It might be better, for example, to prune excessive candidate pairs after the fact the prevent the generation of short, but highly-connected groups. We can experiment with this in the future. This change reduces the overall compile-time slowdown of the csa.ll test case in PR15222 to ~5x. If 5x is still considered too large, a lower limit can be used as the default. This represents a functionality change, but only for very large inputs (thus, there is no regression test). llvm-svn: 175251	2013-02-15 04:28:42 +00:00
Hal Finkel	29b84d5692	BBVectorize: Remove the remaining instances of std::multimap All instances of std::multimap have now been replaced by DenseMap<K, std::vector<V> >, and this yields a speedup of 5% on the csa.ll test case from PR15222. No functionality change intended. llvm-svn: 175216	2013-02-14 22:38:04 +00:00
Hal Finkel	b302f08ac1	BBVectorize: Don't store candidate pairs in a std::multimap This is another commit on the road to removing std::multimap from BBVectorize. This gives an ~1% speedup on the csa.ll test case in PR15222. No functionality change intended. llvm-svn: 175215	2013-02-14 22:37:09 +00:00
Benjamin Kramer	7e28d6495d	LoopVectorize: Simplify code for clarity. No functionality change. llvm-svn: 175076	2013-02-13 21:12:29 +00:00
Pekka Jaaskelainen	7e2908d0f3	Metadata for annotating loops as parallel. The first consumer for this metadata is the loop vectorizer. See the documentation update for more info. llvm-svn: 175060	2013-02-13 18:08:57 +00:00
Hal Finkel	5e637e4e5d	BBVectorize: Don't over-search when building the dependency map When building the pairable-instruction dependency map, don't search past the last pairable instruction. For large blocks that have been divided into multiple instruction groups, searching past the last instruction in each group is very wasteful. This gives a 32% speedup on the csa.ll test case from PR15222 (when using 50 instructions in each group). No functionality change intended. llvm-svn: 174915	2013-02-11 23:02:17 +00:00
Hal Finkel	b67d7ef969	BBVectorize: Omit unnecessary entries in PairableInstUsers This map is queried only for instructions in pairs of pairable instructions; so make sure that only pairs of pairable instructions are added to the map. This gives a 3.5% speedup on the csa.ll test case from PR15222. No functionality change intended. llvm-svn: 174914	2013-02-11 23:02:09 +00:00
Hal Finkel	2e8c1799a0	BBVectorize: Eliminate one more restricted linear search This eliminates one more linear search over a range of std::multimap entries. This gives a 22% speedup on the csa.ll test case from PR15222. No functionality change intended. llvm-svn: 174893	2013-02-11 17:19:34 +00:00
Hal Finkel	d37a002ca6	BBVectorize: Remove the linear searches from pair connection searching This removes the last of the linear searches over ranges of std::multimap iterators, giving a 7% speedup on the doduc.bc input from PR15222. No functionality change intended. llvm-svn: 174859	2013-02-11 05:29:51 +00:00
Hal Finkel	e006b66033	BBVectorize: Avoid linear searches within the load-move set This is another cleanup aimed at eliminating linear searches in ranges of std::multimap. No functionality change intended. llvm-svn: 174858	2013-02-11 05:29:49 +00:00
Hal Finkel	6672c24cc9	BBVectorize: isa/cast cleanup in getInstructionTypes Profiling suggests that getInstructionTypes is performance-sensitive, this cleans up some double-casting in that function in favor of using dyn_cast. No functionality change intended. llvm-svn: 174857	2013-02-11 05:29:48 +00:00
Hal Finkel	fad3c1ec89	BBVectorize: Make the bookkeeping to support full cycle checking less expensive By itself, this does not have much of an effect, but only because in the default configuration the full cycle checks are used only for small problem sizes. This is part of a general cleanup of uses of iteration over std::multimap ranges only for the purpose of checking membership. No functionality change intended. llvm-svn: 174856	2013-02-11 05:29:41 +00:00
Hal Finkel	2e1d39a40e	BBVectorize: Use TTI->getAddressComputationCost This is a follow-up to the cost-model change in r174713 which splits the cost of a memory operation between the address computation and the actual memory access. In r174713, this cost is always added to the memory operation cost, and so BBVectorize will do the same. Currently, this new cost function is used only by ARM, and I don't have any ARM test cases for BBVectorize. Assistance in generating some good ARM test cases for BBVectorize would be greatly appreciated! llvm-svn: 174743	2013-02-08 21:13:39 +00:00
Jakob Stoklund Olesen	c8bee11d10	Typos. llvm-svn: 174723	2013-02-08 17:43:32 +00:00
Arnold Schwaighofer	381c4a3e54	ARM cost model: Address computation in vector mem ops not free Adds a function to target transform info to query for the cost of address computation. The cost model analysis pass now also queries this interface. The code in LoopVectorize adds the cost of address computation as part of the memory instruction cost calculation. Only there, we know whether the instruction will be scalarized or not. Increase the penality for inserting in to D registers on swift. This becomes necessary because we now always assume that address computation has a cost and three is a closer value to the architecture. radar://13097204 llvm-svn: 174713	2013-02-08 14:50:48 +00:00
Michael Kuperstein	ed93eafd8f	Test Commit llvm-svn: 174709	2013-02-08 12:58:29 +00:00
Nadav Rotem	dcab6c8ee5	fix 80-col violation and fix the docs. llvm-svn: 174671	2013-02-07 22:34:07 +00:00
Arnold Schwaighofer	cae839558d	Loop Vectorizer: Refactor Memory Cost Computation We don't want too many classes in a pass and the classes obscure the details. I was going a little overboard with object modeling here. Replace classes by generic code that handles both loads and stores. No functionality change intended. llvm-svn: 174646	2013-02-07 19:05:21 +00:00
Arnold Schwaighofer	b835fa09f3	Loop Vectorizer: Refactor code to compute vectorized memory instruction cost Introduce a helper class that computes the cost of memory access instructions. No functionality change intended. llvm-svn: 174422	2013-02-05 18:46:41 +00:00
Arnold Schwaighofer	5ee07db17d	Loop Vectorizer: Handle pointer stores/loads in getWidestType() In the loop vectorizer cost model, we used to ignore stores/loads of a pointer type when computing the widest type within a loop. This meant that if we had only stores/loads of pointers in a loop we would return a widest type of 8bits (instead of 32 or 64 bit) and therefore a vector factor that was too big. Now, if we see a consecutive store/load of pointers we use the size of a pointer (from data layout). This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced test case is the first test in vector_ptr_load_store.ll). radar://13139343 llvm-svn: 174377	2013-02-05 15:08:02 +00:00
Pekka Jaaskelainen	67cddcca8c	LoopVectorize: convert TinyTripCountVectorThreshold constant to a command line switch. llvm-svn: 173837	2013-01-29 21:42:08 +00:00
Benjamin Kramer	5142e76f3a	LoopVectorize: Clean up ValueMap a bit and avoid double lookups. No intended functionality change. llvm-svn: 173809	2013-01-29 17:31:33 +00:00
Renato Golin	7fe847b274	Vectorization Factor clarification llvm-svn: 173691	2013-01-28 16:02:45 +00:00
Hal Finkel	115cffae92	BBVectorize: Better use of TTI->getShuffleCost When flipping the pair of subvectors that form a vector, if the vector length is 2, we can use the SK_Reverse shuffle kind to get more-accurate cost information. Also we can use the SK_ExtractSubvector shuffle kind to get accurate subvector extraction costs. The current cost model implementations don't yet seem complex enough for this to make a difference (thus, there are no test cases with this commit), but it should help in future. Depending on how the various targets optimize and combine shuffles in practice, we might be able to get more-accurate costs by combining the costs of multiple shuffle kinds. For example, the cost of flipping the subvector pairs could be modeled as two extractions and two subvector insertions. These changes, however, should probably be motivated by specific test cases. llvm-svn: 173621	2013-01-27 20:07:01 +00:00
Hal Finkel	9882e8c083	BBVectorize: Add a additional comment about the cost computation llvm-svn: 173580	2013-01-26 16:49:04 +00:00
Hal Finkel	2d9bc41033	BBVectorize: Fix anomalous capital letter in comment llvm-svn: 173579	2013-01-26 16:49:03 +00:00
Nadav Rotem	80f2c07dc3	LoopVectorize: Refactor the code that vectorizes loads/stores to remove duplication. llvm-svn: 173500	2013-01-25 21:47:42 +00:00
Benjamin Kramer	3f37f1a557	LoopVectorize: Simplify code. No functionality change. llvm-svn: 173475	2013-01-25 19:43:15 +00:00
Nadav Rotem	1d462abb9a	LoopVectorizer: Refactor more code to use the IRBuilder. llvm-svn: 173471	2013-01-25 19:26:23 +00:00
Nadav Rotem	df96d1dffb	Refactor some code to use the IRBuilder. llvm-svn: 173467	2013-01-25 18:34:09 +00:00
Nadav Rotem	52f5279653	Add support for reverse pointer induction variables. These are loops that contain pointers that count backwards. For example, this is the hot loop in BZIP: do { m = --p; p = ( ... ); } while (--n); llvm-svn: 173219	2013-01-23 01:35:00 +00:00
Nadav Rotem	6e8c348e91	Fix a comment. Induction vars dont need to start at zero. llvm-svn: 173061	2013-01-21 17:59:18 +00:00
Benjamin Kramer	3acd79adc5	LoopVectorize: Fix a C++11 incompatibility. llvm-svn: 172990	2013-01-20 20:29:52 +00:00
Nadav Rotem	7c9244fdca	Fix a build error. llvm-svn: 172971	2013-01-20 09:39:17 +00:00
Nadav Rotem	9ec02f071a	LoopVectorizer: Implement a new heuristics for selecting the unroll factor. We ignore the cpu frontend and focus on pipeline utilization. We do this because we don't have a good way to estimate the loop body size at the IR level. llvm-svn: 172964	2013-01-20 05:24:29 +00:00
Benjamin Kramer	28c812f680	LoopVectorizer: Emit memory checks into their own basic block. This separates the check for "too few elements to run the vector loop" from the "memory overlap" check, giving a lot nicer code and allowing to skip the memory checks when we're not going to execute the vector code anyways. We still leave the decision of whether to emit the memory checks as branches or setccs, but it seems to be doing a good job. If ugly code pops up we may want to emit them as separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso. Most of this is legwork to allow multiple bypass blocks while updating PHIs, dominators and loop info. llvm-svn: 172902	2013-01-19 13:57:58 +00:00
Nadav Rotem	3fc70ae776	LoopVectorizer cost model. Honor the user command line flag that selects the vectorization factor even if the target machine does not have any vector registers. llvm-svn: 172544	2013-01-15 18:25:16 +00:00
Nadav Rotem	c6cce40085	Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 and i16). llvm-svn: 172348	2013-01-13 07:56:29 +00:00
Nadav Rotem	008741a0e0	ARM Cost Model: We need to detect the max bitwidth of types in the loop in order to select the max vectorization factor. We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector zext/sext/trunc operations. llvm-svn: 172178	2013-01-11 07:11:59 +00:00
Nadav Rotem	d5f59a81d9	LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals. PR14878 llvm-svn: 172079	2013-01-10 17:34:39 +00:00
Nadav Rotem	436dc952aa	ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor. llvm-svn: 172010	2013-01-09 22:29:00 +00:00
Nadav Rotem	9c27f36e59	Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM. llvm-svn: 171928	2013-01-09 01:15:42 +00:00
Nadav Rotem	df0bfd2e95	Code cleanup: refactor the switch statements in the generation of reduction variables into an IR builder call. llvm-svn: 171871	2013-01-08 17:37:45 +00:00
Nadav Rotem	5bb578a21c	Rename the enum members to match the LLVM coding style. llvm-svn: 171868	2013-01-08 17:23:17 +00:00
Nadav Rotem	4aa065a2a3	LoopVectorizer: Add support for floating point reductions llvm-svn: 171812	2013-01-07 23:13:00 +00:00
Nadav Rotem	5906222eab	LoopVectorizer: When we vectorizer and widen loops we process many elements at once. This is a good thing, except for small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the rest of the loop. This patch disables widening of loops with a small static trip count. llvm-svn: 171798	2013-01-07 21:54:51 +00:00
Chandler Carruth	c894d24706	Simplify LoopVectorize to require target transform info and rely on it being present. Make a member of one of the helper classes a reference as part of this. Reformatting goodness brought to you by clang-format. llvm-svn: 171726	2013-01-07 11:12:29 +00:00
Chandler Carruth	cbf30d85b9	Merge the unused header file for LoopVectorizer into the source file. This makes the loop vectorizer match the pattern followed by roughly all other passses. =] Notably, this header file was braken in several regards: it contained a using namespace directive, global #define's that aren't globaly appropriate, and global constants defined directly in the header file. As a side benefit, lots of the types in this file become internal, which will cause the optimizer to chew on this pass more effectively. llvm-svn: 171723	2013-01-07 10:44:06 +00:00
Chandler Carruth	3487258579	Switch BBVectorize to directly depend on having a TTI analysis. This could be simplified further, but Hal has a specific feature for ignoring TTI, and so I preserved that. Also, I needed to use it because a number of tests fail when switching from a null TTI to the NoTTI nonce implementation. That seems suspicious to me and so may be something that you need to look into Hal. I worked it by preserving the old behavior for these tests with the flag that ignores all target info. llvm-svn: 171722	2013-01-07 10:22:36 +00:00
Chandler Carruth	362e34c603	Fix a slew of indentation and parameter naming style issues. This 80% of this patch brought to you by the tool clang-format. I wanted to fix up the names of constructor parameters because they followed a bit of an anti-pattern by naming initialisms with CamelCase: 'Tti', 'Se', etc. This appears to have been in an attempt to not overlap with the names of member variables 'TTI', 'SE', etc. However, constructor arguments can very safely alias members, and in fact that's the conventional way to pass in members. I've fixed all of these I saw, along with making some strang abbreviations such as 'Lp' be simpler 'L', or 'Lgl' be the word 'Legal'. However, the code I was touching had indentation and formatting somewhat all over the map. So I ran clang-format and fixed them. I also fixed a few other formatting or doxygen formatting issues such as using ///< on trailing comments so they are associated with the correct entry. There is still a lot of room for improvement of the formating and cleanliness of this code. ;] At least a few parts of the coding standards or common practices in LLVM's code aren't followed, the enum naming rules jumped out at me. I may mix some of these while I'm here, but not all of them. llvm-svn: 171719	2013-01-07 09:57:00 +00:00
Chandler Carruth	7723d75e9e	Fix the enumerator names for ShuffleKind to match tho coding standards, and make its comments doxygen comments. llvm-svn: 171688	2013-01-07 03:20:02 +00:00
Chandler Carruth	3c0f5d4efb	Move TargetTransformInfo to live under the Analysis library. This no longer would violate any dependency layering and it is in fact an analysis. =] llvm-svn: 171686	2013-01-07 03:08:10 +00:00
Chandler Carruth	7e058addc4	Switch the loop vectorizer from VTTI to just use TTI directly. llvm-svn: 171620	2013-01-05 10:16:02 +00:00
Chandler Carruth	07ee67f2dc	Switch the BB vectorizer from the VTTI interface to the simple TTI interface. llvm-svn: 171618	2013-01-05 10:05:28 +00:00
Nadav Rotem	836b9a9fda	iLoopVectorize: Non commutative operators can be used as reduction variables as long as the reduction chain is used in the LHS. PR14803. llvm-svn: 171583	2013-01-05 01:15:47 +00:00
Paul Redmond	6ce33a6ae9	Do not vectorize loops with subtraction reductions Since subtraction does not commute the loop vectorizer incorrectly vectorizes reductions such as x = A[i] - x. Disabling for now. llvm-svn: 171537	2013-01-04 22:10:16 +00:00
Nadav Rotem	3349017273	Fix a warning llvm-svn: 171525	2013-01-04 21:08:44 +00:00
Nadav Rotem	cb3562a88e	LoopVectorizer: 1. Add code to estimate register pressure. 2. Add code to select the unroll factor based on register pressure. 3. Add bits to TargetTransformInfo to provide the number of registers. llvm-svn: 171469	2013-01-04 17:48:25 +00:00
Nadav Rotem	29dd0667aa	LoopVectorizer: Add support for loop-unrolling during vectorization for increasing the ILP. At the moment this feature is disabled by default and this commit should not cause any functional changes. llvm-svn: 171436	2013-01-03 00:52:27 +00:00
Nadav Rotem	a7cac72b7d	Avoid vectorization when the function has the "noimplicitflot" attribute. llvm-svn: 171429	2013-01-02 23:54:43 +00:00
Chandler Carruth	4c1f3c24db	Move all of the header files which are involved in modelling the LLVM IR into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366	2013-01-02 11:36:10 +00:00

... 3 4 5 6 7 ...

626 Commits