llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00

Author	SHA1	Message	Date
Renato Golin	ae79e04f36	Mark vector loops as already vectorized Make sure we mark all loops (scalar and vector) when vectorizing, so that we don't try to vectorize them anymore. Also, set unroll to 1, since this is what we check for on early exit. llvm-svn: 193349	2013-10-24 14:50:51 +00:00
Matt Arsenault	fcca6dd732	Use more type helper functions llvm-svn: 193109	2013-10-21 19:43:56 +00:00
Arnold Schwaighofer	789187ee86	SLPVectorizer: Don't vectorize volatile memory operations radar://15231682 Reapply r192799, http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/8226 showed that the bot is still broken even with this out. llvm-svn: 192820	2013-10-16 17:52:40 +00:00
Arnold Schwaighofer	7097263371	Revert "SLPVectorizer: Don't vectorize volatile memory operations" This speculatively reverts commit 192799. It might have broken a linux buildbot. llvm-svn: 192816	2013-10-16 17:19:40 +00:00
Arnold Schwaighofer	eebda9d6cf	SLPVectorizer: Don't vectorize volatile memory operations radar://15231682 llvm-svn: 192799	2013-10-16 16:09:00 +00:00
Benjamin Kramer	a00487e169	LoopVectorize: Properly reflect PODness in comments. llvm-svn: 192717	2013-10-15 16:19:54 +00:00
Arnold Schwaighofer	38ec37faba	SLPVectorizer: Sort PHINodes based on their opcode Before this patch we relied on the order of phi nodes when we looked for phi nodes of the same type. This could prevent vectorization of cases where there was a phi node of a second type in between phi nodes of some type. This is important for vectorization of an internal graphics kernel. On the test suite + external on x86_64 (and on a run on armv7s) it showed no impact on either performance or compile time. radar://15024459 llvm-svn: 192537	2013-10-12 18:56:27 +00:00
Tobias Grosser	bc154d94d0	LoopVectorize: Add missing INITIALIZE_PASS_DEPENDENCY macros Contributed-by: Peter Zotov <whitequark@whitequark.org> llvm-svn: 192536	2013-10-12 18:29:15 +00:00
Renato Golin	ec7fe56cfa	Better info when debugging vectorizer llvm-svn: 192460	2013-10-11 16:14:39 +00:00
Arnold Schwaighofer	bfe48b104a	LoopVectorize: External uses must use the last value in a reduction cycle Otherwise, we don't perform operations that would have been performed on the scalar version. Fixes PR17498. llvm-svn: 192133	2013-10-07 21:05:43 +00:00
Arnold Schwaighofer	6fab174840	SLPVectorizer: Sort inputs to commutative binary operations Sort the operands of the other entries in the current vectorization root according to the first entry's operands opcodes. %conv0 = uitofp ... %load0 = load float ... = fmul %conv0, %load0 = fmul %load0, %conv1 = fmul %load0, %conv2 Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0, %load0, %load0>. This makes it more likely to obtain vectorizable trees. We have to be careful when we sort that we don't destroy 'good' existing ordering implied by source order. radar://15080067 llvm-svn: 191977	2013-10-04 20:39:16 +00:00
Matt Arsenault	2c180f66a9	Don't use runtime bounds check between address spaces. Don't vectorize with a runtime check if it requires a comparison between pointers with different address spaces. The values can't be assumed to be directly comparable. Previously it would create an illegal bitcast. llvm-svn: 191862	2013-10-02 22:38:17 +00:00
Yi Jiang	09195de8f3	Apply slp vectorization on fully-vectorizable tree of height 2 llvm-svn: 191852	2013-10-02 20:20:39 +00:00
Matt Arsenault	15633246b6	Fix debug printing spacing. Fix missing newlines, missing and extra spaces in printed messages. llvm-svn: 191851	2013-10-02 20:04:29 +00:00
Matt Arsenault	26cc78f548	Fix comment grammar and capitalization. llvm-svn: 191850	2013-10-02 20:04:26 +00:00
Benjamin Kramer	4458ae839d	SLPVectorizer: Make store chain finding more aggressive with GetUnderlyingObject. This recursively strips all GEPs like the existing code. It also handles bitcasts and other operations that do not change the pointer value. llvm-svn: 191847	2013-10-02 19:06:06 +00:00
Rafael Espindola	a279462828	Remove several unused variables. Patch by Alp Toker. llvm-svn: 191757	2013-10-01 13:32:03 +00:00
Matt Arsenault	a3e171a6c8	Fix code duplication llvm-svn: 191716	2013-10-01 00:01:14 +00:00
Benjamin Kramer	7b5eaaacfd	Convert manual insert point restores to the new RAII object. llvm-svn: 191675	2013-09-30 15:40:17 +00:00
Benjamin Kramer	33abdcddb3	IRBuilder: Add RAII objects to reset insertion points or fast math flags. Inspired by the object from the SLPVectorizer. This found a minor bug in the debug loc restoration in the vectorizer where the location of a following instruction was attached instead of the location from the original instruction. llvm-svn: 191673	2013-09-30 15:39:48 +00:00
Robert Wilhelm	198f21deb3	Even more spelling fixes for "instruction". llvm-svn: 191611	2013-09-28 13:42:22 +00:00
Robert Wilhelm	6b36431ffa	Fix spelling intruction -> instruction. llvm-svn: 191610	2013-09-28 11:46:15 +00:00
Matt Arsenault	0dc0668061	Fix SLPVectorizer using wrong address space for load/store llvm-svn: 191564	2013-09-27 21:24:57 +00:00
Justin Bogner	db7f32982e	Transforms: Use getFirstNonPHI to set the insertion point for PHIs We were previously using getFirstInsertionPt to insert PHI instructions when vectorizing, but getFirstInsertionPt also skips past landingpads, causing this to generate invalid IR. We can avoid this issue by using getFirstNonPHI instead. llvm-svn: 191526	2013-09-27 15:30:25 +00:00
Arnold Schwaighofer	9830391a8f	SLPVectorize: Put horizontal reductions feeding a store under separate flag Put them under a separate flag for experimentation. They are more likely to interfere with loop vectorization which happens later in the pass pipeline. llvm-svn: 191371	2013-09-25 14:02:32 +00:00
Yi Jiang	6ba7a7b02c	set the cost of tiny trees to INT_MAX in SLP vectorizer to disable vectorization on them llvm-svn: 191314	2013-09-24 17:26:43 +00:00
Arnold Schwaighofer	b1cea2cfcc	Revert "LoopVectorizer: Only allow vectorization of intrinsics." Revert 191122 - with extra checks we are allowed to vectorize math library function calls. Standard library indentifiers are reserved names so functions with external linkage must not overrided them. However, functions with internal linkage can. Therefore, we can vectorize calls to math library functions with a check for external linkage and matching signature. This matches what we do during SelectionDAG building. llvm-svn: 191206	2013-09-23 14:54:39 +00:00
Arnold Schwaighofer	4f8b0cf48b	SLPVectorizer: Fix multiline comment warning llvm-svn: 191135	2013-09-21 05:37:30 +00:00
Arnold Schwaighofer	c1f8473eb6	Reapply "SLPVectorizer: Handle more horizontal reductions (disabled)"" Reapply r191108 with a fix for a memory corruption error I introduced. Of course, we can't reference the scalars that we replace by vectorizing and then call their eraseFromParent method. I only 'needed' the scalars to get the DebugLoc. Just store the DebugLoc before actually vectorizing instead. As a nice side effect, this also simplifies the interface between BoUpSLP and the HorizontalReduction class to returning a value pointer (the vectorized tree root). radar://14607682 llvm-svn: 191123	2013-09-21 01:06:00 +00:00
Nadav Rotem	a10aa3ffa1	LoopVectorizer: Only allow vectorization of intrinsics. We can't know for sure that the functions 'abs' or 'round' are the functions from libm. rdar://15012650 llvm-svn: 191122	2013-09-21 00:27:05 +00:00
Arnold Schwaighofer	c81e140238	Revert "SLPVectorizer: Handle more horizontal reductions (disabled)" This reverts commit r191108. The horizontal.ll test case fails under libgmalloc. Thanks Shuxin for pointing this out to me. llvm-svn: 191121	2013-09-21 00:06:20 +00:00
Arnold Schwaighofer	db78859615	SLPVectorizer: Handle more horizontal reductions (disabled) Match reductions starting at binary operation feeding into a phi. The code handles trees like r += v1 + v2 + v3 ... and r += v1 r += v2 ... and r *= v1 + v2 + ... We currently only handle associative operations (add, fadd fast). The code can now also handle reductions feeding into stores. a[i] = v1 + v2 + v3 + ... The code is currently disabled behind the flag "-slp-vectorize-hor". The cost model for most architectures is not there yet. I found one opportunity of a horizontal reduction feeding a phi in TSVC (LoopRerolling-flt) and there are several opportunities where reductions feed into stores. radar://14607682 llvm-svn: 191108	2013-09-20 21:18:20 +00:00
Robert Lytton	b41e4ff222	Prevent LoopVectorizer and SLPVectorizer running if the target has no vector registers. XCore target: Add XCoreTargetTransformInfo This is where getNumberOfRegisters() resides, which in turn returns the number of vector registers (=0). llvm-svn: 190936	2013-09-18 12:43:35 +00:00
Craig Topper	194d1e2a5a	Revert accidental commit I had to make to get the test case in PR17268 to still work correctly. llvm-svn: 190917	2013-09-18 04:10:17 +00:00
Craig Topper	5d022196de	Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268. llvm-svn: 190916	2013-09-18 03:55:53 +00:00
Arnold Schwaighofer	43c2040076	SLPVectorizer: Don't vectorize phi nodes that use invoke values We can't insert an insertelement after an invoke. We would have to split a critical edge. So when we see a phi node that uses an invoke we just give up. radar://14990770 llvm-svn: 190871	2013-09-17 17:03:29 +00:00
Arnold Schwaighofer	11f318e34c	Don't vectorize if there are outside loop users of the induction variable. We would have to compute the pre increment value, either by computing it on every loop iteration or by splitting the edge out of the loop and inserting a computation for it there. For now, just give up vectorizing such loops. Fixes PR17179. llvm-svn: 190790	2013-09-16 16:17:24 +00:00
Eli Friedman	e87f8feb58	Don't assert on invalid loop vectorization hint. llvm-svn: 190450	2013-09-10 23:45:25 +00:00
Benjamin Kramer	06ad7ff792	LoopVectorize: PHI nodes are always at the beginning of a block, no need to scan the whole block. llvm-svn: 190422	2013-09-10 18:46:15 +00:00
Yi Jiang	e1b34bf1fe	In this patch we are trying to do two things: 1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg. 2) We do not vectorize the width which is not power of two. The performance result shows it will help some spec benchmarks. mesa improved 6.97% and ammp improved 1.54%. llvm-svn: 189830	2013-09-03 17:26:04 +00:00
Hal Finkel	a22a21165f	Disable unrolling in the loop vectorizer when disabled in the pass manager When unrolling is disabled in the pass manager, the loop vectorizer should also not unroll loops. This will allow the -fno-unroll-loops option in Clang to behave as expected (even for vectorizable loops). The loop vectorizer's -force-vector-unroll option will (continue to) override the pass-manager setting (including -force-vector-unroll=0 to force use of the internal auto-selection logic). In order to test this, I added a flag to opt (-disable-loop-unrolling) to force disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also, this fixes a small bug in opt where the loop vectorizer was enabled only after the pass manager populated the queue of passes (the global_alias.ll test needed a slight update to the RUN line as a result of this fix). llvm-svn: 189499	2013-08-28 18:33:10 +00:00
Nadav Rotem	c8417c3f79	Refactor 'vectorizeLoop' no functionality change. This patch merges LoopVectorize of InnerLoopVectorizer and InnerLoopUnroller by adding checks for VF=1. This helps in erasing the Unroller code that is almost identical to the InnerLoopVectorizer code. llvm-svn: 189391	2013-08-27 18:52:47 +00:00
Matt Arsenault	c32a5a3d46	Fix inserting instructions before last in bundle. The builder inserts from before the insert point, not after, so this would insert before the last instruction in the bundle instead of after it. I'm not sure if this can actually be a problem with any of the current insertions. llvm-svn: 189285	2013-08-26 23:08:37 +00:00
Nadav Rotem	7d4e24e1f4	LoopVectorize: Implement partial loop unrolling when vectorization is not profitable. This patch enables unrolling of loops when vectorization is legal but not profitable. We add a new class InnerLoopUnroller, that extends InnerLoopVectorizer and replaces some of the vector-specific logic with scalars. This patch does not introduce any runtime regressions and improves the following workloads: SingleSource/Benchmarks/Shootout/matrix -22.64% SingleSource/Benchmarks/Shootout-C++/matrix -13.06% External/SPEC/CINT2006/464_h264ref/464_h264ref -3.99% SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -1.95% llvm-svn: 189281	2013-08-26 22:33:26 +00:00
Yi Jiang	b951785e03	test commit. Remove blank line llvm-svn: 189265	2013-08-26 18:57:55 +00:00
Matt Arsenault	49ebc53ae4	Fix unused variable in release build llvm-svn: 189264	2013-08-26 18:38:29 +00:00
Matt Arsenault	0d6c559675	Constify functions llvm-svn: 189234	2013-08-26 17:56:38 +00:00
Matt Arsenault	fe57252c78	Vectorize starting from insertelements building a vector llvm-svn: 189233	2013-08-26 17:56:35 +00:00
Matt Arsenault	c1b8722791	Check if in set on insertion instead of separately llvm-svn: 189179	2013-08-24 19:55:38 +00:00
Chandler Carruth	e6b6740e73	Teach the SLP vectorizer the correct way to check for consecutive access using GEPs. Previously, it used a number of different heuristics for analyzing the GEPs. Several of these were conservatively correct, but failed to fall back to SCEV even when SCEV might have given a reasonable answer. One was simply incorrect in how it was formulated. There was good code already to recursively evaluate the constant offsets in GEPs, look through pointer casts, etc. I gathered this into a form code like the SLP code can use in a previous commit, which allows all of this code to become quite simple. There is some performance (compile time) concern here at first glance as we're directly attempting to walk both pointers constant GEP chains. However, a couple of thoughts: 1) The very common cases where there is a dynamic pointer, and a second pointer at a constant offset (usually a stride) from it, this code will actually not do any unnecessary work. 2) InstCombine and other passes work very hard to collapse constant GEPs, so it will be rare that we iterate here for a long time. That said, if there remain performance problems here, there are some obvious things that can improve the situation immensely. Doing a vectorizer-pass-wide memoizer for each individual layer of pointer values, their base values, and the constant offset is likely to be able to completely remove redundant work and strictly limit the scaling of the work to scrape these GEPs. Since this optimization was not done on the prior version (which would still benefit from it), I've not done it here. But if folks have benchmarks that slow down it should be straight forward for them to add. I've added a test case, but I'm not really confident of the amount of testing done for different access patterns, strides, and pointer manipulation. llvm-svn: 189007	2013-08-22 12:45:17 +00:00
Matt Arsenault	1b410a4dec	Teach LoopVectorize about address space sizes llvm-svn: 188980	2013-08-22 02:42:55 +00:00
Matt Arsenault	3e8997425a	Use attribute helper function llvm-svn: 188916	2013-08-21 18:54:50 +00:00
Matt Arsenault	68d9c05b51	Fix typo llvm-svn: 188915	2013-08-21 18:54:47 +00:00
Arnold Schwaighofer	276cfe784a	SLPVectorizer: Fix invalid iterator errors Update iterator when the SLP vectorizer changes the instructions in the basic block by restarting the traversal of the basic block. Patch by Yi Jiang! Fixes PR 16899. llvm-svn: 188832	2013-08-20 21:21:45 +00:00
Hal Finkel	8f395a803a	Add a llvm.copysign intrinsic This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728	2013-08-19 23:35:46 +00:00
Joerg Sonnenberger	72a32889f4	PR 16899: Do not modify the basic block using the iterator, but keep the next value. This avoids crashes due to invalidation. Patch by Joey Gouly. llvm-svn: 188605	2013-08-17 11:04:47 +00:00
Matt Arsenault	66eeeddb1d	Fix spelling llvm-svn: 188506	2013-08-15 23:11:03 +00:00
Hal Finkel	fd36621506	BBVectorize: Add initial stores to the write set when tracking uses When computing the use set of a store, we need to add the store to the write set prior to iterating over later instructions. Otherwise, if there is a later aliasing load of that store, that load will not be tagged as a use, and bad things will happen. trackUsesOfI still adds later dependent stores of an instruction to that instruction's write set, but it never sees the original instruction, and so when tracking uses of a store, the store must be added to the write set by the caller. Fixes PR16834. llvm-svn: 188329	2013-08-13 23:34:32 +00:00
Nadav Rotem	bc08e7ce84	Fix PR16797 - Support PHINodes with multiple inputs from the same basic block. Do not generate new vector values for the same entries because we know that the incoming values from the same block must be identical. llvm-svn: 188185	2013-08-12 17:46:44 +00:00
Hal Finkel	bdc7aa32c1	Add ISD::FROUND for libm round() All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926	2013-08-07 22:49:12 +00:00
Arnold Schwaighofer	af6776a17b	LoopVectorize: Allow vectorization of loops with lifetime markers Patch by Marc Jessome! llvm-svn: 187825	2013-08-06 22:37:52 +00:00
Nadav Rotem	eef986f7a3	SLPVectorizer: Fix PR16777. PHInodes may use multiple extracted values that come from different blocks. Thanks Alexey Samsonov. llvm-svn: 187663	2013-08-02 18:40:24 +00:00
Nadav Rotem	403f78810d	80-col llvm-svn: 187535	2013-07-31 22:17:45 +00:00
Nadav Rotem	fc24bbce9c	SLPVectorier: update the debug location for the new instructions. llvm-svn: 187363	2013-07-29 18:18:46 +00:00
Nadav Rotem	931f83b2ef	Don't vectorize when the attribute NoImplicitFloat is used. llvm-svn: 187340	2013-07-29 05:13:00 +00:00
Nadav Rotem	b6c365bd70	Update the comment llvm-svn: 187316	2013-07-27 23:28:47 +00:00
Nadav Rotem	9bebda1517	SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. llvm-svn: 187267	2013-07-26 23:07:55 +00:00
Nadav Rotem	965db2159b	SLP Vectorizer: Disable the vectorization of non power of two chains, such as <3 x float>, because we dont have a good cost model for these types. llvm-svn: 187265	2013-07-26 22:53:11 +00:00
Nadav Rotem	383f980dca	When we vectorize across multiple basic blocks we may vectorize PHINodes that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can. llvm-svn: 186883	2013-07-22 22:18:07 +00:00
Nadav Rotem	53033f1204	Fix an obvious typo in the loop vectorizer where the cost model uses the wrong variable. The variable BlockCost is ignored. We don't have tests for the effect of if-conversion loops because it requires a big test (that includes if-converted loops) and it is difficult to find and balance a loop to do the right thing. llvm-svn: 186845	2013-07-22 17:10:48 +00:00
Nadav Rotem	c15d196f83	Delete unused helper functions. llvm-svn: 186808	2013-07-22 05:19:22 +00:00
Nadav Rotem	cb987ae696	Revert a part of r186420. Don't forbid multiple store chains that merge. llvm-svn: 186786	2013-07-21 06:12:57 +00:00
Nadav Rotem	85fcd9dde0	fix an 80-col line. llvm-svn: 186733	2013-07-19 23:14:01 +00:00
Nadav Rotem	43e3cf61bf	Use LLVMs ADTs that improve the compile time of this pass. llvm-svn: 186732	2013-07-19 23:12:19 +00:00
Nadav Rotem	aead8bb4d6	SLPVectorizer: Improve the compile time of isConsecutive by reordering the conditions that check GEPs and eliminate two of the calls to accumulateConstantOffset. llvm-svn: 186731	2013-07-19 23:11:15 +00:00
Nadav Rotem	d9b6f69b94	Handle constants without going through SCEV. llvm-svn: 186593	2013-07-18 18:34:21 +00:00
Nadav Rotem	c47d289b77	SLPVectorizer: Speedup isConsecutive by manually checking GEPs with multiple indices. This brings the compile time of the SLP-Vectorizer to about 2.5% of OPT for my testcase. llvm-svn: 186592	2013-07-18 18:20:45 +00:00
Nadav Rotem	440c849dd9	SLPVectorizer: Speedup isConsecutive (that checks if two addresses are consecutive in memory) by checking for additional patterns that don't need to go through SCEV. llvm-svn: 186563	2013-07-18 04:33:20 +00:00
Nadav Rotem	22f3c7e5fb	Fix a comment. llvm-svn: 186541	2013-07-17 22:41:16 +00:00
Nadav Rotem	990f24a7d2	Add a micro optimization to catch cases where the PtrA equals PtrB. llvm-svn: 186531	2013-07-17 19:52:25 +00:00
Nadav Rotem	ae8b6de415	SLPVectorizer: Accelerate the isConsecutive check by replacing the subtraction of the two values with a simple SCEV expression that adds the offset to one of the pointers that we compare. llvm-svn: 186479	2013-07-17 00:48:31 +00:00
Nadav Rotem	adfe58a7ad	flip the scev minus direction to simplify the code. llvm-svn: 186466	2013-07-16 22:57:06 +00:00
Nadav Rotem	633bd23118	SLPVectorizer: Improve the compile time of isConsecutive by adding a simple constant-gep check before using SCEV. This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV. llvm-svn: 186465	2013-07-16 22:51:07 +00:00
Nadav Rotem	ebe95f88ed	SLPVectorizer: Reduce the compile time of the consecutive store lookup. Process groups of stores in chunks of 16. llvm-svn: 186420	2013-07-16 15:25:17 +00:00
Nadav Rotem	f1e0aebca1	PR16628: Fix a bug in the code that merges compares. Compares return i1 but they compare different types. llvm-svn: 186359	2013-07-15 22:52:48 +00:00
Nadav Rotem	37c18f5cda	SLPVectorizer: change the order in which we search for vectorization candidates. Do stores first and PHIs second. llvm-svn: 186277	2013-07-14 06:15:46 +00:00
Craig Topper	58fa7a9b4a	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. llvm-svn: 186274	2013-07-14 04:42:23 +00:00
Arnold Schwaighofer	970f54281c	LoopVectorizer: Disallow reductions whose header phi is used outside the loop If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256	2013-07-13 19:09:29 +00:00
Andrew Trick	651c624842	LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander. In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241	2013-07-13 06:20:06 +00:00
Arnold Schwaighofer	b9c37551bc	TargetTransformInfo: address calculation parameter for gather/scather Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits. radar://14351991 llvm-svn: 186187	2013-07-12 19:16:02 +00:00
Nadav Rotem	ee62470368	SLPVectorizer: Sink and enable CSE for ExtractElements. llvm-svn: 186145	2013-07-12 06:09:24 +00:00
Nadav Rotem	1e6246b38c	SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes. Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler. llvm-svn: 186139	2013-07-12 00:04:18 +00:00
Nadav Rotem	cd9c4e430f	Remove an argument that we dont use anymore. llvm-svn: 186116	2013-07-11 20:56:13 +00:00
Arnold Schwaighofer	a8667081e1	LoopVectorize: Vectorize all accesses in address space zero with unit stride We can vectorize them because in the case where we wrap in the address space the unvectorized code would have had to access a pointer value of zero which is undefined behavior in address space zero according to the LLVM IR semantics. (Thank you Duncan, for pointing this out to me). Fixes PR16592. llvm-svn: 186088	2013-07-11 15:21:55 +00:00
Nadav Rotem	965af9cb85	Fix a warning. llvm-svn: 186064	2013-07-11 05:39:02 +00:00
Nadav Rotem	8e1d89128f	SLPVectorizer: refactor the code that places extracts. Place the code that decides where to put extracts in the build-tree phase. This allows us to take the cost of the extracts into account. llvm-svn: 186058	2013-07-11 04:54:05 +00:00
Nadav Rotem	417c1a3150	Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform. llvm-svn: 185970	2013-07-09 21:38:08 +00:00
Nadav Rotem	07b212b07b	Set the default insert point to the first instruction, and not to end() llvm-svn: 185953	2013-07-09 17:55:36 +00:00
Nadav Rotem	c43699ed15	This patch changes the saved IRBuilder insert point from BasicBlock::iterator to AssertingVH. Commit 185883 fixes a bug in the IRBuilder that should fix the ASan bot. AssertingVH can help in exposing some RAUW problems. Thanks Ben and Alexey! llvm-svn: 185886	2013-07-08 23:31:13 +00:00
Nadav Rotem	709b733114	Clear the builder insert point between tree-vectorization phases. llvm-svn: 185777	2013-07-07 14:57:18 +00:00
Nadav Rotem	883fb8ad80	SLPVectorizer: Implement DCE as part of vectorization. This is a complete re-write if the bottom-up vectorization class. Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization. There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design. In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree. llvm-svn: 185774	2013-07-07 06:57:07 +00:00
Craig Topper	783617eba7	Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. llvm-svn: 185606	2013-07-04 01:31:24 +00:00
Arnold Schwaighofer	26461b04c7	LoopVectorize: Math functions only read rounding mode Math functions are mark as readonly because they read the floating point rounding mode. Because we don't vectorize loops that would contain function calls that set the rounding mode it is safe to ignore this memory read. llvm-svn: 185299	2013-07-01 00:54:44 +00:00
Benjamin Kramer	78189f47c1	LoopVectorizer: Pack MemAccessInfo pairs. llvm-svn: 185263	2013-06-29 17:52:08 +00:00
Benjamin Kramer	55f216aff4	Move helper classes into anonymous namespaces. llvm-svn: 185262	2013-06-29 17:02:06 +00:00
Nadav Rotem	201beeb585	We preserve the CFG and some of the analysis passes. llvm-svn: 185251	2013-06-29 05:38:15 +00:00
Nadav Rotem	aed4323517	Update docs. llvm-svn: 185250	2013-06-29 05:37:19 +00:00
Nadav Rotem	12c3b510fa	SLP Vectorizer: Add support for trees with external users. To support this we have to insert 'extractelement' instructions to pick the right lane. We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated. llvm-svn: 185230	2013-06-28 22:07:09 +00:00
Nadav Rotem	803f7c3932	LoopVectorizer: Refactor the code that checks if it is safe to predicate blocks. In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks. We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault. llvm-svn: 185214	2013-06-28 20:46:27 +00:00
Arnold Schwaighofer	e6284189a7	LoopVectorize: Pull dyn_cast into setDebugLocFromInst llvm-svn: 185168	2013-06-28 17:14:48 +00:00
Arnold Schwaighofer	09766b6b9f	LoopVectorize: Use static function instead of DebugLocSetter class I used the class to safely reset the state of the builder's debug location. I think I have caught all places where we need to set the debug location to a new one. Therefore, we can replace the class by a function that just sets the debug location. llvm-svn: 185165	2013-06-28 16:26:54 +00:00
Arnold Schwaighofer	d6aee045b3	LoopVectorize: Preserve debug location info radar://14169017 llvm-svn: 185122	2013-06-28 00:38:54 +00:00
Arnold Schwaighofer	c0e3a07c99	LoopVectorize: Cache edge masks created during if-conversion Otherwise, we end up with an exponential IR blowup. Fixes PR16472. llvm-svn: 185097	2013-06-27 20:31:06 +00:00
Arnold Schwaighofer	ccd78deec7	LoopVectorize: Use vectorized loop invariant gep index anchored in loop Use vectorized instruction instead of original instruction anchored in the original loop. Fixes PR16452 and t2075.c of PR16455. llvm-svn: 185081	2013-06-27 15:11:55 +00:00
Arnold Schwaighofer	18efca433e	LoopVectorize: Don't store a reversed value in the vectorized value map When we store values for reversed induction stores we must not store the reversed value in the vectorized value map. Another instruction might use this value. This fixes 3 test cases of PR16455. llvm-svn: 185051	2013-06-27 00:45:41 +00:00
Nadav Rotem	195bbbe54b	No need to use a Set when a vector would do. llvm-svn: 185047	2013-06-27 00:14:13 +00:00
Nadav Rotem	897ca82595	SLP: When searching for vectorization opportunities scan the blocks in post-order because we grow chains upwards. llvm-svn: 185041	2013-06-26 23:44:45 +00:00
Nadav Rotem	962b32446e	SLP: Dont erase instructions during vectorization because it prevents the outerloops from iterating over the instructions. llvm-svn: 185040	2013-06-26 23:43:23 +00:00
Nadav Rotem	860aebf69a	Erase all of the instructions that we RAUWed llvm-svn: 184969	2013-06-26 17:16:09 +00:00
Nadav Rotem	e0a5a586b8	Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited. llvm-svn: 184966	2013-06-26 16:54:53 +00:00
Nadav Rotem	a8fba65221	SLPVectorizer: support slp-vectorization of PHINodes between basic blocks llvm-svn: 184888	2013-06-25 23:04:09 +00:00
Nadav Rotem	8fcb707c24	Fix a typo in the code that collected the costs recursively. llvm-svn: 184827	2013-06-25 05:30:56 +00:00
Nadav Rotem	eff545235c	Rename the variable to fix a warning. Thanks Andy Gibbs. llvm-svn: 184749	2013-06-24 15:59:47 +00:00
Arnold Schwaighofer	0a98597e80	Reapply 184685 after the SetVector iteration order fix. This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg testers. "LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598" llvm-svn: 184724	2013-06-24 12:09:15 +00:00
Arnold Schwaighofer	75b76bf92f	LoopVectorize: Use SetVector for the access set We are creating the runtime checks using this set so we need a deterministic iteration order. llvm-svn: 184723	2013-06-24 12:09:12 +00:00
Arnold Schwaighofer	f022b11b08	Revert "LoopVectorize: Use the dependence test utility class" This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac. We are seeing a stage2 and stage3 miscompare on some dragonegg bots. llvm-svn: 184690	2013-06-24 06:10:41 +00:00
Arnold Schwaighofer	c49cd1a668	LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598 llvm-svn: 184685	2013-06-24 03:55:48 +00:00
Arnold Schwaighofer	f9828b092b	LoopVectorize: Add utility class for checking dependency among accesses This class checks dependences by subtracting two Scalar Evolution access functions allowing us to catch very simple linear dependences. The checker assumes source order in determining whether vectorization is safe. We currently don't reorder accesses. Positive true dependencies need to be a multiple of VF otherwise we impede store-load forwarding. llvm-svn: 184684	2013-06-24 03:55:45 +00:00
Arnold Schwaighofer	67714fedcd	LoopVectorize: Add utility class for building sets of dependent accesses Sets of dependent accesses are built by unioning sets based on underlying objects. This class will be used by the upcoming dependence checker. llvm-svn: 184683	2013-06-24 03:55:44 +00:00
Nadav Rotem	6c2ae14dc5	SLP Vectorizer: Add support for vectorizing parts of the tree. Untill now we detected the vectorizable tree and evaluated the cost of the entire tree. With this patch we can decide to trim-out branches of the tree that are not profitable to vectorizer. Also, increase the max depth from 6 to 12. In the worse possible case where all of the code is made of diamond-shaped graph this can bring the cost to 2**10, but diamonds are not very common. llvm-svn: 184681	2013-06-24 02:52:43 +00:00
Nadav Rotem	5f8e32a66f	SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences. Make sure that we don't replace and RAUW two sequences if one does not dominate the other. llvm-svn: 184674	2013-06-23 21:57:27 +00:00
Nadav Rotem	8aa1211383	SLP Vectorizer: Erase instructions outside the vectorizeTree method. The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization. llvm-svn: 184671	2013-06-23 19:38:56 +00:00
Nadav Rotem	03f4c0b02d	SLP Vectorizer: Implement a simple CSE optimization for the gather sequences. llvm-svn: 184660	2013-06-23 06:15:46 +00:00
Nadav Rotem	3dc5b0a65a	SLP Vectorizer: Implement multi-block slp-vectorization. Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks. It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function. I removed the support for extracting values from trees. We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2). llvm-svn: 184647	2013-06-22 21:34:10 +00:00
Nadav Rotem	232096ea37	SLP Vectorizer: do not search for store-chains that are wider than the vector-register size. llvm-svn: 184527	2013-06-21 04:18:13 +00:00
Nadav Rotem	9191086b4b	Clang-format the SLP vectorizer. No functionality change. llvm-svn: 184446	2013-06-20 17:54:36 +00:00
Nadav Rotem	83cdea61cd	SLPVectorization: Add a basic support for cross-basic block slp vectorization. We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent hints for vectorization of other basic blocks. llvm-svn: 184444	2013-06-20 17:41:45 +00:00
Nadav Rotem	b87c78377e	Change the debug type to match the debug type that is used by vecutils.cpp. This change makes it easier to filter debug messages. llvm-svn: 184440	2013-06-20 16:38:05 +00:00
Nadav Rotem	a578c6d3ad	SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst). llvm-svn: 184325	2013-06-19 17:33:16 +00:00
Nadav Rotem	79be778ce2	SLPVectorizer: start constructing chains at stores that are not power of two. The type <3 x i8> is a common in graphics and we want to be able to vectorize it. This changes accelerates bullet by 12% and 471_omnetpp by 5%. llvm-svn: 184317	2013-06-19 15:57:29 +00:00
Nadav Rotem	61c4560eab	SLPVectorizer: vectorize compares and selects. llvm-svn: 184282	2013-06-19 05:49:52 +00:00
Nadav Rotem	9accad7fa1	Document the return value and fix a typo. llvm-svn: 184281	2013-06-19 05:47:33 +00:00
Nadav Rotem	6796716256	Scan the successor blocks and use the PHI nodes as a hint for possible chain roots. llvm-svn: 184201	2013-06-18 15:58:05 +00:00
Nadav Rotem	c7a7b98ec1	Add a return value to make this function more useful. llvm-svn: 184200	2013-06-18 15:57:12 +00:00
Pekka Jaaskelainen	4f4b7ec54b	Fix for a regression caused by the LoopVectorizer when vectorizing loops with memory accesses to non-zero address spaces. It simply dropped the AS info. Fixes PR16306. llvm-svn: 184103	2013-06-17 18:49:06 +00:00
Arnold Schwaighofer	1ddd15e16f	LoopVectorize: Change API call to get the backedge taken count Use ScalarEvolution's getBackedgeTakenCount API instead of getExitCount since that is really what we want to know. Using the more specific getExitCount was safe because we made sure that there is only one exiting block. No functionality change. llvm-svn: 183047	2013-05-31 21:48:56 +00:00
Arnold Schwaighofer	12f1ab46d1	LoopVectorize: PHIs with only outside users should prevent vectorization We check that instructions in the loop don't have outside users (except if they are reduction values). Unfortunately, we skipped this check for if-convertable PHIs. Fixes PR16184. llvm-svn: 183035	2013-05-31 19:53:50 +00:00
NAKAMURA Takumi	0271817249	LoopVectorize.cpp: Fix abuse of StringRef on Twine. Twine captures the pointer of StringRef. llvm-svn: 182820	2013-05-29 03:13:47 +00:00
NAKAMURA Takumi	4514caf74f	Whitespace. llvm-svn: 182819	2013-05-29 03:13:41 +00:00
Paul Redmond	0eb4837b24	Add support for llvm.vectorizer metadata - llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem llvm-svn: 182802	2013-05-28 20:00:34 +00:00

1 2 3 4 5 ...

568 Commits