llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00

Author	SHA1	Message	Date
David Blaikie	ab043ff680	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	0d99339102	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Chandler Carruth	105d2fa5e8	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
Adam Nemet	691acc83fa	[LoopAccesses] Modify test to also check symbolic strides with memchecks See the comment in the code. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229627	2015-02-18 03:43:32 +00:00
David Blaikie	d5022c864c	Reformat test case to be easier to migrate to typeless pointers. llvm-svn: 229275	2015-02-15 04:13:53 +00:00
Olivier Sallenave	cfd4c9724c	Check interleaving without relying on debug output. llvm-svn: 229027	2015-02-13 02:13:57 +00:00
NAKAMURA Takumi	b02669f0d4	llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll REQUIRES +Asserts due to -debug. llvm-svn: 228989	2015-02-13 00:21:34 +00:00
Olivier Sallenave	7043291b9c	Change max interleave factor to 12 for POWER7 and POWER8. llvm-svn: 228973	2015-02-12 22:57:58 +00:00
Benjamin Kramer	f3253754cf	Update test case. llvm-svn: 228956	2015-02-12 20:40:19 +00:00
Hao Liu	7f59f4df87	Move the target specific test case arbitrary-induction-step.ll to test/Transforms/LoopVectorize/AArch64 folder. llvm-svn: 227561	2015-01-30 07:33:31 +00:00
Hao Liu	dd2f874770	[LoopVectorize] Induction variables: support arbitrary constant step. Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support arbitrary constant steps. Initial patch by Alexey Volkov. Some bug fixes are added in the following version. Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193 llvm-svn: 227557	2015-01-30 05:02:21 +00:00
Elena Demikhovsky	837f1e39e9	Fixed a bug in masked load/store in reversed loop. Added a test. The bug was submitted to bugzilla: http://llvm.org/bugs/show_bug.cgi?id=22225 llvm-svn: 226791	2015-01-22 08:20:06 +00:00
Duncan P. N. Exon Smith	4a5feedcaa	IR: Move MDLocation into place This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048	2015-01-14 22:27:36 +00:00
Duncan P. N. Exon Smith	bc9ee9160a	IR: Add 'distinct' MDNodes to bitcode and assembly Propagate whether `MDNode`s are 'distinct' through the other types of IR (assembly and bitcode). This adds the `distinct` keyword to assembly. Currently, no one actually calls `MDNode::getDistinct()`, so these nodes only get created for: - self-references, which are never uniqued, and - nodes whose operands are replaced that hit a uniquing collision. The concept of distinct nodes is still not quite first-class, since distinct-ness doesn't yet survive across `MapMetadata()`. Part of PR22111. llvm-svn: 225474	2015-01-08 22:38:29 +00:00
Michael Kuperstein	6483dfa9b1	Fix broken test from r225159. llvm-svn: 225164	2015-01-05 12:34:01 +00:00
Jiangning Liu	3fc6a7e69d	Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159	2015-01-05 10:08:58 +00:00
Elena Demikhovsky	fe73fcc29b	Masked Load and Store Intrinsics in loop vectorizer. The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334	2014-12-16 11:50:42 +00:00
Duncan P. N. Exon Smith	9c5542c040	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Michael Zolotukhin	37717c3cf1	PR21302. Vectorize only bottom-tested loops. rdar://problem/18886083 llvm-svn: 223171	2014-12-02 22:59:06 +00:00
Michael Zolotukhin	ce3e203aab	Apply loop-rotate to several vectorizer tests. Such loops shouldn't be vectorized due to the loops form. After applying loop-rotate (+simplifycfg) the tests again start to check what they are intended to check. llvm-svn: 223170	2014-12-02 22:59:02 +00:00
Duncan P. N. Exon Smith	73ce6dbb2b	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
Matt Arsenault	454e837bd2	Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons llvm-svn: 222705	2014-11-24 23:15:18 +00:00
Elena Demikhovsky	36a2243ab7	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Suyog Sarda	e156908daa	Addition to r216371 (SLP and Loop Vectorization) and r218607 where cost model for signed division by power of 2 was improved for AArch64. The revision r218607 missed test case for Loop Vectorization. Adding it in this revision. Differential Revision: http://reviews.llvm.org/D6181 llvm-svn: 221674	2014-11-11 07:39:27 +00:00
David Majnemer	15ad95a47c	LoopVectorize: Don't assume pointees are sized A pointer's pointee might not be sized: the pointee could be a function. Report this as IK_NoInduction when calculating isInductionVariable. This fixes PR21508. llvm-svn: 221501	2014-11-07 00:31:14 +00:00
Michael Zolotukhin	2a32061dc5	Correctly update dom-tree after loop vectorizer. llvm-svn: 221009	2014-10-31 22:28:03 +00:00
Matt Arsenault	74dd906076	Add minnum / maxnum intrinsics These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341	2014-10-21 23:00:20 +00:00
Hal Finkel	3a23f40590	[LoopVectorize] Ignore @llvm.assume for cost estimates and legality A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741	2014-10-14 22:59:49 +00:00
Duncan P. N. Exon Smith	c1be4794ba	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
Duncan P. N. Exon Smith	fb6bcc4eb2	Revert "DI: Fold constant arguments into a single MDString" This reverts commit r218914 while I investigate some bots. llvm-svn: 218918	2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith	58b6077a79	DI: Fold constant arguments into a single MDString This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914	2014-10-02 21:56:57 +00:00
Adrian Prantl	2b1df58ebe	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	0959156fa3	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	229943585f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Sanjay Patel	8030ed3639	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Sanjay Patel	24d425f290	Add a convenience method to copy wrapping, exact, and fast-math flags (NFC). The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions. This patch adds a convenience method to make that operation easier because we need to do this in the loop vectorizer, SLP vectorizer, and possibly other places. Although this is a 'no functional change' patch, I've added a testcase to verify that the exact flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked in existing testcases. Differential Revision: http://reviews.llvm.org/D5138 llvm-svn: 216886	2014-09-01 18:44:57 +00:00
Renato Golin	0b73c2b605	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. Re-applying again due to MSVC 2013 being minimum requirement, and this patch having C++11 that MSVC 2012 didn't support. Fixes PR20655. llvm-svn: 216870	2014-09-01 10:00:17 +00:00
Karthik Bhat	d94045aa5a	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Renato Golin	ce769f1472	Revert "Small refactor on VectorizerHint for deduplication" This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness. llvm-svn: 215999	2014-08-19 18:08:50 +00:00
Renato Golin	d941fc5a5d	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. llvm-svn: 215994	2014-08-19 17:30:43 +00:00
James Molloy	23906392d8	[LoopVectorizer] Enable support for floating-point subtraction reductions llvm-svn: 215200	2014-08-08 12:41:08 +00:00
Tyler Nowicki	0eb1e96567	Add diagnostics to the vectorizer cost model. When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599	2014-08-02 00:14:03 +00:00
Tyler Nowicki	f5be5413e6	Improve the remark generated for -Rpass-missed. The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss. Reviewed by Arnold Schwaighofer` llvm-svn: 214445	2014-07-31 21:22:22 +00:00
Tyler Nowicki	192485e325	Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable. Reviewed by Arnold Schwaighofer llvm-svn: 214440	2014-07-31 21:02:40 +00:00
Mark Heffernan	2ae2a57274	Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave. llvm-svn: 213588	2014-07-21 23:11:03 +00:00
Hal Finkel	c0bfaefffa	[LoopVectorize] Use AA to partition potential dependency checks Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int a, float b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486	2014-07-20 23:07:52 +00:00
Hal Finkel	eb099b2a30	[LoopVectorize] Propagate known metadata to vectorized instructions There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452	2014-07-19 13:33:16 +00:00
Tyler Nowicki	16db81fdfa	Emit warnings if vectorization is forced and fails. This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110	2014-07-16 00:36:00 +00:00
Aditya Nandakumar	2b32e5e74c	When we sink an instruction, this can open up opportunity for the operands to be sunk - add them to the worklist llvm-svn: 212847	2014-07-11 21:49:39 +00:00
Adam Nemet	46d0e99fe3	[X86] AVX512: Enable it in the Loop Vectorizer This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634	2014-07-09 18:22:33 +00:00

1 2 3 4 5 ...

290 Commits