llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 19:52:54 +01:00

Author	SHA1	Message	Date
Nadav Rotem	6b56385c1a	Vectorizer: optimize the generation of selects. If the condition is uniform, generate a scalar-cond select (i1 as selector). llvm-svn: 166409	2012-10-22 04:38:00 +00:00
Nick Lewycky	44e5136371	Reapply r166405, teaching tailcallelim to be smarter about nocapture, with a very small but very important bugfix: bool shouldExplore(Use U) { Value V = U->get(); if (isa<CallInst>(V) \|\| isa<InvokeInst>(V)) [...] should have read: bool shouldExplore(Use U) { Value V = U->getUser(); if (isa<CallInst>(V) \|\| isa<InvokeInst>(V)) Fixes PR14143! llvm-svn: 166407	2012-10-22 03:03:52 +00:00
NAKAMURA Takumi	32507dd070	Revert r166405, "Teach TailRecursionElimination to consider 'nocapture' when deciding whether" It broke selfhosting stage2 in several builders. llvm-svn: 166406	2012-10-22 00:48:51 +00:00
Nick Lewycky	1a50a0e414	Teach TailRecursionElimination to consider 'nocapture' when deciding whether calls can be marked tail. llvm-svn: 166405	2012-10-21 23:51:22 +00:00
Hal Finkel	502fe3cc4a	DataLayout should use itself when calculating the size of a vector. This is important for vectors of pointers because only DataLayout, not the underlying vector type, knows how to calculate the size of the pointers in the vector. Fixes PR14138. llvm-svn: 166401	2012-10-21 20:38:03 +00:00
Benjamin Kramer	d97f445bdf	Revert r166390 "LoopIdiom: Replace custom dependence analysis with LoopDependenceAnalysis." It passes all tests, produces better results than the old code but uses the wrong pass, LoopDependenceAnalysis, which is old and unmaintained. "Why is it still in tree?", you might ask. The answer is obviously: "To confuse developers." Just swapping in the new dependency pass sends the pass manager into an infinte loop, I'll try to figure out why tomorrow. llvm-svn: 166399	2012-10-21 19:31:16 +00:00
Benjamin Kramer	7d87bba7c4	LoopIdiom: Replace custom dependence analysis with LoopDependenceAnalysis. Requires a lot less code and complexity on loop-idiom's side and the more precise analysis can catch more cases, like the one I included as a test case. This also fixes the edge-case miscompilation from PR9481. I'm not entirely sure that all cases are handled that the old checks handled but LDA will certainly become smarter in the future. llvm-svn: 166390	2012-10-21 15:03:07 +00:00
Nadav Rotem	380fe201de	Fix a bug in the vectorization of wide load/store operations. We used a SCEV to detect that A[X] is consecutive. We assumed that X was the induction variable. But X can be any expression that uses the induction for example: X = i + 2; llvm-svn: 166388	2012-10-21 06:49:10 +00:00
Nadav Rotem	825cda19d5	Add support for reduction variables that do not start at zero. This is important for nested-loop reductions such as : In the innermost loop, the induction variable does not start with zero: for (i = 0 .. n) for (j = 0 .. m) sum += ... llvm-svn: 166387	2012-10-21 05:52:51 +00:00
Nadav Rotem	763abacb83	Vectorizer: fix a bug in the classification of induction/reduction phis. llvm-svn: 166384	2012-10-21 02:38:01 +00:00
Nadav Rotem	2ee8edf34a	Fix an infinite loop in the loop-vectorizer. PR14134. llvm-svn: 166379	2012-10-20 20:45:01 +00:00
Benjamin Kramer	f3ac3fa22b	InstCombine: Fix an edge case where constant icmps could sneak into ConstantFoldInstOperands and crash. Have to refactor the ConstantFolder interface one day to define bugs like this away. Fixes PR14131. llvm-svn: 166374	2012-10-20 08:43:52 +00:00
Nadav Rotem	cdd573e703	Vectorize: teach cavVectorizeMemory to distinguish between A[i]+=x and A[B[i]]+=x. If the pointer is consecutive then it is safe to read and write. If the pointer is non-loop-consecutive then it is unsafe to vectorize it because we may hit an ordering issue. llvm-svn: 166371	2012-10-20 08:26:33 +00:00
Nadav Rotem	8fe03aa4c1	Vectorizer: Add support for loop reductions. For example: for (i=0; i<n; i++) sum += A[i] + B[i] + i; llvm-svn: 166351	2012-10-19 23:05:40 +00:00
Benjamin Kramer	0edfdde403	SimplifyLibcalls: The return value of ffsll is always i32, even when the input is zero. Fixes PR13028. llvm-svn: 166313	2012-10-19 20:43:44 +00:00
Benjamin Kramer	49175bcdcd	Indvars: Don't recursively delete instruction during BB iteration. This can invalidate the iterators leading to use after frees and crashes. Fixes PR12536. llvm-svn: 166291	2012-10-19 17:53:54 +00:00
Benjamin Kramer	77d71ab3d7	SCEVExpander: Don't crash when trying to merge two constant phis. Just constant fold them so they can't cause any trouble. Fixes PR12627. llvm-svn: 166286	2012-10-19 16:37:30 +00:00
Nadav Rotem	451f76acc3	vectorizer: Add support for reading and writing from the same memory location. llvm-svn: 166255	2012-10-19 01:24:18 +00:00
Meador Inge	4cd6c97082	instcombine: Migrate strcpy optimizations This patch migrates the strcpy optimizations from the simplify-libcalls pass into the instcombine library call simplifier. Note also that StrCpyChkOpt has been updated with a few simplifications that were being done in the simplify-libcalls version of StrCpyOpt, but not in the migrated implementation of StrCpyOpt. There is no reason to overload StrCpyOpt with fortified and regular simplifications in the new model since there is already a dedicated simplifier for __strcpy_chk. llvm-svn: 166198	2012-10-18 18:12:40 +00:00
Nadav Rotem	fd924ec3c6	Vectorizer: Add support for loops with an unknown count. For example: for (i=0; i<n; i++){ a[i] = b[i+1] + c[i+3]; } llvm-svn: 166165	2012-10-18 05:29:12 +00:00
Nadav Rotem	8303c909c7	Add a loop vectorizer. llvm-svn: 166112	2012-10-17 18:25:06 +00:00
Chandler Carruth	7d68167eb0	This just in, it is a bad idea to use 'udiv' on an offset of a pointer. A very bad idea. Let's not do that. Fixes PR14105. Note that this wasn't that glaring of an oversight. Originally, these routines were only called on offsets within an alloca, which are intrinsically positive. But over the evolution of the pass, they ended up being called for arbitrary offsets, and things went downhill... llvm-svn: 166095	2012-10-17 09:23:48 +00:00
Michael Gottesman	b9205da3c6	[InstCombine] Teach InstCombine how to handle an obfuscated splat. An obfuscated splat is where the frontend poorly generates code for a splat using several different shuffles to create the splat, i.e., %A = load <4 x float>* %in_ptr, align 16 %B = shufflevector <4 x float> %A, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef> %C = shufflevector <4 x float> %B, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 4, i32 undef> %D = shufflevector <4 x float> %C, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 2, i32 4> llvm-svn: 166061	2012-10-16 21:29:38 +00:00
Chandler Carruth	0659e10e8a	Update the memcpy rewriting to fully support widened int rewriting. This includes extracting ints for copying elsewhere and inserting ints when copying into the alloca. This should fix the CanSROA assertion coming out of Clang's regression test suite. llvm-svn: 165931	2012-10-15 10:24:43 +00:00
Chandler Carruth	7755041393	Follow-up fix to r165928: handle memset rewriting for widened integers, and generally clean up the memset handling. It had rotted a bit as the other rewriting logic got polished more. llvm-svn: 165930	2012-10-15 10:24:40 +00:00
Chandler Carruth	65613836e9	First major step toward addressing PR14059. This teaches SROA to handle cases where we have partial integer loads and stores to an otherwise promotable alloca to widen[1] those loads and stores to cover the entire alloca and bitcast them into the appropriate type such that promotion can proceed. These partial loads and stores stem from an annoying confluence of ARM's calling convention and ABI lowering and the FCA pre-splitting which takes place in SROA. Clang lowers a { double, double } in-register function argument as a [4 x i32] function argument to ensure it is placed into integer 32-bit registers (a really unnerving implicit contract between Clang and the ARM backend I would add). This results in a FCA load of [4 x i32]* from the { double, double } alloca, and SROA decomposes this into a sequence of i32 loads and stores. Inlining proceeds, code gets folded, but at the end of the day, we still have i32 stores to the low and high halves of a double alloca. Widening these to be i64 operations, and bitcasting them to double prior to loading or storing allows promotion to proceed for these allocas. I looked quite a bit changing the IR which Clang produces for this case to be more friendly, but small changes seem unlikely to help. I think the best representation we could use currently would be to pass 4 i32 arguments thereby avoiding any FCAs, but that would still require this fix. It seems like it might eventually be nice to somehow encode the ABI register selection choices outside of the parameter type system so that the parameter can be a { double, double }, but the CC register annotations indicate that this should be passed via 4 integer registers. This patch does not address the second problem in PR14059, which is the reverse: when a struct alloca is loaded as a larger single integer. This patch also does not address some of the code quality issues with the FCA-splitting. Those don't actually impede any optimizations really, but they're on my list to clean up. [1]: Pedantic footnote: for those concerned about memory model issues here, this is safe. For the alloca to be promotable, it cannot escape or have any use of its address that could allow these loads or stores to be racing. Thus, widening is always safe. llvm-svn: 165928	2012-10-15 08:40:30 +00:00
Meador Inge	a24293f9d8	instcombine: Migrate strcmp and strncmp optimizations This patch migrates the strcmp and strncmp optimizations from the simplify-libcalls pass into the instcombine library call simplifier. llvm-svn: 165915	2012-10-15 03:47:37 +00:00
Meador Inge	7fafebf1b4	instcombine: Migrate strchr and strrchr optimizations This patch migrates the strchr and strrchr optimizations from the simplify-libcalls pass into the instcombine library call simplifier. llvm-svn: 165875	2012-10-13 16:45:37 +00:00
Meador Inge	29f1f4f7b1	instcombine: Migrate strcat and strncat optimizations This patch migrates the strcat and strncat optimizations from the simplify-libcalls pass into the instcombine library call simplifier. llvm-svn: 165874	2012-10-13 16:45:32 +00:00
Chandler Carruth	5aece20d98	Teach SROA to cope with wrapper aggregates. These show up a lot in ABI type coercion code, especially when targetting ARM. Things like [1 x i32] instead of i32 are very common there. The goal of this logic is to ensure that when we are picking an alloca type, we look through such wrapper aggregates and across any zero-length aggregate elements to find the simplest type possible to form a type partition. This logic should (generally speaking) rarely fire. It only ends up kicking in when an alloca is accessed using two different types (for instance, i32 and float), and the underlying alloca type has wrapper aggregates around it. I noticed a significant amount of this occurring looking at stepanov_abstraction generated code for arm, and suspect it happens elsewhere as well. Note that this doesn't yet address truly heinous IR productions such as PR14059 is concerning. Those result in mismatched sizes of types in addition to mismatched access and alloca types. llvm-svn: 165870	2012-10-13 10:49:33 +00:00
Nick Lewycky	211cb06fc3	Don't crash when !tbaa.struct contents is invalid. llvm-svn: 165693	2012-10-11 02:05:23 +00:00
Duncan Sands	9db0be6c19	Add the testcase from pr13254 (the old scalarreply pass handles this wrong; the new sroa pass handles it right). llvm-svn: 165644	2012-10-10 18:41:19 +00:00
Michael Ilseman	072e9fdbb9	New EarlyCSE tests for CSE-ing across commutativity. llvm-svn: 165510	2012-10-09 16:58:13 +00:00
Alexey Samsonov	561aa02d50	Fix PR14016. DeadArgumentElimination pass can replace one LLVM function with another, invalidating a pointer stored in debug info metadata entry for this function. To fix this, we collect debug info descriptors for functions before running a DeadArgumentElimination pass and "patch" pointers in metadata nodes if we replace a function. llvm-svn: 165490	2012-10-09 08:13:15 +00:00
Chandler Carruth	e55d2920b1	Fix PR14034, an infloop / heap corruption / crash bug in the new SROA. Thanks to Benjamin for the raw test case. This one took about 50 times longer to reduce than to fix. =/ llvm-svn: 165476	2012-10-09 01:58:35 +00:00
Micah Villmow	fe3338a7eb	Move TargetData to DataLayout. llvm-svn: 165403	2012-10-08 16:39:34 +00:00
Chandler Carruth	353476d536	Teach the new SROA a new trick. Now we zap any memcpy or memmoves which are in fact identity operations. We detect these and kill their partitions so that even splitting is unaffected by them. This is particularly important because Clang relies on emitting identity memcpy operations for struct copies, and these fold away to constants very often after inlining. Fixes the last big performance FIXME I have on my plate. llvm-svn: 165285	2012-10-05 01:29:09 +00:00
Benjamin Kramer	bbb006ad7d	SimplifyCFG: Enhance the "remove CFG edge that leads to null pointer dereference" optimization to also handle instructions with multiple uses. We conservatively only check the first use to avoid walking long use chains. This catches the common case of having both a load and a store to a pointer supplied by a PHI node. llvm-svn: 165232	2012-10-04 16:11:49 +00:00
Duncan Sands	0ebee9338d	In my recent change to avoid use of underaligned memory I didn't notice that cpyDest can be mutated in some cases, which would then cause a crash later if indeed the memory was underaligned. This brought down several buildbots, so I guess the underaligned case is much more common than I thought! llvm-svn: 165228	2012-10-04 13:53:21 +00:00
Duncan Sands	448245e370	The alignment of an sret parameter is known: it must be at least the alignment of the return type. Teach the optimizers this. llvm-svn: 165226	2012-10-04 13:36:31 +00:00
Chandler Carruth	9d83695de1	Fix PR13969, a mini-phase-ordering issue with the new SROA pass. Currently, we re-visit allocas when something changes about the way they might be split to allow better scalarization to take place. However, we weren't handling the case when the promotion is what would change the behavior of SROA. When an address derived from an alloca is stored into another alloca, we consider the first to have escaped. If the second is ever promoted to an SSA value, we will suddenly be able to run the SROA pass on the first alloca. This patch adds explicit support for this form if iteration. When we detect a store of a pointer derived from an alloca, we flag the underlying alloca for reprocessing after promotion. The logic works hard to only do this when there is definitely going to be promotion and it might remove impediments to the analysis of the alloca. Thanks to Nick for the great test case and Benjamin for some sanity check review. llvm-svn: 165223	2012-10-04 12:33:50 +00:00
Duncan Sands	86f8827745	The memcpy optimizer was happily doing call slot forwarding when the new memory was less aligned than the old. In the testcase this results in an overaligned memset: the memset alignment was correct for the original memory but is too much for the new memory. Fix this by either increasing the alignment of the new memory or bailing out if that isn't possible. Should fix the gcc-4.7 self-host buildbot failure. llvm-svn: 165220	2012-10-04 10:54:40 +00:00
Chandler Carruth	6e02238cee	Teach the integer-promotion rewrite strategy to be endianness aware. Sorry for this being broken so long. =/ As part of this, switch all of the existing tests to be Little Endian, which is the behavior I was asserting in them anyways! Add in a new big-endian test that checks the interesting behavior there. Another part of this is to tighten the rules abotu when we perform the full-integer promotion. This logic now rejects cases where there fully promoted integer is a non-multiple-of-8 bitwidth or cases where the loads or stores touch bits which are in the allocated space of the alloca but are not loaded or stored when accessing the integer. Sadly, these aren't really observable today as the rest of the pass will already ensure the invariants hold. However, the latter situation is likely to become a potential concern in the future. Thanks to Benjamin and Duncan for early review of this patch. I'm still looking into whether there are further endianness issues, please let me know if anyone sees BE failures persisting past this. llvm-svn: 165219	2012-10-04 10:39:28 +00:00
Jakub Staszak	73d9bdcca5	Fix PR13967. llvm-svn: 165187	2012-10-03 23:59:47 +00:00
Chandler Carruth	57e63536e6	Fix an issue where we failed to adjust the alignment constraint on a memcpy to reflect that '0' has a different meaning when applied to a load or store. Now we correctly use underaligned loads and stores for the test case added. llvm-svn: 165101	2012-10-03 08:26:28 +00:00
Chandler Carruth	c0353523f6	Try to use a better set of abstractions for computing the alignment necessary during rewriting. As part of this, fix a real think-o here where we might have left off an alignment specification when the address is in fact underaligned. I haven't come up with any way to trigger this, as there is always some other factor that reduces the alignment, but it certainly might have been an observable bug in some way I can't think of. This also slightly changes the strategy for placing explicit alignments on loads and stores to only do so when the alignment does not match that required by the ABI. This causes a few redundant alignments to go away from test cases. I've also added a couple of tests that really push on the alignment that we end up with on loads and stores. More to come here as I try to fix an underlying bug I have conjectured and produced test cases for, although it's not clear if this bug is the one currently hitting dragonegg's gcc47 bootstrap. llvm-svn: 165100	2012-10-03 08:14:02 +00:00
Chandler Carruth	72359007f5	Teach the new SROA to handle cases where an alloca that has already been scheduled for processing on the worklist eventually gets deleted while we are processing another alloca, fixing the original test case in PR13990. To facilitate this, add a remove_if helper to the SetVector abstraction. It's not easy to use the standard abstractions for this because of the specifics of SetVectors types and implementation. Finally, a nice small test case is included. Thanks to Benjamin for the fantastic reduced test case here! All I had to do was delete some empty basic blocks! llvm-svn: 165065	2012-10-02 22:46:45 +00:00
Benjamin Kramer	aa07e96212	Fix broken tests. llvm-svn: 165019	2012-10-02 15:49:34 +00:00
Chandler Carruth	6be8e1c6b5	Fix more misspellings found by Duncan during review. llvm-svn: 164940	2012-10-01 12:30:45 +00:00
Chandler Carruth	93f31cefdc	Fix several issues with alignment. We weren't always accounting for type alignment requirements of the new alloca. As one consequence which was reported as a bug by Duncan, we overaligned memcpy calls to ranges of allocas after they were rewritten to types with lower alignment requirements. Other consquences are possible, but I don't have any test cases for them. llvm-svn: 164937	2012-10-01 12:16:54 +00:00

1 2 3 4 5 ...

3159 Commits