llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Reid Kleckner	72c9e73170	GlobalOpt: Aliases don't have sections, don't copy them when replacing As defined in LangRef, aliases do not have sections. However, LLVM's GlobalAlias class inherits from GlobalValue, which means we can read and set its section. We should probably ban that as a separate change, since it doesn't make much sense for an alias to have a section that differs from its aliasee. Fixes PR18757, where the section was being lost on the global in code from Clang like: extern "C" { __attribute__((used, section("CUSTOM"))) static int in_custom_section; } Reviewers: rafael.espindola Differential Revision: http://llvm-reviews.chandlerc.com/D2758 llvm-svn: 201286	2014-02-13 02:18:36 +00:00
Owen Anderson	5dc9f991c9	Remove a very old instcombine where we would turn sequences of selects into logical operations on the i1's driving them. This is a bad idea for every target I can think of (confirmed with micro tests on all of: x86-64, ARM, AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into a general purpose register, whereas consuming it directly into a select generally allows it to exist only transiently in a predicate or flags register. Chandler ran a set of performance tests with this change, and reported no measurable change on x86-64. llvm-svn: 201275	2014-02-12 23:54:07 +00:00
Daniel Sanders	656c4d360b	Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call It introduced multiple test failures in the buildbots. llvm-svn: 201241	2014-02-12 15:39:20 +00:00
Daniel Sanders	e647d6441b	Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201237	2014-02-12 14:44:54 +00:00
Benjamin Kramer	e435e87a6a	InstCombine: Teach icmp merging about the equivalence of bit tests and UGE/ULT with a power of 2. This happens in bitfield code. While there reorganize the existing code a bit. llvm-svn: 201176	2014-02-11 21:09:03 +00:00
Chandler Carruth	aa1d9ed9b0	[LPM] Switch LICM to actively use LCSSA in addition to preserving it. Fixes PR18753 and PR18782. This is necessary for LICM to preserve LCSSA correctly and efficiently. There is still some active discussion about whether we should be using LCSSA, but we can't just immediately stop using it and we need LICM to preserve it while we are using it. We can restore the old SSAUpdater driven code if and when there is a serious effort to remove the reliance on LCSSA from all of the loop passes. However, this also serves as a great example of why LCSSA is very nice to have. This change significantly simplifies the process of sinking instructions for LICM, and makes it quite a bit less expensive. It wouldn't even be as complex as it is except that I had to start the process of removing the big recursive LCSSA formation hammer in order to switch even this much of the re-forming code to asserting that LCSSA was preserved. I'll fully remove that next just to tidy things up until the LCSSA debate settles one way or the other. llvm-svn: 201148	2014-02-11 12:52:27 +00:00
Arnold Schwaighofer	0bd2bb3092	LoopVectorizer: Keep track of conditional store basic blocks Before conditional store vectorization/unrolling we had only one vectorized/unrolled basic block. After adding support for conditional store vectorization this will not only be one block but multiple basic blocks. The last block would have the back-edge. I updated the code to use a vector of basic blocks instead of a single basic block and fixed the users to use the last entry in this vector. But, I forgot to add the basic blocks to this vector! Fixes PR18724. llvm-svn: 201028	2014-02-08 20:41:13 +00:00
Juergen Ributzka	a44e3756e3	[Constant Hoisting] Fix insertion point for constant materialization. The bitcast instruction during constant materialization was not placed correcly in the presence of phi nodes. This commit fixes the insertion point to be in the idom instead. This fixes PR18768 llvm-svn: 201009	2014-02-08 00:20:49 +00:00
Nick Lewycky	03b9ed1b7b	A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! llvm-svn: 200907	2014-02-06 06:29:19 +00:00
Manman Ren	91c0933df0	Set default of inlinecold-threshold to 225. 225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898	2014-02-06 01:59:22 +00:00
Manman Ren	b78e9a1411	Inliner uses a smaller inline threshold for callees with cold attribute. Added command line option inlinecold-threshold to set threshold for inlining functions with cold attribute. Listen to the cold attribute when it would decrease the inline threshold. llvm-svn: 200886	2014-02-05 22:53:44 +00:00
Benjamin Kramer	700474a946	SimplifyLibCalls: Push TLI through the exp2->ldexp transform. For the odd case of platforms with exp2 available but not ldexp. llvm-svn: 200795	2014-02-04 20:27:23 +00:00
Tim Northover	d6fb863f04	OS X: the correct function is __sincospif_stret, not __sincospi_stretf rdar://problem/13729466 llvm-svn: 200771	2014-02-04 16:28:20 +00:00
Kai Nacke	a3477b4ff6	Add strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls and remove the ToDo comment. Reviewer: Duncan P.N. Exan Smith llvm-svn: 200736	2014-02-04 05:55:16 +00:00
Reid Kleckner	8fe10af69d	inalloca: Don't remove dead arguments in the presence of inalloca args It disturbs the layout of the parameters in memory and registers, leading to problems in the backend. The plan for optimizing internal inalloca functions going forward is to essentially SROA the argument memory and demote any captured arguments (things that aren't trivially written by a load or store) to an indirect pointer to a static alloca. llvm-svn: 200717	2014-02-03 20:42:49 +00:00
Duncan P. N. Exon Smith	4f1b28340d	Lower llvm.expect intrinsic correctly for i1 LowerExpectIntrinsic previously only understood the idiom of an expect intrinsic followed by a comparison with zero. For llvm.expect.i1, the comparison would be stripped by the early-cse pass. Patch by Daniel Micay. llvm-svn: 200664	2014-02-02 22:43:55 +00:00
Arnold Schwaighofer	8a0e82c2bc	LoopVectorizer: Enable unrolling of conditional stores and the load/store unrolling heuristic per default Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed up some benchmarks while not causing any interesting regressions. llvm-svn: 200621	2014-02-02 03:12:34 +00:00
Arnold Schwaighofer	984f27d265	ARMTTI: We don't have 16 allocatable scalar registers This caused an regression on libquantum after enabling the new loop vectorizer unroll heuristics. llvm-svn: 200616	2014-02-01 18:00:25 +00:00
Chandler Carruth	a93c365f31	[LPM] Apply a really big hammer to fix PR18688 by recursively reforming LCSSA when we promote to SSA registers inside of LICM. Currently, this is actually necessary. The promotion logic in LICM uses SSAUpdater which doesn't understand how to place LCSSA PHI nodes. Teaching it to do so would be a very significant undertaking. It may be worthwhile and I've left a FIXME about this in the code as well as starting a thread on llvmdev to try to figure out the right long-term solution. For now, the PR needs to be fixed. Short of using the promition SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes, I don't see a cleaner or cheaper way of achieving this. Fortunately, LCSSA is relatively lazy and sparse -- it should only update instructions which need it. We can also skip the recursive variant when we don't promote to SSA values. llvm-svn: 200612	2014-02-01 13:35:14 +00:00
Chandler Carruth	e29471d99d	[inliner] Skip debug intrinsics even earlier in computing the inline cost so that they don't impact the vector bonus. Fundamentally, counting unsimplified instructions is just wrong; it will continue to introduce instability as things which do not generate code bizarrely impact inlining. For example, sufficiently nested inlined functions could turn off the vector bonus with lifetime markers just like the debug intrinsics do. =/ This is a short-term tactical fix. Long term, I think we need to remove the vector bonus entirely. That's a separate patch and discussion though. The patch to fix this provided by Dario Domizioli. I've added some comments about the planned direction and used a heavily pruned form of debug info intrinsics for the test case. While this debug info doesn't work or "do" anything useful, it lets us easily test all manner of interference easily, and I suspect this will not be the last time we want to craft a pattern where debug info interferes with the inliner in a problematic way. llvm-svn: 200609	2014-02-01 10:38:17 +00:00
Reid Kleckner	0421c6aef8	Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." This reverts commit r200576. It broke 32-bit self-host builds by vectorizing two calls to @llvm.bswap.i64, which we then fail to expand. llvm-svn: 200602	2014-02-01 01:37:30 +00:00
Chandler Carruth	74c658030d	[SLPV] Recognize vectorizable intrinsics during SLP vectorization and transform accordingly. Based on similar code from Loop vectorization. Subsequent commits will include vectorization of function calls to vector intrinsics and form function calls to vector library calls. Patch by Raul Silvera! (Much delayed due to my not running dcommit) llvm-svn: 200576	2014-01-31 21:14:40 +00:00
Chandler Carruth	fbc2b60e8a	[vectorizer] Tweak the way we do small loop runtime unrolling in the loop vectorizer to not do so when runtime pointer checks are needed and share code with the new (not yet enabled) load/store saturation runtime unrolling. Also ensure that we only consider the runtime checks when the loop hasn't already been vectorized. If it has, the runtime check cost has already been paid. I've fleshed out a test case to cover the scalar unrolling as well as the vector unrolling and comment clearly why we are or aren't following the pattern. llvm-svn: 200530	2014-01-31 10:51:08 +00:00
Matt Arsenault	5055466f83	Allow speculating llvm.sqrt, fma and fmuladd This doesn't set errno, so this should be OK. Also update the documentation to explicitly state that errno are not set. llvm-svn: 200501	2014-01-31 00:09:00 +00:00
Arnold Schwaighofer	b5ff780138	LoopVectorizer: Add a test case for unrolling of small loops that need a runtime check. llvm-svn: 200408	2014-01-29 18:55:44 +00:00
Chandler Carruth	6ba48b6c38	[LPM] Fix PR18643, another scary place where loop transforms failed to preserve loop simplify of enclosing loops. The problem here starts with LoopRotation which ends up cloning code out of the latch into the new preheader it is buidling. This can create a new edge from the preheader into the exit block of the loop which breaks LoopSimplify form. The code tries to fix this by splitting the critical edge between the latch and the exit block to get a new exit block that only the latch dominates. This sadly isn't sufficient. The exit block may be an exit block for multiple nested loops. When we clone an edge from the latch of the inner loop to the new preheader being built in the outer loop, we create an exiting edge from the outer loop to this exit block. Despite breaking the LoopSimplify form for the inner loop, this is fine for the outer loop. However, when we split the edge from the inner loop to the exit block, we create a new block which is in neither the inner nor outer loop as the new exit block. This is a predecessor to the old exit block, and so the split itself takes the outer loop out of LoopSimplify form. We need to split every edge entering the exit block from inside a loop nested more deeply than the exit block in order to preserve all of the loop simplify constraints. Once we try to do that, a problem with splitting critical edges surfaces. Previously, we tried a very brute force to update LoopSimplify form by re-computing it for all exit blocks. We don't need to do this, and doing this much will sometimes but not always overlap with the LoopRotate bug fix. Instead, the code needs to specifically handle the cases which can start to violate LoopSimplify -- they aren't that common. We need to see if the destination of the split edge was a loop exit block in simplified form for the loop of the source of the edge. For this to be true, all the predecessors need to be in the exact same loop as the source of the edge being split. If the dest block was originally in this form, we have to split all of the deges back into this loop to recover it. The old mechanism of doing this was conservatively correct because at least one of the exiting blocks it rewrote was the DestBB and so the DestBB's predecessors were fixed. But this is a much more targeted way of doing it. Making it targeted is important, because ballooning the set of edges touched prevents LoopRotate from being able to split edges it needs to split to preserve loop simplify in a coherent way -- the critical edge splitting would sometimes find the other edges in need of splitting but not others. Many, many thanks for help from Nick reducing these test cases mightily. And helping lots with the analysis here as this one was quite tricky to track down. llvm-svn: 200393	2014-01-29 13:16:53 +00:00
Chandler Carruth	ed726e1be7	[LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" because of the inside-out run of LoopSimplify in the LoopPassManager and the fact that LoopSimplify couldn't be "preserved" across two independent LoopPassManagers. Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI node because it thought it was rewriting (via SCEV) the incoming value to a loop invariant value. While it may well be invariant for the current loop, it may be rewritten in terms of an enclosing loop's values. This in and of itself is fine, as the LCSSA PHI node in the enclosing loop for the inner loop value we're rewriting will have its own LCSSA PHI node if used outside of the enclosing loop. With me so far? Well, the current loop and the enclosing loop may share an exiting block and exit block, and when they do they also share LCSSA PHI nodes. In this case, its not valid to RAUW through the LCSSA PHI node. Expected crazy test included. llvm-svn: 200372	2014-01-29 04:40:19 +00:00
Rafael Espindola	e8856107f0	Fix pr14893. When simplifycfg moves an instruction, it must drop metadata it doesn't know is still valid with the preconditions changes. In particular, it must drop the range and tbaa metadata. The patch implements this with an utility function to drop all metadata not in a white list. llvm-svn: 200322	2014-01-28 16:56:46 +00:00
Chandler Carruth	6a45efab46	[vectorizer] Completely disable the block frequency guidance of the loop vectorizer, placing it behind an off-by-default flag. It turns out that block frequency isn't what we want at all, here or elsewhere. This has been I think a nagging feeling for several of us working with it, but Arnold has given some really nice simple examples where the results are so comprehensively wrong that they aren't useful. I'm planning to email the dev list with a summary of why its not really useful and a couple of ideas about how to better structure these types of heuristics. llvm-svn: 200294	2014-01-28 09:10:41 +00:00
Reid Kleckner	c9ab4a9a3b	Update optimization passes to handle inalloca arguments Summary: I searched Transforms/ and Analysis/ for 'ByVal' and updated those call sites to check for inalloca if appropriate. I added tests for any change that would allow an optimization to fire on inalloca. Reviewers: nlewycky Differential Revision: http://llvm-reviews.chandlerc.com/D2449 llvm-svn: 200281	2014-01-28 02:38:36 +00:00
Arnold Schwaighofer	8f596e2047	LoopVectorize: Support conditional stores by scalarizing The vectorizer takes a loop like this and widens all instructions except for the store. The stores are scalarized/unrolled and hidden behind an "if" block. for (i = 0; i < 128; ++i) { if (a[i] < 10) a[i] += val; } for (i = 0; i < 128; i+=2) { v = a[i:i+1]; v0 = (extract v, 0) + 10; v1 = (extract v, 1) + 10; if (v0 < 10) a[i] = v0; if (v1 < 10) a[i] = v1; } The vectorizer relies on subsequent optimizations to sink instructions into the conditional block where they are anticipated. The flag "vectorize-num-stores-pred" controls whether and how many stores to handle this way. Vectorization of conditional stores is disabled per default for now. This patch also adds a change to the heuristic when the flag "enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small loops until load/store ports are saturated. This heuristic uses TTI's getMaxUnrollFactor as a measure for load/store ports. I also added a second flag -enable-cond-stores-vec. It will enable vectorization of conditional stores. But there is no cost model for vectorization of conditional stores in place yet so this will not do good at the moment. rdar://15892953 Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll -vectorize-num-stores-pred=1 (before the BFI change): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower) Applications/siod/siod 2.18% Performance improvements: mesa -4.42% libquantum -4.15% With a patch that slightly changes the register heuristics (by subtracting the induction variable on both sides of the register pressure equation, as the induction variable is probably not really unrolled): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.73% Applications/siod/siod 1.97% Performance Improvements: libquantum -13.05% (we now also unroll quantum_toffoli) mesa -4.27% llvm-svn: 200270	2014-01-28 01:01:53 +00:00
Manman Ren	c3f51e8e54	PGO branch weight: keep halving the weights until they can fit into uint32. When folding branches to common destination, the updated branch weights can exceed uint32 by more than factor of 2. We should keep halving the weights until they can fit into uint32. llvm-svn: 200262	2014-01-27 23:39:03 +00:00
Chandler Carruth	f70ef7ae29	[vectorize] Initial version of respecting PGO in the vectorizer: treat cold loops as-if they were being optimized for size. Nothing fancy here. Simply test case included. The nice thing is that we can now incrementally build on top of this to drive other heuristics. All of the infrastructure work is done to get the profile information into this layer. The remaining work necessary to make this a fully general purpose loop unroller for very hot loops is to make it a fully general purpose loop unroller. Things I know of but am not going to have time to benchmark and fix in the immediate future: 1) Don't disable the entire pass when the target is lacking vector registers. This really doesn't make any sense any more. 2) Teach the unroller at least and the vectorizer potentially to handle non-if-converted loops. This is trivial for the unroller but hard for the vectorizer. 3) Compute the relative hotness of the loop and thread that down to the various places that make cost tradeoffs (very likely only the unroller makes sense here, and then only when dealing with loops that are small enough for unrolling to not completely blow out the LSD). I'm still dubious how useful hotness information will be. So far, my experiments show that if we can get the correct logic for determining when unrolling actually helps performance, the code size impact is completely unimportant and we can unroll in all cases. But at least we'll no longer burn code size on cold code. One somewhat unrelated idea that I've had forever but not had time to implement: mark all functions which are only reachable via the global constructors rigging in the module as optsize. This would also decrease the impact of any more aggressive heuristics here on code size. llvm-svn: 200219	2014-01-27 13:11:50 +00:00
Benjamin Kramer	65df2371a8	ConstantHoisting: We can't insert instructions directly in front of a PHI node. Insert before the terminating instruction of the dominating block instead. llvm-svn: 200218	2014-01-27 13:11:43 +00:00
Chandler Carruth	88d92716dd	[vectorizer] Add an override for the target instruction cost and use it to stabilize a test that really is trying to test generic behavior and not a specific target's behavior. llvm-svn: 200215	2014-01-27 11:41:50 +00:00
Chandler Carruth	d1ecfe35ae	[vectorizer] Teach the loop vectorizer's unroller to only unroll by powers of two. This is essentially always the correct thing given the impact on alignment, scaling factors that can be used in addressing modes, etc. Also, fix the management of the unroll vs. small loop cost to more accurately model things with this world. Enhance a test case to actually exercise more of the unroll machinery if using synthetic constants rather than a specific target model. Before this change, with the added flags this test will unroll 3 times instead of either 2 or 4 (the two sensible answers). While I don't expect this to make a huge difference, if there are lots of loops sitting right on the edge of hitting the 'small unroll' factor, they might change behavior. However, I've benchmarked moving the small loop cost up and down in many various ways and by a huge factor (2x) without seeing more than 0.2% code size growth. Small adjustments such as the series that led up here have led to about 1% improvement on some benchmarks, but it is very close to the noise floor so I mostly checked that nothing regressed. Let me know if you see bad behavior on other targets but I don't expect this to be a sufficiently dramatic change to trigger anything. llvm-svn: 200213	2014-01-27 11:12:24 +00:00
Chandler Carruth	3998de34a0	[LPM] Make LCSSA a utility with a FunctionPass that applies it to all the loops in a function, and teach LICM to work in the presance of LCSSA. Previously, LCSSA was a loop pass. That made passes requiring it also be loop passes and unable to depend on function analysis passes easily. It also caused outer loops to have a different "canonical" form from inner loops during analysis. Instead, we go into LCSSA form and preserve it through the loop pass manager run. Note that this has the same problem as LoopSimplify that prevents enabling its verification -- loop passes which run at the end of the loop pass manager and don't preserve these are valid, but the subsequent loop pass runs of outer loops that do preserve this pass trigger too much verification and fail because the inner loop no longer verifies. The other problem this exposed is that LICM was completely unable to handle LCSSA form. It didn't preserve it and it actually would give up on moving instructions in many cases when they were used by an LCSSA phi node. I've taught LICM to support detecting LCSSA-form PHI nodes and to hoist and sink around them. This may actually let LICM fire significantly more because we put everything into LCSSA form to rotate the loop before running LICM. =/ Now LICM should handle that fine and preserve it correctly. The down side is that LICM has to require LCSSA in order to preserve it. This is just a fact of life for LCSSA. It's entirely possible we should completely remove LCSSA from the optimizer. The test updates are essentially accomodating LCSSA phi nodes in the output of LICM, and the fact that we now completely sink every instruction in ashr-crash below the loop bodies prior to unrolling. With this change, LCSSA is computed only three times in the pass pipeline. One of them could be removed (and potentially a SCEV run and a separate LoopPassManager entirely!) if we had a LoopPass variant of InstCombine that ran InstCombine on the loop body but refused to combine away LCSSA PHI nodes. Currently, this also prevents loop unrolling from being in the same loop pass manager is rotate, LICM, and unswitch. There is one thing that I really don't like -- preserving LCSSA in LICM is quite expensive. We end up having to re-run LCSSA twice for some loops after LICM runs because LICM can undo LCSSA both in the current loop and the parent loop. I don't really see good solutions to this other than to completely move away from LCSSA and using tools like SSAUpdater instead. llvm-svn: 200067	2014-01-25 04:07:24 +00:00
Benjamin Kramer	78991033ac	InstCombine: Don't try to use aggregate elements of ConstantExprs. PR18600. llvm-svn: 200028	2014-01-24 19:02:37 +00:00
Alp Toker	1c4b33e8e5	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Benjamin Kramer	0e56ec16c4	InstSimplify: Make shift, select and GEP simplifications vector-aware. llvm-svn: 200016	2014-01-24 17:09:53 +00:00
Rafael Espindola	e9ef8c2f1a	Note the PR number. llvm-svn: 199932	2014-01-23 20:17:12 +00:00
Rafael Espindola	adb277286a	Remove tail marker when changing an argument to an alloca. Argument promotion can replace an argument of a call with an alloca. This requires clearing the tail marker as it is very likely that the callee is now using an alloca in the caller. This fixes pr14710. llvm-svn: 199909	2014-01-23 17:19:42 +00:00
Chandler Carruth	46bbc995de	[LPM] Make LoopSimplify no longer a LoopPass and instead both a utility function and a FunctionPass. This has many benefits. The motivating use case was to be able to compute function analysis passes after running LoopSimplify (to avoid invalidating them) and then to run other passes which require LoopSimplify. Specifically passes like unrolling and vectorization are critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so that they can be profile aware. For the LoopVectorize pass the only things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify and LCSSA is next on my list. There are also a bunch of other benefits of doing this: - It is now very feasible to make more passes preserve LoopSimplify because they can simply run it after changing a loop. Because subsequence passes can assume LoopSimplify is preserved we can reduce the runs of this pass to the times when we actually mutate a loop structure. - The new pass manager should be able to more easily support loop passes factored in this way. - We can at long, long last observe that LoopSimplify is preserved across SCEV. This halves the number of times we run LoopSimplify!!! Now, getting here wasn't trivial. First off, the interfaces used by LoopSimplify are all over the map regarding how analysis are updated. We end up with weird "pass" parameters as a consequence. I'll try to clean at least some of this up later -- I'll have to have it all clean for the new pass manager. Next up I discovered a really frustrating bug. LoopUnroll claims to preserve LoopSimplify. That's actually a lie. But the way the LoopPassManager ends up running the passes, it always ran LoopSimplify on the unrolled-into loop, rectifying this oversight before any verification could kick in and point out that in fact nothing was preserved. So I've added code to the unroller to actually simplify the surrounding loop when it succeeds at unrolling. The only functional change in the test suite is that we now catch a case that was previously missed because SCEV and other loop transforms see their containing loops as simplified and thus don't miss some opportunities. One test case has been converted to check that we catch this case rather than checking that we miss it but at least don't get the wrong answer. Note that I have #if-ed out all of the verification logic in LoopSimplify! This is a temporary workaround while extracting these bits from the LoopPassManager. Currently, there is no way to have a pass in the LoopPassManager which preserves LoopSimplify along with one which does not. The LPM will try to verify on each loop in the nest that LoopSimplify holds but the now-Function-pass cannot distinguish what loop is being verified and so must try to verify all of them. The inner most loop is clearly no longer simplified as there is a pass which didn't even attempt to preserve it. =/ Once I get LCSSA out (and maybe LoopVectorize and some other fixes) I'll be able to re-enable this check and catch any places where we are still failing to preserve LoopSimplify. If this causes problems I can back this out and try to commit all of this at once, but so far this seems to work and allow much more incremental progress. llvm-svn: 199884	2014-01-23 11:23:19 +00:00
Matt Arsenault	5eede68ba6	Add CHECK-LABELs llvm-svn: 199846	2014-01-22 22:32:58 +00:00
Matt Arsenault	52e557deb2	Handle an addrspacecast case in memcpyopt llvm-svn: 199836	2014-01-22 21:53:19 +00:00
Owen Anderson	e0205fdcd8	Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators. This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd. llvm-svn: 199629	2014-01-20 07:44:53 +00:00
Benjamin Kramer	813eb189fa	InstCombine: Modernize a bunch of cast combines. Also make them vector-aware. llvm-svn: 199608	2014-01-19 20:05:13 +00:00
Benjamin Kramer	47d4c4c113	InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing. llvm-svn: 199604	2014-01-19 16:48:41 +00:00
Benjamin Kramer	0de38fdc6a	InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors. llvm-svn: 199602	2014-01-19 15:24:22 +00:00
Benjamin Kramer	b864b5d907	InstCombine: Refactor fmul/fdiv combines to handle vectors. llvm-svn: 199598	2014-01-19 13:36:27 +00:00

1 2 3 4 5 ...

4188 Commits