llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-30 07:22:55 +01:00

Author	SHA1	Message	Date
Chris Lattner	66f10c8957	teach the cloner to handle inline asms llvm-svn: 25633	2006-01-26 01:55:22 +00:00
Chris Lattner	84f2acfaa0	Fix Regression/Transforms/ScalarRepl/2006-01-24-IllegalUnionPromoteCrash.ll llvm-svn: 25587	2006-01-24 19:36:27 +00:00
Chris Lattner	d09c95f83c	rename method llvm-svn: 25572	2006-01-24 04:16:34 +00:00
Chris Lattner	c06c8d8c06	When cloning a module, clone the inline asm. llvm-svn: 25559	2006-01-23 23:06:28 +00:00
Chris Lattner	6e1f262158	add a bunch more optimizations for unary double math functions llvm-svn: 25530	2006-01-23 06:24:46 +00:00
Chris Lattner	22fa9eeac1	Refactor/genericize this, no functionality change llvm-svn: 25525	2006-01-23 05:57:36 +00:00
Chris Lattner	2588f0eb8f	Make iostream #inclusion explicit llvm-svn: 25514	2006-01-22 23:32:06 +00:00
Chris Lattner	0dbef6ec71	Make this more efficient in the following ways: 1. Do not statically construct a map when the program starts up, this is expensive and cannot be optimized. Instead, create a list. 2. Do not insert entries for all function in the module into a hashmap that lives the full life of the compiler. llvm-svn: 25512	2006-01-22 23:10:26 +00:00
Chris Lattner	adff158fbd	Add explicit #includes of <iostream> llvm-svn: 25509	2006-01-22 22:53:01 +00:00
Chris Lattner	305fee5118	Several non-functionality changing changes: 1. Use the varargs version of getOrInsertFunction to simplify code. 2. remove #include 3. Reduce the number of #ifdef's. 4. remove extraneous vertical whitespace. llvm-svn: 25508	2006-01-22 22:35:08 +00:00
Robert Bocchino	40c6c91f56	ConstantFoldLoadThroughGEPConstantExpr wasn't handling pointers to packed types correctly. llvm-svn: 25470	2006-01-19 23:53:23 +00:00
Reid Spencer	585019f629	For PR696: Don't do floor->floorf conversion if floorf is not available. This checks the compiler's host, not its target, which is incorrect for cross-compilers Not sure that's important as we don't build many cross-compilers. llvm-svn: 25456	2006-01-19 08:36:56 +00:00
Chris Lattner	6e4d8741d5	Implement casts.ll:test26: a cast from float -> double -> integer, doesn't need the float->double part. llvm-svn: 25452	2006-01-19 07:40:22 +00:00
Chris Lattner	d5a7ceda96	If not internalizing, don't mark llvm.global[cd]tors const, as a fix for a hypothetical future boog. llvm-svn: 25430	2006-01-19 00:46:54 +00:00
Chris Lattner	197d33ce21	Don't internalize llvm.global[cd]tor unless there are uses of it. This unbreaks front-ends that don't use __main (like the new CFE). llvm-svn: 25429	2006-01-19 00:40:39 +00:00
Chris Lattner	f7623e2065	Make sure that cloning a module clones its target triple and dependent library list as well. This should help bugpoint. llvm-svn: 25424	2006-01-18 21:32:45 +00:00
Robert Bocchino	443661ec6b	Constant folding support for the insertelement operation. llvm-svn: 25407	2006-01-17 20:07:07 +00:00
Robert Bocchino	d9fa267a49	Lowerpacked and SCCP support for the insertelement operation. llvm-svn: 25406	2006-01-17 20:06:55 +00:00
Chris Lattner	ddbf4fba37	Clean up the FFS optimization code, and make it correctly create the appropriate unsigned llvm.cttz.* intrinsic, fixing the 2005-05-11-Popcount-ffs-fls regression last night. llvm-svn: 25398	2006-01-17 18:27:17 +00:00
Reid Spencer	3cecd3c4cf	For PR411: This patch is an incremental step towards supporting a flat symbol table. It de-overloads the intrinsic functions by providing type-specific intrinsics and arranging for automatically upgrading from the old overloaded name to the new non-overloaded name. Specifically: llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64 llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64 llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64 llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64 llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64 New code should not use the overloaded intrinsic names. Warnings will be emitted if they are used. llvm-svn: 25366	2006-01-16 21:12:35 +00:00
Chris Lattner	12d9016774	fix a crash due to missing parens llvm-svn: 25363	2006-01-16 19:47:21 +00:00
Chris Lattner	00c02ed5d6	This pass has never worked correctly. Remove. llvm-svn: 25349	2006-01-16 01:06:00 +00:00
Chris Lattner	f718ef50ad	Let the inliner update the callgraph to reflect the changes it makes, instead of doing it ourselves. This fixes Transforms/Inline/2006-01-14-CallGraphUpdate.ll llvm-svn: 25321	2006-01-14 20:09:18 +00:00
Chris Lattner	fea434f37e	Teach the inliner to update the CallGraph itself, and have it add edges to llvm.stacksave/restore when it inserts calls to them. llvm-svn: 25320	2006-01-14 20:07:50 +00:00
Chris Lattner	7cbd5dc1f0	FunctionPass's cannot do IPO things. llvm-svn: 25315	2006-01-14 19:30:35 +00:00
Nate Begeman	4750001146	Add bswap intrinsics as documented in the Language Reference llvm-svn: 25309	2006-01-14 01:25:24 +00:00
Robert Bocchino	4617a805da	Added instcombine support for extractelement. llvm-svn: 25299	2006-01-13 22:48:06 +00:00
Chris Lattner	818f217fad	it is ok to dce stacksave. llvm-svn: 25295	2006-01-13 21:31:54 +00:00
Chris Lattner	61a2fca725	Do a simple instcombine xforms to delete llvm.stackrestore cases. llvm-svn: 25294	2006-01-13 21:28:09 +00:00
Chris Lattner	e79c8847f0	Simplify this a tiny bit by using the new IntrinsicInst functionality. llvm-svn: 25292	2006-01-13 20:11:04 +00:00
Chris Lattner	67a0c03bb4	Permit inlining functions that contain dynamic allocations now that InlineFunction handles this case safely. This implements Transforms/Inline/dynamic_alloca_test.ll. llvm-svn: 25288	2006-01-13 19:35:43 +00:00
Chris Lattner	423aeb28d5	If inlining a call to a function that contains dynamic allocas, wrap the resultant code with llvm.stacksave/llvm.stackrestore intrinsics. llvm-svn: 25286	2006-01-13 19:34:14 +00:00
Chris Lattner	672f2df6d0	Use ClonedCodeInfo to avoid another walk over the inlined code, this this time in common C cases. llvm-svn: 25285	2006-01-13 19:18:11 +00:00
Chris Lattner	6f96396bae	Use the ClonedCodeInfo object to avoid scans of the inlined code when it doesn't contain any calls. This is a fairly common case for C++ code, so it will probably speed up the inliner marginally in these cases. llvm-svn: 25284	2006-01-13 19:15:15 +00:00
Chris Lattner	ec00fcaba6	Refactor a bunch of invoke handling stuff out into a new function "HandleInlinedInvoke". No functionality change. llvm-svn: 25283	2006-01-13 19:05:59 +00:00
Chris Lattner	67fb415248	Allow the code cloning interfaces to capture some important info about the code being cloned if the client wants. llvm-svn: 25281	2006-01-13 18:39:17 +00:00
Chris Lattner	32ff638ae5	Fix a bug I noticed by inspection: if the first instruction in the inlined function was not an alloca, we wouldn't check the entry block for any allocas, leading to increased stack space in some cases. In practice, allocas are almost always at the top of the block, so this was never noticed. llvm-svn: 25280	2006-01-13 18:16:48 +00:00
Chris Lattner	32f54b291d	Fix 80 column violations llvm-svn: 25279	2006-01-13 18:06:56 +00:00
Chris Lattner	d726328ee9	Preserve and update ETForest. Patch by Daniel Berlin llvm-svn: 25203	2006-01-11 05:11:13 +00:00
Chris Lattner	7b7fee2d92	Switch these to using ETForest instead of DominatorSet to compute itself. Patch written by Daniel Berlin! llvm-svn: 25202	2006-01-11 05:10:20 +00:00
Chris Lattner	229a42a573	Switch this to using ETForest instead of DominatorSet to compute itself. Patch written by Daniel Berlin! llvm-svn: 25201	2006-01-11 05:09:40 +00:00
Robert Bocchino	9a57550e4e	Added support for the extractelement operation. llvm-svn: 25181	2006-01-10 19:05:34 +00:00
Robert Bocchino	e00d93bc83	Added lower packed support for the extractelement operation. llvm-svn: 25180	2006-01-10 19:05:05 +00:00
Chris Lattner	05816ecb7f	Teach loopsimplify to update et-forest. Patch contributed by Daniel Berlin! llvm-svn: 25153	2006-01-09 08:03:08 +00:00
Chris Lattner	9453593f40	fix some 176.gcc miscompilation from my previous patch. llvm-svn: 25137	2006-01-07 01:32:28 +00:00
Chris Lattner	6c99d09404	silence some bogus gcc warnings on fenris llvm-svn: 25130	2006-01-06 17:59:59 +00:00
Chris Lattner	6c01df15ac	Enhance the shift-shift folding code to allow a no-op cast to occur in between the shifts. This allows us to fold this (which is the 'integer add a constant' sequence from cozmic's scheme compmiler): int %x(uint %anf-temporary776) { %anf-temporary777 = shr uint %anf-temporary776, ubyte 1 %anf-temporary800 = cast uint %anf-temporary777 to int %anf-temporary804 = shl int %anf-temporary800, ubyte 1 %anf-temporary805 = add int %anf-temporary804, -2 %anf-temporary806 = or int %anf-temporary805, 1 ret int %anf-temporary806 } into this: int %x(uint %anf-temporary776) { %anf-temporary776 = cast uint %anf-temporary776 to int %anf-temporary776.mask1 = add int %anf-temporary776, -2 %anf-temporary805 = or int %anf-temporary776.mask1, 1 ret int %anf-temporary805 } note that instcombine already knew how to eliminate the AND that the two shifts fold into. This is tested by InstCombine/shift.ll:test26 -Chris llvm-svn: 25128	2006-01-06 07:52:12 +00:00
Chris Lattner	410db54bbf	Simplify the code a bit more llvm-svn: 25126	2006-01-06 07:22:22 +00:00
Chris Lattner	83fc19a4a9	Extract a bunch of code out of visitShiftInst into FoldShiftByConstant. No functionality changes. llvm-svn: 25125	2006-01-06 07:12:35 +00:00
Chris Lattner	6a56973377	Pull inline methods out of the pass class definition to make it easier to read the code. Do not internalize debugger anchors. llvm-svn: 25067	2006-01-03 19:13:17 +00:00
Duraid Madina	7cb522e3e8	getting there... llvm-svn: 25021	2005-12-26 13:48:44 +00:00
Chris Lattner	76b2303521	Fix Transforms/ScalarRepl/2005-12-14-UnionPromoteCrash.ll, a crash on undefined behavior in 126.gcc on big-endian systems. llvm-svn: 24708	2005-12-14 17:23:59 +00:00
Reid Spencer	519dc24073	Improve ResolveFunctions to: a) use better local variable names (OldMT -> OldFT) where "M" is used to mean "Function" (perhaps it was previously "Method"?) b) print out the module identifier in a warning message so that it is possible to track down in which module the error occurred. llvm-svn: 24698	2005-12-13 19:56:51 +00:00
Chris Lattner	d61c654e67	Implement a little hack for parity with GCC on crafty. This speeds up 186.crafty by about 16% (from 15.109s to 13.045s) on my system. This turns allocas with unions/casts into scalars. For example crafty has something like this: union doub { unsigned short i[4]; long long d; }; int f(long long a) { return ((union doub){.d=a}).i[1]; } Instead of generating loads and stores to an alloca, we now promote the whole thing to a scalar long value. This implements: Transforms/ScalarRepl/AggregatePromote.ll llvm-svn: 24667	2005-12-12 07:19:13 +00:00
Chris Lattner	3d993f7e4d	getRawValue zero extens for unsigned values, use getsextvalue so that we know that small negative values fit into the immediate field of addressing modes. llvm-svn: 24608	2005-12-05 18:23:57 +00:00
Chris Lattner	a4fe0bd75f	Wrap a long line, never internalize llvm.used. llvm-svn: 24602	2005-12-05 05:07:38 +00:00
Chris Lattner	f9a1c37c84	Fix SimplifyCFG/2005-12-03-IncorrectPHIFold.ll llvm-svn: 24581	2005-12-03 18:25:58 +00:00
Chris Lattner	99e894a36f	Fix a bug where we didn't realize that vaarg reads memory. This fixes Transforms/DeadStoreElimination/2005-11-30-vaarg.ll llvm-svn: 24545	2005-11-30 19:38:22 +00:00
Andrew Lenharth	1ffbe58972	a few more comments on the interfaces and functions llvm-svn: 24500	2005-11-28 18:10:59 +00:00
Andrew Lenharth	cad1d52b64	Added documented rsprofiler interface. Also remove new profiler passes, the old ones have been updated to implement the interface. llvm-svn: 24499	2005-11-28 18:00:38 +00:00
Jeff Cohen	b171dee053	Fix VC++ warning. llvm-svn: 24496	2005-11-28 06:45:57 +00:00
Andrew Lenharth	311ec68cf4	Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling). The code is organized into 3 parts (2 passes) 1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction. 2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it). The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet). Some things are a bit ugly still, but that should be fixed up soon enough. Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable. llvm-svn: 24493	2005-11-28 00:58:09 +00:00
Andrew Lenharth	2700c92469	since reg2mem requires it, might as well mention that it preserves it llvm-svn: 24491	2005-11-25 16:04:54 +00:00
Andrew Lenharth	79ee761b69	Reg2Mem is something a pass may depend on, so allow that llvm-svn: 24488	2005-11-22 22:14:23 +00:00
Andrew Lenharth	939cd99914	turns out, demotion and invokes and critical edges don't mix llvm-svn: 24487	2005-11-22 21:45:19 +00:00
Chris Lattner	e67f211b68	Fix a crash building 176.gcc due to my recent patch, which only fixed half the problem. llvm-svn: 24414	2005-11-18 18:30:47 +00:00
Chris Lattner	fc4928a31a	Implement a refinement to the mem2reg algorithm for cases where an alloca has a single def. In this case, look for uses that are dominated by the def and attempt to rewrite them to directly use the stored value. This speeds up mem2reg on these values and reduces the number of phi nodes inserted. This should address PR665. llvm-svn: 24411	2005-11-18 07:31:42 +00:00
Chris Lattner	86e6fa1ee7	This needs proper dominance llvm-svn: 24410	2005-11-18 07:29:44 +00:00
Chris Lattner	bf3324e75d	This was checking the wrong GEP expression. Fixing this fixes a gccas crash compiling mysql reported by Ted Kremenek. llvm-svn: 24402	2005-11-17 19:35:42 +00:00
Andrew Lenharth	0b424575e0	the pain isn't gone unless the phinodes are spilled too llvm-svn: 24288	2005-11-10 19:39:09 +00:00
Andrew Lenharth	b4169fe539	this works with backedges to the existing entry block alot better llvm-svn: 24270	2005-11-10 17:35:34 +00:00
Andrew Lenharth	03d60c3d09	The pass everyone has been waiting for! Reg2Mem for fun you can opt -reg2mem -mem2reg llvm-svn: 24267	2005-11-10 01:58:38 +00:00
Nate Begeman	f299b9fb03	Add support alignment of allocation instructions. Add support for specifying alignment and size of setjmp jmpbufs. No targets currently do anything with this information, nor is it presrved in the bytecode representation. That's coming up next. llvm-svn: 24196	2005-11-05 09:21:28 +00:00
Chris Lattner	0bd3e3230c	Implement Transforms/TailCallElim/return-undef.ll, a trivial case that has been sitting in my inbox since May 18. :) llvm-svn: 24194	2005-11-05 08:21:11 +00:00
Chris Lattner	4352c050d1	Turn sdiv into udiv if both operands have a clear sign bit. This occurs a few times in crafty: OLD: %tmp.36 = div int %tmp.35, 8 ; <int> [#uses=1] NEW: %tmp.36 = div uint %tmp.35, 8 ; <uint> [#uses=0] OLD: %tmp.19 = div int %tmp.18, 8 ; <int> [#uses=1] NEW: %tmp.19 = div uint %tmp.18, 8 ; <uint> [#uses=0] OLD: %tmp.117 = div int %tmp.116, 8 ; <int> [#uses=1] NEW: %tmp.117 = div uint %tmp.116, 8 ; <uint> [#uses=0] OLD: %tmp.92 = div int %tmp.91, 8 ; <int> [#uses=1] NEW: %tmp.92 = div uint %tmp.91, 8 ; <uint> [#uses=0] Which all turn into shrs. llvm-svn: 24190	2005-11-05 07:40:31 +00:00
Chris Lattner	297e545d4b	Turn srem -> urem when neither input has their sign bit set. This triggers 8 times in vortex, allowing the srems to be turned into shrs: OLD: %tmp.104 = rem int %tmp.5.i37, 16 ; <int> [#uses=1] NEW: %tmp.104 = rem uint %tmp.5.i37, 16 ; <uint> [#uses=0] OLD: %tmp.98 = rem int %tmp.5.i24, 16 ; <int> [#uses=1] NEW: %tmp.98 = rem uint %tmp.5.i24, 16 ; <uint> [#uses=0] OLD: %tmp.91 = rem int %tmp.5.i19, 8 ; <int> [#uses=1] NEW: %tmp.91 = rem uint %tmp.5.i19, 8 ; <uint> [#uses=0] OLD: %tmp.88 = rem int %tmp.5.i14, 8 ; <int> [#uses=1] NEW: %tmp.88 = rem uint %tmp.5.i14, 8 ; <uint> [#uses=0] OLD: %tmp.85 = rem int %tmp.5.i9, 1024 ; <int> [#uses=2] NEW: %tmp.85 = rem uint %tmp.5.i9, 1024 ; <uint> [#uses=0] OLD: %tmp.82 = rem int %tmp.5.i, 512 ; <int> [#uses=2] NEW: %tmp.82 = rem uint %tmp.5.i1, 512 ; <uint> [#uses=0] OLD: %tmp.48.i = rem int %tmp.5.i.i161, 4 ; <int> [#uses=1] NEW: %tmp.48.i = rem uint %tmp.5.i.i161, 4 ; <uint> [#uses=0] OLD: %tmp.20.i2 = rem int %tmp.5.i.i, 4 ; <int> [#uses=1] NEW: %tmp.20.i2 = rem uint %tmp.5.i.i, 4 ; <uint> [#uses=0] it also occurs 9 times in gcc, but with odd constant divisors (1009 and 61) so the payoff isn't as great. llvm-svn: 24189	2005-11-05 07:28:37 +00:00
Andrew Lenharth	9a32a77e33	make this 64 bit clean, fixed test30 of /Regression/Transforms/InstCombine/add.ll llvm-svn: 24158	2005-11-02 18:35:40 +00:00
Chris Lattner	f8e14244c3	Limit the search depth of MaskedValueIsZero to 6 instructions, to avoid bad cases. This fixes Markus's second testcase in PR639, and should seal it for good. llvm-svn: 24123	2005-10-31 18:35:52 +00:00
Chris Lattner	bafe11a821	This pass is now obsolete since all targets have moved to the SelectionDAG infrastructure and the simple isels have been removed. llvm-svn: 24090	2005-10-29 05:33:46 +00:00
Chris Lattner	4772d742a0	Remove dead #include llvm-svn: 24083	2005-10-29 04:41:30 +00:00
Chris Lattner	20e1564bf0	Now that instcombine does this xform, remove it from the -raise pass llvm-svn: 24082	2005-10-29 04:40:23 +00:00
Chris Lattner	2c6430d1a7	Pull some code out into a function, give it the ability to see through +. This allows us to turn code like malloc(4*x+4) -> malloc int, (x+1) llvm-svn: 24081	2005-10-29 04:36:15 +00:00
Chris Lattner	b313b6d67f	Remove a special case, allowing the general case to handle it. No functionality change. llvm-svn: 24076	2005-10-29 03:19:53 +00:00
Chris Lattner	d0b0dd1d62	Fix a bit of backwards logic that broke exptree and smg2000 llvm-svn: 24056	2005-10-28 16:27:35 +00:00
Chris Lattner	e545fcc680	Do not sink any instruction with side effects, including vaarg. This fixes PR640 llvm-svn: 24046	2005-10-27 17:13:11 +00:00
Chris Lattner	64f57e9d14	Fix #include order llvm-svn: 24044	2005-10-27 16:34:00 +00:00
John Criswell	d6538108e8	Move some constant folding code shared by Analysis and Transform passes into the LLVMAnalysis library. This allows LLVMTranform and LLVMTransformUtils to be archives and linked with LLVMAnalysis.a, which provides any missing definitions. llvm-svn: 24036	2005-10-27 15:54:34 +00:00
Chris Lattner	74ac5d0cc4	Fix typo llvm-svn: 24033	2005-10-27 06:26:26 +00:00
Chris Lattner	adbc250213	Teach instcombine to promote stuff like (cast (malloc sbyte, 8X) to int) into: malloc int, (2*X) llvm-svn: 24032	2005-10-27 06:24:46 +00:00
Chris Lattner	4c9dae5fdb	Promote cases like cast (malloc sbyte, 100) to int* into (malloc [25 x int]) directly without having to convert to (malloc [100 x sbyte]) first. llvm-svn: 24031	2005-10-27 06:12:00 +00:00
Chris Lattner	2b0006cd60	Minor change to this file to support obscure cases with constant array amounts llvm-svn: 24030	2005-10-27 05:53:56 +00:00
John Criswell	b0f5adf975	1. Remove libraries no longer created from the list of libraries linked into the SparcV9 JIT. 2. Make LLVMTransformUtils a relinked object file and always link it before LLVMAnalysis.a. These two libraries have circular dependencies on each other which creates problem when building the SparcV9 JIT. This change fixes the dependency on all platforms problems with a minimum of fuss. llvm-svn: 24023	2005-10-26 20:35:13 +00:00
Chris Lattner	e1fda00ea5	fold nested and's early to avoid inefficiencies in MaskedValueIsZero. This fixes a very slow compile in PR639. llvm-svn: 24011	2005-10-26 17:18:16 +00:00
Jeff Cohen	e561bf727e	Update Visual Studio projects to reflect moved file. llvm-svn: 23998	2005-10-26 05:36:51 +00:00
Alkis Evlogimenos	7fe091048c	Stop using deprecated types llvm-svn: 23973	2005-10-25 11:18:06 +00:00
Chris Lattner	77f228f586	Handle allocations that, even after removing dead uses, still have more than one use (but one is a cast). This handles the very common case of: X = alloc [n x byte] Y = cast X to somethingbetter seteq X, null In order to avoid infinite looping when there are multiple casts, we only allow this if the xform is strictly increasing the alignment of the allocation. llvm-svn: 23961	2005-10-24 06:35:18 +00:00
Chris Lattner	a295128ce5	Fix a bug where we would 'promote' an allocation from one type to another where the second has less alignment required. If we had explicit alignment support in the IR, we could handle this case, but we can't until we do. llvm-svn: 23960	2005-10-24 06:26:18 +00:00
Chris Lattner	3806b84861	Before promoting a malloc type, remove dead uses. This makes instcombine more effective at promoting these allocations, catching them earlier in the compile process. llvm-svn: 23959	2005-10-24 06:22:12 +00:00
Chris Lattner	5ad6f085ab	Pull some code out into a function, no functionality change llvm-svn: 23958	2005-10-24 06:03:58 +00:00
Chris Lattner	949f205c4c	Remove some beta code that no longer has an owner. llvm-svn: 23944	2005-10-24 02:32:41 +00:00
Chris Lattner	79a1a2af13	Do not build the ProfilePaths directory anymore llvm-svn: 23943	2005-10-24 02:31:49 +00:00
Chris Lattner	e6f7a38925	DONT_BUILD_RELINKED is gone and implied by BUILD_ARCHIVE now llvm-svn: 23940	2005-10-24 02:26:13 +00:00
Chris Lattner	a4b13acd52	Only build .a file versions of these libraries, instead of .a and .o versions. This should speed up build times. llvm-svn: 23933	2005-10-24 01:59:48 +00:00
Chris Lattner	46cc930a67	Make sure that anything using the ADCE pass pulls in the UnifyFunctionExitNodes code llvm-svn: 23931	2005-10-24 01:40:23 +00:00
Jeff Cohen	a38c737e85	When a function takes a variable number of pointer arguments, with a zero pointer marking the end of the list, the zero must be cast to the pointer type. An un-cast zero is a 32-bit int, and at least on x86_64, gcc will not extend the zero to 64 bits, thus allowing the upper 32 bits to be random junk. The new END_WITH_NULL macro may be used to annotate a such a function so that GCC (version 4 or newer) will detect the use of un-casted zero at compile time. llvm-svn: 23888	2005-10-23 04:37:20 +00:00
Chris Lattner	63aa74e804	My previous patch was too conservative. Reject FP and void types, but do allow pointer types. llvm-svn: 23859	2005-10-21 05:45:41 +00:00
Chris Lattner	3451ac7691	Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an inner loop like this: LBB_RateConvertMono8AltiVec_2: ; no_exit lis r2, ha16(.CPI_RateConvertMono8AltiVec_0) lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2) fmr f3, f3 fadd f0, f2, f0 fadd f3, f0, f3 fcmpu cr0, f3, f1 bge cr0, LBB_RateConvertMono8AltiVec_2 ; no_exit to an inner loop like this: LBB_RateConvertMono8AltiVec_1: ; no_exit fsub f2, f2, f1 fcmpu cr0, f2, f1 fmr f0, f2 bge cr0, LBB_RateConvertMono8AltiVec_1 ; no_exit Doh! good catch! llvm-svn: 23838	2005-10-20 04:47:10 +00:00
Chris Lattner	d21885e6c9	Add an option to this pass. If it is set, we are allowed to internalize all but main. If it's not set, we can still internalize, but only if an explicit symbol list is provided. llvm-svn: 23783	2005-10-18 06:29:22 +00:00
Chris Lattner	515ce6687d	Make this work for FP constantexprs llvm-svn: 23773	2005-10-17 20:18:38 +00:00
Chris Lattner	09b3b3d658	Oops, X+0.0 isn't foldable, but X+-0.0 is. llvm-svn: 23772	2005-10-17 17:56:38 +00:00
Chris Lattner	00f8306943	relax this a bit, as we only support the default rounding mode llvm-svn: 23771	2005-10-17 17:49:32 +00:00
Chris Lattner	23ecce5c82	Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling out CSE's of base expressions it could build a result whose order was nondet. llvm-svn: 23698	2005-10-11 18:41:04 +00:00
Chris Lattner	7effc54496	Fix another problem where LSR was being nondeterminstic. Also remove elements from the end of a vector instead of the beginning llvm-svn: 23697	2005-10-11 18:30:57 +00:00
Chris Lattner	1ec477da8f	Fix another lsr-is-nondeterministic case llvm-svn: 23695	2005-10-11 18:17:57 +00:00
Chris Lattner	1809933a31	Make MaskedValueIsZero a bit more aggressive llvm-svn: 23677	2005-10-09 22:08:50 +00:00
Chris Lattner	d4ff47daf6	Fix funky xcode indentation llvm-svn: 23674	2005-10-09 06:36:35 +00:00
Chris Lattner	f6c2411c40	Hrm, you didn't see this. llvm-svn: 23673	2005-10-09 06:24:02 +00:00
Chris Lattner	6fa98d5979	Fix a source of non-determinism in the backend: the order of processing IV strides dependend on the pointer order of the strides in memory. Non-determinism is bad. llvm-svn: 23672	2005-10-09 06:20:55 +00:00
Jeff Cohen	3c6db93125	Remove useless variable. llvm-svn: 23656	2005-10-07 05:28:29 +00:00
Chris Lattner	27464c8061	Fix DemoteRegToStack on an invoke. This fixes PR634. llvm-svn: 23618	2005-10-04 00:44:01 +00:00
Chris Lattner	cd43592209	Clean up the code a bit. Use isInstructionTriviallyDead to be more aggressive and more correct than use_empty(). This fixes PR635 and SimplifyCFG/2005-10-02-InvokeSimplify.ll llvm-svn: 23616	2005-10-03 23:43:43 +00:00
Chris Lattner	be6f88a2e8	Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In particular, it should realize that phi's use their values in the pred block not the phi block itself. This change turns our em3d loop from this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_6 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; endif.loopexit.loopexit_crit_edge addi r3, r2, 1 blr LBB_test_6: ; loopexit or r3, r2, r2 blr into: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r6, r6 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 or r2, r6, r6 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r2, r2 blr Unfortunately, this is actually worse code, because the register coallescer is getting confused somehow. If it were doing its job right, it could turn the code into this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r6, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r6, r6 blr ... which I'll work on next. :) llvm-svn: 23604	2005-10-03 02:50:05 +00:00
Chris Lattner	6a5ace34da	Refactor some code into a function llvm-svn: 23603	2005-10-03 01:04:44 +00:00
Chris Lattner	5a865c8598	This break is bogus and I have no idea why it was there. Basically it prevents memoizing code when IV's are used by phinodes outside of loops. In a simple example, we were getting this code before (note that r6 and r7 are isomorphic IV's): li r6, 0 or r7, r6, r6 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r7, r7 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r2, r7, 1 addi r7, r7, 1 addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit Now we get: li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit this was noticed in em3d. llvm-svn: 23602	2005-10-03 00:37:33 +00:00
Chris Lattner	af938ddbe5	when checking if we should move a split edge block outside of a loop, check the presplit pred, not the post-split pred. This was causing us to make the wrong decision in some cases, leaving the critical edge block in the loop. llvm-svn: 23601	2005-10-03 00:31:52 +00:00
Jeff Cohen	412582bcec	Fix VC++ warnings. llvm-svn: 23579	2005-10-01 03:57:14 +00:00
Chris Lattner	633db4c298	Insert stores after phi nodes in the normal dest. This fixes LowerInvoke/2005-08-03-InvokeWithPHI.ll llvm-svn: 23525	2005-09-29 17:44:20 +00:00
Chris Lattner	c800214144	Fold isascii into a simple comparison. This speeds up 197.parser by 7.4%, bringing the LLC time down to the CBE time. llvm-svn: 23521	2005-09-29 06:17:27 +00:00
Chris Lattner	3aa9edd482	remove a bunch of unneeded stuff, or self evident comments llvm-svn: 23519	2005-09-29 06:16:11 +00:00
Chris Lattner	0cf56701ef	Implement a couple of memcmp folds from the todo list llvm-svn: 23517	2005-09-29 04:54:20 +00:00
Chris Lattner	6ce1d7dabb	Constant fold llvm.sqrt llvm-svn: 23487	2005-09-28 01:34:32 +00:00
Chris Lattner	cd79ed877a	add a note about a way to improve this code further, that I won't be getting to right now. llvm-svn: 23485	2005-09-27 22:44:59 +00:00
Chris Lattner	80e4480cfa	Fix a regression in my previous patch, fixing GlobalOpt/2005-09-27-Crash.ll and PR632. llvm-svn: 23484	2005-09-27 22:28:11 +00:00
Chris Lattner	8ff6df40ba	Avoid spilling stack slots... to stack slots. llvm-svn: 23478	2005-09-27 21:33:12 +00:00
Chris Lattner	b2fa69f4b2	Completely rewrite 'correct' eh support. This changes how setjmp insertion is performed so it is only at most once per function that contains an invoke instead of once per invoke in the function. This patch has the following perks: 1. It fixes PR631, which complains about slowness. 2. If fixes PR240, which complains about non-volatile vars being live across setjmp/longjmps. 3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not forcing the jmpbufs to always be 8-bytes off the alignment of the structure. 4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us now about 4% faster than GCC. Further improvements are also possible. llvm-svn: 23477	2005-09-27 21:18:17 +00:00
Chris Lattner	29f2ba1384	Make the pass name simpler llvm-svn: 23476	2005-09-27 21:10:32 +00:00
Chris Lattner	92b0563eb3	allow demotion to volatile values, add support for invoke llvm-svn: 23473	2005-09-27 19:39:00 +00:00
Chris Lattner	e242673da3	Add support for external calls that we know how to constant fold. This implements ctor-list-opt.ll:CTOR8 llvm-svn: 23465	2005-09-27 05:02:43 +00:00
Chris Lattner	b3b15ed0a3	Fix a bug where we would evaluate stores into linkonce objects which could be potentially replaced at link-time. llvm-svn: 23463	2005-09-27 04:50:03 +00:00
Chris Lattner	dc94923d86	Implement support for static constructors with calls in them. This is useful because gccas runs globalopt before inlining. This implements ctor-list-opt.ll:CTOR7 llvm-svn: 23462	2005-09-27 04:45:34 +00:00
Chris Lattner	5bc02b52dd	Refactor this code a bit, no functionality changes. llvm-svn: 23460	2005-09-27 04:27:01 +00:00
Chris Lattner	fc4e38bf0a	Remove some dead code. ctor evaluation subsumes empty ctor elim llvm-svn: 23453	2005-09-26 20:38:20 +00:00
Chris Lattner	ade39df257	Add support for alloca, implementing ctor-list-opt.ll:CTOR6 llvm-svn: 23452	2005-09-26 17:07:09 +00:00
Chris Lattner	c1fe36502e	Add a debug printout, fix a crash on kc++ llvm-svn: 23450	2005-09-26 07:34:35 +00:00
Chris Lattner	7283d69b9f	Implement loads/stores through GEP's of globals. This implements ctor-list-opt.ll:CTOR5. llvm-svn: 23449	2005-09-26 06:52:44 +00:00
Chris Lattner	6488c9bbc1	Replace TraverseGEPInitializer with ConstantFoldLoadThroughGEPConstantExpr llvm-svn: 23447	2005-09-26 05:34:07 +00:00
Chris Lattner	7c43a40871	Eliminate GetGEPGlobalInitializer in favor of the more powerful ConstantFoldLoadThroughGEPConstantExpr function in the utils lib. llvm-svn: 23446	2005-09-26 05:28:52 +00:00
Chris Lattner	6df0a740e8	Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils as ConstantFoldLoadThroughGEPConstantExpr. llvm-svn: 23445	2005-09-26 05:28:06 +00:00
Chris Lattner	d8b3c644e1	Move the ConstantFoldLoadThroughGEPConstantExpr function out of the InstCombine pass. llvm-svn: 23444	2005-09-26 05:27:10 +00:00
Chris Lattner	e149840f04	add a comment llvm-svn: 23442	2005-09-26 05:16:34 +00:00
Chris Lattner	2addd0dcab	Add support for getelementptr, load, and correctly reject volatile stores. llvm-svn: 23441	2005-09-26 05:15:37 +00:00
Chris Lattner	8364e6b0db	Add support for br/brcond/switch and phi llvm-svn: 23439	2005-09-26 04:57:38 +00:00
Chris Lattner	1d43adff6d	Add a simple interpreter to this code, allowing us to statically evaluate global ctors that are simple enough. This implements ctor-list-opt.ll:CTOR2. llvm-svn: 23437	2005-09-26 04:44:35 +00:00
Chris Lattner	9b71d2a0f6	factor some code into a InstallGlobalCtors method, add comments. No functionality change. llvm-svn: 23435	2005-09-26 02:31:18 +00:00
Chris Lattner	fb737b72b1	Make the global opt optimizer work on modules with a null terminator, by accepting the null even with a non-65535 init prio llvm-svn: 23434	2005-09-26 02:19:27 +00:00
Chris Lattner	ca854922fb	Factor this code out into a few methods. Implement the start of global ctor optimization. It is currently smart enough to remove the global ctor for cases like this: struct foo { foo() {} } x; ... saving a bit of startup time for the program. llvm-svn: 23433	2005-09-26 01:43:45 +00:00
Chris Lattner	d8febfc4aa	Fix some logic I broke that caused a regression on SimplifyLibCalls/2005-05-20-sprintf-crash.ll llvm-svn: 23430	2005-09-25 07:06:48 +00:00
Chris Lattner	e6bfd80169	Move MaskedValueIsZero up. Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll llvm-svn: 23428	2005-09-24 23:43:33 +00:00
Chris Lattner	7af8b0777a	Simplify this code a bit by relying on recursive simplification. Support sprintf("%s", P)'s that have uses. s/hasNUses(0)/use_empty()/ llvm-svn: 23425	2005-09-24 22:17:06 +00:00
Chris Lattner	5ce68bab6c	remove some debugging code llvm-svn: 23411	2005-09-23 18:49:09 +00:00
Chris Lattner	88c234b5e4	Fold two consequtive branches that share a common destination between them. This implements SimplifyCFG/branch-fold.ll, and is useful on ?:/min/max heavy code llvm-svn: 23410	2005-09-23 18:47:20 +00:00
Chris Lattner	67437dce2c	simplify some logic further llvm-svn: 23408	2005-09-23 07:23:18 +00:00
Chris Lattner	48df043325	pull a bunch of logic out of SimplifyCFG into a helper fn llvm-svn: 23407	2005-09-23 06:39:30 +00:00
Chris Lattner	07cf4f249d	Start threading across blocks with code in them, so long as the code does not define a value that is used outside of it's block. This catches many more simplifications, e.g. 854 in 176.gcc, 137 in vpr, etc. This implements branch-phi-thread.ll:test3.ll llvm-svn: 23397	2005-09-20 01:48:40 +00:00
Chris Lattner	44a9487815	Implement merging of blocks with the same condition if the block has multiple predecessors. This implements branch-phi-thread.ll::test1 llvm-svn: 23395	2005-09-20 00:43:16 +00:00
Chris Lattner	4013184d6b	Reject a case we don't handle yet llvm-svn: 23393	2005-09-19 23:57:04 +00:00
Chris Lattner	642f0d6aea	remove debugging code :-/ llvm-svn: 23392	2005-09-19 23:50:15 +00:00
Chris Lattner	34785e2d43	Implement SimplifyCFG/branch-phi-thread.ll, the most trivial case of threading control across branches with determined outcomes. More generality to follow. This triggers a couple thousand times in specint. llvm-svn: 23391	2005-09-19 23:49:37 +00:00
Chris Lattner	e40e2d4ec3	Refactor this code a bit and make it more general. This now compiles: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus2 (unsigned int x) { b.j += x; } To: _plus2: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) slwi r3, r3, 6 add r3, r4, r3 rlwimi r3, r4, 0, 26, 14 stw r3, 0(r2) blr instead of: _plus2: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) rlwinm r5, r4, 26, 21, 31 add r3, r5, r3 rlwimi r4, r3, 6, 15, 25 stw r4, 0(r2) blr by eliminating an 'and'. I'm pretty sure this is as small as we can go :) llvm-svn: 23386	2005-09-18 07:22:02 +00:00
Chris Lattner	024a1c1a46	Compile struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus2 (unsigned int x) { b.j += x; } to: plus2: mov %EAX, DWORD PTR [b] mov %ECX, %EAX and %ECX, 131008 mov %EDX, DWORD PTR [%ESP + 4] shl %EDX, 6 add %EDX, %ECX and %EDX, 131008 and %EAX, -131009 or %EDX, %EAX mov DWORD PTR [b], %EDX ret instead of: plus2: mov %EAX, DWORD PTR [b] mov %ECX, %EAX shr %ECX, 6 and %ECX, 2047 add %ECX, DWORD PTR [%ESP + 4] shl %ECX, 6 and %ECX, 131008 and %EAX, -131009 or %ECX, %EAX mov DWORD PTR [b], %ECX ret llvm-svn: 23385	2005-09-18 06:30:59 +00:00
Chris Lattner	3cd5e466ee	Generalize this transform, using MaskedValueIsZero, allowing us to compile: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus3 (unsigned int x) { b.k += x; } To: plus3: mov %EAX, DWORD PTR [%ESP + 4] shl %EAX, 17 add DWORD PTR [b], %EAX ret instead of: plus3: mov %EAX, DWORD PTR [%ESP + 4] shl %EAX, 17 mov %ECX, DWORD PTR [b] add %EAX, %ECX and %EAX, -131072 and %ECX, 131071 or %ECX, %EAX mov DWORD PTR [b], %ECX ret llvm-svn: 23384	2005-09-18 06:02:59 +00:00
Chris Lattner	b70b011734	fix typeo llvm-svn: 23383	2005-09-18 05:25:20 +00:00
Chris Lattner	7fda14c978	Remove unintentionally committed code llvm-svn: 23382	2005-09-18 05:12:51 +00:00
Chris Lattner	ae35713f00	implement shift.ll:test25. This compiles: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus3 (unsigned int x) { b.k += x; } to: _plus3: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r3, 0(r2) rlwinm r4, r3, 0, 0, 14 add r4, r4, r3 rlwimi r4, r3, 0, 15, 31 stw r4, 0(r2) blr instead of: _plus3: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) srwi r5, r4, 17 add r3, r5, r3 slwi r3, r3, 17 rlwimi r3, r4, 0, 15, 31 stw r3, 0(r2) blr llvm-svn: 23381	2005-09-18 05:12:10 +00:00
Chris Lattner	fa22870351	Implement add.ll:test29. Codegening: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus1 (unsigned int x) { b.i += x; } as: _plus1: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) add r3, r4, r3 rlwimi r3, r4, 0, 0, 25 stw r3, 0(r2) blr instead of: _plus1: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) rlwinm r5, r4, 0, 26, 31 add r3, r5, r3 rlwimi r3, r4, 0, 0, 25 stw r3, 0(r2) blr llvm-svn: 23379	2005-09-18 04:24:45 +00:00
Chris Lattner	c96a222019	remove debug output llvm-svn: 23377	2005-09-18 03:50:25 +00:00
Chris Lattner	1466ade38b	Implement or.ll:test21. This teaches instcombine to be able to turn this: struct { unsigned int bit0:1; unsigned int ubyte:31; } sdata; void foo() { sdata.ubyte++; } into this: foo: add DWORD PTR [sdata], 2 ret instead of this: foo: mov %EAX, DWORD PTR [sdata] mov %ECX, %EAX add %ECX, 2 and %ECX, -2 and %EAX, 1 or %EAX, %ECX mov DWORD PTR [sdata], %EAX ret llvm-svn: 23376	2005-09-18 03:42:07 +00:00
Chris Lattner	c7fe78d9a1	Fix the regression last night compiling povray llvm-svn: 23348	2005-09-14 17:32:56 +00:00
Chris Lattner	f020513d57	Add a simple xform to simplify array accesses with casts in the way. This is useful for 178.galgel where resolution of dope vectors (by the optimizer) causes the scales to become apparent. llvm-svn: 23328	2005-09-13 18:36:04 +00:00
Chris Lattner	26aef8992f	Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI. This fixes up a dot-product loop in galgel, speeding it up from 18.47s to 16.13s. llvm-svn: 23327	2005-09-13 02:09:55 +00:00
Chris Lattner	4284014415	Add a helper function, allowing us to simplify some code a bit, changing indentation, no functionality change llvm-svn: 23325	2005-09-13 00:40:14 +00:00
Chris Lattner	d35d5d2492	Implement a simple xform to turn code like this: if () { store A -> P; } else { store B -> P; } into a PHI node with one store, in the most trival case. This implements load.ll:test10. llvm-svn: 23324	2005-09-12 23:23:25 +00:00
Chris Lattner	6eee5b16b1	Another load-peephole optimization: do gcse when two loads are next to each other. This implements InstCombine/load.ll:test9 llvm-svn: 23322	2005-09-12 22:21:03 +00:00
Chris Lattner	2657841260	Implement a trivial form of store->load forwarding where the store and the load are exactly consequtive. This is picked up by other passes, but this triggers thousands of times in fortran programs that use static locals (and is thus a compile-time speedup). llvm-svn: 23320	2005-09-12 22:00:15 +00:00
Chris Lattner	a949a2836d	Fix a regression from last night, which caused this pass to create invalid code for IV uses outside of loops that are not dominated by the latch block. We should only convert these uses to use the post-inc value if they ARE dominated by the latch block. Also use a new LoopInfo method to simplify some code. This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll llvm-svn: 23318	2005-09-12 17:11:27 +00:00
Chris Lattner	780ffd9c1f	_test: li r2, 0 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r2, 1 stw r2, 0(r4) blr [zion ~/llvm]$ cat > ~/xx Uses of IV's outside of the loop should use hte post-incremented version of the IV, not the preincremented version. This helps many loops (e.g. in sixtrack) which used to generate code like this (this is the code from the dont-hoist-simple-loop-constants.ll testcase): _test: li r2, 0 ** IV starts at 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 Copy for loop exit li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 IV+2 cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 IV+2 stw r2, 0(r4) blr And now generated code like this: _test: li r2, 1 * IV starts at 1 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 * IV.postinc + 0 blt cr0, LBB_test_1 LBB_test_2: ; loopexit.2.loopexit stw r2, 0(r4) * IV.postinc + 0 blr llvm-svn: 23313	2005-09-12 06:04:47 +00:00
Chris Lattner	ddec75fdf6	implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll. We used to emit this code for it: _test: li r2, 1 ;; Value tying up a register for the whole loop li r5, 0 LBB_test_1: ; no_exit.2 or r6, r5, r5 li r5, 0 stw r5, 0(r3) addi r5, r6, 1 addi r3, r3, 4 add r7, r2, r5 ;; should be addi r7, r5, 1 cmpwi cr0, r7, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r6, 2 stw r2, 0(r4) blr now we emit this: _test: li r2, 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 ;; whoa, fold those adds! cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 stw r2, 0(r4) blr more improvement coming. llvm-svn: 23306	2005-09-10 01:18:45 +00:00
Chris Lattner	371f542759	Fix a problem that Dan Berlin noticed, where reassociation would not succeed in building maximal expressions before simplifying them. In particular, i cases like this: X-(A+B+X) the code would consider A+B+X to be a maximal expression (not understanding that the single use '-' would be turned into a + later), simplify it (a noop) then later get simplified again. Each of these simplify steps is where the cost of reassociation comes from, so this patch should speed up the already fast pass a bit. Thanks to Dan for noticing this! llvm-svn: 23214	2005-09-02 07:07:58 +00:00
Chris Lattner	4cade5915d	Avoid creating garbage instructions, just move the old add instruction to where we need it when converting -(A+B+C) -> -A + -B + -C. llvm-svn: 23213	2005-09-02 06:38:04 +00:00
Chris Lattner	dfba6f5029	add some assertions and fix problems where reassociate could access the Ops vector out of range llvm-svn: 23211	2005-09-02 05:23:22 +00:00
Chris Lattner	efc5937add	Fix Regression/Transforms/Reassociate/2005-08-24-Crash.ll llvm-svn: 23019	2005-08-24 17:55:32 +00:00
Chris Lattner	043ab16860	Transform floor((double)FLT) -> (double)floorf(FLT), implementing Regression/Transforms/SimplifyLibCalls/floor.ll. This triggers 19 times in 177.mesa. llvm-svn: 23017	2005-08-24 17:22:17 +00:00
Chris Lattner	a852093954	Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash on 177.mesa llvm-svn: 22843	2005-08-17 21:22:41 +00:00
Chris Lattner	969232d2ec	Use a new helper to split critical edges, making the code simpler. Do not claim to not change the CFG. We do change the cfg to split critical edges. This isn't causing us a problem now, but could likely do so in the future. llvm-svn: 22824	2005-08-17 06:35:16 +00:00
Chris Lattner	bc70c99aef	Fix a bad case in gzip where we put lots of things in registers across the loop, because a IV-dependent value was used outside of the loop and didn't have immediate-folding capability llvm-svn: 22798	2005-08-16 00:38:11 +00:00
Chris Lattner	57a2a74e99	Ooops, don't forget to clear this. The real inner loop is now: .LBB_foo_3: ; no_exit.1 lfd f2, 0(r9) lfd f3, 8(r9) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r9) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfd f2, 0(r9) addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22782	2005-08-13 07:42:01 +00:00
Chris Lattner	f59e855dbc	Recursively scan scev expressions for common subexpressions. This allows us to handle nested loops much better, for example, by being able to tell that these two expressions: {( 8 + ( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp 12)}<loopentry.1> {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> Have the following common part that can be shared: {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> This allows us to codegen an important inner loop in 168.wupwise as: .LBB_foo_4: ; no_exit.1 lfd f2, 16(r9) fmul f3, f0, f2 fmul f2, f1, f2 fadd f4, f3, f2 stfd f4, 8(r9) fsub f2, f3, f2 stfd f2, 16(r9) addi r8, r8, 1 addi r9, r9, 16 cmpw cr0, r8, r4 ble .LBB_foo_4 ; no_exit.1 instead of: .LBB_foo_3: ; no_exit.1 lfdx f2, r6, r9 add r10, r6, r9 lfd f3, 8(r10) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r10) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfdx f2, r6, r9 addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22781	2005-08-13 07:27:18 +00:00
Chris Lattner	2681a83c43	Teach SplitCriticalEdge to update LoopInfo if it is alive. This fixes a problem in LoopStrengthReduction, where it would split critical edges then confused itself with outdated loop information. llvm-svn: 22776	2005-08-13 01:38:43 +00:00
Chris Lattner	87bcd2794b	remove dead code. The exit block list is computed on demand, thus does not need to be updated. This code is a relic from when it did. llvm-svn: 22775	2005-08-13 01:30:36 +00:00
Chris Lattner	811ef4cce0	When splitting critical edges, make sure not to leave the new block in the middle of the loop. This turns a critical loop in gzip into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 bne .LBB_test_8 ; loopentry.loopexit_crit_edge .LBB_test_2: ; shortcirc_next.0 add r28, r3, r27 lhz r28, 5(r28) add r26, r4, r27 lhz r26, 5(r26) cmpw cr0, r28, r26 bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge .LBB_test_3: ; shortcirc_next.1 add r28, r3, r27 lhz r28, 7(r28) add r26, r4, r27 lhz r26, 7(r26) cmpw cr0, r28, r26 bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge .LBB_test_4: ; shortcirc_next.2 add r28, r3, r27 lhz r26, 9(r28) add r28, r4, r27 lhz r25, 9(r28) addi r28, r27, 8 cmpw cr7, r26, r25 mfcr r26, 1 rlwinm r26, r26, 31, 31, 31 add r25, r8, r27 cmpw cr7, r25, r7 mfcr r25, 1 rlwinm r25, r25, 29, 31, 31 and. r26, r26, r25 bne .LBB_test_1 ; loopentry instead of this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_test_9 ; loopexit .LBB_test_3: ; shortcirc_next.0 add r28, r3, r27 lhz r28, 5(r28) add r26, r4, r27 lhz r26, 5(r26) cmpw cr0, r28, r26 beq .LBB_test_5 ; shortcirc_next.1 .LBB_test_4: ; shortcirc_next.0.loopexit_crit_edge add r2, r11, r27 add r8, r12, r27 b .LBB_test_9 ; loopexit .LBB_test_5: ; shortcirc_next.1 add r28, r3, r27 lhz r28, 7(r28) add r26, r4, r27 lhz r26, 7(r26) cmpw cr0, r28, r26 beq .LBB_test_7 ; shortcirc_next.2 .LBB_test_6: ; shortcirc_next.1.loopexit_crit_edge add r2, r9, r27 add r8, r10, r27 b .LBB_test_9 ; loopexit .LBB_test_7: ; shortcirc_next.2 add r28, r3, r27 lhz r26, 9(r28) add r28, r4, r27 lhz r25, 9(r28) addi r28, r27, 8 cmpw cr7, r26, r25 mfcr r26, 1 rlwinm r26, r26, 31, 31, 31 add r25, r8, r27 cmpw cr7, r25, r7 mfcr r25, 1 rlwinm r25, r25, 29, 31, 31 and. r26, r26, r25 bne .LBB_test_1 ; loopentry Next up, improve the code for the loop. llvm-svn: 22769	2005-08-12 22:22:17 +00:00

... 2 3 4 5 6 ...

2392 Commits