llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00

Author	SHA1	Message	Date
Chris Lattner	75599bb566	remove the partial specialization pass. It is unmaintained and has bugs. llvm-svn: 123554	2011-01-16 00:27:10 +00:00
Nick Lewycky	1d57e867a4	Make constmerge a two-pass algorithm so that it won't miss merging opporuntities. Fixes PR8978. llvm-svn: 123541	2011-01-15 18:14:21 +00:00
Chris Lattner	55c2150f36	temporarily revert r123526. While working on a follow-on patch I realize that ConstantFoldTerminator doesn't preserve dominfo. llvm-svn: 123527	2011-01-15 07:51:19 +00:00
Chris Lattner	68a47147ba	fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code The basic issue is that isel (very reasonably!) expects conditional branches to be folded, so CGP leaving around a bunch dead computation feeding conditional branches isn't such a good idea. Just fold branches on constants into unconditional branches. llvm-svn: 123526	2011-01-15 07:36:13 +00:00
Chris Lattner	d4eaf6eba8	Now that instruction optzns can update the iterator as they go, we can have objectsize folding recursively simplify away their result when it folds. It is important to catch this here, because otherwise we won't eliminate the cross-block values at isel and other times. llvm-svn: 123524	2011-01-15 07:25:29 +00:00
Chris Lattner	74ed5d30ca	implement an instcombine xform that canonicalizes casts outside of and-with-constant operations. This fixes rdar://8808586 which observed that we used to compile: union xy { struct x { _Bool b[15]; } x; __attribute__((packed)) struct y { __attribute__((packed)) unsigned long b0to7; __attribute__((packed)) unsigned int b8to11; __attribute__((packed)) unsigned short b12to13; __attribute__((packed)) unsigned char b14; } y; }; struct x foo(union xy *xy) { return xy->x; } into: _foo: ## @foo movq (%rdi), %rax movabsq $1095216660480, %rcx ## imm = 0xFF00000000 andq %rax, %rcx movabsq $-72057594037927936, %rdx ## imm = 0xFF00000000000000 andq %rax, %rdx movzbl %al, %esi orq %rdx, %rsi movq %rax, %rdx andq $65280, %rdx ## imm = 0xFF00 orq %rsi, %rdx movq %rax, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rdx, %rsi movl %eax, %edx andl $-16777216, %edx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rdx orq %rcx, %rdx movabsq $280375465082880, %rcx ## imm = 0xFF0000000000 movq %rax, %rsi andq %rcx, %rsi orq %rdx, %rsi movabsq $71776119061217280, %r8 ## imm = 0xFF000000000000 andq %r8, %rax orq %rsi, %rax movzwl 12(%rdi), %edx movzbl 14(%rdi), %esi shlq $16, %rsi orl %edx, %esi movq %rsi, %r9 shlq $32, %r9 movl 8(%rdi), %edx orq %r9, %rdx andq %rdx, %rcx movzbl %sil, %esi shlq $32, %rsi orq %rcx, %rsi movl %edx, %ecx andl $-16777216, %ecx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rcx movq %rdx, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rcx, %rsi movq %rdx, %rcx andq $65280, %rcx ## imm = 0xFF00 orq %rsi, %rcx movzbl %dl, %esi orq %rcx, %rsi andq %r8, %rdx orq %rsi, %rdx ret We now compile this into: _foo: ## @foo ## BB#0: ## %entry movzwl 12(%rdi), %eax movzbl 14(%rdi), %ecx shlq $16, %rcx orl %eax, %ecx shlq $32, %rcx movl 8(%rdi), %edx orq %rcx, %rdx movq (%rdi), %rax ret A small improvement :-) llvm-svn: 123520	2011-01-15 06:32:33 +00:00
Duncan Sands	dc51b0ee48	Turn X-(X-Y) into Y. According to my auto-simplifier this is the most common simplification present in fully optimized code (I think instcombine fails to transform some of these when "X-Y" has more than one use). Fires here and there all over the test-suite, for example it eliminates 8 subtractions in the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc. llvm-svn: 123442	2011-01-14 15:26:10 +00:00
Duncan Sands	4757061c47	Factorize common code out of the InstructionSimplify shift logic. Add in threading of shifts over selects and phis while there. This fires here and there in the testsuite, to not much effect. For example when compiling spirit it fires 5 times, during early-cse, resulting in 6 more cse simplifications, and 3 more terminators being folded by jump threading, but the final bitcode doesn't change in any interesting way: other optimizations would have caught the opportunity anyway, only later. llvm-svn: 123441	2011-01-14 14:44:12 +00:00
Duncan Sands	01be7e406d	Rename this test. llvm-svn: 123440	2011-01-14 14:16:33 +00:00
Chris Lattner	de9ec03027	relax testcase a bit. llvm-svn: 123433	2011-01-14 07:46:33 +00:00
Duncan Sands	44c273d907	Move some shift transforms out of instcombine and into InstructionSimplify. While there, I noticed that the transform "undef >>a X -> undef" was wrong. For example if X is 2 then the top two bits must be equal, so the result can not be anything. I fixed this in the constant folder as well. Also, I made the transform for "X << undef" stronger: it now folds to undef always, even though X might be zero. This is in accordance with the LangRef, but I must admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef" following the LangRef and the constant folder, likewise fairly aggressive. llvm-svn: 123417	2011-01-14 00:37:45 +00:00
Bob Wilson	3b0197489e	Extend SROA to handle arrays accessed as homogeneous structs and vice versa. This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381	2011-01-13 17:45:11 +00:00
Bob Wilson	9f8d730f9b	Make SROA more aggressive with allocas containing padding. SROA only split up structs and arrays one level at a time, so padding can only cause trouble if it is located in between the struct or array elements. llvm-svn: 123380	2011-01-13 17:45:08 +00:00
Duncan Sands	36b007d63b	The most common simplification missed by instsimplify in unoptimized bitcode is "X != 0 -> X" when X is a boolean. This occurs a lot because of the way llvm-gcc converts gcc's conditional expressions. Add this, and a few other similar transforms for completeness. llvm-svn: 123372	2011-01-13 08:56:29 +00:00
Chris Lattner	fdef60bef7	revert 123144, reenabling the rest of memset formation. llvm-svn: 123302	2011-01-12 03:25:15 +00:00
Chris Lattner	e288204194	revert r123146 which disabled code that wasn't the root cause of the bootstrap miscompare issue. llvm-svn: 123299	2011-01-12 01:52:23 +00:00
Chris Lattner	feade29ab8	merge tests into one crash.ll test. llvm-svn: 123220	2011-01-11 07:50:07 +00:00
Chris Lattner	5731a92f5b	remove a bogus assertion: the latch block of a loop is not neccesarily an uncond branch to the header. This fixes PR8955 (the assertion tripping). llvm-svn: 123219	2011-01-11 07:47:59 +00:00
Chandler Carruth	250dce460c	Teach constant folding to perform conversions from constant floating point values to their integer representation through the SSE intrinsic calls. This is the last part of a README.txt entry for which I have real world examples. llvm-svn: 123206	2011-01-11 01:07:24 +00:00
Chandler Carruth	acb82e863d	FileCheck-ize a test, and move a no-longer calling test case to another file and make it actually test something... llvm-svn: 123205	2011-01-11 01:07:20 +00:00
Owen Anderson	4479341626	Fix a random missed optimization by making InstCombine more aggressive when determining which bits are demanded by a comparison against a constant. llvm-svn: 123203	2011-01-11 00:36:45 +00:00
Chandler Carruth	772e26df36	Teach instcombine about the rest of the SSE and SSE2 conversion intrinsics element dependencies. Reviewed by Nick. llvm-svn: 123161	2011-01-10 07:19:37 +00:00
Chandler Carruth	7f854ac9a9	Fold two related tests into the newly FileCheck-ized test, migrating them to FileCheck as well. llvm-svn: 123154	2011-01-10 02:53:58 +00:00
Chandler Carruth	7c332e5abd	Clean up and FileCheck-ize a test. llvm-svn: 123153	2011-01-10 02:53:54 +00:00
Chris Lattner	867dbe1329	fix typo llvm-svn: 123148	2011-01-10 02:33:34 +00:00
Chris Lattner	b5562212e2	another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost back to life. llvm-svn: 123146	2011-01-10 00:47:34 +00:00
Chris Lattner	e8e9ec58bf	temporarily disable memset formation from memsets in an effort to restore buildbot stability. llvm-svn: 123144	2011-01-09 23:52:48 +00:00
Tobias Grosser	9899845dd3	Instcombine: Fix pattern where the sext did not dominate the icmp using it llvm-svn: 123121	2011-01-09 16:00:11 +00:00
Chris Lattner	98136397bd	Merge memsets followed by neighboring memsets and other stores into larger memsets. Among other things, this fixes rdar://8760394 and allows us to handle "Example 2" from http://blog.regehr.org/archives/320, compiling it into a single 4096-byte memset: _mad_synth_mute: ## @mad_synth_mute ## BB#0: ## %entry pushq %rax movl $4096, %esi ## imm = 0x1000 callq ___bzero popq %rax ret llvm-svn: 123089	2011-01-08 21:19:19 +00:00
Chris Lattner	e09439ed9d	fix an issue in IsPointerOffset that prevented us from recognizing that P and P+1 are relative to the same base pointer. llvm-svn: 123087	2011-01-08 21:07:56 +00:00
Chris Lattner	20bf2d50b8	enhance memcpyopt to merge a store and a subsequent memset into a single larger memset. llvm-svn: 123086	2011-01-08 20:54:51 +00:00
Chris Lattner	756416d4c0	merge two tests and filecheckify llvm-svn: 123082	2011-01-08 20:27:22 +00:00
Chris Lattner	7d3c4712e9	When loop rotation happens, it is very common for the duplicated condbr to be foldable into an uncond branch. When this happens, we can make a much simpler CFG for the loop, which is important for nested loop cases where we want the outer loop to be aggressively optimized. Handle this case more aggressively. For example, previously on phi-duplicate.ll we would get this: define void @test(i32 %N, double* %G) nounwind ssp { entry: %cmp1 = icmp slt i64 1, 1000 br i1 %cmp1, label %bb.nph, label %for.end bb.nph: ; preds = %entry br label %for.body for.body: ; preds = %bb.nph, %for.cond %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ] %arrayidx = getelementptr inbounds double* %G, i64 %j.02 %tmp3 = load double* %arrayidx %sub = sub i64 %j.02, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.02, 1 br label %for.cond for.cond: ; preds = %for.body %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge for.cond.for.end_crit_edge: ; preds = %for.cond br label %for.end for.end: ; preds = %for.cond.for.end_crit_edge, %entry ret void } Now we get the much nicer: define void @test(i32 %N, double* %G) nounwind ssp { entry: br label %for.body for.body: ; preds = %entry, %for.body %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds double* %G, i64 %j.01 %tmp3 = load double* %arrayidx %sub = sub i64 %j.01, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.01, 1 %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } With all of these recent changes, we are now able to compile: void foo(char X) { for (int i = 0; i != 100; ++i) for (int j = 0; j != 100; ++j) X[j+i100] = 0; } into a single memset of 10000 bytes. This series of changes should also be helpful for other nested loop scenarios as well. llvm-svn: 123079	2011-01-08 19:59:06 +00:00
Chris Lattner	db05334c7f	Three major changes: 1. Rip out LoopRotate's domfrontier updating code. It isn't needed now that LICM doesn't use DF and it is super complex and gross. 2. Make DomTree updating code a lot simpler and faster. The old loop over all the blocks was just to find a block?? 3. Change the code that inserts the new preheader to just use SplitCriticalEdge instead of doing an overcomplex reimplementation of it. No behavior change, except for the name of the inserted preheader. llvm-svn: 123072	2011-01-08 18:52:51 +00:00
Frits van Bommel	966cc00809	Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little. llvm-svn: 123061	2011-01-08 10:51:36 +00:00
Chris Lattner	6729ce1c33	Have loop-rotate simplify instructions (yay instsimplify!) as it clones them into the loop preheader, eliminating silly instructions like "icmp i32 0, 100" in fixed tripcount loops. This also better exposes the bigger problem with loop rotate that I'd like to fix: once this has been folded, the duplicated conditional branch often turns into an uncond branch. Not aggressively handling this is pessimizing later loop optimizations somethin' fierce by making "dominates all exit blocks" checks fail. llvm-svn: 123060	2011-01-08 08:24:46 +00:00
Tobias Grosser	48469b566a	InstCombine: Match min/max hidden by sext/zext X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X Instead of calculating this with mixed types promote all to the larger type. This enables scalar evolution to analyze this expression. PR8866 llvm-svn: 123034	2011-01-07 21:33:14 +00:00
Benjamin Kramer	62b5a4d14c	Revert 122959, it needs more thought. Add it back to README.txt with additional notes. llvm-svn: 123030	2011-01-07 20:42:20 +00:00
Benjamin Kramer	fb2bb22b6f	InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal. This happens when we take the (non-constant) length from a malloc. llvm-svn: 122961	2011-01-06 14:22:52 +00:00
Benjamin Kramer	5834b2bab8	InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc. llvm-svn: 122959	2011-01-06 13:11:05 +00:00
Benjamin Kramer	d5e1c24646	InstCombine: Teach llvm.objectsize folding to look through GEPs. llvm-svn: 122958	2011-01-06 13:07:49 +00:00
Chris Lattner	83067bc3e7	implement constant folding support for an exotic constant expr: ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64) to "ret i64 1000". This allows us to correctly compute the trip count on a loop in PR8883, which occurs with std::fill on a char array. This allows us to transform it into a memset with a constant size. llvm-svn: 122950	2011-01-06 06:19:46 +00:00
Chris Lattner	dbb1b09731	fix an off-by-one bug that caused a crash analyzing ashr's with huge shift amounts, PR8896 llvm-svn: 122814	2011-01-04 18:19:15 +00:00
Chris Lattner	1f58120bfe	Teach loop-idiom to turn a loop containing a memset into a larger memset when safe. The testcase is basically this nested loop: void foo(char X) { for (int i = 0; i != 100; ++i) for (int j = 0; j != 100; ++j) X[j+i100] = 0; } which gets turned into a single memset now. clang -O3 doesn't optimize this yet though due to a phase ordering issue I haven't analyzed yet. llvm-svn: 122806	2011-01-04 07:46:33 +00:00
Chris Lattner	cd13979300	Duncan deftly points out that readnone functions aren't invalidated by stores, so they can be handled as 'simple' operations. llvm-svn: 122785	2011-01-03 23:38:13 +00:00
Chris Lattner	c1ebe702b1	earlycse can do trivial with-a-block dead store elimination as well. This deletes 60 stores in 176.gcc that largely come from bitfield code. llvm-svn: 122736	2011-01-03 04:17:24 +00:00
Chris Lattner	d19ae32f2f	now that loads are in their own table, we can implement store->load forwarding. This allows EarlyCSE to zap 600 more loads from 176.gcc. llvm-svn: 122732	2011-01-03 03:46:34 +00:00
Chris Lattner	57b02a342e	add a testcase for readonly call CSE llvm-svn: 122730	2011-01-03 03:33:47 +00:00
Chris Lattner	4cfdaa3f02	Teach EarlyCSE to do trivial CSE of loads and read-only calls. On 176.gcc, this catches 13090 loads and calls, and increases the number of simple instructions CSE'd from 29658 to 36208. llvm-svn: 122727	2011-01-03 03:18:43 +00:00
Chris Lattner	39d1fb3320	add DEBUG and -stats output to earlycse. Teach it to CSE the rest of the non-side-effecting instructions. llvm-svn: 122716	2011-01-02 23:19:45 +00:00

1 2 3 4 5 ...

2106 Commits