1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 05:52:53 +02:00
Commit Graph

5077 Commits

Author SHA1 Message Date
Dan Gohman
c9b4620575 Fix ARCOpt to insert releases on both successors of an invoke rather
than trying to insert them immediately after the invoke.

llvm-svn: 133188
2011-06-16 20:57:14 +00:00
John McCall
519c63cdeb The ARC language-specific optimizer. Credit to Dan Gohman.
llvm-svn: 133108
2011-06-15 23:37:01 +00:00
Eli Friedman
a45cbb1743 Stop using memdep for a check that didn't really make sense with memdep. In terms of specific issues, using memdep here checks irrelevant instructions and won't work properly once we start returning "unknown" more aggressively from memdep.
llvm-svn: 133035
2011-06-15 01:25:56 +00:00
Eli Friedman
47b3cd9fff Add "unknown" results for memdep, which mean "I don't know whether a dependence for the given instruction exists in the given block". This cleans up all the existing hacks in memdep which represent this concept by returning clobber with various unrelated instructions.
llvm-svn: 133031
2011-06-15 00:47:34 +00:00
Cameron Zwarich
77f6a2d09e Be more obvious about what is being tested.
llvm-svn: 132982
2011-06-14 06:33:51 +00:00
Cameron Zwarich
5b03baf002 Fix grammar.
llvm-svn: 132952
2011-06-13 23:39:23 +00:00
Cameron Zwarich
c122dc4050 Rename MergeInType to MergeInTypeForLoadOrStore.
llvm-svn: 132940
2011-06-13 21:44:43 +00:00
Cameron Zwarich
e2e3fc3fe2 Remove the HadAVector instance variable and replace it with a use of ScalarKind.
llvm-svn: 132939
2011-06-13 21:44:40 +00:00
Cameron Zwarich
3614d9bec8 Remove a vacuous check.
llvm-svn: 132938
2011-06-13 21:44:38 +00:00
Cameron Zwarich
934fdc5aba Have SRoA explicitly track the kind of scalar it is promoting. This is pretty
spartan right now, but I plan to encode more information in this enum to improve
the correctness and reliability of SRoA. At least this first pass makes it
possible to make VectorTy an actual VectorType.

llvm-svn: 132937
2011-06-13 21:44:35 +00:00
Cameron Zwarich
ab21bb23b9 Remove an argument that is always true.
llvm-svn: 132936
2011-06-13 21:44:31 +00:00
Cameron Zwarich
ca3f5d4844 Remove a vacuous condition.
llvm-svn: 132767
2011-06-09 01:52:44 +00:00
Cameron Zwarich
72cd0d1b5b Fix PR10104 by adding a bounds check on a vector element access check. It was
assuming that all offsets are legal vector accesses, and thus trying to access
the float member of { <2 x float>, float } as the 3rd element of the first
member.

llvm-svn: 132766
2011-06-09 01:45:33 +00:00
Cameron Zwarich
68f8e98b8e Fix an assymmetry between ConvertScalar_ExtractValue and ConvertScalar_InsertValue. The
former was using the size of the entire alloca, whereas the latter was correctly using
the allocated size of the immediate type being converted (which may differ from the size
of the alloca). This fixes PR10082.

llvm-svn: 132759
2011-06-08 22:08:31 +00:00
Devang Patel
ec747bb76b Use IRBuilder, preserve line numbers.
llvm-svn: 132578
2011-06-03 19:46:19 +00:00
Nick Lewycky
ce38535f5b Bail on unswitching a switch statement for a case with a critical edge. We name
which edge to split by pred/succ pair, which means that we can end up splitting
the wrong edge (by case value) in the switch statement entirely. Fixes PR10031!

llvm-svn: 132535
2011-06-03 06:27:15 +00:00
Devang Patel
73e16acee8 Preserve line number information while converting Invoke into a Call.
llvm-svn: 132505
2011-06-02 22:46:58 +00:00
Eli Friedman
0db9c60959 PR10067: Add missing safety check to call return transformation in MemCpyOpt::processStore. If something accesses the dest of the "copy" between the call and the copy, the performCallSlotOptzn transformation is not valid.
llvm-svn: 132485
2011-06-02 21:24:42 +00:00
Nadav Rotem
77b328bd14 Fix warnings due to 132263; Thanks rdivacky.
llvm-svn: 132285
2011-05-29 08:10:47 +00:00
Nadav Rotem
531aa71d22 Refactor getActionType and getTypeToTransformTo ; place all of the 'decision'
code in one place. Re-apply 131534 and fix the multi-step promotion of integers.

llvm-svn: 132217
2011-05-27 21:03:13 +00:00
Eli Friedman
fe707571fc Attempt to preserve debug line info in LICM; as the comment in the code says, it's hard to pick good line numbers for this transformation, but something is better than nothing.
rdar://9143729

llvm-svn: 132215
2011-05-27 20:31:51 +00:00
Eli Friedman
1062ba1e1f Don't sink or hoist debug info instrinsics; it isn't useful. This also prevents LICM sinking from erasing debug intrinsics which don't dominate any exit block of the loop.
rdar://9143943 .

llvm-svn: 132201
2011-05-27 18:37:52 +00:00
Eli Friedman
17000790d8 Oops, wasn't intending to commit this. Partial revert of r132194.
llvm-svn: 132195
2011-05-27 18:04:04 +00:00
Eli Friedman
560532051b Fix a silly mistake (which trips over an assertion) in r132099. rdar://9515076
llvm-svn: 132194
2011-05-27 18:02:04 +00:00
Chandler Carruth
6a41ac0d30 Fix warning about || and && without explicit grouping.
This looks like it flagged an actual bug. Devang, please review. I added
the parentheses that change behavior, but make the behavior more closely
match commit log's intent.

llvm-svn: 132165
2011-05-26 23:37:58 +00:00
Devang Patel
5e3f883ddf Do not insert anything after terminator.
llvm-svn: 132164
2011-05-26 23:16:48 +00:00
Devang Patel
c67e8c5d45 Do not move DBG_VALUE in middle of PHI nodes.
llvm-svn: 132161
2011-05-26 22:43:14 +00:00
Devang Patel
52e803e053 If llvm.dbg.value and the value instruction it refers to are far apart then iSel may not be able to find corresponding Node for llvm.dbg.value during DAG construction. Make iSel's life easier by removing this distance between llvm.dbg.value and its value instruction.
llvm-svn: 132151
2011-05-26 21:51:06 +00:00
Andrew Trick
0577c5768b indvars: incremental fixes for -disable-iv-rewrite and testcases.
Use a proper worklist for use-def traversal without holding onto an
iterator. Now that we process all IV uses, we need complete logic for
resusing existing derived IV defs. See HoistStep.

llvm-svn: 132103
2011-05-26 00:46:11 +00:00
Evan Cheng
335fcad6bc Simplify r132022 based on Cameron's feedback.
llvm-svn: 132071
2011-05-25 18:17:13 +00:00
Andrew Trick
0aa40b2f8f indvars: fixed IV cloning in -disable-iv-rewrite mode with associated
cleanup and overdue test cases.

llvm-svn: 132038
2011-05-25 04:42:22 +00:00
Evan Cheng
d359a7cab3 Forgot dyn_cast check.
llvm-svn: 132025
2011-05-24 23:47:50 +00:00
Evan Cheng
9466b36c02 Fix LoopUnswitch bug. RewriteLoopBodyWithConditionConstant can delete a dead
case of a switch instruction. Back off this optimization when this would
eliminate all of the predecessors to the latch.

Sorry, I am unable to reduce a reasonably sized test case.

rdar://9486843

llvm-svn: 132022
2011-05-24 23:12:57 +00:00
Cameron Zwarich
e425894eaf Clean up the lazy initialization of DIBuilder a bit.
llvm-svn: 131956
2011-05-24 06:00:08 +00:00
Cameron Zwarich
462b5db500 Make LoadAndStorePromoter preserve debug info and create llvm.dbg.values when
promoting allocas to SSA variables. Fixes <rdar://problem/9479036>.

llvm-svn: 131953
2011-05-24 03:10:43 +00:00
Dan Gohman
e6a4a2aa6f When checking for signed multiplication overflow, watch out for INT_MIN and -1.
This fixes PR9845.

llvm-svn: 131919
2011-05-23 21:07:39 +00:00
Chris Lattner
c277b19e97 Teach valuetracking that byval arguments with a specified alignment are
aligned.

Teach memcpyopt to not give up all hope when confonted with an underaligned
memcpy feeding an overaligned byval.  If the *source* of the memcpy can be
determined to be adequeately aligned, or if it can be forced to be, we can
eliminate the memcpy.

This addresses PR9794.  We now compile the example into:

define i32 @f(%struct.p* nocapture byval align 8 %q) nounwind ssp {
entry:
  %call = call i32 @g(%struct.p* byval align 8 %q) nounwind
  ret i32 %call
}

in both x86-64 and x86-32 mode.  We still don't get a tailcall though,
because tailcalls apparently can't handle byval.

llvm-svn: 131884
2011-05-23 00:03:39 +00:00
Chris Lattner
44516c3ba3 Fix PR9815: I was trying to get out of "generating code and then
failing to form a memset, then having to delete it" but my approximation
isn't safe for self recurrent loops.  Instead of doign a hack, just
do it the right way.

llvm-svn: 131858
2011-05-22 17:39:56 +00:00
Frits van Bommel
df58ece78a Add a parameter to ConstantFoldTerminator() that callers can use to ask it to also clean up the condition of any conditional terminator it folds to be unconditional, if that turns the condition into dead code. This just means it calls RecursivelyDeleteTriviallyDeadInstructions() in strategic spots. It defaults to the old behavior.
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.

llvm-svn: 131855
2011-05-22 16:24:18 +00:00
Chris Lattner
6cce3b63ab fix PR9841 by having GVN not process dead loads. This was
causing it to get into infinite loops when it would widen a 
load (which can necessarily leave around dead loads).

llvm-svn: 131847
2011-05-22 07:03:34 +00:00
Eli Friedman
58beee1262 PR7952: Make isa<> use the same logic as cast<>, so that they both work
consistently.

llvm-svn: 131803
2011-05-21 19:13:10 +00:00
Andrew Trick
3352db291f indvars: Prototyping Sign/ZeroExtend elimination without canonical IVs.
No functionality enabled by default. Use -disable-iv-rewrite.
Extended IVUsers to keep track of the phi that represents the users' IV.
Added the WidenIV transform to replace a narrow IV with a wide IV
by doing a one-for-one replacement of IV users instead of expanding the
SCEV expressions. [sz]exts are removed and truncs are inserted.

llvm-svn: 131744
2011-05-20 18:25:42 +00:00
Andrew Trick
650ac8d64f indvars: minor cleanup in preparation for sign/zero extend elimination.
llvm-svn: 131716
2011-05-20 03:37:48 +00:00
Dan Gohman
13dc3428b9 When forming an ICmpZero LSRUse, normalize the non-IV operand
of the comparison, so that the resulting expression is fully
normalized. This fixes PR9939.

llvm-svn: 131576
2011-05-18 21:02:18 +00:00
Duncan Sands
d3292b9f1e Revert commit 131534 since it seems to have broken several buildbots.
Original log entry:
Refactor getActionType and getTypeToTransformTo ; place all of the 'decision'
code in one place.

llvm-svn: 131536
2011-05-18 14:57:56 +00:00
Nadav Rotem
b7d689c706 Refactor getActionType and getTypeToTransformTo ; place all of the 'decision'
code in one place.

llvm-svn: 131534
2011-05-18 12:26:38 +00:00
Devang Patel
b7b53e9a24 Preserve line number information.
llvm-svn: 131482
2011-05-17 20:00:02 +00:00
Devang Patel
9a0301219b Set debug loc for new load instruction.
llvm-svn: 131481
2011-05-17 19:43:38 +00:00
Rafael Espindola
113d944ce4 Don't do tail calls in a function that call setjmp. The stack might be
corrupted when setjmp returns again.

llvm-svn: 131399
2011-05-16 03:05:33 +00:00
Andrew Trick
9bf6279f98 Convert SimplifyIVUsers into a worklist instead of a single pass over
the users.

llvm-svn: 131277
2011-05-13 01:12:21 +00:00
Andrew Trick
3e6beb9889 indvars: Added SimplifyIVUsers.
Interleave IV simplifications. Currently involves EliminateComparison
and EliminateRemainder. Next I'll add EliminateExtend.

llvm-svn: 131210
2011-05-12 00:04:28 +00:00
Duncan Sands
c4d27a63a4 Fix PR9820: a read-only call differs from a load in that a load doesn't
return the pointer being dereferenced, it returns the pointee, but a call
might return the pointer itself.

llvm-svn: 130979
2011-05-06 10:30:37 +00:00
Devang Patel
9faa957c2a Set debug loc for new instructions.
llvm-svn: 130895
2011-05-04 23:58:50 +00:00
Devang Patel
7774df3aa6 Preserve line number information while threading jumps.
llvm-svn: 130880
2011-05-04 22:48:19 +00:00
Devang Patel
86e9457cfb Preserve line number info.
llvm-svn: 130876
2011-05-04 21:58:58 +00:00
Devang Patel
e077bd8a58 preserve line number info.
llvm-svn: 130869
2011-05-04 21:37:05 +00:00
Andrew Trick
db8b89e46b indvars: Added DisableIVRewrite and WidenIVs.
This adds functionality to remove size/zero extension during indvars
without generating a canonical IV and rewriting all IV users. It's
disabled by default so should have no effect on codegen. Work in progress.

llvm-svn: 130829
2011-05-04 02:10:13 +00:00
Andrew Trick
6196cc1640 indvars: Added canExpandBackEdgeTakenCount.
Only create a canonical IV for backedge taken count if it will
actually be used by LinearFunctionTestReplace. And some related
cleanup, preparing to reduce dependence on canonical IVs.
No significant effect on x86 or arm in the test-suite.

llvm-svn: 130799
2011-05-03 22:24:10 +00:00
Dan Gohman
7beb845bab Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Chris Lattner
c04013dd48 enhance memcpyopt to obey -fno-builtin and friends. This addresses a
problem reported on cfe-dev.

llvm-svn: 130661
2011-05-01 18:27:11 +00:00
Devang Patel
1c0642d509 Preserve line number information.
llvm-svn: 130536
2011-04-29 20:38:55 +00:00
Devang Patel
b156287ec5 Preserve line number information.
llvm-svn: 130450
2011-04-28 22:48:14 +00:00
Chris Lattner
8e8a01ebb8 improve comment.
llvm-svn: 130426
2011-04-28 20:02:57 +00:00
Devang Patel
e7e53746f0 Do not lose line number info while eliminating tail call.
llvm-svn: 130419
2011-04-28 18:43:39 +00:00
Chris Lattner
a4696ee0ac final step needed to resolve PR6627, which allows us to flatten the code down to
a nice and tidy:
  %x1 = load i32* %0, align 4
  %1 = icmp eq i32 %x1, 1179403647
  br i1 %1, label %if.then, label %if.end

instead of doing lots of loads and branches.  May the FreeBSD bootloader
long fit in its allocated space.

llvm-svn: 130416
2011-04-28 18:15:47 +00:00
Chris Lattner
5307af1ed9 code cleanups only.
llvm-svn: 130414
2011-04-28 18:08:21 +00:00
Andrew Trick
db3db0337f Reapply r130340: Fix for PR9730.
llvm-svn: 130408
2011-04-28 17:30:04 +00:00
Chris Lattner
6fa384f4a7 centralize "marking for deletion" into a helper function. Pass GVN around to
static functions instead of passing around tons of random ivars.

llvm-svn: 130403
2011-04-28 16:36:48 +00:00
Chris Lattner
bc11964fe9 Promote toErase to be an ivar of the GVN class.
llvm-svn: 130401
2011-04-28 16:18:52 +00:00
Chris Lattner
594c4c5823 teach GVN to widen integer loads when they are overaligned, when doing an
wider load would allow elimination of subsequent loads, and when the wider
load is still a native integer type.  This eliminates a ton of loads on 
various benchmarks involving struct fields, though it is somewhat hobbled
by clang not being very aggressive about field alignment.

This is yet another step along the way towards resolving PR6627.

llvm-svn: 130390
2011-04-28 07:29:08 +00:00
Andrew Trick
ad00466e83 Reverting r130340 in the unlikely event that it's responsible for a llvm-gcc stage2 compiler error.
llvm-svn: 130350
2011-04-28 00:13:59 +00:00
Andrew Trick
652ca6a95a Fixes PR9730: indvars: An asserting value handle still pointed to this value
Modified LinearFunctionTestReplace to push the condition on the dead
list instead of eagerly deleting it. This can cause unnecessary
IV rewrites, which should have no effect on codegen and will not be an
issue once we stop generating canonical IVs.

llvm-svn: 130340
2011-04-27 23:00:03 +00:00
Devang Patel
cc15dfc21a Simplify cfg inserts a call to trap when unreachable code is detected. Assign DebugLoc to this new trap instruction.
llvm-svn: 130315
2011-04-27 17:59:27 +00:00
Chris Lattner
a43e6b57a4 Improve the bail-out predicate to really only kick in when phi
translation fails.  We were bailing out in some cases that would
cause us to miss GVN'ing some non-local cases away.

llvm-svn: 130206
2011-04-26 17:41:02 +00:00
Chris Lattner
bad294615e Enhance MemDep: When alias analysis returns a partial alias result,
return it as a clobber.  This allows GVN to do smart things.

Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load.  In this case, forward the value.  This
allows us to compile stuff like this:

int test(void *P) {
  int tmp = *(unsigned int*)P;
  return tmp+*((unsigned char*)P+1);
}

into:

_test:                                  ## @test
	movl	(%rdi), %ecx
	movzbl	%ch, %eax
	addl	%ecx, %eax
	ret

which has one load.  We already handled the case where the smaller
load was from a must-aliased base pointer.

llvm-svn: 130180
2011-04-26 01:21:15 +00:00
Jay Foad
c146569beb Remove unused STL header includes.
llvm-svn: 130068
2011-04-23 19:53:52 +00:00
Cameron Zwarich
fee060175e Fix another case of <rdar://problem/9184212> that only occurs with code
generated by llvm-gcc, since llvm-gcc uses 2 i64s for passing a 4 x float
vector on ARM rather than an i64 array like Clang.

llvm-svn: 129878
2011-04-20 21:48:38 +00:00
Cameron Zwarich
139f811a9b The bitcast case here is actually handled uniformly earlier in the function, so
delete it.

llvm-svn: 129877
2011-04-20 21:48:34 +00:00
Cameron Zwarich
13cd49fd54 Cleanup some code to better use an early return style in preparation for adding
more cases.

llvm-svn: 129876
2011-04-20 21:48:16 +00:00
Chris Lattner
0304b82f80 Fix a ton of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129558
2011-04-15 05:18:47 +00:00
Owen Anderson
268d8f22f8 Fix an infinite alternation in JumpThreading where two transforms would repeatedly undo each other. The solution is to perform more aggressive constant folding to make one of the edges just folded away rather than trying to thread it.
Fixes <rdar://problem/9284786>.

Discovered with CSmith.

llvm-svn: 129538
2011-04-14 21:35:50 +00:00
Mon P Wang
f9b26f115c Cleanup r129509 based on comments by Chris
llvm-svn: 129532
2011-04-14 19:20:42 +00:00
Mon P Wang
8b09087f46 Cleanup r129472 by using a utility routine as suggested by Eli.
llvm-svn: 129509
2011-04-14 08:04:01 +00:00
Chris Lattner
e9409fb80a fix a couple -Wsign-compare warnings.
llvm-svn: 129501
2011-04-14 02:27:25 +00:00
Mon P Wang
4b667cd995 Vectors with different number of elements of the same element type can have
the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>).  When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.

llvm-svn: 129472
2011-04-13 21:40:02 +00:00
Junjie Gu
caffb8c04f Fixed the revision 129449.
llvm-svn: 129450
2011-04-13 16:45:49 +00:00
Junjie Gu
920274b4dd Passing unroll parameters (unroll-count, threshold, and partial unroll) via LoopUnroll class's ctor. Doing so
will allow multiple context with different loop unroll parameters to run.  This is a minor change and no effect 
on existing application.

llvm-svn: 129449
2011-04-13 16:15:29 +00:00
Rafael Espindola
3a5e7bd46f Add the alias analysis to the C api.
llvm-svn: 129447
2011-04-13 15:44:58 +00:00
Bill Wendling
0984f4927e Reapply r129401 with patch for clang.
llvm-svn: 129419
2011-04-13 00:36:11 +00:00
Bill Wendling
f6446a0961 Revert r129401 for now. Clang is using the old way of doing things.
llvm-svn: 129403
2011-04-12 22:59:27 +00:00
Bill Wendling
f9c9d3e05b Remove the unaligned load intrinsics in favor of using native unaligned loads.
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.

First part of <rdar://problem/8460511>.

llvm-svn: 129401
2011-04-12 22:46:31 +00:00
Dan Gohman
4eedbc29cd Fix reassociate to use a worklist instead of recursing when new
reassociation opportunities are exposed. This fixes a bug where
the nested reassociation expects to be the IR to be consistent,
but it isn't, because the outer reassociation has disconnected
some of the operands.  rdar://9167457

llvm-svn: 129324
2011-04-12 00:11:56 +00:00
Jay Foad
0d5ca4cf44 Don't include Operator.h from InstrTypes.h.
llvm-svn: 129271
2011-04-11 09:35:34 +00:00
Chris Lattner
b1efa0b48d fix PR9523, a crash in looprotate on a non-canonical loop made out of indirectbr.
llvm-svn: 129203
2011-04-09 07:25:58 +00:00
Chris Lattner
f7623daa2b Fix a bug where RecursivelyDeleteTriviallyDeadInstructions could
delete the instruction pointed to by CGP's current instruction
iterator, leading to a crash on the testcase.  This fixes PR9578.

llvm-svn: 129200
2011-04-09 07:05:44 +00:00
Rafael Espindola
a13eab4837 Expose more passes to the C API.
llvm-svn: 129087
2011-04-07 18:20:46 +00:00
Eli Friedman
b0e846a68c PR9634: Don't unconditionally tell the AliasSetTracker that the PreheaderLoad
is equivalent to any other relevant value; it isn't true in general.
If it is equivalent, the LoopPromoter will tell the AST the equivalence.
Also, delete the PreheaderLoad if it is unused.

Chris, since you were the last one to make major changes here, can you check
that this is sane?

llvm-svn: 129049
2011-04-07 01:35:06 +00:00
Bill Wendling
59a1021dc6 * The DSE code that tested for overlapping needed to take into account the fact
that one of the numbers is signed while the other is unsigned. This could lead
  to a wrong result when the signed was promoted to an unsigned int.

* Add the data layout line to the testcase so that it will test the appropriate
  thing.

Patch by David Terei!

llvm-svn: 128577
2011-03-30 21:37:19 +00:00
Jay Foad
53632b7c03 Remove PHINode::reserveOperandSpace(). Instead, add a parameter to
PHINode::Create() giving the (known or expected) number of operands.

llvm-svn: 128537
2011-03-30 11:28:46 +00:00
Jay Foad
dc5a008237 (Almost) always call reserveOperandSpace() on newly created PHINodes.
llvm-svn: 128535
2011-03-30 11:19:20 +00:00
Benjamin Kramer
a28b1da096 DSE: Remove an early exit optimization that depended on the ordering of a SmallPtrSet.
Fixes PR9569 and will hopefully make selfhost on ASLR-enabled systems more deterministic.

llvm-svn: 128482
2011-03-29 20:28:57 +00:00
Cameron Zwarich
d49e32233c Do some simple copy propagation through integer loads and stores when promoting
vector types. This helps a lot with inlined functions when using the ARM soft
float ABI. Fixes <rdar://problem/9184212>.

llvm-svn: 128453
2011-03-29 05:19:52 +00:00
Bill Wendling
24757df3f3 Simplification noticed by Frits.
llvm-svn: 128333
2011-03-26 09:32:07 +00:00
Bill Wendling
0e3e7257ef Rework the logic that determines if a store completely overlaps an ealier store.
There are two ways that a later store can comletely overlap a previous store:

1. They both start at the same offset, but the earlier store's size is <= the
   later's size, or
2. The earlier store's offset is > the later's offset, but it's offset + size
   doesn't extend past the later's offset + size.

llvm-svn: 128332
2011-03-26 08:02:59 +00:00
Cameron Zwarich
09bd1deda3 Fix a typo and add a test.
llvm-svn: 128331
2011-03-26 04:58:50 +00:00
Bill Wendling
72b390743d PR9561: A store with a negative offset (via GEP) could erroniously say that it
completely overlaps a previous store, thus mistakenly deleting that store. Check
for this condition.

llvm-svn: 128319
2011-03-26 01:20:37 +00:00
Cameron Zwarich
1991677a8e Debug intrinsics must be skipped at the beginning and ends of blocks, lest they
affect the generated code.

llvm-svn: 128217
2011-03-24 16:34:59 +00:00
Cameron Zwarich
c2a32b3c5c It is enough for the CallInst to have no uses to be made a tail call with a ret
void; it doesn't need to have a void type.

llvm-svn: 128212
2011-03-24 15:54:11 +00:00
Devang Patel
f432e2c2df s/UpdateDT/ModifiedDT/g
llvm-svn: 128211
2011-03-24 15:35:25 +00:00
Cameron Zwarich
4d1c5fe9ae Do early taildup of ret in CodeGenPrepare for potential tail calls that have a
void return type. This fixes PR9487.

llvm-svn: 128197
2011-03-24 04:52:10 +00:00
Cameron Zwarich
6c983bf746 Use an early return instead of a long if block.
llvm-svn: 128196
2011-03-24 04:52:07 +00:00
Cameron Zwarich
6862665125 When UpdateDT is set, DT is invalid, which could cause problems when trying to
use it later. I couldn't make a test that hits this with the current code.

llvm-svn: 128195
2011-03-24 04:52:04 +00:00
Cameron Zwarich
6633f15fad Check for TLI so that -codegenprepare can be used from opt.
llvm-svn: 128194
2011-03-24 04:51:51 +00:00
Cameron Zwarich
9f72ea0a80 Fix PR9464 by correcting some math that just happened to be right in most cases
that were hit in practice.

llvm-svn: 128146
2011-03-23 05:25:55 +00:00
Evan Cheng
dd99a0a548 Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.
llvm-svn: 127981
2011-03-21 01:19:09 +00:00
Daniel Dunbar
34c65737c3 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng
c5f50f7322 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Andrew Trick
a4b86e96b1 Remove TargetData and ValueTracking includes. I didn't mean for them to sneak in my last checkin.
llvm-svn: 127842
2011-03-18 00:36:39 +00:00
Andrew Trick
07887af00c Added isValidRewrite() to check the result of ScalarEvolutionExpander.
SCEV may generate expressions composed of multiple pointers, which can
lead to invalid GEP expansion. Until we can teach SCEV to follow strict
pointer rules, make sure no bad GEPs creep into IR.
Fixes rdar://problem/9038671.

llvm-svn: 127839
2011-03-17 23:51:11 +00:00
Andrew Trick
17df72f60b whitespace
llvm-svn: 127837
2011-03-17 23:46:48 +00:00
Cameron Zwarich
c60b47a7e2 Fix a comment.
llvm-svn: 127728
2011-03-16 08:13:42 +00:00
Cameron Zwarich
7fd94ea393 Only convert allocas to scalars if it is profitable. The profitability metric I
chose is having a non-memcpy/memset use and being larger than any native integer
type. Originally I chose having an access of a size smaller than the total size
of the alloca, but this caused some minor issues on the spirit benchmark where
SRoA runs again after some inlining.

This fixes <rdar://problem/8613163>.

llvm-svn: 127718
2011-03-16 00:13:44 +00:00
Cameron Zwarich
88790f3d4d Better use initializer lists.
llvm-svn: 127716
2011-03-16 00:13:37 +00:00
Cameron Zwarich
f09bb5f2f5 Add a clarifying comment.
llvm-svn: 127715
2011-03-16 00:13:35 +00:00
Andrew Trick
5d45b563c5 Added SCEV::NoWrapFlags to manage unsigned, signed, and self wrap
properties.
Added the self-wrap flag for SCEV::AddRecExpr.
A slew of temporary FIXMEs indicate the intention of the no-self-wrap flag
without changing behavior in this revision.

llvm-svn: 127590
2011-03-14 16:50:06 +00:00
Andrew Trick
e0442babf1 whitespace
llvm-svn: 127589
2011-03-14 16:48:10 +00:00
Cameron Zwarich
bf5c9cd119 Roll r127459 back in:
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127498
2011-03-11 21:52:04 +00:00
Daniel Dunbar
a02706c889 Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get
created from the", it broke some GCC test suite tests.

llvm-svn: 127477
2011-03-11 19:30:30 +00:00
Cameron Zwarich
9ed726c151 Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127459
2011-03-11 04:54:27 +00:00
Dan Gohman
051cc738c3 RecursivelyDeleteTriviallyDeadInstructions only needs a
Value, not an Instruction, so casting is not necessary. Also,
it's theoretically possible that the Value is not an
Instruction, since WeakVH follows RAUWs.

llvm-svn: 127427
2011-03-10 20:57:44 +00:00
Dan Gohman
bb9c77f5bd Fix reassociate to postpone certain instruction deletions until
after it has finished all of its reassociations, because its
habit of unlinking operands and holding them in a datastructure
while working means that it's not easy to determine when an
instruction is really dead until after all its regular work is
done. rdar://9096268.

llvm-svn: 127424
2011-03-10 19:51:54 +00:00
Devang Patel
7a4edc6463 Preserve line number information while simplifying libcalls.
llvm-svn: 127362
2011-03-09 21:27:52 +00:00
Cameron Zwarich
90efa03e8b Fix a crasher introduced by r127317 that is seen on the bots when using an
alloca as both integer and floating-point vectors of the same size. Bugpoint is
not cooperating with me, but I'll try to find a manual testcase tomorrow.

llvm-svn: 127320
2011-03-09 07:34:11 +00:00
Cameron Zwarich
e0b4705b03 Add support to scalar replacement for partial vector accesses of an alloca, e.g.
a union of a float, <2 x float>, and <4 x float>. This mostly comes up with the
use of vector intrinsics, especially in NEON when programmers know the layout of
the register file. This enables codegen to eliminate a lot of the subregister
traffic it would otherwise generate.

This commit only enables this for a small number of floating-point cases, but a
lot more integer cases. I assume this is okay for all ports, but I did not do
extensive testing of the quality of code involving i512 vectors and the like. If
there is a use case where this generates worse code than before, let me know and
we can scale it back.

This fixes <rdar://problem/9036264>.

llvm-svn: 127317
2011-03-09 05:43:05 +00:00
Cameron Zwarich
e80c47f295 Move vector type merging to a separate function in preparation for it getting
more complicated.

llvm-svn: 127316
2011-03-09 05:43:01 +00:00
Devang Patel
c4e4244654 While sinking an instruction, do not lose llvm.dbg.value intrinsic.
llvm-svn: 127214
2011-03-08 03:06:19 +00:00
Devang Patel
2649e83214 Preserve line no. info.
Radar 9097659

llvm-svn: 127182
2011-03-07 22:43:45 +00:00
Cameron Zwarich
7de156da4a Fix PR9398 - 10% of llc compile time is spent in Value::getNumUses. This reduces
the percentage of time spent in CodeGenPrepare when llcing 403.gcc from 12.6% to
1.8% of total llc time.

llvm-svn: 127069
2011-03-05 08:12:26 +00:00
Richard Osborne
2186cd4e47 Fix typo in comment.
llvm-svn: 126941
2011-03-03 14:21:22 +00:00
Richard Osborne
88d0d840f2 Optimize fprintf -> iprintf if there are no floating point arguments
and siprintf is available on the target.

llvm-svn: 126940
2011-03-03 14:20:22 +00:00
Richard Osborne
021b589253 Optimize sprintf -> siprintf if there are no floating point arguments
and siprintf is available on the target.

llvm-svn: 126937
2011-03-03 14:09:28 +00:00
Richard Osborne
df829ddcb7 Optimize printf -> iprintf if there are no floating point arguments
and iprintf is available on the target. Currently iprintf is only
marked as being available on the XCore.

llvm-svn: 126935
2011-03-03 13:17:51 +00:00
Cameron Zwarich
bd98c1011d Remove some more unused code that I missed.
llvm-svn: 126826
2011-03-02 03:48:29 +00:00
Cameron Zwarich
6a4612ba06 Eliminate the unused CodeGenPrepare option to split critical edges.
llvm-svn: 126825
2011-03-02 03:31:46 +00:00
Cameron Zwarich
0287e8ad22 Stop computing the number of uses twice per value in CodeGenPrepare's sinking of
addressing code. On 403.gcc this almost halves CodeGenPrepare time and reduces
total llc time by 9.5%. Unfortunately, getNumUses() is still the hottest function
in llc.

llvm-svn: 126782
2011-03-01 21:13:53 +00:00
Ted Kremenek
7b6189c848 Unbreak CMake build.
llvm-svn: 126715
2011-02-28 23:56:33 +00:00
Chris Lattner
77d6f9b20e update cmake
llvm-svn: 126694
2011-02-28 22:45:25 +00:00
Dan Gohman
91de965338 Delete the GEPSplitter experiment.
llvm-svn: 126671
2011-02-28 19:47:47 +00:00
Dan Gohman
db646bfdad Delete the SimplifyHalfPowrLibCalls pass, which was unused, and
only existed as the result of a misunderstanding.

llvm-svn: 126669
2011-02-28 19:41:14 +00:00
Chris Lattner
e64e9c6f73 wire TargetLibraryInfo into simplify libcalls and use it in a couple of
trivial places.  This pass needs a lot of work.

llvm-svn: 126367
2011-02-24 07:16:14 +00:00
Chris Lattner
d1bb8c0277 move a massive amount of code out into its own helper function
to reduce nesting.  This needs to be turned into a table.

llvm-svn: 126366
2011-02-24 07:12:12 +00:00
Cameron Zwarich
c5fa112a70 Make LoopDeletion work on loops with multiple edges, as long as the incoming
values from all of the loop's exiting blocks are equal. Patch by Andrew Clinton.

llvm-svn: 126253
2011-02-22 22:25:39 +00:00
Chris Lattner
83c60ae907 fix a crasher in disabled code (on variable stride loops)
llvm-svn: 126125
2011-02-21 17:02:55 +00:00
Chris Lattner
9d2899115f Add some (disabled code) to print out negative strides.
llvm-svn: 126102
2011-02-21 02:08:54 +00:00
Chris Lattner
cc2fa12fac rewrite the memset_pattern pattern generation stuff to accept any 2/4/8/16-byte
constant, including globals.  This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.

This enables us to handle stores of relocatable globals.  This kicks in about
48 times in 254.gap, giving us stuff like this:

@.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*] [%struct.TypHeader* (%struct.TypHeader*, %struct
.TypHeader*)* @IsFalse, %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)* @IsFalse], align 16

...
  call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*]* @.memset_pattern40 to i8*
), i64 %tmp75) nounwind

llvm-svn: 126044
2011-02-19 19:56:44 +00:00
Chris Lattner
474215e713 Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins).  This transforms lots of strided loop stores
of ints for example, like 5 in vpr:

  Formed memset:   call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
    from store to: {%3,+,4}<%11> at:   store i32 3, i32* %scevgep, align 4, !tbaa !4

llvm-svn: 126040
2011-02-19 19:31:39 +00:00
Chris Lattner
2b183fca87 Make loop-idiom use TargetLibraryInfo to determine whether it is allowed
to hack on memset, memcpy etc.

llvm-svn: 125974
2011-02-18 22:22:15 +00:00
Chris Lattner
eccec47f5c prevent jump threading from merging blocks when their address is
taken (and used!).  This prevents merging the blocks (invalidating
the block addresses) in a case like this:

#define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })

void foo() {
  printf("%p\n", _THIS_IP_);
  printf("%p\n", _THIS_IP_);
  printf("%p\n", _THIS_IP_);
}

which fixes PR4151.

llvm-svn: 125829
2011-02-18 04:43:06 +00:00
Chris Lattner
828b97cdc2 fix PR9215, preventing -reassociate from clearing nsw/nuw when
it swaps the LHS/RHS of a single binop.

llvm-svn: 125700
2011-02-17 01:29:24 +00:00
Duncan Sands
061150ac1b Spelling fix: consequtive -> consecutive.
llvm-svn: 125563
2011-02-15 09:23:02 +00:00
Chris Lattner
db204cbe42 convert ConstantVector::get to use ArrayRef.
llvm-svn: 125537
2011-02-15 00:14:00 +00:00
Devang Patel
4b1ea1ef94 Do not hoist @llvm.dbg.value. Here, @llvm.dbg.value is "referring" a value that is modified inside loop.
llvm-svn: 125529
2011-02-14 23:03:23 +00:00
Chris Lattner
ee7f7c2494 revert my ConstantVector patch, it seems to have made the llvm-gcc
builders unhappy.

llvm-svn: 125504
2011-02-14 18:15:46 +00:00
Chris Lattner
34f32cb4c2 Switch ConstantVector::get to use ArrayRef instead of a pointer+size
idiom.  Change various clients to simplify their code.

llvm-svn: 125487
2011-02-14 07:55:32 +00:00
Daniel Dunbar
74c3b94237 SimplifyLibCalls: Add missing legalize check on various printf to puts and
putchar transforms, their return values are not compatible.

llvm-svn: 125442
2011-02-12 18:19:57 +00:00
Cameron Zwarich
ef11d9219e Make LoopUnswitch preserve ScalarEvolution by just forgetting everything about
a loop when unswitching it. It only does this in the complex case, because
everything should be fine already in the simple case.

llvm-svn: 125369
2011-02-11 06:08:28 +00:00
Cameron Zwarich
a71a5e4f90 LoopInstSimplify preserves ScalarEvolution.
llvm-svn: 125368
2011-02-11 06:08:25 +00:00
Cameron Zwarich
38343ff97c If we can't avoid running loop-simplify twice for now, at least avoid running
iv-users twice.

llvm-svn: 125318
2011-02-10 23:53:14 +00:00
Eric Christopher
c86930fa03 Revert this in an attempt to bring the builders back.
llvm-svn: 125257
2011-02-10 01:48:24 +00:00
Cameron Zwarich
9e1e6813bd Turn this pass ordering:
Natural Loop Information
 Loop Pass Manager
   Canonicalize natural loops
 Scalar Evolution Analysis
 Loop Pass Manager
   Induction Variable Users
   Canonicalize natural loops
   Induction Variable Users
   Loop Strength Reduction

into this:

Scalar Evolution Analysis
Loop Pass Manager
  Canonicalize natural loops
  Induction Variable Users
  Loop Strength Reduction

This fixes <rdar://problem/8869639>. I also filed PR9184 on doing this sort of
thing automatically, but it seems easier to just change the ordering of the
passes if this is the only case.

llvm-svn: 125254
2011-02-10 01:07:54 +00:00
Dan Gohman
ae7dba9ba8 Don't split any loop backedges, including backedges of loops other than
the active loop. This is generally desirable, and it avoids trouble
in situations such as the testcase in PR9123, though the failure
mode depends on use-list order, so it is infeasible to test.

llvm-svn: 125065
2011-02-08 00:55:13 +00:00
Dan Gohman
4dc130ea78 Fix reassociate to clear optional flags, such as nsw.
llvm-svn: 124712
2011-02-02 02:02:34 +00:00
Francois Pichet
6aed3c72dc Unbreak the MSVC build.
The DEBUG() call at line 606 demands to see raw_ostream's definition. I have no idea why this seems to only break MSVC.

llvm-svn: 124545
2011-01-29 20:06:16 +00:00
Evan Cheng
20433f6339 Add a test for TCE return duplication.
llvm-svn: 124527
2011-01-29 04:53:35 +00:00
Evan Cheng
4af5487b74 Re-apply r124518 with fix. Watch out for invalidated iterator.
llvm-svn: 124526
2011-01-29 04:46:23 +00:00
Evan Cheng
1f943b9b13 Revert r124518. It broke Linux self-host.
llvm-svn: 124522
2011-01-29 02:43:04 +00:00
Evan Cheng
a1e4cb5f09 Re-commit r124462 with fixes. Tail recursion elim will now dup ret into unconditional predecessor to enable TCE on demand.
llvm-svn: 124518
2011-01-29 01:29:26 +00:00
Duncan Sands
e1912ca7e0 Fix PR9039, a use-after-free in reassociate. The issue was that the
operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.

llvm-svn: 124287
2011-01-26 10:08:38 +00:00
Dan Gohman
db0dc19c04 Give GetUnderlyingObject a TargetData, to keep it in sync
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.

Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.

llvm-svn: 124134
2011-01-24 18:53:32 +00:00
Chris Lattner
3609e5afb4 enhance SRoA to promote allocas that are used by PHI nodes. This often
occurs because instcombine sinks loads and inserts phis.  This kicks in 
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.

This resolves the last of rdar://7339113.

llvm-svn: 124090
2011-01-24 01:07:11 +00:00
Chris Lattner
50288f8e54 Enhance SRoA to promote allocas that are used by selects in some
common cases.  This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this.  For example, code from the
STL ends up looking like this to SRoA:

  %202 = load i64* %__old_size, align 8, !tbaa !3
  %203 = load i64* %__old_size, align 8, !tbaa !3
  %204 = load i64* %__n, align 8, !tbaa !3
  %205 = icmp ult i64 %203, %204
  %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
  %206 = load i64* %storemerge.i, align 8, !tbaa !3

We can now promote both the __n and the __old_size allocas.

This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.

llvm-svn: 124088
2011-01-23 22:04:55 +00:00
Chris Lattner
2ef605b499 Enhance SRoA to be more aggressive about scalarization of aggregate allocas
that have PHI or select uses of their element pointers.  This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.

With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted.  This is still a win for large aggregates where only one element
is used.  This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).

llvm-svn: 124070
2011-01-23 08:27:54 +00:00
Chris Lattner
2e57722d9d have AllocaInfo store the alloca being inspected, simplifying callers.
No functionality change.

llvm-svn: 124067
2011-01-23 07:29:29 +00:00
Chris Lattner
2064a3e213 Rearrange some code a bit. Change MarkUnsafe to
handle the "Transformation preventing inst" printing, 
so that -scalarrepl -debug will always print the rejected
instruction.  No functionality change.

llvm-svn: 124066
2011-01-23 07:05:44 +00:00
Chris Lattner
ba66871643 remove an old hack that avoided creating MMX datatypes. The
X86 backend has been fixed.

llvm-svn: 124064
2011-01-23 06:40:33 +00:00
Dan Gohman
a9724ada65 Actually check memcpy lengths, instead of just commenting about
how they should be checked.

llvm-svn: 123999
2011-01-21 22:07:57 +00:00
Nick Lewycky
5de10fb201 SCCP doesn't actually preserve the CFG. It will delete and insert terminator
instructions.

llvm-svn: 123973
2011-01-21 08:38:09 +00:00
Chris Lattner
4832a9d32c fix rdar://8878965, a regression I introduced with the recent
llvm.objectsize changes.

llvm-svn: 123771
2011-01-18 20:53:04 +00:00
Cameron Zwarich
c93f994801 Remove code for updating dominance frontiers and some outdated references to
dominance and post-dominance frontiers.

llvm-svn: 123725
2011-01-18 04:11:31 +00:00
Cameron Zwarich
e39e476305 Remove outdated references to dominance frontiers.
llvm-svn: 123724
2011-01-18 03:53:26 +00:00
Owen Anderson
b315a60d0f Remove dead code, that I apparently wrote a while back. We seem to be doing well enough
without whatever this was trying to do.  When/if someone has the time to do some empirical
evaluations, it might be worth it to figure out what this code was trying to do and see if
it's worth resurrecting/fixing.

llvm-svn: 123684
2011-01-17 22:39:54 +00:00
Cameron Zwarich
0a1975802e Roll r123609 back in with two changes that fix test failures with expensive
checks enabled:

1) Use '<' to compare integers in a comparison function rather than '<='.

2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.

The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.

llvm-svn: 123662
2011-01-17 17:38:41 +00:00
Cameron Zwarich
77e381e302 Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot.
llvm-svn: 123618
2011-01-17 07:26:51 +00:00
Cameron Zwarich
e35642342b Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to
eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.

Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.

llvm-svn: 123609
2011-01-17 01:08:59 +00:00
Chris Lattner
2f902bc3b1 tidy up a comment, as suggested by duncan
llvm-svn: 123590
2011-01-16 17:46:19 +00:00
Chris Lattner
f3b214e298 simplify a little
llvm-svn: 123573
2011-01-16 07:11:21 +00:00
Chris Lattner
29f339f87c if an alloca is only ever accessed as a unit, and is accessed with load/store instructions,
then don't try to decimate it into its individual pieces.  This will just make a mess of the
IR and is pointless if none of the elements are individually accessed.  This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.

The testcase now is optimized to:

define i64 @test2(i64 %X) {
  br label %L2

L2:                                               ; preds = %0
  ret i64 %X
}

before we generated:

define i64 @test2(i64 %X) {
  %sroa.store.elt = lshr i64 %X, 56
  %1 = trunc i64 %sroa.store.elt to i8
  %sroa.store.elt8 = lshr i64 %X, 48
  %2 = trunc i64 %sroa.store.elt8 to i8
  %sroa.store.elt9 = lshr i64 %X, 40
  %3 = trunc i64 %sroa.store.elt9 to i8
  %sroa.store.elt10 = lshr i64 %X, 32
  %4 = trunc i64 %sroa.store.elt10 to i8
  %sroa.store.elt11 = lshr i64 %X, 24
  %5 = trunc i64 %sroa.store.elt11 to i8
  %sroa.store.elt12 = lshr i64 %X, 16
  %6 = trunc i64 %sroa.store.elt12 to i8
  %sroa.store.elt13 = lshr i64 %X, 8
  %7 = trunc i64 %sroa.store.elt13 to i8
  %8 = trunc i64 %X to i8
  br label %L2

L2:                                               ; preds = %0
  %9 = zext i8 %1 to i64
  %10 = shl i64 %9, 56
  %11 = zext i8 %2 to i64
  %12 = shl i64 %11, 48
  %13 = or i64 %12, %10
  %14 = zext i8 %3 to i64
  %15 = shl i64 %14, 40
  %16 = or i64 %15, %13
  %17 = zext i8 %4 to i64
  %18 = shl i64 %17, 32
  %19 = or i64 %18, %16
  %20 = zext i8 %5 to i64
  %21 = shl i64 %20, 24
  %22 = or i64 %21, %19
  %23 = zext i8 %6 to i64
  %24 = shl i64 %23, 16
  %25 = or i64 %24, %22
  %26 = zext i8 %7 to i64
  %27 = shl i64 %26, 8
  %28 = or i64 %27, %25
  %29 = zext i8 %8 to i64
  %30 = or i64 %29, %28
  ret i64 %30
}

In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off.  It's better to not generate this stuff
in the first place.

llvm-svn: 123571
2011-01-16 06:18:28 +00:00
Chris Lattner
b43fce09a9 Use an irbuilder to get some trivial constant folding when doing a store
of a constant.

llvm-svn: 123570
2011-01-16 05:58:24 +00:00
Chris Lattner
2067fb2a93 enhance FoldOpIntoPhi in instcombine to try harder when a phi has
multiple uses.  In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi.  In the testcase
this pushes an add out of the loop.

llvm-svn: 123568
2011-01-16 05:28:59 +00:00
Chris Lattner
55c2150f36 temporarily revert r123526. While working on a follow-on patch I
realize that ConstantFoldTerminator doesn't preserve dominfo.

llvm-svn: 123527
2011-01-15 07:51:19 +00:00
Chris Lattner
68a47147ba fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea.  Just fold branches on constants
into unconditional branches.

llvm-svn: 123526
2011-01-15 07:36:13 +00:00
Chris Lattner
2a7c042c37 simplify code, no functionality change.
llvm-svn: 123525
2011-01-15 07:29:01 +00:00
Chris Lattner
d4eaf6eba8 Now that instruction optzns can update the iterator as they go, we can
have objectsize folding recursively simplify away their result when it
folds.  It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.

llvm-svn: 123524
2011-01-15 07:25:29 +00:00
Chris Lattner
939e77a0df make the current instruction iterator an ivar, allowing xforms that
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.

llvm-svn: 123523
2011-01-15 07:14:54 +00:00
Chris Lattner
e6d5b3c4ce Generalize LoadAndStorePromoter a bit and switch LICM
to use it.

llvm-svn: 123501
2011-01-15 00:12:35 +00:00
Chris Lattner
1ce35a0362 switch SRoA to use LoadAndStorePromoter instead of its own copy of the code.
llvm-svn: 123457
2011-01-14 19:50:47 +00:00
Chris Lattner
8e171470d3 split SROA into two passes: one that uses DomFrontiers (-scalarrepl)
and one that uses SSAUpdater (-scalarrepl-ssa)

llvm-svn: 123436
2011-01-14 08:13:00 +00:00
Chris Lattner
b5c39352d8 Implement full support for promoting allocas to registers using SSAUpdater
instead of DomTree/DomFrontier.  This may be interesting for reducing compile 
time.  This is currently disabled, but seems to work just fine.

When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.

llvm-svn: 123434
2011-01-14 07:50:47 +00:00
Bob Wilson
1238f872da Fix whitespace.
llvm-svn: 123396
2011-01-13 20:59:44 +00:00
Bob Wilson
fbab825516 Check for empty structs, and for consistency, zero-element arrays.
llvm-svn: 123383
2011-01-13 18:26:59 +00:00
Bob Wilson
3b0197489e Extend SROA to handle arrays accessed as homogeneous structs and vice versa.
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations.  Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type.  The corresponding return
types declared in the arm_neon.h header have equivalent arrays.  We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type.  SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array.  So, all that needs to be done is to check
for compatible arrays and homogeneous structs.

llvm-svn: 123381
2011-01-13 17:45:11 +00:00
Bob Wilson
9f8d730f9b Make SROA more aggressive with allocas containing padding.
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.

llvm-svn: 123380
2011-01-13 17:45:08 +00:00
Devang Patel
52db9b4821 Use SmallVector instead of SmallPtrSet and avoid non-deterministic behavior.
llvm-svn: 123318
2011-01-12 19:12:45 +00:00
Chris Lattner
fdef60bef7 revert 123144, reenabling the rest of memset formation.
llvm-svn: 123302
2011-01-12 03:25:15 +00:00
Chris Lattner
e288204194 revert r123146 which disabled code that wasn't the root cause
of the bootstrap miscompare issue.

llvm-svn: 123299
2011-01-12 01:52:23 +00:00
Chris Lattner
c7a5a12af5 revert r123149, reenabling an improvement to memcpyopt that wasn't
the source of the bootstrap problem.

llvm-svn: 123298
2011-01-12 01:43:46 +00:00
Jakob Stoklund Olesen
b935aa1678 Remove the PR8954 workaround.
llvm-svn: 123288
2011-01-11 22:56:41 +00:00
Cameron Zwarich
b0e688cc7c Dial back the speculative fix for PR8954 a bit, so that we only recompute dominators
once at the beginning of GVN instead of once per iteration.

llvm-svn: 123278
2011-01-11 22:14:42 +00:00
Cameron Zwarich
e8c1c44e01 Attempt to fix the bootstrap buildbot. Rafael says this works for him on x86-64 Linux.
llvm-svn: 123270
2011-01-11 20:23:34 +00:00
Chris Lattner
e5688368c6 update memdep when an instruction is deleted. This code isn't
actually reached in the testcase in PR8954, but it's safe and good
practice.

llvm-svn: 123224
2011-01-11 08:19:16 +00:00
Chris Lattner
dc7b2160ba Fix FoldSingleEntryPHINodes to update memdep and AA when it deletes
phi nodes.  It is called from MergeBlockIntoPredecessor which is 
called from GVN, which claims to preserve these.

I'm skeptical that this is the actual problem behind PR8954, but
this is a stab in the right direction.

llvm-svn: 123222
2011-01-11 08:13:40 +00:00
Chris Lattner
b1a9c9ed36 random cleanups
llvm-svn: 123221
2011-01-11 08:00:40 +00:00
Chris Lattner
5731a92f5b remove a bogus assertion: the latch block of a loop is not
neccesarily an uncond branch to the header.  This fixes 
PR8955 (the assertion tripping).

llvm-svn: 123219
2011-01-11 07:47:59 +00:00
Chris Lattner
1404348022 another random stab in the dark trying to fix llvm-gcc-i386-linux-selfhost
llvm-svn: 123149
2011-01-10 02:34:11 +00:00
Chris Lattner
b5562212e2 another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost
back to life.

llvm-svn: 123146
2011-01-10 00:47:34 +00:00
Chris Lattner
e8e9ec58bf temporarily disable memset formation from memsets in an effort to restore buildbot stability.
llvm-svn: 123144
2011-01-09 23:52:48 +00:00
Chris Lattner
82de29fb76 fix a few old bugs (found by inspection) where we would zap instructions
without informing memdep.  This could cause nondeterminstic weirdness 
based on where instructions happen to get allocated, and will hopefully
breath some life into some broken testers.

llvm-svn: 123124
2011-01-09 19:26:10 +00:00
Cameron Zwarich
afbf7a9fe3 LoopInstSimplify preserves LoopSimplify.
llvm-svn: 123117
2011-01-09 12:35:16 +00:00
Chris Lattner
fa37cac39c reduce indentation. Print <nuw> and <nsw> when dumping SCEV AddRec's
that have the bit set.

llvm-svn: 123104
2011-01-09 02:16:18 +00:00
Chris Lattner
7df7b47828 fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't
updating memdep when fusing stores together.  This fixes the crash optimizing
the bullet benchmark.

llvm-svn: 123091
2011-01-08 22:19:21 +00:00
Chris Lattner
563e57abbd tryMergingIntoMemset can only handle constant length memsets.
llvm-svn: 123090
2011-01-08 22:11:56 +00:00
Chris Lattner
98136397bd Merge memsets followed by neighboring memsets and other stores into
larger memsets.  Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:

_mad_synth_mute:                        ## @mad_synth_mute
## BB#0:                                ## %entry
	pushq	%rax
	movl	$4096, %esi             ## imm = 0x1000
	callq	___bzero
	popq	%rax
	ret

llvm-svn: 123089
2011-01-08 21:19:19 +00:00
Chris Lattner
e09439ed9d fix an issue in IsPointerOffset that prevented us from recognizing that
P and P+1 are relative to the same base pointer.

llvm-svn: 123087
2011-01-08 21:07:56 +00:00
Chris Lattner
20bf2d50b8 enhance memcpyopt to merge a store and a subsequent
memset into a single larger memset.

llvm-svn: 123086
2011-01-08 20:54:51 +00:00
Chris Lattner
19aedc848c constify TargetData references.
Split memset formation logic out into its own
"tryMergingIntoMemset" helper function.

llvm-svn: 123081
2011-01-08 20:24:01 +00:00
Chris Lattner
7d3c4712e9 When loop rotation happens, it is *very* common for the duplicated condbr
to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079
2011-01-08 19:59:06 +00:00
Chris Lattner
a01093c990 split ssa updating code out to its own helper function. Don't bother
moving the OrigHeader block anymore: we just merge it away anyway so
its code layout doesn't matter.

llvm-svn: 123077
2011-01-08 19:26:33 +00:00
Chris Lattner
e5f41b44c5 Implement a TODO: Enhance loopinfo to merge away the unconditional branch
that it was leaving in loops after rotation (between the original latch
block and the original header.

With this change, it is possible for rotated loops to have just a single
basic block, which is useful.

llvm-svn: 123075
2011-01-08 19:10:28 +00:00
Chris Lattner
b601620bcd inline preserveCanonicalLoopForm now that it is simple.
llvm-svn: 123073
2011-01-08 18:55:50 +00:00
Chris Lattner
db05334c7f Three major changes:
1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072
2011-01-08 18:52:51 +00:00
Chris Lattner
2fb7c9c272 LoopRotate requires canonical loop form, so it always has preheaders
and latch blocks.  Reorder entry conditions to make hte pass faster
and more logical.

llvm-svn: 123069
2011-01-08 18:06:22 +00:00
Chris Lattner
c6d905407b use the LI ivar.
llvm-svn: 123068
2011-01-08 17:49:51 +00:00
Chris Lattner
0893592a39 some cleanups: remove dead arguments and eliminate ivars
that are just passed to one function.

llvm-svn: 123067
2011-01-08 17:48:33 +00:00
Chris Lattner
ab9a79eda3 fix an issue duncan pointed out, which could cause loop rotate
to violate LCSSA form

llvm-svn: 123066
2011-01-08 17:38:45 +00:00
Cameron Zwarich
9a35e69c7d Fix coding style issues.
llvm-svn: 123065
2011-01-08 17:07:11 +00:00
Cameron Zwarich
a40df277f1 Make more passes preserve dominators (or state that they preserve dominators if
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.

The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.

llvm-svn: 123064
2011-01-08 17:01:52 +00:00
Cameron Zwarich
e97e0555d5 Contract subloop bodies. However, it is still important to visit the phis at the
top of subloop headers, as the phi uses logically occur outside of the subloop.

llvm-svn: 123062
2011-01-08 15:52:22 +00:00
Chris Lattner
6729ce1c33 Have loop-rotate simplify instructions (yay instsimplify!) as it clones
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops.  This also better exposes the 
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations 
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060
2011-01-08 08:24:46 +00:00
Chris Lattner
397937fa0d Revamp the ValueMapper interfaces in a couple ways:
1. Take a flags argument instead of a bool.  This makes
   it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
   map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
   more efficient.  For lookup failures, don't drop null values
   into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
   and LoopUnswitch, kill it.

No functionality change.

llvm-svn: 123058
2011-01-08 08:15:20 +00:00
Chris Lattner
9700b4d5ce two minor changes: switch to the standard ValueToValueMapTy
map from ValueMapper.h (giving us access to its utilities)
and add a fastpath in the loop rotation code, avoiding expensive
ssa updator manipulation for values with nothing to update.

llvm-svn: 123057
2011-01-08 07:21:31 +00:00