problem", this broke llvm-gcc bootstrap for release builds on
x86_64-apple-darwin10.
This reverts commit db22309800b224a9f5f51baf76071d7a93ce59c9.
llvm-svn: 91534
found last time. Instead of trying to modify the IR while iterating over it,
I've change it to keep a list of WeakVH references to dead instructions, and
then delete those instructions later. I also added some special case code to
detect and handle the situation when both operands of a memcpy intrinsic are
referencing the same alloca.
llvm-svn: 91459
While scanning through the uses of an alloca, keep track of the current offset
relative to the start of the alloca, and check memory references to see if
the offset & size correspond to a component within the alloca. This has the
nice benefit of unifying much of the code from isSafeUseOfAllocation,
isSafeElementUse, and isSafeUseOfBitCastedAllocation. The code to rewrite
the uses of a promoted alloca, after it is determined to be safe, is
reorganized in the same way.
Also, when rewriting GEP instructions, mark them as "in-bounds" since all the
indices are known to be safe.
llvm-svn: 91184
value size. This only manifested when memdep inprecisely returns clobber,
which is do to a caching issue in the PR5744 testcase. We can 'efficiently
emulate' this by using '-no-aa'
llvm-svn: 91004
add, there is no need to scan the world to find the same add again.
This invalidates the previous testcase, which wasn't wonderful anyway,
because it needed a run of instcombine to permute the use-lists in
just the right way to before GVN was run (so it was really fragile).
Not a big loss.
llvm-svn: 90973
phi translation of complex expressions like &A[i+1]. This has the
following benefits:
1. The phi translation logic is all contained in its own class with
a strong interface and verification that it is self consistent.
2. The logic is more correct than before. Previously, if intermediate
expressions got PHI translated, we'd miss the update and scan for
the wrong pointers in predecessor blocks. @phi_trans2 is a testcase
for this.
3. We have a lot less code in memdep.
We can handle phi translation across blocks of things like @phi_trans3,
which is pretty insane :).
This patch should fix the miscompiles of 255.vortex, and I tested it
with a bootstrap of llvm-gcc, llvm-test and dejagnu of course.
llvm-svn: 90926
handle cases like this:
void test(int N, double* G) {
long j;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
where G[1] isn't live into the loop.
llvm-svn: 90041
translation of add with immediate. This allows us
to optimize this function:
void test(int N, double* G) {
long j;
G[1] = 1;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
to only do one load every iteration of the loop.
llvm-svn: 90013
array indexes. The "complex" case of SRoA still handles them, and correctly.
This fixes a weirdness where we'd correctly avoid transforming A[0][42] if
the 42 was too large, but we'd only do it if it was one gep, not two separate
ones.
llvm-svn: 90007
generates store to undef and some generates store to null as the idiom
for undefined behavior. Since simplifycfg zaps both, don't remove the
undefined behavior in instcombine.
llvm-svn: 89971
ConstantExpr, not just the top-level operator. This allows it to
fold many more constants.
Also, make GlobalOpt call ConstantFoldConstantExpression on
GlobalVariable initializers.
llvm-svn: 89659
it may be used in contexts where preheader insertion may have failed due
to an indirectbr.
Make LoopSimplify's LoopSimplify::SeparateNestedLoop properly fail in
the case that it would require splitting an indirectbr edge.
These fix PR5502.
llvm-svn: 89484
if it is not ultimately captured. Teach BasicAliasAnalysis that a
local object address which does not escape and is never stored does
not alias with a value resulting from a load.
llvm-svn: 89398
they are lowered to instruction sequences more complex than a simple
load, such that CodeGen cannot rematerialize them, a reload from a
spill slot is likely to be cheaper than the complex sequence.
llvm-svn: 89374
running IPSCCP early, and we run functionattrs interlaced with the inliner,
we often (particularly for small or noop functions) completely propagate
all of the information about a call to its call site in IPSSCP (making a call
dead) and functionattrs is smart enough to realize that the function is
readonly (because it is interlaced with inliner).
To improve compile time and make the inliner threshold more accurate, realize
that we don't have to inline dead readonly function calls. Instead, just
delete the call. This happens all the time for C++ codes, here are some
counters from opt/llvm-ld counting the number of times calls were deleted vs
inlined on various apps:
Tramp3d opt:
5033 inline - Number of call sites deleted, not inlined
24596 inline - Number of functions inlined
llvm-ld:
667 inline - Number of functions deleted because all callers found
699 inline - Number of functions inlined
483.xalancbmk opt:
8096 inline - Number of call sites deleted, not inlined
62528 inline - Number of functions inlined
llvm-ld:
217 inline - Number of allocas merged together
2158 inline - Number of functions inlined
471.omnetpp:
331 inline - Number of call sites deleted, not inlined
8981 inline - Number of functions inlined
llvm-ld:
171 inline - Number of functions deleted because all callers found
629 inline - Number of functions inlined
Deleting a call is much faster than inlining it, and is insensitive to the
size of the callee. :)
llvm-svn: 86975
llvm.invariant.start to be used without necessarily being paired with a call
to llvm.invariant.end. If you run the entire optimization pipeline then such
calls are in fact deleted (adce does it), but that's actually a good thing since
we probably do want them to be zapped late in the game. There should really be
an integration test that checks that the llvm.invariant.start call lasts long
enough that all passes that do interesting things with it get to do their stuff
before it is deleted. But since no passes do anything interesting with it yet
this will have to wait for later.
llvm-svn: 86840
debug intrinsics, and an unconditional branch when possible. This
reuses the TryToSimplifyUncondBranchFromEmptyBlock function split
out of simplifycfg.
llvm-svn: 86722
just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
llvm-svn: 86646
the loop. This is needed because with indirectbr it may not be possible
for LoopSimplify to guarantee that all loop exit predecessors are
inside the loop. This fixes PR5437.
LCCSA no longer actually requires LoopSimplify form, but for now it
must still have the dependency because the PassManager doesn't know
how to schedule LoopSimplify otherwise.
llvm-svn: 86569
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
llvm-svn: 86498
when both the source and dest are illegal types, since it would cause
the phi to grow (for example, we shouldn't transform test14b's phi to
a phi on i320). This fixes an infinite loop on i686 bootstrap with
phi slicing turned on, so turn it back on.
llvm-svn: 86483
not turn a PHI in a legal type into a PHI of an illegal type, and
add a new optimization that breaks up insane integer PHI nodes into
small pieces (PR3451).
llvm-svn: 86443
(eliminating some extends) if the new type of the
computation is legal or if both the source and dest
are illegal. This prevents instcombine from changing big
chains of computation into i64 on 32-bit targets for
example.
llvm-svn: 86398
Here is the original commit message:
This commit updates malloc optimizations to operate on malloc calls that have constant int size arguments.
Update CreateMalloc so that its callers specify the size to allocate:
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86311
predicates. This allows us to jump thread things like:
_ZN12StringSwitchI5ColorE4CaseILj7EEERS1_RAT__KcRKS0_.exit119:
%tmp1.i24166 = phi i8 [ 1, %bb5.i117 ], [ %tmp1.i24165, %_Z....exit ], [ %tmp1.i24165, %bb4.i114 ]
%toBoolnot.i87 = icmp eq i8 %tmp1.i24166, 0 ; <i1> [#uses=1]
%tmp4.i90 = icmp eq i32 %tmp2.i, 6 ; <i1> [#uses=1]
%or.cond173 = and i1 %toBoolnot.i87, %tmp4.i90 ; <i1> [#uses=1]
br i1 %or.cond173, label %bb4.i96, label %_ZN12...
Where it is "obvious" that when coming from %bb5.i117 that the 'and' is always
false. This triggers a surprisingly high number of times in the testsuite,
and gets us closer to generating good code for doug's strswitch testcase.
This also make a bunch of other code in jump threading redundant, I'll rip
out in the next patch. This survived an enable-checking llvm-gcc bootstrap.
llvm-svn: 86264
unsplittable critical edges, which means the introduction of
loops which cannot be transformed to LoopSimplify form. Fix
LoopSimplify to avoid transforming such loops into invalid
code.
llvm-svn: 86176
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86077
to EmitGEPOffset.
Implement some new transforms for optimizing
subtracts of two pointer to ints into the same vector. This happens
for C++ iterator idioms for example, stringmap takes a const char*
that points to the start and end of a string. Once inlined, we want
the pointer difference to turn back into a length.
This is rdar://7362831.
llvm-svn: 86021
functions that don't have local linkage. Basically, we need to be more
careful about propagating argument information to functions whose results
we aren't tracking. This fixes a miscompilation of
LLVMCConfigurationEmitter.cpp when built with an llvm-gcc that has ipsccp
enabled.
llvm-svn: 85923
function to calls of that function, regardless of whether it has local
linkage or has its address taken. Not escaping should only affect
whether we make an aggressive assumption about the arguments to a
function, not whether we can track the result of it.
llvm-svn: 85795