load to avoid even messing around with SSAUpdate at all. In this case (which
is very common, we can just use the input value directly).
This speeds up GVN time on gcc.c-torture/20001226-1.c from 36.4s to 16.3s,
which still isn't great, but substantially better and this is a simple speedup
that applies to lots of different cases.
llvm-svn: 91851
two-element arrays. After restructuring the SROA code, it was not safe to
do this without adding more checking. It is not clear that this special-case
has really been useful, and removing this simplifies the code quite a bit.
llvm-svn: 91828
'GetValueInMiddleOfBlock' case, instead of inserting
duplicates.
A similar fix is almost certainly needed by the machine-level
SSAUpdate implementation.
llvm-svn: 91820
implement some optimizations for MIN(MIN()) and MAX(MAX()) and
MIN(MAX()) etc. This substantially improves the code in PR5822 but
doesn't kick in much elsewhere. 2 max's were optimized in
pairlocalalign and one in smg2000.
llvm-svn: 91814
Use the presence of NSW/NUW to fold "icmp (x+cst), x" to a constant in
cases where it would otherwise be undefined behavior.
Surprisingly (to me at least), this triggers hundreds of the times in
a few benchmarks: lencode, ldecode, and 466.h264ref seem to *really*
like this.
llvm-svn: 91812
where instcombine would have to split a critical edge due to a
phi node of an invoke. Since instcombine can't change the CFG,
it has to bail out from doing the transformation.
llvm-svn: 91763
* change FindElementAndOffset to return a uint64_t instead of unsigned, and
to identify the type to be used for that result in a GEP instruction.
* move "isa<ConstantInt>" to be first in conditional.
* replace some dyn_casts with casts.
* add a comment about handling mem intrinsics.
llvm-svn: 91762
contains another loop, or an instruction. The loop form is
substantially more efficient on large loops than the typical
code it replaces.
llvm-svn: 91654
of 91296 that caused trouble -- the Processed list needs to be
preserved for the livetime of the pass, as AddUsersIfInteresting
is called from other passes.
llvm-svn: 91641
problem", this broke llvm-gcc bootstrap for release builds on
x86_64-apple-darwin10.
This reverts commit db22309800b224a9f5f51baf76071d7a93ce59c9.
llvm-svn: 91534
found last time. Instead of trying to modify the IR while iterating over it,
I've change it to keep a list of WeakVH references to dead instructions, and
then delete those instructions later. I also added some special case code to
detect and handle the situation when both operands of a memcpy intrinsic are
referencing the same alloca.
llvm-svn: 91459
isPodLike type trait. This is a generally useful type trait for
more than just DenseMap, and we really care about whether something
acts like a pod, not whether it really is a pod.
llvm-svn: 91421
While scanning through the uses of an alloca, keep track of the current offset
relative to the start of the alloca, and check memory references to see if
the offset & size correspond to a component within the alloca. This has the
nice benefit of unifying much of the code from isSafeUseOfAllocation,
isSafeElementUse, and isSafeUseOfBitCastedAllocation. The code to rewrite
the uses of a promoted alloca, after it is determined to be safe, is
reorganized in the same way.
Also, when rewriting GEP instructions, mark them as "in-bounds" since all the
indices are known to be safe.
llvm-svn: 91184
value size. This only manifested when memdep inprecisely returns clobber,
which is do to a caching issue in the PR5744 testcase. We can 'efficiently
emulate' this by using '-no-aa'
llvm-svn: 91004
phi translation of complex expressions like &A[i+1]. This has the
following benefits:
1. The phi translation logic is all contained in its own class with
a strong interface and verification that it is self consistent.
2. The logic is more correct than before. Previously, if intermediate
expressions got PHI translated, we'd miss the update and scan for
the wrong pointers in predecessor blocks. @phi_trans2 is a testcase
for this.
3. We have a lot less code in memdep.
We can handle phi translation across blocks of things like @phi_trans3,
which is pretty insane :).
This patch should fix the miscompiles of 255.vortex, and I tested it
with a bootstrap of llvm-gcc, llvm-test and dejagnu of course.
llvm-svn: 90926
handle cases like this:
void test(int N, double* G) {
long j;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
where G[1] isn't live into the loop.
llvm-svn: 90041
array indexes. The "complex" case of SRoA still handles them, and correctly.
This fixes a weirdness where we'd correctly avoid transforming A[0][42] if
the 42 was too large, but we'd only do it if it was one gep, not two separate
ones.
llvm-svn: 90007
generates store to undef and some generates store to null as the idiom
for undefined behavior. Since simplifycfg zaps both, don't remove the
undefined behavior in instcombine.
llvm-svn: 89971
ConstantExpr, not just the top-level operator. This allows it to
fold many more constants.
Also, make GlobalOpt call ConstantFoldConstantExpression on
GlobalVariable initializers.
llvm-svn: 89659
it may be used in contexts where preheader insertion may have failed due
to an indirectbr.
Make LoopSimplify's LoopSimplify::SeparateNestedLoop properly fail in
the case that it would require splitting an indirectbr edge.
These fix PR5502.
llvm-svn: 89484
tests/Transforms/InstCombine/shufflemask-undef.ll. If
anyone cares, the use of 2*e here (and the equivalent
all over the place in instcombine) seems wrong, though
harmless: it should really be twice the length of the
input vector. I think shufflevector used to require
that the mask have the same length as the input, but I
don't think that's true any more. I don't care enough
about vectors to do anything about this...
llvm-svn: 89456
if it is not ultimately captured. Teach BasicAliasAnalysis that a
local object address which does not escape and is never stored does
not alias with a value resulting from a load.
llvm-svn: 89398
they are lowered to instruction sequences more complex than a simple
load, such that CodeGen cannot rematerialize them, a reload from a
spill slot is likely to be cheaper than the complex sequence.
llvm-svn: 89374
running IPSCCP early, and we run functionattrs interlaced with the inliner,
we often (particularly for small or noop functions) completely propagate
all of the information about a call to its call site in IPSSCP (making a call
dead) and functionattrs is smart enough to realize that the function is
readonly (because it is interlaced with inliner).
To improve compile time and make the inliner threshold more accurate, realize
that we don't have to inline dead readonly function calls. Instead, just
delete the call. This happens all the time for C++ codes, here are some
counters from opt/llvm-ld counting the number of times calls were deleted vs
inlined on various apps:
Tramp3d opt:
5033 inline - Number of call sites deleted, not inlined
24596 inline - Number of functions inlined
llvm-ld:
667 inline - Number of functions deleted because all callers found
699 inline - Number of functions inlined
483.xalancbmk opt:
8096 inline - Number of call sites deleted, not inlined
62528 inline - Number of functions inlined
llvm-ld:
217 inline - Number of allocas merged together
2158 inline - Number of functions inlined
471.omnetpp:
331 inline - Number of call sites deleted, not inlined
8981 inline - Number of functions inlined
llvm-ld:
171 inline - Number of functions deleted because all callers found
629 inline - Number of functions inlined
Deleting a call is much faster than inlining it, and is insensitive to the
size of the callee. :)
llvm-svn: 86975
cannot be folded into target cmp instruction.
- Avoid a phase ordering issue where early cmp optimization would prevent the
later count-to-zero optimization.
- Add missing checks which could cause LSR to reuse stride that does not have
users.
- Fix a bug in count-to-zero optimization code which failed to find the pre-inc
iv's phi node.
- Remove, tighten, loosen some incorrect checks disable valid transformations.
- Quite a bit of code clean up.
llvm-svn: 86969
making the new LVI stuff smart enough to subsume some special
cases in the old code. Disable them when LVI is around, the
testcase still passes.
llvm-svn: 86951
llvm.invariant.start to be used without necessarily being paired with a call
to llvm.invariant.end. If you run the entire optimization pipeline then such
calls are in fact deleted (adce does it), but that's actually a good thing since
we probably do want them to be zapped late in the game. There should really be
an integration test that checks that the llvm.invariant.start call lasts long
enough that all passes that do interesting things with it get to do their stuff
before it is deleted. But since no passes do anything interesting with it yet
this will have to wait for later.
llvm-svn: 86840
start using them in a trivial way when -enable-jump-threading-lvi
is passed. enable-jump-threading-lvi will be my playground for
awhile.
llvm-svn: 86789
debug intrinsics, and an unconditional branch when possible. This
reuses the TryToSimplifyUncondBranchFromEmptyBlock function split
out of simplifycfg.
llvm-svn: 86722
just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
llvm-svn: 86646
except that the result may not be a constant. Switch jump threading to
use it so that it gets things like (X & 0) -> 0, which occur when phi preds
are deleted and the remaining phi pred was a zero.
llvm-svn: 86637
This patch forbids implicit conversion of DenseMap::const_iterator to
DenseMap::iterator which was possible because DenseMapIterator inherited
(publicly) from DenseMapConstIterator. Conversion the other way around is now
allowed as one may expect.
The template DenseMapConstIterator is removed and the template parameter
IsConst which specifies whether the iterator is constant is added to
DenseMapIterator.
Actually IsConst parameter is not necessary since the constness can be
determined from KeyT but this is not relevant to the fix and can be addressed
later.
Patch by Victor Zverovich!
llvm-svn: 86636
the loop. This is needed because with indirectbr it may not be possible
for LoopSimplify to guarantee that all loop exit predecessors are
inside the loop. This fixes PR5437.
LCCSA no longer actually requires LoopSimplify form, but for now it
must still have the dependency because the PassManager doesn't know
how to schedule LoopSimplify otherwise.
llvm-svn: 86569
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
llvm-svn: 86498
graphs being produced. The cause was that we were incorrectly marking sigma instructions as
processed after handling the sigma-specific constraints for them, potentially neglecting to
process them as normal instructions as well.
Unfortunately, the testcase that inspired this still doesn't work because of a bug in the solver,
which is next on the list to debug.
llvm-svn: 86486
when both the source and dest are illegal types, since it would cause
the phi to grow (for example, we shouldn't transform test14b's phi to
a phi on i320). This fixes an infinite loop on i686 bootstrap with
phi slicing turned on, so turn it back on.
llvm-svn: 86483
not turn a PHI in a legal type into a PHI of an illegal type, and
add a new optimization that breaks up insane integer PHI nodes into
small pieces (PR3451).
llvm-svn: 86443
(eliminating some extends) if the new type of the
computation is legal or if both the source and dest
are illegal. This prevents instcombine from changing big
chains of computation into i64 on 32-bit targets for
example.
llvm-svn: 86398
Here is the original commit message:
This commit updates malloc optimizations to operate on malloc calls that have constant int size arguments.
Update CreateMalloc so that its callers specify the size to allocate:
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86311
predicates. This allows us to jump thread things like:
_ZN12StringSwitchI5ColorE4CaseILj7EEERS1_RAT__KcRKS0_.exit119:
%tmp1.i24166 = phi i8 [ 1, %bb5.i117 ], [ %tmp1.i24165, %_Z....exit ], [ %tmp1.i24165, %bb4.i114 ]
%toBoolnot.i87 = icmp eq i8 %tmp1.i24166, 0 ; <i1> [#uses=1]
%tmp4.i90 = icmp eq i32 %tmp2.i, 6 ; <i1> [#uses=1]
%or.cond173 = and i1 %toBoolnot.i87, %tmp4.i90 ; <i1> [#uses=1]
br i1 %or.cond173, label %bb4.i96, label %_ZN12...
Where it is "obvious" that when coming from %bb5.i117 that the 'and' is always
false. This triggers a surprisingly high number of times in the testsuite,
and gets us closer to generating good code for doug's strswitch testcase.
This also make a bunch of other code in jump threading redundant, I'll rip
out in the next patch. This survived an enable-checking llvm-gcc bootstrap.
llvm-svn: 86264
unsplittable critical edges, which means the introduction of
loops which cannot be transformed to LoopSimplify form. Fix
LoopSimplify to avoid transforming such loops into invalid
code.
llvm-svn: 86176
makes several optimization passes abort in cases where they're currently
silently miscompiling code.
Remove the indirectbr assertion from SplitEdge. Indirectbr is only
a problem for critical edges, and SplitEdge defers to SplitCriticalEdge
to handle those, and SplitCriticalEdge has its own assertion for
indirectbr.
llvm-svn: 86147
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86077
to EmitGEPOffset.
Implement some new transforms for optimizing
subtracts of two pointer to ints into the same vector. This happens
for C++ iterator idioms for example, stringmap takes a const char*
that points to the start and end of a string. Once inlined, we want
the pointer difference to turn back into a length.
This is rdar://7362831.
llvm-svn: 86021
functions that don't have local linkage. Basically, we need to be more
careful about propagating argument information to functions whose results
we aren't tracking. This fixes a miscompilation of
LLVMCConfigurationEmitter.cpp when built with an llvm-gcc that has ipsccp
enabled.
llvm-svn: 85923
function to calls of that function, regardless of whether it has local
linkage or has its address taken. Not escaping should only affect
whether we make an aggressive assumption about the arguments to a
function, not whether we can track the result of it.
llvm-svn: 85795
a DenseMap. Doing this required being aware of subtle iterator
invalidation issues, but it provides a big speedup. In a
release-asserts build, this sped up optimizing 403.gcc from
1.34s -> 0.79s (IPSCCP) and 1.11s -> 0.44s (SCCP).
This commit also conflates in a bunch of general cleanups, sorry.
llvm-svn: 85788