just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
llvm-svn: 86646
except that the result may not be a constant. Switch jump threading to
use it so that it gets things like (X & 0) -> 0, which occur when phi preds
are deleted and the remaining phi pred was a zero.
llvm-svn: 86637
This patch forbids implicit conversion of DenseMap::const_iterator to
DenseMap::iterator which was possible because DenseMapIterator inherited
(publicly) from DenseMapConstIterator. Conversion the other way around is now
allowed as one may expect.
The template DenseMapConstIterator is removed and the template parameter
IsConst which specifies whether the iterator is constant is added to
DenseMapIterator.
Actually IsConst parameter is not necessary since the constness can be
determined from KeyT but this is not relevant to the fix and can be addressed
later.
Patch by Victor Zverovich!
llvm-svn: 86636
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
llvm-svn: 86498
graphs being produced. The cause was that we were incorrectly marking sigma instructions as
processed after handling the sigma-specific constraints for them, potentially neglecting to
process them as normal instructions as well.
Unfortunately, the testcase that inspired this still doesn't work because of a bug in the solver,
which is next on the list to debug.
llvm-svn: 86486
when both the source and dest are illegal types, since it would cause
the phi to grow (for example, we shouldn't transform test14b's phi to
a phi on i320). This fixes an infinite loop on i686 bootstrap with
phi slicing turned on, so turn it back on.
llvm-svn: 86483
not turn a PHI in a legal type into a PHI of an illegal type, and
add a new optimization that breaks up insane integer PHI nodes into
small pieces (PR3451).
llvm-svn: 86443
(eliminating some extends) if the new type of the
computation is legal or if both the source and dest
are illegal. This prevents instcombine from changing big
chains of computation into i64 on 32-bit targets for
example.
llvm-svn: 86398
predicates. This allows us to jump thread things like:
_ZN12StringSwitchI5ColorE4CaseILj7EEERS1_RAT__KcRKS0_.exit119:
%tmp1.i24166 = phi i8 [ 1, %bb5.i117 ], [ %tmp1.i24165, %_Z....exit ], [ %tmp1.i24165, %bb4.i114 ]
%toBoolnot.i87 = icmp eq i8 %tmp1.i24166, 0 ; <i1> [#uses=1]
%tmp4.i90 = icmp eq i32 %tmp2.i, 6 ; <i1> [#uses=1]
%or.cond173 = and i1 %toBoolnot.i87, %tmp4.i90 ; <i1> [#uses=1]
br i1 %or.cond173, label %bb4.i96, label %_ZN12...
Where it is "obvious" that when coming from %bb5.i117 that the 'and' is always
false. This triggers a surprisingly high number of times in the testsuite,
and gets us closer to generating good code for doug's strswitch testcase.
This also make a bunch of other code in jump threading redundant, I'll rip
out in the next patch. This survived an enable-checking llvm-gcc bootstrap.
llvm-svn: 86264
to EmitGEPOffset.
Implement some new transforms for optimizing
subtracts of two pointer to ints into the same vector. This happens
for C++ iterator idioms for example, stringmap takes a const char*
that points to the start and end of a string. Once inlined, we want
the pointer difference to turn back into a length.
This is rdar://7362831.
llvm-svn: 86021
functions that don't have local linkage. Basically, we need to be more
careful about propagating argument information to functions whose results
we aren't tracking. This fixes a miscompilation of
LLVMCConfigurationEmitter.cpp when built with an llvm-gcc that has ipsccp
enabled.
llvm-svn: 85923
function to calls of that function, regardless of whether it has local
linkage or has its address taken. Not escaping should only affect
whether we make an aggressive assumption about the arguments to a
function, not whether we can track the result of it.
llvm-svn: 85795
a DenseMap. Doing this required being aware of subtle iterator
invalidation issues, but it provides a big speedup. In a
release-asserts build, this sped up optimizing 403.gcc from
1.34s -> 0.79s (IPSCCP) and 1.11s -> 0.44s (SCCP).
This commit also conflates in a bunch of general cleanups, sorry.
llvm-svn: 85788
phis, it didn't preserve the alignment of the load. This is a missed
optimization of the alignment is high and a miscompilation when the
alignment is low.
llvm-svn: 85736
PHI operands by the predecessor order, sort them by the order used by the
first PHI in the block. This is still suffucient to expose duplicates.
llvm-svn: 85634
Checks on Demand algorithm which looks at arbitrary branches instead of loop
iterations. This is GSoC work by Andre Tavares with only editorial changes
applied!
llvm-svn: 85382
GEPs (more than one non-zero index) into simple GEPs (at most one
non-zero index). In some simple experiments using this it's not
uncommon to see 3% overall code size wins, because it exposes
redundancies that can be eliminated, however it's tricky to use
because instcombine aggressively undoes the work that this pass does.
llvm-svn: 85144
strides for now, because it doesn't handle them correctly. This fixes a
miscompile of SingleSource/Benchmarks/Misc-C++/ray.
This problem was usually hidden because indvars transforms such induction
variables into negations of canonical induction variables.
llvm-svn: 85118
used elsewhere - an exit block is a block outside the loop branched to
from within the loop. An exiting block is a block inside the loop that
branches out.
llvm-svn: 85019
Update all analysis passes and transforms to treat free calls just like FreeInst.
Remove RaiseAllocations and all its tests since FreeInst no longer needs to be raised.
llvm-svn: 84987
Analysis/ConstantFolding.cpp. This doesn't change the behavior of
instcombine but makes other clients of ConstantFoldInstruction
able to handle loads. This was partially extracted from Eli's patch
in PR3152.
llvm-svn: 84836
Most changes are cleanup, but there is 1 correctness fix:
I fixed InstCombine so that the icmp is removed only if the malloc call is removed (which requires explicit removal because the Worklist won't DCE any calls since they can have side-effects).
llvm-svn: 84772
"In the existing code, if the load and the value to replace it with are
of different types *and* target data is available, it tries to use the
target data to coerce the replacement value to the type of the load.
Otherwise, it skips all effort to handle the type mismatch and just
feeds the wrongly-typed replacement value to replaceAllUsesWith, which
triggers an assertion.
The patch replaces it with an outer if checking for type mismatch, and
an inner if-else that checks whether target data is available and, if
not, returns false rather than trying to replace the load."
Patch by Kenneth Uildriks!
llvm-svn: 84739
the estimated code size and the number of blocks when deciding whether to
do a non-trivial unswitch. This protects it from some very undesirable
worst-case behavior on large numbers of loop-unswitchable conditions, such
as in the testcase in PR5259.
llvm-svn: 84661
when the invoke had multiple return values: it set the lattice value only on the
extractvalue.
This caused the invoke's lattice value to remain the default (undefined), and
later propagated to extractvalue's operand, which incorrectly introduces
undefined behavior.
llvm-svn: 84637
Update testcases that rely on malloc insts being present.
Also prematurely remove MallocInst handling from IndMemRemoval and RaiseAllocations to help pass tests in this incremental step.
llvm-svn: 84292
don't bother every time going around the main worklist. This speeds up a
release-asserts opt -std-compile-opts on 403.gcc by about 4% (1.5s). It
seems to speed up the most expensive instances of instcombine by ~10%.
llvm-svn: 84171
instruction (which disqualifies stores, unreachable, etc) and at least the
first operand is a constant. This filters out a lot of obvious cases that
can't be folded. Also, switch the IRBuilder to a TargetFolder, which tries
harder.
llvm-svn: 84170
BasicBlocks, so that it doesn't blindly procede in the presence of
large individual BasicBlocks. This addresses a class of code-size
expansion problems.
llvm-svn: 83992
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
Try #3, this time with some unneeded debug info stuff removed
which was causing dead pointers to be added to the worklist.
llvm-svn: 83818
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
llvm-svn: 83814
into a shuffle even if it was used by another insertelement. If the
visitation order of instcombine was wrong, this would turn a chain of
insertelements into a chain of shufflevectors, which was quite painful.
Since CollectShuffleElements handles these cases, the code can just
be nuked.
llvm-svn: 83810
input the the mul is a zext from bool, just that it is all zeros
other than the low bit. This fixes some phase ordering issues
that would cause us to miss some xforms in mul.ll when the worklist
is visited differently.
llvm-svn: 83794
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
llvm-svn: 83790
For now the metadata of sinked/hoisted instructions is still wrong, but that'll
be fixed when instructions will have debug metadata directly attached.
llvm-svn: 83786
done by condprop, but do it in a much more general form. The
basic idea is that we can do a limited form of tail duplication
in the case when we have a branch on a phi. Moving the branch
up in to the predecessor block makes instruction selection
much easier and encourages chained jump threadings.
llvm-svn: 83759
from GVN, this also speeds it up, inserts fewer PHI nodes (see the
testcase) and allows it to remove more loads (due to fewer PHI nodes
standing in the way).
llvm-svn: 83746
DemoteRegToStack. This makes it more efficient (because it isn't
creating a ton of load/stores that are eventually removed by a later
mem2reg), and more slightly more effective (because those load/stores
don't get in the way of threading).
llvm-svn: 83706
to declare that they preserve other passes without needing to pull in
additional header file or library dependencies. Convert MachineFunctionPass
and CodeGenLICM to make use of this.
llvm-svn: 83555
already on the worklist, and print Visited when an instruction is about to be
visited. Net, on one input, this reduced the output size by at least 9x.
llvm-svn: 83510
the new predicates I added) instead of going through a context and doing a
pointer comparison. Besides being cheaper, this allows a smart compiler
to turn the if sequence into a switch.
llvm-svn: 83297
that are phi nodes. Also tighten up FoldOpIntoPhi to treat constantexpr
operands to phis just like other variables, avoiding moving constantexpr
computations around.
Patch by Daniel Dunbar.
llvm-svn: 82913
from a piece of a large store when both are in the same block.
This allows clang to compile the testcase in PR4216 to this code:
_test_bitfield:
movl 4(%esp), %eax
movl %eax, %ecx
andl $-65536, %ecx
orl $32962, %eax
andl $40186, %eax
orl %ecx, %eax
ret
This is not ideal, but is a whole lot better than the code produced
by llvm-gcc:
_test_bitfield:
movw $-32574, %ax
orw 4(%esp), %ax
andw $-25350, %ax
movw %ax, 4(%esp)
movw 7(%esp), %cx
shlw $8, %cx
movzbl 6(%esp), %edx
orw %cx, %dx
movzwl %dx, %ecx
shll $16, %ecx
movzwl %ax, %eax
orl %ecx, %eax
ret
and dramatically better than that produced by gcc 4.2:
_test_bitfield:
pushl %ebx
call L3
"L00000000001$pb":
L3:
popl %ebx
movl 8(%esp), %eax
leal 0(,%eax,4), %edx
sarb $7, %dl
movl %eax, %ecx
andl $7168, %ecx
andl $-7201, %ebx
movzbl %dl, %edx
andl $1, %edx
sall $5, %edx
orl %ecx, %ebx
orl %edx, %ebx
andl $24, %eax
andl $-58336, %ebx
orl %eax, %ebx
orl $32962, %ebx
movl %ebx, %eax
popl %ebx
ret
llvm-svn: 82439
so that nonlocal and partially redundant loads can use it as well.
The testcase shows examples of craziness this can handle. This triggers
*many* times in 176.gcc.
llvm-svn: 82403
(and load -> load) when the base pointers must alias but when
they are different types. This occurs very very frequently in
176.gcc and other code that uses bitfields a lot.
llvm-svn: 82399
constants out of loops. These aren't covered by the regular LICM
pass, because in LLVM IR constants don't require separate
instructions. They're not always covered by the MachineLICM pass
either, because it doesn't know how to unfold folded constant-pool
loads. This is somewhat experimental at this point, and off by
default.
llvm-svn: 82076
more than one phi, since that leads to higher register pressure on
entry to the phi. This is especially problematic when the phi is in
a loop header, as it increases register pressure throughout the loop.
llvm-svn: 81993
loop exit edge -- new PHIs may be needed not only for the additional
splits that are made to preserve LoopSimplify form, but also for the
original split. Factor out the code that inserts new PHIs so that it
can be used for both. Remove LoopRotation.cpp's code for manually
updating LCSSA form, as it is now redundant. This fixes PR4934.
llvm-svn: 81363
that get created during loop unswitching, and fix SplitBlockPredecessors'
LCSSA updating code to create new PHIs instead of trying to just move
existing ones.
Also, optimize Loop::verifyLoop, since it gets called a lot. Use
searches on a sorted list of blocks instead of calling the "contains"
function, as is done in other places in the Loop class, since "contains"
does a linear search. Also, don't call verifyLoop from LoopSimplify or
LCSSA, as the PassManager is already calling verifyLoop as part of
LoopInfo's verifyAnalysis.
llvm-svn: 81221
extractelement operations into a bitcast of the pointer,
then a gep, then a scalar load. Disable this when the vector
only has one element, because it leads to infinite loops in
instcombine (PR4908).
This transformation seems like a really bad idea to me, as it
will likely disable CSE of vector load/stores etc and can be
better done in the code generator when profitable. This
goes all the way back to the first days of packed types,
r25299 specifically.
I'll let those people who care about the performance of vector
code decide what to do with this.
llvm-svn: 81185
- I think there are more instances of this, but I think they are fixed in Dan's
incoming patch. This one was preventing me from doing a bugpoint reduction
though.
llvm-svn: 81103
Constant uniquing tables. This allows distinct ConstantExpr objects
with the same operation and different flags.
Even though a ConstantExpr "a + b" is either always overflowing or
never overflowing (due to being a ConstantExpr), it's still necessary
to be able to represent it both with and without overflow flags at
the same time within the IR, because the safety of the flag may
depend on the context of the use. If the constant really does overflow,
it wouldn't ever be safe to use with the flag set, however the use
may be in code that is never actually executed.
This also makes it possible to merge all the flags tests into a single test.
llvm-svn: 80998
instead of a bool argument, and to do the dominator check itself.
This makes it eaiser to use when DominatorTree information is
available.
llvm-svn: 80920
simplifylibcalls optimization is thus valid for C++ but not C.
It's not important enough to worry about for C++ apps, so just
remove it.
rdar://7191924
llvm-svn: 80887