Analysis/ConstantFolding.cpp. This doesn't change the behavior of
instcombine but makes other clients of ConstantFoldInstruction
able to handle loads. This was partially extracted from Eli's patch
in PR3152.
llvm-svn: 84836
Most changes are cleanup, but there is 1 correctness fix:
I fixed InstCombine so that the icmp is removed only if the malloc call is removed (which requires explicit removal because the Worklist won't DCE any calls since they can have side-effects).
llvm-svn: 84772
"In the existing code, if the load and the value to replace it with are
of different types *and* target data is available, it tries to use the
target data to coerce the replacement value to the type of the load.
Otherwise, it skips all effort to handle the type mismatch and just
feeds the wrongly-typed replacement value to replaceAllUsesWith, which
triggers an assertion.
The patch replaces it with an outer if checking for type mismatch, and
an inner if-else that checks whether target data is available and, if
not, returns false rather than trying to replace the load."
Patch by Kenneth Uildriks!
llvm-svn: 84739
the estimated code size and the number of blocks when deciding whether to
do a non-trivial unswitch. This protects it from some very undesirable
worst-case behavior on large numbers of loop-unswitchable conditions, such
as in the testcase in PR5259.
llvm-svn: 84661
When an incoming value for a PHI is updated, we must also updated all other
incoming values for the same BB to match, otherwise we create invalid PHIs.
llvm-svn: 84638
when the invoke had multiple return values: it set the lattice value only on the
extractvalue.
This caused the invoke's lattice value to remain the default (undefined), and
later propagated to extractvalue's operand, which incorrectly introduces
undefined behavior.
llvm-svn: 84637
where a loop's header is being split and it has predecessors which are not
contained by the most-nested loop which contains the loop.
This fixes PR5235.
llvm-svn: 84505
Update testcases that rely on malloc insts being present.
Also prematurely remove MallocInst handling from IndMemRemoval and RaiseAllocations to help pass tests in this incremental step.
llvm-svn: 84292
identifying the malloc as a non-array malloc. This broke GlobalOpt's optimization of stores of mallocs
to global variables.
The fix is to classify malloc's into 3 categories:
1. non-array mallocs
2. array mallocs whose array size can be determined
3. mallocs that cannot be determined to be of type 1 or 2 and cannot be optimized
getMallocArraySize() returns NULL for category 3, and all users of this function must avoid their
malloc optimization if this function returns NULL.
Eventually, currently unexpected codegen for computing the malloc's size argument will be supported in
isArrayMalloc() and getMallocArraySize(), extending malloc optimizations to those examples.
llvm-svn: 84199
don't bother every time going around the main worklist. This speeds up a
release-asserts opt -std-compile-opts on 403.gcc by about 4% (1.5s). It
seems to speed up the most expensive instances of instcombine by ~10%.
llvm-svn: 84171
instruction (which disqualifies stores, unreachable, etc) and at least the
first operand is a constant. This filters out a lot of obvious cases that
can't be folded. Also, switch the IRBuilder to a TargetFolder, which tries
harder.
llvm-svn: 84170
BasicBlocks, so that it doesn't blindly procede in the presence of
large individual BasicBlocks. This addresses a class of code-size
expansion problems.
llvm-svn: 83992
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
Try #3, this time with some unneeded debug info stuff removed
which was causing dead pointers to be added to the worklist.
llvm-svn: 83818
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
llvm-svn: 83814
into a shuffle even if it was used by another insertelement. If the
visitation order of instcombine was wrong, this would turn a chain of
insertelements into a chain of shufflevectors, which was quite painful.
Since CollectShuffleElements handles these cases, the code can just
be nuked.
llvm-svn: 83810
input the the mul is a zext from bool, just that it is all zeros
other than the low bit. This fixes some phase ordering issues
that would cause us to miss some xforms in mul.ll when the worklist
is visited differently.
llvm-svn: 83794
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
llvm-svn: 83790
For now the metadata of sinked/hoisted instructions is still wrong, but that'll
be fixed when instructions will have debug metadata directly attached.
llvm-svn: 83786
done by condprop, but do it in a much more general form. The
basic idea is that we can do a limited form of tail duplication
in the case when we have a branch on a phi. Moving the branch
up in to the predecessor block makes instruction selection
much easier and encourages chained jump threadings.
llvm-svn: 83759
from GVN, this also speeds it up, inserts fewer PHI nodes (see the
testcase) and allows it to remove more loads (due to fewer PHI nodes
standing in the way).
llvm-svn: 83746
DemoteRegToStack. This makes it more efficient (because it isn't
creating a ton of load/stores that are eventually removed by a later
mem2reg), and more slightly more effective (because those load/stores
don't get in the way of threading).
llvm-svn: 83706
and that will make Caller too big to inline, see if it
might be better to inline Caller into its callers instead.
This situation is described in PR 2973, although I haven't
tried the specific case in SPASS.
llvm-svn: 83602
to declare that they preserve other passes without needing to pull in
additional header file or library dependencies. Convert MachineFunctionPass
and CodeGenLICM to make use of this.
llvm-svn: 83555
already on the worklist, and print Visited when an instruction is about to be
visited. Net, on one input, this reduced the output size by at least 9x.
llvm-svn: 83510
the new predicates I added) instead of going through a context and doing a
pointer comparison. Besides being cheaper, this allows a smart compiler
to turn the if sequence into a switch.
llvm-svn: 83297
that are phi nodes. Also tighten up FoldOpIntoPhi to treat constantexpr
operands to phis just like other variables, avoiding moving constantexpr
computations around.
Patch by Daniel Dunbar.
llvm-svn: 82913
from a piece of a large store when both are in the same block.
This allows clang to compile the testcase in PR4216 to this code:
_test_bitfield:
movl 4(%esp), %eax
movl %eax, %ecx
andl $-65536, %ecx
orl $32962, %eax
andl $40186, %eax
orl %ecx, %eax
ret
This is not ideal, but is a whole lot better than the code produced
by llvm-gcc:
_test_bitfield:
movw $-32574, %ax
orw 4(%esp), %ax
andw $-25350, %ax
movw %ax, 4(%esp)
movw 7(%esp), %cx
shlw $8, %cx
movzbl 6(%esp), %edx
orw %cx, %dx
movzwl %dx, %ecx
shll $16, %ecx
movzwl %ax, %eax
orl %ecx, %eax
ret
and dramatically better than that produced by gcc 4.2:
_test_bitfield:
pushl %ebx
call L3
"L00000000001$pb":
L3:
popl %ebx
movl 8(%esp), %eax
leal 0(,%eax,4), %edx
sarb $7, %dl
movl %eax, %ecx
andl $7168, %ecx
andl $-7201, %ebx
movzbl %dl, %edx
andl $1, %edx
sall $5, %edx
orl %ecx, %ebx
orl %edx, %ebx
andl $24, %eax
andl $-58336, %ebx
orl %eax, %ebx
orl $32962, %ebx
movl %ebx, %eax
popl %ebx
ret
llvm-svn: 82439
so that nonlocal and partially redundant loads can use it as well.
The testcase shows examples of craziness this can handle. This triggers
*many* times in 176.gcc.
llvm-svn: 82403
(and load -> load) when the base pointers must alias but when
they are different types. This occurs very very frequently in
176.gcc and other code that uses bitfields a lot.
llvm-svn: 82399
In getMallocArraySize(), fix bug in the case that array size is the product of 2 constants.
Extend isArrayMalloc() and getMallocArraySize() to handle case where malloc is used as char array.
Ensure that ArraySize in LowerAllocations::runOnBasicBlock() is correct type.
Extend Instruction::isSafeToSpeculativelyExecute() to handle malloc calls.
Add verification for malloc calls.
Reviewed by Dan Gohman.
llvm-svn: 82257
constants out of loops. These aren't covered by the regular LICM
pass, because in LLVM IR constants don't require separate
instructions. They're not always covered by the MachineLICM pass
either, because it doesn't know how to unfold folded constant-pool
loads. This is somewhat experimental at this point, and off by
default.
llvm-svn: 82076
more than one phi, since that leads to higher register pressure on
entry to the phi. This is especially problematic when the phi is in
a loop header, as it increases register pressure throughout the loop.
llvm-svn: 81993
argpromote to avoid invalidating an iterator. This fixes PR4977.
All clang tests now pass with expensive checking (on my system
at least).
llvm-svn: 81843
within the notional bounds of the static type of the getelementptr (which
is not the same as "inbounds") from GlobalOpt into a utility routine,
and use it in ConstantFold.cpp to check whether there are any mis-behaved
indices.
llvm-svn: 81478
loop exit edge -- new PHIs may be needed not only for the additional
splits that are made to preserve LoopSimplify form, but also for the
original split. Factor out the code that inserts new PHIs so that it
can be used for both. Remove LoopRotation.cpp's code for manually
updating LCSSA form, as it is now redundant. This fixes PR4934.
llvm-svn: 81363