into phis. This is actually the same bug as PR2262 /
2008-04-29-VolatileLoadDontMerge.ll, but I missed checking the first
predecessor for multiple successors. Testcase here:
InstCombine/2008-07-08-VolatileLoadMerge.ll
llvm-svn: 53240
Move GetConstantStringInfo to lib/Analysis. Remove
string output routine from Constant. Update all
callers. Change debug intrinsic api slightly to
accomodate move of routine, these now return values
instead of strings.
This unbreaks llvm-gcc bootstrap.
llvm-svn: 52884
string output routine from Constant. Update all
callers. Change debug intrinsic api slightly to
accomodate move of routine, these now return values
instead of strings.
llvm-svn: 52748
I'm at it, rename it to FindInsertedValue.
The only functional change is that newly created instructions are no longer
added to instcombine's worklist, but that is not really necessary anyway (and
I'll commit some improvements next that will completely remove the need).
llvm-svn: 52315
out of instcombine into a new file in libanalysis. This also teaches
ComputeNumSignBits about the number of sign bits in a constantint.
llvm-svn: 51863
the conditions for performing the transform when only the
function declaration is available: no longer allow turning
i32 into i64 for example. Only allow changing between
pointer types, and between pointer types and integers of
the same size. For return values ptr -> intptr was already
allowed; I added ptr -> ptr and intptr -> ptr while there.
As shown by a recent objc testcase, changing the way
parameters/return values are passed can be fatal when calling
code written in assembler that directly manipulates call
arguments and return values unless the transform has no
impact on the way they are passed at the codegen level.
While it is possible to imagine an ABI that treats integers
of pointer size differently to pointers, I don't think LLVM
supports any so the transform should now be safe while still
being useful.
llvm-svn: 51834
Analysis/ConstantFolding to fold ConstantExpr's, then make instcombine use it
to try to use targetdata to fold constant expressions on void instructions.
Also extend the icmp(inttoptr, inttoptr) folding to handle the case where
int size != ptr size.
llvm-svn: 51559
type and the other operand is a constant into integer comparisons.
This happens surprisingly frequently (e.g. 10 times in 471.omnetpp),
which are things like this:
%tmp8283 = sitofp i32 %tmp82 to double
%tmp1013 = fcmp ult double %tmp8283, 0.0
Clearly comparing tmp82 against i32 0 is cheaper here.
this also triggers 8 times in gobmk, including this one:
%tmp375376 = sitofp i32 %tmp375 to double
%tmp377 = fcmp ogt double %tmp375376, 8.150000e+01
which is comparing an integer against 81.5 :).
llvm-svn: 51268
intersecting bits. This triggers all over the place, for example in lencode,
with adds of stuff like:
%tmp580 = mul i32 %tmp579, 2
%tmp582 = and i32 %b8, 1
and
%tmp28 = shl i32 %abs.i, 1
%sign.0 = select i1 %tmp23, i32 1, i32 0
and
%tmp344 = shl i32 %tmp343, 2
%tmp346 = and i32 %tmp96, 3
etc.
llvm-svn: 51263
is bitcast to return a floating point value. The result of the instruction may
not be used by the program afterwards, and LLVM will happily remove all
instructions except the call. But, on some platforms, if a value is returned as
a floating point, it may need to be removed from the stack (like x87). Thus, we
can't get rid of the bitcast even if there isn't a use of the value.
llvm-svn: 51134
ComputeMaskedBits knows about cttz, ctlz, and ctpop. Teach
SelectionDAG's ComputeMaskedBits what InstCombine's knows
about SRem. And teach them both some things about high bits
in Mul, UDiv, URem, and Sub. This allows instcombine and
dagcombine to eliminate sign-extension operations in
several new cases.
llvm-svn: 50358
getelementptr-seteq.ll into:
define i1 @test(i64 %X, %S* %P) {
%C = icmp eq i64 %X, -1 ; <i1> [#uses=1]
ret i1 %C
}
instead of:
define i1 @test(i64 %X, %S* %P) {
%A.idx.mask = and i64 %X, 4611686018427387903 ; <i64> [#uses=1]
%C = icmp eq i64 %A.idx.mask, 4611686018427387903 ; <i1> [#uses=1]
ret i1 %C
}
And fixes the second half of PR2235. This speeds up the insertion sort
case by 45%, from 1.12s to 0.77s. In practice, this will significantly
speed up for loops structured like:
for (double *P = Base + N; P != Base; --P)
...
Which happens frequently for C++ iterators.
llvm-svn: 50079
in addition to integer expressions. Rewrite GetOrEnforceKnownAlignment
as a ComputeMaskedBits problem, moving all of its special alignment
knowledge to ComputeMaskedBits as low-zero-bits knowledge.
Also, teach ComputeMaskedBits a few basic things about Mul and PHI
instructions.
This improves ComputeMaskedBits-based simplifications in a few cases,
but more noticeably it significantly improves instcombine's alignment
detection for loads, stores, and memory intrinsics.
llvm-svn: 49492
the type instead of the byte size. This was causing troublesome mis-compilations.
True to form, this took 2 days to find and is a one-line fix. :-P
llvm-svn: 48354
1. There is now a "PAListPtr" class, which is a smart pointer around
the underlying uniqued parameter attribute list object, and manages
its refcount. It is now impossible to mess up the refcount.
2. PAListPtr is now the main interface to the underlying object, and
the underlying object is now completely opaque.
3. Implementation details like SmallVector and FoldingSet are now no
longer part of the interface.
4. You can create a PAListPtr with an arbitrary sequence of
ParamAttrsWithIndex's, no need to make a SmallVector of a specific
size (you can just use an array or scalar or vector if you wish).
5. All the client code that had to check for a null pointer before
dereferencing the pointer is simplified to just access the
PAListPtr directly.
6. The interfaces for adding attrs to a list and removing them is a
bit simpler.
Phase #2 will rename some stuff (e.g. PAListPtr) and do other less
invasive changes.
llvm-svn: 48289
was incorrectly simplifying "x == (gep x, 1, i)" into false, even
though i could be negative. As it turns out, all the code to
handle this already existed, we just need to disable the incorrect
optimization case and let the general case handle it.
llvm-svn: 46739
drop attributes on varargs call arguments. Also, it could generate
invalid IR if the transformed call already had the 'nest' attribute
somewhere (this can never happen for code coming from llvm-gcc,
but it's a theoretical possibility). Fix both problems.
llvm-svn: 45973
a load/store of i64. The later prevents promotion/scalarrepl of the
source and dest in many cases.
This fixes the 300% performance regression of the byval stuff on
stepanov_v1p2.
llvm-svn: 45945
direct calls bails out unless caller and callee have essentially
equivalent parameter attributes. This is illogical - the callee's
attributes should be of no relevance here. Rework the logic, which
incidentally fixes a crash when removed arguments have attributes.
llvm-svn: 45658
a direct call with cast parameters and cast return
value (if any), instcombine was prepared to cast any
non-void return value into any other, whether castable
or not. Add a new predicate for testing whether casting
is valid, and check it both for the return value and
(as a cleanup) for the parameters.
llvm-svn: 45657
calls 'nounwind'. It is important for correct C++
exception handling that nounwind markings do not get
lost, so this transformation is actually needed for
correctness.
llvm-svn: 45218
calls. Remove special casing of inline asm from the
inliner. There is a potential problem: the verifier
rejects invokes of inline asm (not sure why). If an
asm call is not marked "nounwind" in some .ll, and
instcombine is not run, but the inliner is run, then
an illegal module will be created. This is bad but
I'm not sure what the best approach is. I'm tempted
to remove the check in the verifier...
llvm-svn: 45073
2. Using zero-extended value of Scale and unsigned division is safe provided
that Scale doesn't have the sign bit set.
Previously these 2 instructions:
%p = bitcast [100 x {i8,i8,i8}]* %x to i8*
%q = getelementptr i8* %p, i32 -4
were combined into:
%q = getelementptr [100 x { i8, i8, i8 }]* %x, i32 0,
i32 1431655764, i32 0
what was incorrect.
llvm-svn: 44936
the function type, instead they belong to functions
and function calls. This is an updated and slightly
corrected version of Reid Spencer's original patch.
The only known problem is that auto-upgrading of
bitcode files doesn't seem to work properly (see
test/Bitcode/AutoUpgradeIntrinsics.ll). Hopefully
a bitcode guru (who might that be? :) ) will fix it.
llvm-svn: 44359
trivial difference in function attributes, allow calls to it to
be converted to direct calls. Based on a patch by Török Edwin.
While there, move the various lists of mutually incompatible
parameters etc out of the verifier and into ParameterAttributes.h.
llvm-svn: 44315
The meaning of getTypeSize was not clear - clarifying it is important
now that we have x86 long double and arbitrary precision integers.
The issue with long double is that it requires 80 bits, and this is
not a multiple of its alignment. This gives a primitive type for
which getTypeSize differed from getABITypeSize. For arbitrary precision
integers it is even worse: there is the minimum number of bits needed to
hold the type (eg: 36 for an i36), the maximum number of bits that will
be overwriten when storing the type (40 bits for i36) and the ABI size
(i.e. the storage size rounded up to a multiple of the alignment; 64 bits
for i36).
This patch removes getTypeSize (not really - it is still there but
deprecated to allow for a gradual transition). Instead there is:
(1) getTypeSizeInBits - a number of bits that suffices to hold all
values of the type. For a primitive type, this is the minimum number
of bits. For an i36 this is 36 bits. For x86 long double it is 80.
This corresponds to gcc's TYPE_PRECISION.
(2) getTypeStoreSizeInBits - the maximum number of bits that is
written when storing the type (or read when reading it). For an
i36 this is 40 bits, for an x86 long double it is 80 bits. This
is the size alias analysis is interested in (getTypeStoreSize
returns the number of bytes). There doesn't seem to be anything
corresponding to this in gcc.
(3) getABITypeSizeInBits - this is getTypeStoreSizeInBits rounded
up to a multiple of the alignment. For an i36 this is 64, for an
x86 long double this is 96 or 128 depending on the OS. This is the
spacing between consecutive elements when you form an array out of
this type (getABITypeSize returns the number of bytes). This is
TYPE_SIZE in gcc.
Since successive elements in a SequentialType (arrays, pointers
and vectors) need to be aligned, the spacing between them will be
given by getABITypeSize. This means that the size of an array
is the length times the getABITypeSize. It also means that GEP
computations need to use getABITypeSize when computing offsets.
Furthermore, if an alloca allocates several elements at once then
these too need to be aligned, so the size of the alloca has to be
the number of elements multiplied by getABITypeSize. Logically
speaking this doesn't have to be the case when allocating just
one element, but it is simpler to also use getABITypeSize in this
case. So alloca's and mallocs should use getABITypeSize. Finally,
since gcc's only notion of size is that given by getABITypeSize, if
you want to output assembler etc the same as gcc then getABITypeSize
is the size you want.
Since a store will overwrite no more than getTypeStoreSize bytes,
and a read will read no more than that many bytes, this is the
notion of size appropriate for alias analysis calculations.
In this patch I have corrected all type size uses except some of
those in ScalarReplAggregates, lib/Codegen, lib/Target (the hard
cases). I will get around to auditing these too at some point,
but I could do with some help.
Finally, I made one change which I think wise but others might
consider pointless and suboptimal: in an unpacked struct the
amount of space allocated for a field is now given by the ABI
size rather than getTypeStoreSize. I did this because every
other place that reserves memory for a type (eg: alloca) now
uses getABITypeSize, and I didn't want to make an exception
for unpacked structs, i.e. I did it to make things more uniform.
This only effects structs containing long doubles and arbitrary
precision integers. If someone wants to pack these types more
tightly they can always use a packed struct.
llvm-svn: 43620
Fix DecomposeSimpleLinearExpr to handle simple constants better.
Don't nuke gep(bitcast(allocation)) if the bitcast(allocation) will
fold the allocation. This fixes PR1728 and Instcombine/malloc3.ll
llvm-svn: 42891
double from some of the many places in the optimizers
it appears, and do something reasonable with x86
long double.
Make APInt::dump() public, remove newline, use it to
dump ConstantSDNode's.
Allow APFloats in FoldingSet.
Expand X86 backend handling of long doubles (conversions
to/from int, mostly).
llvm-svn: 41967
Use APFloat in UpgradeParser and AsmParser.
Change all references to ConstantFP to use the
APFloat interface rather than double. Remove
the ConstantFP double interfaces.
Use APFloat functions for constant folding arithmetic
and comparisons.
(There are still way too many places APFloat is
just a wrapper around host float/double, but we're
getting there.)
llvm-svn: 41747
This also changes the syntax for llvm.bswap, llvm.part.set, llvm.part.select, and llvm.ct* intrinsics. They are automatically upgraded by both the LLVM ASM reader and the bitcode reader. The test cases have been updated, with special tests added to ensure the automatic upgrading is supported.
llvm-svn: 40807
First teach instcombine that sign bit checks only demand the
sign bit, this allows simplify demanded bits to hack on
expressions better.
Second, teach instcombine that ashr is useless if only the
sign bit is demanded.
llvm-svn: 39880
transformation. Also, keep track of which end of the integer interval overflows
occur on. This fixes Transforms/InstCombine/2007-06-21-DivCompareMiscomp.ll
and rdar://5278853, a miscompilation of perl.
llvm-svn: 37692