gep to explicit addressing, we know that none of the intermediate
computation overflows.
This could use review: it seems that the shifts certainly wouldn't
overflow, but could the intermediate adds overflow if there is a
negative index?
Previously the testcase would instcombine to:
define i1 @test(i64 %i) {
%p1.idx.mask = and i64 %i, 4611686018427387903
%cmp = icmp eq i64 %p1.idx.mask, 1000
ret i1 %cmp
}
now we get:
define i1 @test(i64 %i) {
%cmp = icmp eq i64 %i, 1000
ret i1 %cmp
}
llvm-svn: 125271
exact/nsw/nuw shifts and have instcombine infer them when it can prove
that the relevant properties are true for a given shift without them.
Also, a variety of refactoring to use the new patternmatch logic thrown
in for good luck. I believe that this takes care of a bunch of related
code quality issues attached to PR8862.
llvm-svn: 125267
optimizations to be much more aggressive in the face of
exact/nsw/nuw div and shifts. For example, these (which
are the same except the first is 'exact' sdiv:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
compile down to:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%1 = icmp eq i64 %X, 0
ret i1 %1
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%X.off = add i64 %X, 4
%1 = icmp ult i64 %X.off, 9
ret i1 %1
}
This happens when you do something like:
(ptr1-ptr2) == 42
where the pointers are pointers to non-unit types.
llvm-svn: 125266
and generally tidying things up. Only very trivial functionality changes
like now doing (-1 - A) -> (~A) for vectors too.
InstCombineAddSub.cpp | 296 +++++++++++++++++++++-----------------------------
1 file changed, 126 insertions(+), 170 deletions(-)
llvm-svn: 125264
Natural Loop Information
Loop Pass Manager
Canonicalize natural loops
Scalar Evolution Analysis
Loop Pass Manager
Induction Variable Users
Canonicalize natural loops
Induction Variable Users
Loop Strength Reduction
into this:
Scalar Evolution Analysis
Loop Pass Manager
Canonicalize natural loops
Induction Variable Users
Loop Strength Reduction
This fixes <rdar://problem/8869639>. I also filed PR9184 on doing this sort of
thing automatically, but it seems easier to just change the ordering of the
passes if this is the only case.
llvm-svn: 125254
versions of creation functions. Eventually, the "insertion point" versions
of these should just be removed, we do have IRBuilder afterall.
Do a massive rewrite of much of pattern match. It is now shorter and less
redundant and has several other widgets I will be using in other patches.
Among other changes, m_Div is renamed to m_IDiv (since it only matches
integer divides) and m_Shift is gone (it used to match all binops!!) and
we now have m_LogicalShift for the one client to use.
Enhance IRBuilder to have "isExact" arguments to things like CreateUDiv
and reduce redundancy within IRbuilder by having these methods chain to
each other more instead of duplicating code.
llvm-svn: 125194
could end up removing a different function than we intended because it was
functionally equivalent, then end up with a comparison of a function against
itself in the next round of comparisons (the one in the function set and the
one on the deferred list). To fix this, I introduce a choice in the form of
comparison for ComparableFunctions, either normal or "pointer only" used to
find exact Function*'s in lookups.
Also add some debugging statements.
llvm-svn: 125180
the active loop. This is generally desirable, and it avoids trouble
in situations such as the testcase in PR9123, though the failure
mode depends on use-list order, so it is infeasible to test.
llvm-svn: 125065
This makes the job of the later optzn passes easier, allowing the vast amount of
icmp transforms to chew on it.
We transform 840 switches in gcc.c, leading to a 16k byte shrink of the resulting
binary on i386-linux.
The testcase from README.txt now compiles into
decl %edi
cmpl $3, %edi
sbbl %eax, %eax
andl $1, %eax
ret
llvm-svn: 124724
that might have changed been affected by a merge elsewhere will have been
removed from the function set, and it isn't needed for performance because we
call grow() ahead of time to prevent reallocations.
llvm-svn: 124717
Modified patch by Adam Preuss.
This builds on the existing framework for block tracing, edge profiling and optimal edge profiling.
See -help-hidden for new flags.
For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs.
llvm-svn: 124515
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487