This is similar to the -unroll-threshold option. There should be no change in
behavior when -tail-dup-size is not explicit on the llc command line.
llvm-svn: 124564
to do this and more, but would only do it if X/Y had only one use. Spotted as the
most common missed simplification in SPEC by my auto-simplifier, now that it knows
about nuw/nsw/exact flags. This removes a bunch of multiplications from 447.dealII
and 483.xalancbmk. It also removes a lot from tramp3d-v4, which results in much
more inlining.
llvm-svn: 124560
This happens all the time when a smul is promoted to a larger type.
On x86-64 we now compile "int test(int x) { return x/10; }" into
movslq %edi, %rax
imulq $1717986919, %rax, %rax
movq %rax, %rcx
shrq $63, %rcx
sarq $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax"
addl %ecx, %eax
This fires 96 times in gcc.c on x86-64.
llvm-svn: 124559
This happens e.g. for code like "X - X%10" where we lower the modulo operation
to a series of multiplies and shifts that are then subtracted from X, leading to
this missed optimization.
llvm-svn: 124532
Modified patch by Adam Preuss.
This builds on the existing framework for block tracing, edge profiling and optimal edge profiling.
See -help-hidden for new flags.
For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs.
llvm-svn: 124515
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487
rdar://problem/8893967: JM/lencod miscompile at -arch armv7 -mthumb -O3
Added ResurrectKill to remove kill flags after we decide to reused a
physical register. And (hopefully) ensure that we call it in all the
right places.
Sorry, I'm not checking in a unit test given that it's a miscompile I
can't reproduce easily with a toy example. Failures in the rewriter
depend on a series of heuristic decisions maked during one of the many
upstream phases in codegen. This case would require coercing regalloc
to generate a couple of rematerialzations in a way that causes the
scavenger to reuse the same register at just the wrong point.
The general way to test this is to implement kill flags
verification. Then we could have a simple, robust compile-only unit
test. That would be worth doing if the whole pass was not about to
disappear. At this point we focus verification work on the next
generation of regalloc.
llvm-svn: 124442