related bug fixes and corresponding assertions for uninitialized data
and missing NULL check. Test cases will be included with the new LFTR.
llvm-svn: 135333
BranchProbabilityInfo (expect setEdgeWeight which is not available here).
Branch Weights are kept in MachineBasicBlocks. To turn off this analysis
set -use-mbpi=false.
llvm-svn: 133184
Patch by: Jakub Staszak!
Introduces BranchProbability. Changes unsigned to uint32_t all over and
uint64_t only when overflow is expected.
llvm-svn: 132867
BranchProbabilityInfo provides an interface for IR passes to query the
likelihood that control follows a CFG edge. This patch provides an
initial implementation of static branch predication that will populate
BranchProbabilityInfo for branches with no external profile
information using very simple heuristics. It currently isn't hooked up
to any external profile data, so static prediction does all the work.
llvm-svn: 132613
queries in the case of a DAG, where a query reaches a node
visited earlier, but it's not on a cycle. This avoids
MayAlias results in cases where BasicAA is expected to
return MustAlias or PartialAlias in order to protect TBAA.
llvm-svn: 132609
No functionality enabled by default. Use -disable-iv-rewrite.
Extended IVUsers to keep track of the phi that represents the users' IV.
Added the WidenIV transform to replace a narrow IV with a wide IV
by doing a one-for-one replacement of IV users instead of expanding the
SCEV expressions. [sz]exts are removed and truncs are inserted.
llvm-svn: 131744
wider load would allow elimination of subsequent loads, and when the wider
load is still a native integer type. This eliminates a ton of loads on
various benchmarks involving struct fields, though it is somewhat hobbled
by clang not being very aggressive about field alignment.
This is yet another step along the way towards resolving PR6627.
llvm-svn: 130390
This worked untill now because stars are aligned (i.e. num of complex address elments are always 0 or 2+ and when it is 2+ at least two elements are access together)
llvm-svn: 130225
return it as a clobber. This allows GVN to do smart things.
Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load. In this case, forward the value. This
allows us to compile stuff like this:
int test(void *P) {
int tmp = *(unsigned int*)P;
return tmp+*((unsigned char*)P+1);
}
into:
_test: ## @test
movl (%rdi), %ecx
movzbl %ch, %eax
addl %ecx, %eax
ret
which has one load. We already handled the case where the smaller
load was from a must-aliased base pointer.
llvm-svn: 130180
For example, on 32-bit architecture, don't promote all uses of the IV
to 64-bits just because one use is a 64-bit cast.
Alternate implementation of the patch by Arnaud de Grandmaison.
llvm-svn: 127884
SCEV may generate expressions composed of multiple pointers, which can
lead to invalid GEP expansion. Until we can teach SCEV to follow strict
pointer rules, make sure no bad GEPs creep into IR.
Fixes rdar://problem/9038671.
llvm-svn: 127839
properties.
Added the self-wrap flag for SCEV::AddRecExpr.
A slew of temporary FIXMEs indicate the intention of the no-self-wrap flag
without changing behavior in this revision.
llvm-svn: 127590
Use 8 bits from line number field to keep track of argument ordering while encoding debug info for an argument. That leaves 24 bit for line no, DebugLoc also allocates 24 bit for line numbers. If a function has more than 255 arguments then rest of the arguments will be ordered by llvm.dbg.* intrinsics' ordering in IR.
llvm-svn: 126793
a) Making it a per call site bonus for functions that we can move from
indirect to direct calls.
b) Reduces the bonus from 500 to 100 per call site.
c) Subtracts the size of the possible newly inlineable call from the
bonus to only add a bonus if we can inline a small function to devirtualize
it.
Also changes the bonus from a positive that's subtracted to a negative
that's added.
Fixes the remainder of rdar://8546196 by reducing the object file size
after inlining by 84%.
llvm-svn: 124916
Modified patch by Adam Preuss.
This builds on the existing framework for block tracing, edge profiling and optimal edge profiling.
See -help-hidden for new flags.
For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs.
llvm-svn: 124515
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487
a few loops accordingly. Should be no functional change.
This is a step for more accurate cost/benefit analysis of devirt/inlining
bonuses.
llvm-svn: 124275
optimized code are:
(non-negative number)+(power-of-two) != 0 -> true
and
(x | 1) != 0 -> true
Instcombine knows about the second one of course, but only does it if X|1
has only one use. These fire thousands of times in the testsuite.
llvm-svn: 124183
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.
Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.
llvm-svn: 124134
computation, the Ancestor field is always set to the Parent, so we can remove
the explicit link entirely and merge the Parent and Ancestor fields. Instead of
checking for whether an ancestor exists for a node or not, we simply check
whether the node has already been processed. This is simpler if Compress is
inlined into Eval, so I did that as well.
This is about a 3% speedup running -domtree on test-suite + SPEC2000 & SPEC2006,
but it also opens up some opportunities for further improvement.
llvm-svn: 124061
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
llvm-svn: 123417
Add methods for accessing the (single) entry / exit edge of a region. If no such
edge exists, null is returned. Both accessors return the start block of the
corresponding edge. The edge can finally be formed by utilizing
Region::getEntry() or Region::getExit();
Contributed by: Andreas Simbuerger <simbuerg@fim.uni-passau.de>
llvm-svn: 123410
void f(int* begin, int* end) { std::fill(begin, end, 0); }
which turns into a != exit expression where one pointer is
strided and (thanks to step #1) known to not overflow, and
the other is loop invariant.
The observation here is that, though the IV is strided by
4 in this case, that the IV *has* to become equal to the
end value. It cannot "miss" the end value by stepping over
it, because if it did, the strided IV expression would
eventually wrap around.
Handle this by turning A != B into "A-B != 0" where the A-B
part is known to be NUW.
llvm-svn: 123131
a pointer value has potentially become escaping. Implementations can choose to either fall back to
conservative responses for that value, or may recompute their analysis to accomodate the change.
llvm-svn: 122777
update a callGraph when performing the common operation of splicing the body to
a new function and updating all callers (such as via RAUW).
No users yet, though this is intended for DeadArgumentElimination as part of
PR8887.
llvm-svn: 122728
compile, and everyone's tests have shown it to be slower in practice, even for
quite large graphs.
I also hope to do an optimization that is only correct with the simpler data
structure, which would break this even further.
llvm-svn: 122684
naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket
for each vertex. However, this is unnecessary, because each vertex is only
placed into a single bucket (that of its semidominator), and each vertex's
bucket is processed before it is added to any bucket itself.
Instead of using a bucket per vertex, we use a single array Buckets that has two
purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores
the index of the first element in V's bucket. After V's bucket is processed,
Buckets[i] stores the index of the next element in the bucket to which V now
belongs, if any.
Reading from the buckets can also be optimized. Instead of processing the bucket
of V's parent at the end of processing V, we process the bucket of V itself at
the beginning of processing V. This means that the case of the root vertex can
be simplified somewhat. It also means that we don't need to look up the DFS
number of the semidominator of every node in the bucket we are processing,
since we know it is the current index being processed.
This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with
larger speedups of around 12% on the larger benchmarks like GCC.
llvm-svn: 122680
AliasAnalysis consumers, PartialAlias will be treated as MayAlias.
For AliasAnalysis chaining, MayAlias says "procede to the next analysis".
PartialAlias will be used to indicate that the query should terminate,
even though it didn't reach MustAlias or NoAlias.
llvm-svn: 121507
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
llvm-svn: 120974
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate. This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.
This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet. Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.
llvm-svn: 120406
are constant. There was in fact one exception to this (phi nodes) - so
remove that exception (InstructionSimplify handles this so there should
be no loss).
llvm-svn: 120015
destination location of a memcpy/memmove. I'm not clear about whether
TBAA works on these, so I'm leaving it out for now. Dan, please revisit
this when convenient.
llvm-svn: 119928
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class. Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form. Fixes PR8622.
llvm-svn: 119727
instructions out of InstCombine and into InstructionSimplify. While
there, introduce an m_AllOnes pattern to simplify matching with integers
and vectors with all bits equal to one.
llvm-svn: 119536
simplified to itself (this can only happen in unreachable blocks).
Change it to return null instead. Hopefully this will fix some
buildbot failures.
llvm-svn: 119490
over a phi node by applying it to each operand may be wrong if the
operation and the phi node are mutually interdependent (the testcase
has a simple example of this). So only do this transform if it would
be correct to perform the operation in each predecessor of the block
containing the phi, i.e. if the other operands all dominate the phi.
This should fix the FFMPEG snow.c regression reported by İsmail Dönmez.
llvm-svn: 119347
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
to optionally look for constant or local (alloca) memory.
Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.
Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.
llvm-svn: 118412
To create debugging information for a pointer, using DIBUilder front-end just needs
DBuilder.CreatePointerType(Ty, Size);
instead of
DebugFactory.CreateDerivedType(llvm::dwarf::DW_TAG_pointer_type,
TheCU, "", getOrCreateMainFile(),
0, Size, 0, 0, 0, OCTy);
llvm-svn: 118248
A RegionPass is executed like a LoopPass but on the regions detected by the
RegionInfo pass instead of the loops detected by the LoopInfo pass.
llvm-svn: 116905
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
getExpandedRegion() enables us to create non canonical regions. Those regions
can be used to define the largerst region, that fullfills a certain property.
llvm-svn: 116397
more careful not to call SimplifyInstructionsInBlock() on an unreachable block, the issue has been fixed at a higher level. Add
a big warning to SimplifyInstructionsInBlock() to hopefully prevent this in the future.
llvm-svn: 114117
isn't a good level of abstraction for memdep. Instead, generalize
AliasAnalysis::alias and related interfaces with a new Location
class for describing a memory location. For now, this is the same
Pointer and Size as before, plus an additional field for a TBAA tag.
Also, introduce a fixed MD_tbaa metadata tag kind.
llvm-svn: 113858
not unrolling loops that contain calls that would be better off getting inlined. This mostly
comes up when an interleaved devirtualization pass has devirtualized a call which the inliner
will inline on a future pass. Thus, rather than blocking all loops containing calls, add
a metric for "inline candidate calls" and block loops containing those instead.
llvm-svn: 113535
don't use any InlineCostAnalyzer state, and are useful for other clients who don't necessarily want to use
all of InlineCostAnalyzer's logic, some of which is fairly inlining-specific.
No intended functionality change.
llvm-svn: 113499
AliasAnalysis, and some code for implementing the new query on top of
existing implementations by making standard alias and getModRefInfo
queries.
llvm-svn: 113329
Loop::hasLoopInvariantOperands method. Remove
a useless and confusing Loop::isLoopInvariant(Instruction)
method, which didn't do what you thought it did.
No functionality change.
llvm-svn: 113133
assertingvh so we get a violent explosion if the pointer dangles.
2) Fix AliasSetTracker::deleteValue to remove call sites with
by-pointer comparisons instead of by-alias queries. Using
findAliasSetForCallSite can cause alias sets to get merged
when they shouldn't, and can also miss alias sets when the
call is readonly.
#2 fixes PR6889, which only repros with a .c file :(
llvm-svn: 112452
not part of the IR, are not uniqued, and may be safely RAUW'd.
This replaces a variety of alternate mechanisms for achieving
the same effect.
llvm-svn: 111681
expression being loop invariant is not equivalent to the property of
properly dominating the loop header.
Other optimizations have also made this optimization less important.
llvm-svn: 111160