ilist of MachineInstr objects. This allows constant time removal and
insertion of MachineInstr instances from anywhere in each
MachineBasicBlock. It also allows for constant time splicing of
MachineInstrs into or out of MachineBasicBlocks.
llvm-svn: 11340
more of a testcase for profiling information than anything that should reasonably
be used, but it's a starting point. When I have more time I will whip this into
better shape.
llvm-svn: 11311
Having a proper 'select' instruction would allow the elimination of a lot
of the special case cruft in this patch, but we don't have one yet.
llvm-svn: 11307
in this for programs with lots of types (like the testcase in PR224).
The problem was that the type ID that the outer vector was using was not
very dense (as many types are getting resolved), so the vector is large
and gets reallocated a lot.
Since there are a lot of values in the program (the .ll file is 10M),
each reallocation has to copy the subvectors, which is also quite slow
(this wouldn't be a problem if C++ supported move semantics, but it
doesn't, at least not yet :(
Changing the outer data structure to a map speeds a release build of
llvm-as up from 11.21s to 5.13s on the testcase in PR224.
llvm-svn: 11244
this speeds up a release llvm-as from 21.95s to 11.21s, because before it
would do an expensive traversal of the type-graph of every type resolved.
llvm-svn: 11242
type at the same time, resolve the upreferences to each other before resolving
it to the outer type. This shaves off some time from the testcase in PR224, from
25.41s -> 21.72s.
llvm-svn: 11241
instead of randomly groping about inside its outEdges array.
Make SchedGraph::addDummyEdges() use getNumOutEdges() instead of
outEdges.size().
Get rid of ifdefed-out code in SchedGraph::buildGraph().
llvm-svn: 11238
consistent across the various type classes, we can factor out a LOT more
almost-identical code. Also, add a couple of temporary statistics.
llvm-svn: 11232
all of the ad-hoc storage of contained types. This allows getContainedType to
not be virtual, and allows us to entirely delete the TypeIterator class.
llvm-svn: 11230
contains the type we are looking for, just search the immediately used types.
We can only do this because we keep the "current" type in the nesting level
as we decrement upreferences.
This change speeds up the testcase in PR224 from 50.4s to 22.08s, not
too shabby.
llvm-svn: 11221
the Virt2PhysRegMap std::map with an std::vector. This speeds up the
register allocator another (almost) 40%, from .72->.45s in a release build
of LLC on 253.perlbmk.
llvm-svn: 11219
from physical registers, and they are always dense, it makes sense to not have
a ton of RBtree overhead. This change speeds up regalloclocal about ~30% on
253.perlbmk, from .35s -> .27s in the JIT (in LLC, it goes from .74 -> .55).
Now live variable analysis is the slowest codegen pass. Of course it doesn't
help that we have to run it twice, because regalloclocal doesn't update it,
but even if it did it would be the slowest pass (now it's just the 2x slowest
pass :(
llvm-svn: 11215
1. The "work" was not in the assert, so it was punishing the optimized release
2. getNamedFunction is _very_ expensive in large programs. It is not designed
to be used like this, and was taking 7% of the execution time of the code
generator on perlbmk.
Since the assert "can never fail", I'm just killing it.
llvm-svn: 11214
removeDeadNodes is called, only call it at the end of the pass being run.
This saves 1.3 seconds running DSA on 177.mesa (5.3->4.0s), which is
pretty big. This is only possible because of the automatic garbage
collection done on forwarding nodes.
llvm-svn: 11178
DSGraphs while they are forwarding. When the last reference to the forwarding
node is dropped, the forwarding node is autodeleted. This should simplify
removeTriviallyDead nodes, and is only (efficiently) possible because we are
using an ilist of dsnodes now.
llvm-svn: 11175
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
llvm-svn: 11141
The problem is that the dominator update code didn't "realize" that it's
possible for the newly inserted basic block to dominate anything. Because
it IS possible, stuff was getting updated wrong.
llvm-svn: 11137
complete rewrite of load-vn will make it a bit faster. This changes speeds up
the gcse pass (which uses load-vn) from 25.45s to 0.42s on the testcase in
PR209.
I've also verified that this gives the exact same results as the old one.
llvm-svn: 11132
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
llvm-svn: 11118
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
llvm-svn: 11116
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
llvm-svn: 11108
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
llvm-svn: 11107
instead of a loop that is really inefficient with large basic blocks.
This speeds up the inliner pass on the testcase in PR209 from 13.8s to 2.24s
which still isn't exactly speedy, but is a lot better. :)
llvm-svn: 11105
Basically we store floating point values as their integral components, instead of relying
on the semantics of floating point < to differentiate between values. This is likely to
make the map search be faster anyway.
llvm-svn: 11064
registers (not as the max number of registers).
Change toSpill from a std::set into a std::vector<bool>.
Use the reverse iterator adapter to do a reverse scan of allocatable
registers.
llvm-svn: 11061
of a linear search to find the first range for comparisons. This cuts
down the linear scan register allocator running time by a factor of 3
in 254.perlbmk and by a factor of 2.2 in 176.gcc.
llvm-svn: 11030
FP_REG_KILL instructions at the end of blocks involved with critical edges.
Fix a bug where FP_REG_KILL instructions weren't inserted in fall through
unconditional branches. Perhaps this will fix some linscan problems?
llvm-svn: 11019
function to find the globals, iterate over all of the globals directly. This
speeds the function up from 14s to 6.3s on perlbmk, reducing DSA time from
53->46s.
llvm-svn: 10996
This reduces the number of nodes allocated, then immediately merged and DNE'd
from 2193852 to 1298049. unfortunately this only speeds DSA up by ~1.5s (of
53s), because it's spending most of its time waddling through the scalar map :(
llvm-svn: 10992
Also, use RC::merge when possible, reducing the number of nodes allocated, then immediately merged away from 2985444 to 2193852 on perlbmk.
llvm-svn: 10991
it to be off. If it looks like it's completely unnecessary after testing, I
will remove it completely (which is the hope).
* Callers of the DSNode "copy ctor" can not choose to not copy links.
* Make node collapsing not create a garbage node in some cases, avoiding a
memory allocation, and a subsequent DNE.
* When merging types, allow two functions of different types to be merged
without collapsing.
* Use DSNodeHandle::isNull more often instead of DSNodeHandle::getNode() == 0,
as it is much more efficient.
*** Implement the new, more efficient reachability cloner class
In addition to only cloning nodes that are reachable from interesting
roots, this also fixes the huge inefficiency we had where we cloned lots
of nodes, only to merge them away immediately after they were cloned.
Now we only actually allocate a node if there isn't one to merge it into.
* Eliminate the now-obsolete cloneReachable* and clonePartiallyInto methods
* Rewrite updateFromGlobalsGraph to use the reachability cloner
* Rewrite mergeInGraph to use the reachability cloner
* Disable the scalar map scanning code in removeTriviallyDeadNodes. In large
SCC's, this is extremely expensive. We need a better data structure for the
scalar map, because we really want to scan the unique node handles, not ALL
of the scalars.
* Remove the incorrect SANER_CODE_FOR_CHECKING_IF_ALL_REFERRERS_ARE_FROM_SCALARMAP code.
* Move the code for eliminating integer nodes from the trivially dead
eliminator to the dead node eliminator.
* removeDeadNodes no longer uses removeTriviallyDeadNodes, as it contains a
superset of the node removal power.
* Only futz around with the globals graph in removeDeadNodes if it is modified
llvm-svn: 10987
efficient in the case where a function calls into the same graph multiple times
(ie, it either contains multiple calls to the same function, or multiple calls
to functions in the same SCC graph)
llvm-svn: 10986
out that the problem was actually the writer writing out a 'null' value
because it didn't normalize it. This fixes:
test/Regression/Assembler/2004-01-22-FloatNormalization.ll
llvm-svn: 10967
is a move between two registers, at least one of the registers is
virtual and the two live intervals do not overlap.
This results in about 40% reduction in intervals, 30% decrease in the
register allocators running time and a 20% increase in peephole
optimizations (mainly move eliminations).
The option can be enabled by passing -join-liveintervals where
appropriate.
llvm-svn: 10965