PHI node entries for unwind instructions just like for call instructions which
became invokes! This fixes PR57, tested by
Inline/2003-10-26-InlineInvokeExceptionDestPhi.ll
llvm-svn: 9526
Prototype option to save state in a global instead of as a Constant in
the Module. (Turned off, for now, with the on/off switch welded in the off
position. You get the idea.)
llvm-svn: 9500
CurFrame, TraceMode, and the CachedWriter are history.
The ExecutionAnnotations (SlotNumber, InstNumber, and FunctionInfo) are history.
ExecutionContext now keeps Values for each stack frame in a std::map.
printValue() and print() are history.
executeInstruction() is now part of run().
llvm-svn: 9493
CurFrame, TraceMode, and the CachedWriter are history.
ArrayChecksEnabled and non-QuietMode are history.
The ExecutionAnnotations (SlotNumber, InstNumber, and FunctionInfo) are history.
ExecutionContext now keeps Values for each stack frame in a std::map.
Stop pre-initializing Values on the stack to 42.
Remove some dead variables, excess whitespace and commented-out code.
executeInstruction() is now part of run().
printValue() and print() are history.
llvm-svn: 9489
recompile and relink. This keeps it from failing an assertion when
it goes and tries to construct a new MachineFunction for that Function.
llvm-svn: 9459
Make FnAllocState contain vectors of AllocInfo, instead of LLVM Constants.
Give doFinalization a method comment, and let it do the work of converting
AllocInfos to LLVM Constants.
llvm-svn: 9451
* Make file description more readable
* Make code layout more consistent, include comment in assert so it's visible
during execution if it hits
llvm-svn: 9430
as well as arguments. Now it can delete arguments and return values which are
only passed into other arguments or are returned, if they are dead. This causes
it to delete several hundred extra args/retvals from the C++ hello world program,
shrinking it by about 2K.
llvm-svn: 9398
Try to improve method comments a little.
Get rid of some excess whitespace; put braces on previous line when possible.
Add stub for method to verify the work of saveState().
llvm-svn: 9385
* It's no longer a BasicBlock pass: update comment on run() method
* Fix placement of braces to be consistent
* Delete extraneous whitespace
llvm-svn: 9361
the module. This change converts it from being a basic block pass to being
a simple pass. This allows elimination of the annotation and simplification
of the logic for moving constants into global variables.
llvm-svn: 9320
C is a constant which can be sign-extended from 8 bits without value loss,
and op is one of: add, sub, imul, and, or, xor.
This allows the JIT to emit the one byte version of the constant instead of
the two or 4 byte version. Because these instructions are very common, this
can save a LOT of code space. For example, I sampled two benchmarks, 176.gcc
and 254.gap.
BM Old New Reduction
176.gcc 2673621 2548962 4.89%
254.gap 498261 475104 4.87%
Note that while the percentage is not spectacular, this did eliminate
124.6 _KILOBYTES_ of codespace from gcc. Not bad.
Note that this doesn't effect the llc version at all, because the assembler
already does this optimization.
llvm-svn: 9284
getelementptr code path for use by other code paths (like malloc and alloca).
* Optimize comparisons with zero
* Generate neg, not, inc, and dec instructions, when possible.
This gives some code size wins, which might translate into performance. We'll
see tommorow in the nightly tester.
llvm-svn: 9267
Make insertFarJumpAtAddr() return void, because nothing uses its return value.
Remove some commented-out code.
Implement replaceMachineCodeForFunction() for SPARC.
llvm-svn: 9203
Rename SlotCalculator::getValSlot() to SlotCalculator::getSlot(),
SlotCalculator::insertValue() to SlotCalculator::getOrCreateSlot(),
SlotCalculator::insertVal() to SlotCalculator::insertValue(), and
SlotCalculator::doInsertVal() to SlotCalculator::doInsertValue().
llvm-svn: 9190
* Move the constructors from .h file here
* Document ExecutionEngine::create()
* Catch exception possibly thrown by ModuleProvider::releaseModule()
llvm-svn: 9181
For now, we translate linkonce into weak linkage in the bytecode format because
we don't have enough bits to represent it. We will rev the bytecode version
soon anyways, so this will be fixed in the near future.
llvm-svn: 9170
this list (except use_size()) are constant time. Before the killUse method
(used whenever something stopped using a value) was linear time, and thus
very very slow for large programs.
This speeds GCCAS up _substantially_ on large programs: almost 2x for 176.gcc:
176.gcc: 77.07s -> 37.38s
177.mesa: 7.59s -> 5.57s
252.eon: 21.02s -> 19.52s (*)
253.perlbmk: 11.40s -> 13.05s
254.gap: 7.25s -> 7.42s
252.eon would speed up a whole lot more, but optimization time is being
dominated by the inlining pass, which needs to be fixed.
llvm-svn: 9160
* Add header comment
* Remove extraneous #includes
* Move the FileType enum into the GCC class
* The GCC class is not virtual.
* Move all of the "constructor" functions into the classes themselves
* Stop using cl::list as arguments, use std::vector instead (which cl::list
derives from)
* Improve comments
llvm-svn: 9121
X86/linux. :( The problem is that a signal delivered while the function
is executing could clobber the functions stack. This is a partial fix
for PR41.
llvm-svn: 9113
break dominance relationships, and is otherwise bad. This fixes bug:
Inline/2003-10-13-AllocaDominanceProblem.ll. This also fixes miscompilation
of 3 176.gcc source files (reload1.c, global.c, flow.c)
llvm-svn: 9109
multiple times. This reduces the time to construct post-dominance sets a LOT.
For example, optimizing perlbmk goes from taking 12.9894s to 1.4074s.
llvm-svn: 9091
* Fix a nasty initializer ordering bug. Any only-CFG passes which registered
themselves before the CFGOnlyAnalysis vector initialized got forgotten and
thus got invalidated and recomputed.
In particular, in my compiled version of gccas, the Loop information pass was
being recomputed unnecessarily.
llvm-svn: 9074
Only transform call sites in a setjmp'ing function which are reachable from
the setjmp. If the call dominates the setjmp (for example), the called
function cannot longjmp to the setjmp.
This dramatically reduces the number of invoke instructions created in some
large testcases.
llvm-svn: 9066
have a SINGLE backedge. This is useful to, for example, the -indvars pass.
This implements testcase LoopSimplify/single-backedge.ll and closes PR#34
llvm-svn: 9065
* Print floating point values using C99 hexadecimal style FP if possible.
This increases the number of floating point constants that may be emitted
inline, and improves precision for global variable initializers which
can not be emitted in integer form.
This fixes the Olden/Power benchmark with the CBE!!!!
llvm-svn: 9052
* Fix isFPCSafeToPrint to find more constants safe to print, which it was
failing because ftostr was padding with leading space characters.
* Scan the entire module for global constants instead of each function at a
time. This has the advantage of allowing us to emit constants at global
scope instead of function scope. This speeds FP programs quite a bit.
llvm-svn: 9048
In lookupFunction():
Change to use "F" for Function argument instead of ancient "M".
Remove commented-out code.
Change to use GetAddressOfSymbol instead of dlsym.
llvm-svn: 9013
are ordered by name, not by slot, so the previous solution wasn't any good.
On a large testcase, this reduces time to parse from 2.17s to 1.58s.
llvm-svn: 9002
changes:
* BytecodeReader::getType(...) used to return a null pointer
on error. This was only checked about half the time. Now we convert
it to throw an exception, and delete the half that checked for error.
This was checked in before, but psmith crashed and lost the change :(
* insertValue no longer returns -1 on error, so callers don't need to
check for it.
* Substantial rewrite of InstructionReader.cpp, to use more efficient,
simpler, data structures. This provides another 5% speedup. This also
makes the code much easier to read and understand.
llvm-svn: 8984
new, simpler, ForwardReferences data structure. This is just the first
simple replacement, subsequent changes will improve the code more.
This simple change improves the performance of loading a file from HDF5
(contributed by Bill) from 2.36s to 1.93s, a 22% improvement. This
presumably has to do with the fact that we only create ONE placeholder for
a particular forward referenced values, and also may be because the data
structure is much simpler.
llvm-svn: 8979
in the bytecode parser. Before we tried to shoehorn basic blocks into the
"getValue" code path with other types of values. For a variety of reasons
this was a bad idea, so this patch separates it out into its own data structure.
This simplifies the code, makes it fit in 80 columns, and is also much faster.
In a testcase provided by Bill, which has lots of PHI nodes, this patch speeds
up bytecode parsing from taking 6.9s to taking 2.32s. More speedups to
follow later.
llvm-svn: 8977
of a test that Bill Wendling sent me from 228.5s to 105s. Obviously there is
more improvement to be had, but this is a nice speedup which should be "felt"
by many programs.
llvm-svn: 8962
and TargetInstrDescriptor::ImplicitUses to always point to a null
terminated array and never be null. So there is no need to check for
pointer validity when iterating over those sets. Code that looked
like:
if (const unsigned* AS = TID.ImplicitDefs) {
for (int i = 0; AS[i]; ++i) {
// use AS[i]
}
}
was changed to:
for (const unsigned* AS = TID.ImplicitDefs; *AS; ++AS) {
// use *AS
}
llvm-svn: 8960
of callees between executions.
On eon, in release mode, this changes the inliner from taking 11.5712s
to taking 2.2066s. In debug mode, it went from taking 14.4148s to
taking 7.0745s. In release mode, this is a 24.7% speedup of gccas, in
debug mode, it's a total speedup of 11.7%.
This also makes it slightly more aggressive. This could be because we
are not judging the size of the functions quite as accurately as before.
When we start looking at the performance of the generated code, this can
be investigated further.
llvm-svn: 8893
Running the inliner on 252.eon used to take 48.4763s, now it takes 14.4148s.
In release mode, it went from taking 25.8741s to taking 11.5712s.
This also fixes a FIXME.
llvm-svn: 8890
"minimal" SSA form (in other words, it doesn't insert dead PHIs). This
speeds up the mem2reg pass very significantly because it doesn't have to
do a lot of frivolous work in many common cases.
In the 252.eon function I have been playing with, this doesn't even insert
the 120 PHI nodes that it used to which were trivially dead (in the process
of promoting 356 alloca instructions overall). This speeds up the mem2reg
pass from 1.2459s to 0.1284s. More significantly, the DCE pass used to take
2.4138s to remove the 120 dead PHI nodes that mem2reg constructed, now it
takes 0.0134s (which is the time to scan the function and decide that there
is nothing dead). So overall, on this one function, we speed things up a
total of 3.5179s, which is a 24.8x speedup! :)
This change is tested by the Mem2Reg/2003-10-05-DeadPHIInsertion.ll test,
which now passes.
llvm-svn: 8884
basic block. This is amazingly common in code generated by the C/C++ front-ends.
This change makes it not have to insert ANY phi nodes, whereas before it would insert
a ton of dead ones which DCE would have to clean up.
Thus, this fix improves compile-time performance of these trivial allocas in two ways:
1. It doesn't have to do the walking and book-keeping for renaming
2. It does not insert dead phi nodes for them which would have to
subsequently be cleaned up.
On my favorite testcase from 252.eon, this special case handles 305 out of
356 promoted allocas in the function. It speeds up the mem2reg pass from 7.5256s
to 1.2505s. It inserts 677 fewer dead PHI nodes, which speeds up a subsequent
-dce pass from 18.7524s to 2.4806s.
There are still 120 trivially dead PHI nodes being inserted for variables used
in multiple basic blocks, but they are not handled by this patch.
llvm-svn: 8881
*** Revamp the code which handled unreachable code in the function. Now the
code is much more efficient for high-degree basic blocks, such as those
that occur in the 252.eon SPEC benchmark.
For the interested, the time to promote a SINGLE alloca in _ZN7mrScene4ReadERSi
function used to be > 3.5s. Now it is < .075s. The function has a LOT of
allocas in it, so it appeared to be infinite looping, this should make it much
nicer. :)
llvm-svn: 8863
work-list of value definitions. This allows elimination of the explicit
'iterative' step of the algorithm, and also reuses temporary memory better.
llvm-svn: 8861
* Do not insert a new entry into NewPhiNodes during the rename pass if there are no PHIs in a block.
* Do not compute WriteSets in parallel
llvm-svn: 8858