This means that we have to include an additional header.
This patch should be functionally equivalent. I cannot outrule any performance
degradation, though I do not expect any.
llvm-svn: 61694
instructions to avoid copies, because TwoAddressInstructionPass
also does this optimization. The scheduler's version didn't
account for live-out values, which resulted in spurious commutes
and missed opportunities.
Now, TwoAddressInstructionPass handles all the opportunities,
instead of just those that the scheduler missed. The result is
usually the same, though there are occasional trivial differences
resulting from the avoidance of spurious commutes.
llvm-svn: 61611
and clean recursive descent parser.
This change has a couple of ramifications:
1. The parser code is about 400 lines shorter (in what we maintain, not
including what is autogenerated).
2. The code should be significantly faster than the old code because we
don't have to work around bison's poor handling of datatypes with
ctors/dtors. This also makes the code much more resistant to memory
leaks.
3. We now get caret diagnostics from the .ll parser, woo.
4. The actual diagnostics emited from the parser are completely different
so a bunch of testcases had to be updated.
5. I now disallow "%ty = type opaque %ty = type i32". There was no good
reason to support this, it was just an accident of the old
implementation. I have no reason to think that anyone is actually using
this.
6. The syntax for sticking a global variable has changed to make it
unambiguous. I don't think anyone is depending on this since only clang
supports this and it is not solid yet, so I'm not worried about anything
breaking.
7. This gets rid of the last use of bison, and along with it the .cvs files.
I'll prune this from the makefiles as a subsequent commit.
There are a few minor cleanups that can be done after this commit (suggestions
welcome!) but this passes dejagnu testing and is ready for its time in the
limelight.
llvm-svn: 61558
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
to work out (in a very simplistic way) which function
arguments (pointer arguments only) are only dereferenced
and so do not escape. Mark such arguments 'nocapture'.
llvm-svn: 61525
This removes all the _8, _16, _32, and _64 opcodes and replaces each
group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode
is now used to carry the size information. In tablegen, the size-specific
opcodes are replaced by size-independent opcodes that utilize the
ability to compose them with predicates.
This shrinks the per-opcode tables and makes the code that handles
atomics much more concise.
llvm-svn: 61389
172 %ECX<def> = MOV32rr %reg1039<kill>
180 INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>,
36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0
188 %EAX<def> = MOV32rr %EAX<kill>
196 %ECX<def> = MOV32rr %ECX<kill>
204 %ECX<def> = MOV32rr %ECX<kill>
212 %EAX<def> = MOV32rr %EAX<kill>
220 %EAX<def> = MOV32rr %EAX
228 %reg1039<def> = MOV32rr %ECX<kill>
The early clobber operand ties ECX input to the ECX def.
The live interval of ECX is represented as this:
%reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47)
The right way to represent this is something like
%reg20,inf = [46,47:2)[174,182:1)[181:230:0) 0@174-(182) 1@181-230 @2@46-(47)
Of course that won't work since that means overlapping live ranges defined by two val#.
The workaround for now is to add a bit to val# which says the val# is redefined by a early clobber def somewhere. This prevents the move at 228 from being optimized away by SimpleRegisterCoalescing::AdjustCopiesBackFrom.
llvm-svn: 61259
The EH_frame and .eh symbols are now private, except for darwin9 and earlier.
The patch also fixes the definition of PrivateGlobalPrefix on pcc linux.
llvm-svn: 61242
The problematic part of this patch is that we were out of attribute bits,
requiring some fancy bit hacking to make it fit (by shrinking alignment)
without breaking existing users or the file format.
This change will require users to rebuild llvm-gcc to match llvm.
llvm-svn: 61239
- ability to insert previously created instructions using a builder
- creation of aliases
- creation of inline asm constants
Patch by Zoltan Varga!
llvm-svn: 61153
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
llvm-svn: 61073
alignment attribute such that 0 means unaligned.
This will probably require a rebuild of llvm-gcc because of the change to
Attributes.h. If you see many test failures on "make check", please rebuild
your llvm-gcc.
llvm-svn: 61030
memdep keeps track of how PHIs affect the pointer in dep queries, which
allows it to eliminate the load in cases like rle-phi-translate.ll, which
basically end up being:
BB1:
X = load P
br BB3
BB2:
Y = load Q
br BB3
BB3:
R = phi [P] [Q]
load R
turning "load R" into a phi of X/Y. In addition to additional exposed
opportunities, this makes memdep safe in many cases that it wasn't before
(which is required for load PRE) and also makes it substantially more
efficient. For example, consider:
bb1: // has many predecessors.
P = some_operator()
load P
In this example, previously memdep would scan all the predecessors of BB1
to see if they had something that would mustalias P. In some cases (e.g.
test/Transforms/GVN/rle-must-alias.ll) it would actually find them and end
up eliminating something. In many other cases though, it would scan and not
find anything useful. MemDep now stops at a block if the pointer is defined
in that block and cannot be phi translated to predecessors. This causes it
to miss the (rare) cases like rle-must-alias.ll, but makes it faster by not
scanning tons of stuff that is unlikely to be useful. For example, this
speeds up GVN as a whole from 3.928s to 2.448s (60%)!. IMO, scalar GVN
should be enhanced to simplify the rle-must-alias pointer base anyway, which
would allow the loads to be eliminated.
In the future, this should be enhanced to phi translate through geps and
bitcasts as well (as indicated by FIXMEs) making memdep even more powerful.
llvm-svn: 61022
callee will not introduce any new aliases of that pointer.
The attributes had all bits allocated already, so I decided to collapse
alignment. Alignment was previously stored as a 16-bit integer from bits 16 to
32 of the attribute, but it was required to be a power of 2. Now it's stored in
log2 encoded form in five bits from 16 to 21. That gives us 11 more bits of
space.
You may have already noticed that you only need four bits to encode a 16-bit
power of two, so why five bits? Because the AsmParser accepted 32-bit
alignments, even though we couldn't store them (they were silently discarded).
Now we can store them in memory, but not in the bitcode.
The bitcode format was already storing these as 64-bit VBR integers. So, the
bitcode format stays the same, keeping the alignment values stored as 16 bit
raw values. There's some hideous code in the reader and writer that deals with
this, waiting to be ripped out the moment we run out of bits again and have to
replace the parameter attributes table encoding.
llvm-svn: 61019
This stuff is not used outside Base.td, and with the conversion of the
compilation graph to string-based format became much less (if at all)
useful.
llvm-svn: 60873
node latencies. Use CalcLatency instead of manual code in
CalculatePriorities to keep it consistent. Previously it
computed slightly different results.
llvm-svn: 60817
The Cost field is removed. It was only being used in a very limited way,
to indicate when the scheduler should attempt to protect a live register,
and it isn't really needed to do that. If we ever want the scheduler to
start inserting copies in non-prohibitive situations, we'll have to
rethink some things anyway.
A Latency field is added. Instead of giving each node a single
fixed latency, each edge can have its own latency. This will eventually
be used to model various micro-architecture properties more accurately.
The PointerIntPair class and an internal union are now used, which
reduce the overall size.
llvm-svn: 60806
of a pointer. This allows is to catch more equivalencies. For example,
the type_lists_compatible_p function used to require two iterations of
the gvn pass (!) to delete its 18 redundant loads because the first pass
would CSE all the addressing computation cruft, which would unblock the
second memdep/gvn passes from recognizing them. This change allows
memdep/gvn to catch all 18 when run just once on the function (as is
typical :) instead of just 3.
On all of 403.gcc, this bumps up the # reundandancies found from:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
to:
63 gvn - Number of instructions PRE'd
154137 gvn - Number of instructions deleted
50185 gvn - Number of loads deleted
+120 loads deleted isn't bad.
llvm-svn: 60799
essential problem was that the DAG can contain
random unused nodes which were never analyzed.
When remapping a value of a node being processed,
such a node may become used and need to be analyzed;
however due to operands being transformed during
analysis the node may morph into a different one.
Users of the morphing node need to be updated, and
this wasn't happening. While there I added a bunch
of documentation and sanity checks, so I (or some
other poor soul) won't have to scratch their head
over this stuff so long trying to remember how it
was all supposed to work next time some obscure
problem pops up! The extra sanity checking exposed
a few places where invariants weren't being preserved,
so those are fixed too. Since some of the sanity
checking is expensive, I added a flag to turn it
on. It is also turned on when building with
ENABLE_EXPENSIVE_CHECKS=1.
llvm-svn: 60797
tricks based on readnone/readonly functions.
Teach memdep to look past readonly calls when analyzing
deps for a readonly call. This allows elimination of a
few more calls from 403.gcc:
before:
63 gvn - Number of instructions PRE'd
153986 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
after:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
5 calls isn't much, but this adds plumbing for the next change.
llvm-svn: 60794
and use it in x86 address mode folding. Also, make
getRegForValue return 0 for illegal types even if it has a
ValueMap for them, because Argument values are put in the
ValueMap. This fixes PR3181.
llvm-svn: 60696
track of whether the CachedNonLocalPointerInfo for a block is specific
to a block. If so, just return it without any pred scanning. This is
good for a 6% speedup on GVN (when it uses this lookup method, which
it doesn't right now).
llvm-svn: 60695
AND. This is speedup on any reasonable target, but particularly
on 32-bit targets where this often turns into a libcall like udivdi3.
We know that alignments are a power of two but the compiler doesn't.
llvm-svn: 60688
method. This will eventually take over load/store dep
queries from getNonLocalDependency. For now it works
fine, but is incredibly slow because it does no caching.
Lets not switch GVN to use it until that is fixed :)
llvm-svn: 60649
1. Merge the 'None' result into 'Normal', making loads
and stores return their dependencies on allocations as Normal.
2. Split the 'Normal' result into 'Clobber' and 'Def' to
distinguish between the cases when memdep knows the value is
produced from when we just know if may be changed.
3. Move some of the logic for determining whether readonly calls
are CSEs into memdep instead of it being in GVN. This still
leaves verification that the arguments are hte same to GVN to
let it know about value equivalences in different contexts.
4. Change memdep's call/call dependency analysis to use
getModRefInfo(CallSite,CallSite) instead of doing something
very weak. This only really matters for things like DSA, but
someday maybe we'll have some other decent context sensitive
analyses :)
5. This reimplements the guts of memdep to handle the new results.
6. This simplifies GVN significantly:
a) readonly call CSE is slightly simpler
b) I eliminated the "getDependencyFrom" chaining for load
elimination and load CSE doesn't have to worry about
volatile (they are always clobbers) anymore.
c) GVN no longer does any 'lastLoad' caching, leaving it to
memdep.
7. The logic in DSE is simplified a bit and sped up. A potentially
unsafe case was eliminated.
llvm-svn: 60607
heretical from a STL standpoint, but is oh-so-useful for things that
can't throw exceptions when copied, like, well, everything in LLVM.
llvm-svn: 60587
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.
Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).
llvm-svn: 60461
ReplaceNodeResults: rather than returning a node which
must have the same number of results as the original
node (which means mucking around with MERGE_VALUES,
and which is also easy to get wrong since SelectionDAG
folding may mean you don't get the node you expect),
return the results in a vector.
llvm-svn: 60348
instead of std::sort. This shrinks the release-asserts LSR.o file
by 1100 bytes of code on my system.
We should start using array_pod_sort where possible.
llvm-svn: 60335
vector instead of a densemap. This shrinks the memory usage of this thing
substantially (the high water mark) as well as making operations like
scanning it faster. This speeds up memdep slightly, gvn goes from
3.9376 to 3.9118s on 403.gcc
This also splits out the statistics for the cached non-local case to
differentiate between the dirty and clean cached case. Here's the stats
for 403.gcc:
6153 memdep - Number of dirty cached non-local responses
169336 memdep - Number of fully cached non-local responses
162428 memdep - Number of uncached non-local responses
yay for caching :)
llvm-svn: 60313
ReverseLocalDeps when we update it. This fixes a regression test
failure from my last commit.
Second, for each non-local cached information structure, keep a bit that
indicates whether it is dirty or not. This saves us a scan over the whole
thing in the common case when it isn't dirty.
llvm-svn: 60274
instead of containing them by value. This increases the density
(!) of NonLocalDeps as well as making the reallocation case
faster. This speeds up gvn on 403.gcc by 2% and makes room for
future improvements.
I'm not super thrilled with having to explicitly manage the new/delete
of the map, but it is necesary for the next change.
llvm-svn: 60271
an entry in the nonlocal deps map, don't reset entries
referencing that instruction to [dirty, null], instead, set
them to [dirty,next] where next is the instruction after the
deleted one. Use this information in the non-local deps
code to avoid rescanning entire blocks.
This speeds up GVN slightly by avoiding pointless work. On
403.gcc this makes GVN 1.5% faster.
llvm-svn: 60256
Document the Dirty value more precisely, use it for the uninitialized
DepResultTy value. Change reverse mappings to be from an instruction*
instead of DepResultTy, and stop tracking other forms. This makes it more
clear that we only care about the instruction cases.
Eliminate a DepResultTy,bool pair by using Dirty in the local case as well,
shrinking the map and simplifying the code.
This speeds up GVN by ~3% on 403.gcc.
llvm-svn: 60232
query. This makes it crystal clear what cases can escape from MemDep that
the clients have to handle. This also gives the clients a nice simplified
interface to it that is easy to poke at.
This patch also makes DepResultTy and MemoryDependenceAnalysis::DepType
private, yay.
llvm-svn: 60231
of a pointer/int pair instead of a manually bitmangled pointer.
This forces clients to think a little more about checking the
appropriate pieces and will be useful for internal
implementation improvements later.
I'm not particularly happy with this. After going through this
I don't think that the clients of memdep should be exposed to
the internal type at all. I'll fix this in a subsequent commit.
This has no functionality change.
llvm-svn: 60230
wrappers around the interesting code and use an obscure iterator
abstraction that dates back many many years.
Move EraseDeadInstructions to Transforms/Utils and name it
RecursivelyDeleteTriviallyDeadInstructions.
llvm-svn: 60191
introduce any new spilling; it just uses unused registers.
Refactor the SUnit topological sort code out of the RRList scheduler and
make use of it to help with the post-pass scheduler.
llvm-svn: 59999
simplify header dependencies for front-ends that just want to choose
a scheduler and don't need all the scheduling machinery declarations.
llvm-svn: 59978
(this doesn't happen that often, since most code
does not use illegal types) then follow it by a
DAG combiner run that is allowed to generate
illegal operations but not illegal types. I didn't
modify the target combiner code to distinguish like
this between illegal operations and illegal types,
so it will not produce illegal operations as well
as not producing illegal types.
llvm-svn: 59960
NULL-based reference.
Note: Encountered this a few times on Tiger + gcc 4.0.1. Might just be a
platform-specific compiler issue, but it's good defensive programming in any
case.
llvm-svn: 59890