doing very similar pointer capture analysis.
Factor out the common logic. The new version
is from FunctionAttrs since it does a better
job than the version in BasicAliasAnalysis
llvm-svn: 62461
and every other instruction in their blocks to keep the terminator
instructions at the end, teach the post-RA scheduler how to operate
on ranges of instructions, and exclude terminators from the range
of instructions that get scheduled.
Also, exclude mid-block labels, such as EH_LABEL instructions, and
schedule code before them separately from code after them. This
fixes problems with the post-RA scheduler moving code past
EH_LABELs.
llvm-svn: 62366
For example, PIC16 needs to break a long or int constant into mulitple parts and emit multiple directives. So Allow targets to overried EmitConstantValueOnly().
llvm-svn: 62301
a new toy hazard recognizier heuristic which attempts to direct the
scheduler to avoid clumping large groups of loads or stores too densely.
llvm-svn: 62291
loops, hoisting instructions all the way out in one step rather
than hoisting them one nest level at a time. Also, make a few
other code simplifications. This speeds up MachineLICM
by several fold.
llvm-svn: 62283
and into the ScheduleDAGInstrs class, so that they don't get
destructed and re-constructed for each block. This fixes a
compile-time hot spot in the post-pass scheduler.
To help facilitate this, tidy and do some minor reorganization
in the scheduler constructor functions.
llvm-svn: 62275
my earlier patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Also, the mechanism for keeping SCEV's corresponding to GEP's
no longer works, as the GEP might change after its SCEV
is remembered, invalidating the SCEV, and we might get a bad
SCEV value when looking up the GEP again for a later loop.
This also couldn't happen before, as we weren't recursing
into GEP's outside the loop.
Also, when we build an expression that involves a (possibly
non-affine) IV from a different loop as well as an IV from
the one we're interested in (containsAddRecFromDifferentLoop),
don't recurse into that. We can't do much with it and will
get in trouble if we try to create new non-affine IVs or something.
More testcases are coming.
llvm-svn: 62212
opcode on each delegation.
Instead the information is cached on construction and the cached flag used thereafter.
Introduced two predicates: isCall and isInvoke.
llvm-svn: 62055
functions that don't already have a (dynamic) alloca.
Dynamic allocas cause inefficient codegen and we shouldn't
propagate this (behavior follows gcc). Two existing tests
assumed such inlining would be done; they are hacked by
adding an alloca in the caller, preserving the point of
the tests.
llvm-svn: 61946
StringMapEntryInitializer classes. Leave it for the compiler to figure out what
the type is and what "0" should be transformed into.
* Un-disable the unit tests which test the StringMapEntryInitializer class.
llvm-svn: 61922
v1024 = EDI // not killed
=
= EDI
One possible solution is for the coalescer to examine the sub-register live intervals in the same manner as the physical register. Another possibility is to examine defs and uses (when needed) of sub-registers. Both solutions are too expensive. For now, look for "short virtual intervals" and scan instructions to look for conflict instead.
This is a small win on x86-64. e.g. It shaves 403.gcc by ~80 instructions.
llvm-svn: 61847
to handle LLVMMatchType intrinsic parameters, and by adding new subclasses
of LLVMMatchType to match vector types with integral elements that are
either twice as wide or half as wide as the elements of the matched type.
llvm-svn: 61834
This means that we have to include an additional header.
This patch should be functionally equivalent. I cannot outrule any performance
degradation, though I do not expect any.
llvm-svn: 61694
instructions to avoid copies, because TwoAddressInstructionPass
also does this optimization. The scheduler's version didn't
account for live-out values, which resulted in spurious commutes
and missed opportunities.
Now, TwoAddressInstructionPass handles all the opportunities,
instead of just those that the scheduler missed. The result is
usually the same, though there are occasional trivial differences
resulting from the avoidance of spurious commutes.
llvm-svn: 61611
and clean recursive descent parser.
This change has a couple of ramifications:
1. The parser code is about 400 lines shorter (in what we maintain, not
including what is autogenerated).
2. The code should be significantly faster than the old code because we
don't have to work around bison's poor handling of datatypes with
ctors/dtors. This also makes the code much more resistant to memory
leaks.
3. We now get caret diagnostics from the .ll parser, woo.
4. The actual diagnostics emited from the parser are completely different
so a bunch of testcases had to be updated.
5. I now disallow "%ty = type opaque %ty = type i32". There was no good
reason to support this, it was just an accident of the old
implementation. I have no reason to think that anyone is actually using
this.
6. The syntax for sticking a global variable has changed to make it
unambiguous. I don't think anyone is depending on this since only clang
supports this and it is not solid yet, so I'm not worried about anything
breaking.
7. This gets rid of the last use of bison, and along with it the .cvs files.
I'll prune this from the makefiles as a subsequent commit.
There are a few minor cleanups that can be done after this commit (suggestions
welcome!) but this passes dejagnu testing and is ready for its time in the
limelight.
llvm-svn: 61558
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
to work out (in a very simplistic way) which function
arguments (pointer arguments only) are only dereferenced
and so do not escape. Mark such arguments 'nocapture'.
llvm-svn: 61525
This removes all the _8, _16, _32, and _64 opcodes and replaces each
group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode
is now used to carry the size information. In tablegen, the size-specific
opcodes are replaced by size-independent opcodes that utilize the
ability to compose them with predicates.
This shrinks the per-opcode tables and makes the code that handles
atomics much more concise.
llvm-svn: 61389
172 %ECX<def> = MOV32rr %reg1039<kill>
180 INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>,
36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0
188 %EAX<def> = MOV32rr %EAX<kill>
196 %ECX<def> = MOV32rr %ECX<kill>
204 %ECX<def> = MOV32rr %ECX<kill>
212 %EAX<def> = MOV32rr %EAX<kill>
220 %EAX<def> = MOV32rr %EAX
228 %reg1039<def> = MOV32rr %ECX<kill>
The early clobber operand ties ECX input to the ECX def.
The live interval of ECX is represented as this:
%reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47)
The right way to represent this is something like
%reg20,inf = [46,47:2)[174,182:1)[181:230:0) 0@174-(182) 1@181-230 @2@46-(47)
Of course that won't work since that means overlapping live ranges defined by two val#.
The workaround for now is to add a bit to val# which says the val# is redefined by a early clobber def somewhere. This prevents the move at 228 from being optimized away by SimpleRegisterCoalescing::AdjustCopiesBackFrom.
llvm-svn: 61259
The EH_frame and .eh symbols are now private, except for darwin9 and earlier.
The patch also fixes the definition of PrivateGlobalPrefix on pcc linux.
llvm-svn: 61242
The problematic part of this patch is that we were out of attribute bits,
requiring some fancy bit hacking to make it fit (by shrinking alignment)
without breaking existing users or the file format.
This change will require users to rebuild llvm-gcc to match llvm.
llvm-svn: 61239
- ability to insert previously created instructions using a builder
- creation of aliases
- creation of inline asm constants
Patch by Zoltan Varga!
llvm-svn: 61153
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
llvm-svn: 61073
alignment attribute such that 0 means unaligned.
This will probably require a rebuild of llvm-gcc because of the change to
Attributes.h. If you see many test failures on "make check", please rebuild
your llvm-gcc.
llvm-svn: 61030
memdep keeps track of how PHIs affect the pointer in dep queries, which
allows it to eliminate the load in cases like rle-phi-translate.ll, which
basically end up being:
BB1:
X = load P
br BB3
BB2:
Y = load Q
br BB3
BB3:
R = phi [P] [Q]
load R
turning "load R" into a phi of X/Y. In addition to additional exposed
opportunities, this makes memdep safe in many cases that it wasn't before
(which is required for load PRE) and also makes it substantially more
efficient. For example, consider:
bb1: // has many predecessors.
P = some_operator()
load P
In this example, previously memdep would scan all the predecessors of BB1
to see if they had something that would mustalias P. In some cases (e.g.
test/Transforms/GVN/rle-must-alias.ll) it would actually find them and end
up eliminating something. In many other cases though, it would scan and not
find anything useful. MemDep now stops at a block if the pointer is defined
in that block and cannot be phi translated to predecessors. This causes it
to miss the (rare) cases like rle-must-alias.ll, but makes it faster by not
scanning tons of stuff that is unlikely to be useful. For example, this
speeds up GVN as a whole from 3.928s to 2.448s (60%)!. IMO, scalar GVN
should be enhanced to simplify the rle-must-alias pointer base anyway, which
would allow the loads to be eliminated.
In the future, this should be enhanced to phi translate through geps and
bitcasts as well (as indicated by FIXMEs) making memdep even more powerful.
llvm-svn: 61022
callee will not introduce any new aliases of that pointer.
The attributes had all bits allocated already, so I decided to collapse
alignment. Alignment was previously stored as a 16-bit integer from bits 16 to
32 of the attribute, but it was required to be a power of 2. Now it's stored in
log2 encoded form in five bits from 16 to 21. That gives us 11 more bits of
space.
You may have already noticed that you only need four bits to encode a 16-bit
power of two, so why five bits? Because the AsmParser accepted 32-bit
alignments, even though we couldn't store them (they were silently discarded).
Now we can store them in memory, but not in the bitcode.
The bitcode format was already storing these as 64-bit VBR integers. So, the
bitcode format stays the same, keeping the alignment values stored as 16 bit
raw values. There's some hideous code in the reader and writer that deals with
this, waiting to be ripped out the moment we run out of bits again and have to
replace the parameter attributes table encoding.
llvm-svn: 61019
This stuff is not used outside Base.td, and with the conversion of the
compilation graph to string-based format became much less (if at all)
useful.
llvm-svn: 60873
node latencies. Use CalcLatency instead of manual code in
CalculatePriorities to keep it consistent. Previously it
computed slightly different results.
llvm-svn: 60817
The Cost field is removed. It was only being used in a very limited way,
to indicate when the scheduler should attempt to protect a live register,
and it isn't really needed to do that. If we ever want the scheduler to
start inserting copies in non-prohibitive situations, we'll have to
rethink some things anyway.
A Latency field is added. Instead of giving each node a single
fixed latency, each edge can have its own latency. This will eventually
be used to model various micro-architecture properties more accurately.
The PointerIntPair class and an internal union are now used, which
reduce the overall size.
llvm-svn: 60806
of a pointer. This allows is to catch more equivalencies. For example,
the type_lists_compatible_p function used to require two iterations of
the gvn pass (!) to delete its 18 redundant loads because the first pass
would CSE all the addressing computation cruft, which would unblock the
second memdep/gvn passes from recognizing them. This change allows
memdep/gvn to catch all 18 when run just once on the function (as is
typical :) instead of just 3.
On all of 403.gcc, this bumps up the # reundandancies found from:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
to:
63 gvn - Number of instructions PRE'd
154137 gvn - Number of instructions deleted
50185 gvn - Number of loads deleted
+120 loads deleted isn't bad.
llvm-svn: 60799
essential problem was that the DAG can contain
random unused nodes which were never analyzed.
When remapping a value of a node being processed,
such a node may become used and need to be analyzed;
however due to operands being transformed during
analysis the node may morph into a different one.
Users of the morphing node need to be updated, and
this wasn't happening. While there I added a bunch
of documentation and sanity checks, so I (or some
other poor soul) won't have to scratch their head
over this stuff so long trying to remember how it
was all supposed to work next time some obscure
problem pops up! The extra sanity checking exposed
a few places where invariants weren't being preserved,
so those are fixed too. Since some of the sanity
checking is expensive, I added a flag to turn it
on. It is also turned on when building with
ENABLE_EXPENSIVE_CHECKS=1.
llvm-svn: 60797
tricks based on readnone/readonly functions.
Teach memdep to look past readonly calls when analyzing
deps for a readonly call. This allows elimination of a
few more calls from 403.gcc:
before:
63 gvn - Number of instructions PRE'd
153986 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
after:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
5 calls isn't much, but this adds plumbing for the next change.
llvm-svn: 60794