SingleSource/UnitTests/2007-04-25-weak.c in JIT mode. The test
now passes on systems which are able to produce a correct
reference output to compare with.
llvm-svn: 61674
- Fix (brcond (setq ...)) bug, where BRNZ should have been used vice BRZ.
- Kill unused/unnecessary nodes in SPUNodes.td
- Beef out the i64operations.c test harness to use a lot of unaligned
loads, test loops and LLVM loop/basic block optimizations; run the
test harness successfully on real Cell hardware.
llvm-svn: 61664
Finalization occurs after all the FunctionPasses in the group have run, which
is clearly not what we want.
This also means that we have to make sure that we apply the right param
attributes when creating a new function.
Also, add a missed optimization: strdup and strndup. NoCapture and
NoAlias return!
llvm-svn: 61658
llvm::PATypeHolder::get() method when LLVM is self-hosted in Release
mode. Before the parser changed, there was a definition of llvm::PAHolder::get()
in llvmAsmParser.y. This was probably a bug that no-one noticed.
Explicitly #include the Type.h file as a temporary fix for now.
llvm-svn: 61620
instructions to avoid copies, because TwoAddressInstructionPass
also does this optimization. The scheduler's version didn't
account for live-out values, which resulted in spurious commutes
and missed opportunities.
Now, TwoAddressInstructionPass handles all the opportunities,
instead of just those that the scheduler missed. The result is
usually the same, though there are occasional trivial differences
resulting from the avoidance of spurious commutes.
llvm-svn: 61611
- Remove custom lowering for BRCOND
- Add remaining functionality for branches in SPUInstrInfo, such as branch
condition reversal and load/store folding. Updated BrCond test to reflect
branch reversal.
llvm-svn: 61597
not have pointer type. In particular, it may
be the condition argument for a select or a GEP
index. While I was unable to construct a testcase
for which some bits of the original pointer are
captured due to one of these, it's very very close
to being possible - so play safe and exclude these
possibilities.
llvm-svn: 61580
the argument to be stored to an alloca by tracking uses
of the alloca. This occurs 4 times (out of 7121, 0.05%)
in MultiSource/Applications, so may not be worth it. On
the other hand, it is easy to do and fairly cheap. The
functions it helps are: W_addcom and W_addlit in spiff;
process_args (argv) in d (make_dparser); ercPixConcealIMB
in JM/ldecod.
llvm-svn: 61570
and clean recursive descent parser.
This change has a couple of ramifications:
1. The parser code is about 400 lines shorter (in what we maintain, not
including what is autogenerated).
2. The code should be significantly faster than the old code because we
don't have to work around bison's poor handling of datatypes with
ctors/dtors. This also makes the code much more resistant to memory
leaks.
3. We now get caret diagnostics from the .ll parser, woo.
4. The actual diagnostics emited from the parser are completely different
so a bunch of testcases had to be updated.
5. I now disallow "%ty = type opaque %ty = type i32". There was no good
reason to support this, it was just an accident of the old
implementation. I have no reason to think that anyone is actually using
this.
6. The syntax for sticking a global variable has changed to make it
unambiguous. I don't think anyone is depending on this since only clang
supports this and it is not solid yet, so I'm not worried about anything
breaking.
7. This gets rid of the last use of bison, and along with it the .cvs files.
I'll prune this from the makefiles as a subsequent commit.
There are a few minor cleanups that can be done after this commit (suggestions
welcome!) but this passes dejagnu testing and is ready for its time in the
limelight.
llvm-svn: 61558
functions that don't write can't leak a pointer except through
the return value, so a void readonly function is implicitly nocapture.
Test these, and add a test that verifies that f1 calling f2 with an
otherwise dead pointer gets both of them marked nocapture.
llvm-svn: 61552
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
to work out (in a very simplistic way) which function
arguments (pointer arguments only) are only dereferenced
and so do not escape. Mark such arguments 'nocapture'.
llvm-svn: 61525
instruction sequence and cannot ordinarily be simplified by DAGcombine
into the various target description files or SPUDAGToDAGISel.cpp.
This makes some 64-bit operations legal.
- Eliminate target-dependent ISD enums.
- Update tests.
llvm-svn: 61508
and select instructions doesn't buy anything here
except extra complexity: the only difference in
the entire testsuite was that a readonly function
became readnone in MiBench/consumer-typeset. Add
a comment about this.
llvm-svn: 61478
constants, since doing so is irrelevant for aliasing
purposes. While this doesn't increase the total number
of functions marked readonly or readnone in MultiSource/
Applications (3089), it does result in 12 functions being
marked readnone rather than readonly.
Before:
readnone: 820
readonly: 2269
After:
readnone: 832
readonly: 2257
llvm-svn: 61469
- Move v4i32, i32 mul into SPUInstrInfo.td, with a few more instruction
cleanups there as well.
- Make SMUL_LOHI, UMUL_LOHI competely illegal for Cell SPU, to better
assist Chris to see the problem in bug 3101.
llvm-svn: 61464
DAGcombine's ability to find reasons to remove truncates when they were not
needed. Consequently, the CellSPU backend would produce correct, but _really
slow and horrible_, code.
Replaced with instruction sequences that do the equivalent truncation in
SPUInstrInfo.td.
- Re-examine how unaligned loads and stores work. Generated unaligned
load code has been tested on the CellSPU hardware; see the i32operations.c
and i64operations.c in CodeGen/CellSPU/useful-harnesses. (While they may be
toy test code, it does prove that some real world code does compile
correctly.)
- Fix truncating stores in bug 3193 (note: unpack_df.ll will still make llc
fault because i64 ult is not yet implemented.)
- Added i64 eq and neq for setcc and select/setcc; started new instruction
information file for them in SPU64InstrInfo.td. Additional i64 operations
should be added to this file and not to SPUInstrInfo.td.
llvm-svn: 61447
This removes all the _8, _16, _32, and _64 opcodes and replaces each
group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode
is now used to carry the size information. In tablegen, the size-specific
opcodes are replaced by size-independent opcodes that utilize the
ability to compose them with predicates.
This shrinks the per-opcode tables and makes the code that handles
atomics much more concise.
llvm-svn: 61389
several places. isTerminator() returns true for a superset
of cases, and includes things like FP_REG_KILL, which are
nither return or branch but aren't safe to move/remat/etc.
llvm-svn: 61373
my last patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Also, the mechanism for keeping SCEV's corresponding to GEP's
no longer works, as the GEP might change after its SCEV
is remembered, invalidating the SCEV, and we might get a bad
SCEV value when looking up the GEP again for a later loop.
This also couldn't happen before, as we weren't recursing
into GEP's outside the loop.
I owe some testcases for this, want to get it in for nightly runs.
llvm-svn: 61362
172 %ECX<def> = MOV32rr %reg1039<kill>
180 INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>,
36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0
188 %EAX<def> = MOV32rr %EAX<kill>
196 %ECX<def> = MOV32rr %ECX<kill>
204 %ECX<def> = MOV32rr %ECX<kill>
212 %EAX<def> = MOV32rr %EAX<kill>
220 %EAX<def> = MOV32rr %EAX
228 %reg1039<def> = MOV32rr %ECX<kill>
The early clobber operand ties ECX input to the ECX def.
The live interval of ECX is represented as this:
%reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47)
The right way to represent this is something like
%reg20,inf = [46,47:2)[174,182:1)[181:230:0) 0@174-(182) 1@181-230 @2@46-(47)
Of course that won't work since that means overlapping live ranges defined by two val#.
The workaround for now is to add a bit to val# which says the val# is redefined by a early clobber def somewhere. This prevents the move at 228 from being optimized away by SimpleRegisterCoalescing::AdjustCopiesBackFrom.
llvm-svn: 61259
that have i32 immediates so that they get selected first. This
currently only matters in the JIT, as assemblers will
automatically use the smallest encoding.
llvm-svn: 61250
- Use SplitBlockPredecessors to factor out common predecessors of the critical edge destination. This is disabled for now due to some regressions.
llvm-svn: 61248
The EH_frame and .eh symbols are now private, except for darwin9 and earlier.
The patch also fixes the definition of PrivateGlobalPrefix on pcc linux.
llvm-svn: 61242
The problematic part of this patch is that we were out of attribute bits,
requiring some fancy bit hacking to make it fit (by shrinking alignment)
without breaking existing users or the file format.
This change will require users to rebuild llvm-gcc to match llvm.
llvm-svn: 61239
and the RegisterScavenger not to expect traditional liveness
techniques are applicable to these registers, since we don't fully
modify the effects of push and pop after stackification.
llvm-svn: 61179
my last patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. (This patch does not handle
all the cases where this can happen.) And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Everything above is exercised in
CodeGen/X86/lsr-negative-stride.ll (and ifcvt4 in ARM which is
the same IR).
llvm-svn: 61178
- ability to insert previously created instructions using a builder
- creation of aliases
- creation of inline asm constants
Patch by Zoltan Varga!
llvm-svn: 61153
nodes. This allows it to do fairly general phi insertion if a
load from a pointer global wants to be SRAd but the load is used
by (recursive) phi nodes. This fixes a pessimization on ppc
introduced by Load PRE.
llvm-svn: 61123
temporary workaround for an obscure bug. When node cloning is
used, it is possible that more SUnits will be created, and
if the SUnits std::vector has to reallocate, it will
invalidate all the graph edges.
llvm-svn: 61122
consistently for deleting branches. In addition to being slightly
more readable, this makes SimplifyCFG a bit better
about cleaning up after itself when it makes conditions unused.
llvm-svn: 61100
visited set before they are used. If used, their blocks need to be
added to the visited set so that subsequent queries don't use conflicting
pointer values in the cache result blocks.
llvm-svn: 61080
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
llvm-svn: 61073
alignment attribute such that 0 means unaligned.
This will probably require a rebuild of llvm-gcc because of the change to
Attributes.h. If you see many test failures on "make check", please rebuild
your llvm-gcc.
llvm-svn: 61030
CFG when there is exactly one predecessor where the load is not available.
This is designed to not increase code size but still eliminate partially
redundant loads. This fires 1765 times on 403.gcc even though it doesn't
do critical edge splitting yet (the most common reason for it to fail).
llvm-svn: 61027
cleans up the generated code a bit. This should have the added benefit of
not randomly renaming functions/globals like my previous patch did. :)
llvm-svn: 61023
memdep keeps track of how PHIs affect the pointer in dep queries, which
allows it to eliminate the load in cases like rle-phi-translate.ll, which
basically end up being:
BB1:
X = load P
br BB3
BB2:
Y = load Q
br BB3
BB3:
R = phi [P] [Q]
load R
turning "load R" into a phi of X/Y. In addition to additional exposed
opportunities, this makes memdep safe in many cases that it wasn't before
(which is required for load PRE) and also makes it substantially more
efficient. For example, consider:
bb1: // has many predecessors.
P = some_operator()
load P
In this example, previously memdep would scan all the predecessors of BB1
to see if they had something that would mustalias P. In some cases (e.g.
test/Transforms/GVN/rle-must-alias.ll) it would actually find them and end
up eliminating something. In many other cases though, it would scan and not
find anything useful. MemDep now stops at a block if the pointer is defined
in that block and cannot be phi translated to predecessors. This causes it
to miss the (rare) cases like rle-must-alias.ll, but makes it faster by not
scanning tons of stuff that is unlikely to be useful. For example, this
speeds up GVN as a whole from 3.928s to 2.448s (60%)!. IMO, scalar GVN
should be enhanced to simplify the rle-must-alias pointer base anyway, which
would allow the loads to be eliminated.
In the future, this should be enhanced to phi translate through geps and
bitcasts as well (as indicated by FIXMEs) making memdep even more powerful.
llvm-svn: 61022
callee will not introduce any new aliases of that pointer.
The attributes had all bits allocated already, so I decided to collapse
alignment. Alignment was previously stored as a 16-bit integer from bits 16 to
32 of the attribute, but it was required to be a power of 2. Now it's stored in
log2 encoded form in five bits from 16 to 21. That gives us 11 more bits of
space.
You may have already noticed that you only need four bits to encode a 16-bit
power of two, so why five bits? Because the AsmParser accepted 32-bit
alignments, even though we couldn't store them (they were silently discarded).
Now we can store them in memory, but not in the bitcode.
The bitcode format was already storing these as 64-bit VBR integers. So, the
bitcode format stays the same, keeping the alignment values stored as 16 bit
raw values. There's some hideous code in the reader and writer that deals with
this, waiting to be ripped out the moment we run out of bits again and have to
replace the parameter attributes table encoding.
llvm-svn: 61019
llvm[2]: Linking Release executable opt (without symbols)
...
Undefined symbols:
"llvm::APFloat::IEEEsingle", referenced from:
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
"llvm::APFloat::IEEEdouble", referenced from:
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
ld: symbol(s) not found
This is in release mode. To replicate, compile llvm and llvm-gcc in optimized
mode. Then build llvm, in optimized mode, with the newly created compiler.
llvm-svn: 60977
width register load followed by a truncating
store for the copy, since the load will not place
the value in the lower bits. Probably partial
loads/stores can never happen here, but fix it
anyway.
llvm-svn: 60972
use of illegal integer types: instead, use a stack slot
and copying via integer registers. The existing code
is still used if the bitconvert is to a legal integer
type.
This fires on the PPC testcases 2007-09-08-unaligned.ll
and vec_misaligned.ll. It looks like equivalent code
is generated with these changes, just permuted, but
it's hard to tell.
With these changes, nothing in LegalizeDAG produces
illegal integer types anymore. This is a prerequisite
for removing the LegalizeDAG type legalization code.
While there I noticed that the existing code doesn't
handle trunc store of f64 to f32: it turns this into
an i64 store, which represents a 4 byte stack smash.
I added a FIXME about this. Hopefully someone more
motivated than I am will take care of it.
llvm-svn: 60964
which are identical to the original patterns.
- Change the multiply with overflow so that we distinguish between signed and
unsigned multiplication. Currently, unsigned multiplication with overflow
isn't working!
llvm-svn: 60963
ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.
Similar for SUB and MUL instructions.
llvm-svn: 60915
for promoted integer types, eg: i16 on ppc-32, or
i24 on any platform. Complete support for arbitrary
precision integers would require handling expanded
integer types, eg: i128, but I couldn't be bothered.
llvm-svn: 60834