def SHL8rCL : I<0xD2, MRM4r, (ops R8 :$dst, R8 :$src),
"shl{b} {%cl, $dst|$dst, %CL}",
[(set R8:$dst, (shl R8:$src, CL))]>, Imp<[CL],[]>;
This generates a CopyToReg operand and added its 2nd result to the shl as
a flag operand.
llvm-svn: 24557
changes allow us to generate the following code:
_foo:
li r2, 0
lvx v0, r2, r3
vaddfp v0, v0, v0
stvx v0, r2, r3
blr
for this llvm:
void %foo(<4 x float>* %a) {
entry:
%tmp1 = load <4 x float>* %a
%tmp2 = add <4 x float> %tmp1, %tmp1
store <4 x float> %tmp2, <4 x float>* %a
ret void
}
llvm-svn: 24534
packed types with an element count of 1, although more generic support is
coming. This allows LLVM to turn the following code:
void %foo(<1 x float> * %a) {
entry:
%tmp1 = load <1 x float> * %a;
%tmp2 = add <1 x float> %tmp1, %tmp1
store <1 x float> %tmp2, <1 x float> *%a
ret void
}
Into:
_foo:
lfs f0, 0(r3)
fadds f0, f0, f0
stfs f0, 0(r3)
blr
llvm-svn: 24416
this and have it in about the same form, I think this makes sense.
on X86, you do a RDTSC (64bit result, from any ring since the P5MMX)
on Alpha, you do a RDCC
on PPC, there is a sequence which may or may not work depending on how things
are setup by the OS. Or something like that. Maybe someone who knows PPC
can add support. Something about the time base register.
on Sparc, you read %tick, which in some solaris versions (>=8) is readable by
userspace
on IA64 read ar.itc
So I think the ulong is justified since all of those are 64bit.
Support is slighly flaky on old chips (P5 and lower) and sometimes
depends on OS (PPC, Sparc). But for modern OS/Hardware (aka this decade),
we should be ok.
I am still not sure what to do about lowering. I can either see a lower to 0, to
gettimeofday (or the target os equivalent), or loudly complaining and refusing to
continue.
I am commiting an Alpha implementation. I will add the X86 implementation if I
have to (I have use of it in the near future), but if someone who knows that
backend (and the funky multi-register results) better wants to add it, it would
take them a lot less time ;)
TODO: better lowering and legalizing, and support more platforms
llvm-svn: 24299
doesn't support .asciz, just set AscizDirective to null in your asmprinter.
This compiles C strings to:
l1__2E_str_1: ; '.str_1'
.asciz "foo"
instead of:
l1__2E_str_1: ; '.str_1'
.ascii "foo\000"
llvm-svn: 24271
This eliminates the vector, allows constant time removal of a node from
a graph, and makes iteration over the all nodes list stable when adding
nodes to the graph.
llvm-svn: 24262
allocated. Further, in the common case where a node has a single value, just
reference an element from a small array. This is a small compile-time wi.
llvm-svn: 24250
This saves 12 bytes from SDNode, but doesn't speed things up substantially
(our graphs apparently already fit within the cache on my g5). In any case
this reduces memory usage.
llvm-svn: 24248
alignment information appropriately. Includes code for PowerPC to support
fixed-size allocas with alignment larger than the stack. Support for
arbitrarily aligned dynamic allocas coming soon.
llvm-svn: 24224
this, it is a requirement on PPC, which can have an f32 value in r3 at one
point in a function and a f64 value in r3 at another point. :(
llvm-svn: 23160
putting it into the constant pool. This allows the isel machinery to
create constants that it will end up deciding are not needed, without them
ending up in the resultant function constant pool.
llvm-svn: 23081
binary search to test for membership. This speeds up LLC a bit more on KC++,
e.g. on itanium from 16.6974s to 14.8272s, PPC from 11.4926s to 10.7089s and
X86 from 10.8128s to 9.7943s, with no difference in generated code (like all
of the RA patches).
With these changes, isel is the slowest pass for PPC/X86, but linscan+live
intervals is still > 50% of the compile time for itanium. More work could
be done, but this is the last for now.
llvm-svn: 22993
rearrange some of the accessors to be more efficient.
This makes it much more efficient to iterate over all of the things with the
same value. This speeds up liveintervals analysis from 8.63s to 3.79s with
a release build of llc on kc++ with -march=ia64. This also speeds up live
var from 1.66s -> 0.87s as well, reducing total llc time from 20.1s->15.2s.
This also speeds up other targets slightly, e.g. llc time on X86 from 16.84
-> 16.45s, and PPC from 17.64->17.03s.
llvm-svn: 22990
is greater than the range of building selection dag node types, and
getTargetOpcode(), which returns the node opcode less the value of
isd::builtin_op_end, which specifies the end of the builtin types.
llvm-svn: 22844
used to tack a register number onto the node.
Instead of doing this, make a new node, RegisterSDNode, which is a leaf
containing a register number. These three operations just become normal
DAG nodes now, instead of requiring special handling.
Note that with this change, it is no longer correct to make illegal
CopyFromReg/CopyToReg nodes. The legalizer will not touch them, and this
is bad, so don't do it. :)
llvm-svn: 22806
1. move assertions for node creation to getNode()
2. legalize the values returned in ExpandOp immediately
3. Move select_cc optimizations from SELECT's getNode() to SELECT_CC's,
allowing them to be cleaned up significantly.
This paves the way to pick up additional optimizations on SELECT_CC, such
as sum-of-absolute-differences.
llvm-svn: 22757
CC out of the SetCC operation, making SETCC a standard ternary operation and
CC's a standard DAG leaf. This will make it possible for other node to use
CC's as operands in the future...
llvm-svn: 22728
near the GOT, which new doesn't do. So break out the allocate into a new function.
Also move GOT index handling into JITResolver. This lets it update the mapping when a Lazy
function is JITed. It doesn't managed the table, just the mapping. Note that this is
still non-ideal, as any function that takes a function address should also take a GOT
index, but that is a lot of changes. The relocation resolve process updates any GOT entry
it sees is out of date.
llvm-svn: 22537
vector that represents the .o file at once, build up a vector for each
section of the .o file. This is needed because the .o file writer needs
to be able to switch between sections as it emits them (e.g. switch
between the .text section and the .rel section when emitting code).
This patch has no functionality change.
llvm-svn: 22453