This is the first incremental patch to implement this feature. It adds no
functionality to LLVM but setup up the information needed from targets in
order to implement the optimization correctly. Each target needs to specify
the maximum number of store operations for conversion of the llvm.memset,
llvm.memcpy, and llvm.memmove intrinsics into a sequence of store operations.
The limit needs to be chosen at the threshold of performance for such an
optimization (generally smallish). The target also needs to specify whether
the target can support unaligned stores for multi-byte store operations.
This helps ensure the optimization doesn't generate code that will trap on
an alignment errors.
More patches to follow.
llvm-svn: 22468
legalizer to eliminate them. With this comes the expected code quality
improvements, such as, for this:
double foo(unsigned short X) { return X; }
we now generate this:
_foo:
subl $4, %esp
movzwl 8(%esp), %eax
movl %eax, (%esp)
fildl (%esp)
addl $4, %esp
ret
instead of this:
_foo:
subl $4, %esp
movw 8(%esp), %ax
movzwl %ax, %eax ;; Load not folded into this.
movl %eax, (%esp)
fildl (%esp)
addl $4, %esp
ret
-Chris
llvm-svn: 22449
Implement the X86 Subtarget.
This consolidates the checks for target triple, and setting options based
on target triple into one place. This allows us to convert the asm printer
and isel over from being littered with "forDarwin", "forCygwin", etc. into
just having the appropriate flags for each subtarget feature controlling
the code for that feature.
This patch also implements indirect external and weak references in the
X86 pattern isel, for darwin. Next up is to convert over the asm printers
to use this new interface.
llvm-svn: 22389
This is the last MVTSDNode.
This allows us to eliminate a bunch of special case code for handling
MVTSDNodes.
Also, remove some uses of dyn_cast that should really be cast (which is
cheaper in a release build).
llvm-svn: 22368
XMM registers. There are many known deficiencies and fixmes, which will be
addressed ASAP. The major benefit of this work is that it will allow the
LLVM register allocator to allocate FP registers across basic blocks.
The x86 backend will still default to x87 style FP. To enable this work,
you must pass -enable-sse-scalar-fp and either -sse2 or -sse3 to llc.
An example before and after would be for:
double foo(double *P) { double Sum = 0; int i; for (i = 0; i < 1000; ++i)
Sum += P[i]; return Sum; }
The inner loop looks like the following:
x87:
.LBB_foo_1: # no_exit
fldl (%esp)
faddl (%eax,%ecx,8)
fstpl (%esp)
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
SSE2:
addsd (%eax,%ecx,8), %xmm0
incl %ecx
cmpl $1000, %ecx
#FP_REG_KILL
jne .LBB_foo_1 # no_exit
llvm-svn: 22340
1. Pass Value*'s into lowering methods so that the proper pointers can be
added to load/stores from the valist
2. Intrinsics that return void should only return a token chain, not a token
chain/retval pair.
3. Rename LowerVAArgNext -> LowerVAArg, because VANext is long gone.
4. Now that we have Value*'s available in the lowering methods, pass them
into any load/stores from the valist that are emitted
llvm-svn: 22339
working. The instruction selector changes will hopefully be coming later
this week once they are debugged. This is necessary to support the darwin
x86 FP model, and is recommended by intel as the replacement for x87. As
a bonus, the register allocator knows how to deal with these registers
across basic blocks, unliky the FP stackifier. This leads to significantly
better codegen in several cases.
llvm-svn: 22300
currently use: llc t.bc --filetype=obj
This will produce a t.o file which is dumpable with readelf. Currently
the file produced is empty, but the scaffolding to do more is now in place.
llvm-svn: 22292
* Change assert() to std::cerr printout, as it will not appear in opt builds
* Add comments to clarify what #ifdef/#else/#endif match what condition(s)
llvm-svn: 22154