1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

350 Commits

Author SHA1 Message Date
Evan Cheng
d29a22e4b0 Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623
2012-02-28 06:42:03 +00:00
Evan Cheng
d18a688213 Optimize a couple of common patterns involving conditional moves where the false
value is zero. Instead of a cmov + op, issue an conditional op instead. e.g.
    cmp   r9, r4
    mov   r4, #0
    moveq r4, #1 
    orr   lr, lr, r4

should be:
    cmp   r9, r4
    orreq lr, lr, #1

That is, optimize (or x, (cmov 0, y, cond)) to (or.cond x, y). Similarly extend
this to xor as well as (and x, (cmov -1, y, cond)) => (and.cond x, y).

It's possible to extend this to ADD and SUB but I don't think they are common.

rdar://8659097

llvm-svn: 151224
2012-02-23 01:19:06 +00:00
Craig Topper
3ed929de0a Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Bob Wilson
0e88464871 Fix ARM SjLj-EH dispatch setup code. <rdar://problem/10444602>
The EmitBasePointerRecalculation function has 2 problems, one minor and one
fatal.  The minor problem is that it inserts the code at the setjmp
instead of in the dispatch block.  The fatal problem is that at the point
where this code runs, we don't know whether there will be a base pointer,
so the entire function is a no-op.  The base pointer recalculation needs to
be handled as it was before, by inserting a pseudo instruction that gets
expanded late.

Most of the support for the old approach is still here, but it no longer
has any connection to the eh_sjlj_dispatchsetup intrinsic.  Clean up the
parts related to the intrinsic and just generate the pseudo instruction
directly.

llvm-svn: 144781
2011-11-16 07:11:57 +00:00
Evan Cheng
47d8f8af84 Add vmov.f32 to materialize f32 immediate splats which cannot be handled by
integer variants. rdar://10437054

llvm-svn: 144608
2011-11-15 02:12:34 +00:00
Lang Hames
ba63f9da8b Fixed parameter name.
llvm-svn: 143594
2011-11-02 23:37:04 +00:00
Lang Hames
ceec8ec67e Try to lower memset/memcpy/memmove to vector instructions on ARM where the alignment permits.
llvm-svn: 143582
2011-11-02 22:52:45 +00:00
Bill Wendling
1f7c16c63f Take the code that was emitted for the llvm.eh.dispatch.setup intrinsic and emit
it with the new SjLj emitter stuff. This way there's no need to emit that
kind-of-hacky intrinsic.

llvm-svn: 141419
2011-10-07 22:08:37 +00:00
Bill Wendling
3c68e1d212 Refactor some of the code that sets up the entry block for SjLj EH. No functionality change.
llvm-svn: 141323
2011-10-06 22:18:16 +00:00
Bill Wendling
834bb83a41 Check-pointing the new SjLj EH lowering.
This code will replace the version in ARMAsmPrinter.cpp. It creates a new
machine basic block, which is the dispatch for the return from a longjmp
call. It then shoves the address of that machine basic block into the correct
place in the function context so that the EH runtime will jump to it directly
instead of having to go through a compare-and-jump-to-the-dispatch bit. This
should be more efficient in the common case.

llvm-svn: 141031
2011-10-03 21:25:38 +00:00
Jim Grosbach
d94ffffc87 ARM fix encoding of VMOV.f32 and VMOV.f64 immediates.
Encode the immediate into its 8-bit form as part of isel rather than later,
which simplifies things for mapping the encoding bits, allows the removal
of the custom disassembler decoding hook, makes the operand printer trivial,
and prepares things more cleanly for handling these in the asm parser.

rdar://10211428

llvm-svn: 140834
2011-09-30 00:50:06 +00:00
Duncan Sands
d1311488fe Add codegen support for vector select (in the IR this means a select
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons.  Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all").  Patch mostly by
Nadav Rotem.

llvm-svn: 139159
2011-09-06 19:07:46 +00:00
Eli Friedman
5d3814e0c4 64-bit atomic cmpxchg for ARM.
llvm-svn: 138868
2011-08-31 17:52:22 +00:00
Eli Friedman
d71c865ae0 Some 64-bit atomic operations on ARM. 64-bit cmpxchg coming next.
llvm-svn: 138845
2011-08-31 00:31:29 +00:00
Evan Cheng
91aa81acaa Follow up to r138791.
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.

llvm-svn: 138810
2011-08-30 19:09:48 +00:00
Evan Cheng
1eacb83316 Change ARM / Thumb2 addc / adde and subc / sube modeling to use physical
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.

When a i64 sub is expanded to subc + sube.
  libcall #1
     \
      \        subc 
       \       /  \
        \     /    \
         \   /    libcall #2
          sube

If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.

  subc
   |
  libcall #2
   |
  libcall #1
   |
  sube

However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.

The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.

rdar://10019576

llvm-svn: 138791
2011-08-30 01:34:54 +00:00
Chris Lattner
e1fe7061ce land David Blaikie's patch to de-constify Type, with a few tweaks.
llvm-svn: 135375
2011-07-18 04:54:35 +00:00
Evan Cheng
37ff73dfaf Improve codegen for select's:
if (x != 0) x = 1
if (x == 1) x = 1

Previous codegen looks like this:
        mov     r1, r0
        cmp     r1, #1
        mov     r0, #0
        moveq   r0, #1

The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
        cmp     r0, #1
        movne   r0, #0

rdar://9758317

llvm-svn: 135017
2011-07-13 00:42:17 +00:00
Eric Christopher
2090e793c1 Remove getRegClassForInlineAsmConstraint from the ARM port.
Part of rdar://9643582

llvm-svn: 134095
2011-06-29 21:10:36 +00:00
Eric Christopher
d68494ffdd Have LowerOperandForConstraint handle multiple character constraints.
Part of rdar://9119939

llvm-svn: 132510
2011-06-02 23:16:42 +00:00
Eli Friedman
12e590e760 Make the logic for determining function alignment more explicit. No functionality change.
llvm-svn: 131012
2011-05-06 20:34:06 +00:00
Dan Gohman
7beb845bab Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Jim Grosbach
77d45564c3 ARM and Thumb2 support for atomic MIN/MAX/UMIN/UMAX loads.
rdar://9326019

llvm-svn: 130234
2011-04-26 19:44:18 +00:00
Andrew Trick
a130d110d1 Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Andrew Trick
31c7962ce5 whitespace
llvm-svn: 130046
2011-04-23 03:24:11 +00:00
Stuart Hastings
a552942e02 ARM byval support. Will be enabled by another patch to the FE. <rdar://problem/7662569>
llvm-svn: 129858
2011-04-20 16:47:52 +00:00
Cameron Zwarich
1b8f91d2c8 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng
dd99a0a548 Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.
llvm-svn: 127981
2011-03-21 01:19:09 +00:00
Daniel Dunbar
34c65737c3 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng
c5f50f7322 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Bill Wendling
388dad6d62 Some minor cleanups based on feedback.
llvm-svn: 127694
2011-03-15 20:47:26 +00:00
Bill Wendling
da1364d669 Generate a VTBL instruction instead of a series of loads and stores when we
can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:

_shuf:
@ BB#0:       @ %entry
  push        {r4, r7, lr}
  add         r7, sp, #4
  sub         sp, #12
  mov         r4, sp
  bic         r4, r4, #7
  mov         sp, r4
  mov         r2, sp
  vmov        d16, r0, r1
  orr         r0, r2, #6
  orr         r3, r2, #7
  vst1.8      {d16[0]}, [r3]
  vst1.8      {d16[5]}, [r0]
  subs        r4, r7, #4
  orr         r0, r2, #5
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #4
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #3
  vst1.8      {d16[0]}, [r0]
  orr         r0, r2, #2
  vst1.8      {d16[2]}, [r0]
  orr         r0, r2, #1
  vst1.8      {d16[1]}, [r0]
  vst1.8      {d16[3]}, [r2]
  vldr.64     d16, [sp]
  vmov        r0, r1, d16
  mov         sp, r4
  pop         {r4, r7, pc}

The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>

llvm-svn: 127630
2011-03-14 23:02:38 +00:00
Bob Wilson
f8c4d1ded9 Fix a compiler crash where a Glue value had multiple uses. Radar 9049552.
llvm-svn: 127198
2011-03-08 01:17:20 +00:00
Cameron Zwarich
a1920d7f51 Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.
llvm-svn: 127175
2011-03-07 21:56:36 +00:00
Bob Wilson
1497601a7b Remove unused conditional negate operations.
llvm-svn: 127090
2011-03-05 16:54:31 +00:00
Stuart Hastings
539d4e1460 Support for byval parameters on ARM. Will be enabled by a forthcoming
patch to the front-end.  Radar 7662569.

llvm-svn: 126655
2011-02-28 17:17:53 +00:00
Bob Wilson
65f4a70b82 Add codegen support for using post-increment NEON load/store instructions.
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.

llvm-svn: 125014
2011-02-07 17:43:21 +00:00
Evan Cheng
c7ce7e2ac3 Given a pair of floating point load and store, if there are no other uses of
the load, then it may be legal to transform the load and store to integer
load and store of the same width.

This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []

to

ldr r12, []
str r12, []

rdar://8944252

llvm-svn: 124708
2011-02-02 01:06:55 +00:00
Evan Cheng
0dfe28a9b5 Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.

llvm-svn: 123991
2011-01-21 18:55:51 +00:00
Evan Cheng
53ec6fc591 Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.
movw    r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
        movt    r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
        add     r0, pc, r0

It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.

llvm-svn: 123619
2011-01-17 08:03:18 +00:00
Evan Cheng
1afd04fc59 Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Bob Wilson
c485ff3ced Lower some BUILD_VECTORS using VEXT+shuffle.
Patch by Tim Northover.

llvm-svn: 123035
2011-01-07 21:37:30 +00:00
Evan Cheng
f7e586d749 Enable sibling call optimization of libcalls which are expanded during
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777

llvm-svn: 120501
2010-11-30 23:55:39 +00:00
Bob Wilson
3bb61d1932 Add support for NEON VLD2-dup instructions.
llvm-svn: 120236
2010-11-28 06:51:26 +00:00
Owen Anderson
52e3873edc Add support for ARM's specialized vector-compare-against-zero instructions.
llvm-svn: 118453
2010-11-08 23:21:22 +00:00
Owen Anderson
f26ea37db7 Disallow the certain NEON modified-immediate forms when generating vorr or vbic.
llvm-svn: 118300
2010-11-05 21:57:54 +00:00
Owen Anderson
add19dd6dd Add codegen and encoding support for the immediate form of vbic.
llvm-svn: 118291
2010-11-05 19:27:46 +00:00
Owen Anderson
1a89511e5d Add support for code generation of the one register with immediate form of vorr.
We could be more aggressive about making this work for a larger range of constants,
but this seems like a good start.

llvm-svn: 118201
2010-11-03 22:44:51 +00:00
Evan Cheng
eab7251695 Fix preload instruction isel. Only v7 supports pli, and only v7 with mp extension supports pldw. Add subtarget attribute to denote mp extension support and legalize illegal ones to nothing.
llvm-svn: 118160
2010-11-03 06:34:55 +00:00
Bob Wilson
183c466006 Overhaul memory barriers in the ARM backend. Radar 8601999.
There were a number of issues to fix up here:
* The "device" argument of the llvm.memory.barrier intrinsic should be
used to distinguish the "Full System" domain from the "Inner Shareable"
domain.  It has nothing to do with using DMB vs. DSB instructions.
* The compiler should never need to emit DSB instructions.  Remove the
ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB.
* Merge the separate DMB/DSB instructions for options only used for the
disassembler with the default DMB/DSB instructions.  Add the default
"full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum.
* Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement
a data memory barrier using the MCR instruction.
* Fix up encodings for these instructions (except MCR).
I also updated the tests and added a few new ones to check for DMB options
that were not currently being exercised.

llvm-svn: 117756
2010-10-30 00:54:37 +00:00
John Thompson
6115a7f1d4 Inline asm multiple alternative constraints development phase 2 - improved basic logic, added initial platform support.
llvm-svn: 117667
2010-10-29 17:29:13 +00:00
Jim Grosbach
a8c0be5343 Add a pre-dispatch SjLj EH hook on the unwind edge for targets to do any
setup they require. Use this for ARM/Darwin to rematerialize the base
pointer from the frame pointer when required. rdar://8564268

llvm-svn: 116879
2010-10-19 23:27:08 +00:00
Bob Wilson
6b6b53ad6f Remove unused ARMISD::AND selection DAG node.
llvm-svn: 116566
2010-10-15 04:34:40 +00:00
Bob Wilson
c4345abcc0 Define the TargetLowering::getTgtMemIntrinsic hook for ARM so that NEON load
and store intrinsics are represented with MemIntrinsicSDNodes.

llvm-svn: 114454
2010-09-21 17:56:22 +00:00
Evan Cheng
c9cb37516d Teach if-converter to be more careful with predicating instructions that would
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.

llvm-svn: 113570
2010-09-10 01:29:16 +00:00
Bob Wilson
3348d2eb50 Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply,
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests.  Add auto-upgrade support for the old intrinsics.

llvm-svn: 112773
2010-09-01 23:50:19 +00:00
Bill Wendling
385ad1516f Create an ARMISD::AND node. This node is exactly like the "ARM::AND" node, but
it sets the CPSR register.

llvm-svn: 112393
2010-08-29 03:02:11 +00:00
Bill Wendling
f10d5c00fc Consider this code snippet:
float t1(int argc) {
  return (argc == 1123) ? 1.234f : 2.38213f;
}

We would generate truly awful code on ARM (those with a weak stomach should look
away):

_t1:
  movw   r1, #1123
  movs   r2, #1
  movs   r3, #0
  cmp    r0, r1
  mov.w  r0, #0
  it     eq
  moveq  r0, r2
  movs   r1, #4
  cmp    r0, #0
  it     ne
  movne  r3, r1
  adr    r0, #LCPI1_0
  ldr    r0, [r0, r3]
  bx     lr

The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".

I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.

Now we generate this, which looks optimal to me:

_t1:
  movw   r1, #1123
  movs   r2, #0
  cmp    r0, r1
  adr    r0, #LCPI0_0
  it     eq
  moveq  r2, #4
  ldr    r0, [r0, r2]
  bx     lr
  .align  2
LCPI0_0:
  .long   1075344593  @ float 2.382130e+00
  .long   1067316150  @ float 1.234000e+00

llvm-svn: 110799
2010-08-11 08:43:16 +00:00
Nate Begeman
b506e13a32 Add support for getting & setting the FPSCR application register on ARM when VFP is enabled.
Add support for using the FPSCR in conjunction with the vcvtr instruction, for controlling fp to int rounding.
Add support for the FLT_ROUNDS_ node now that the FPSCR is exposed.

llvm-svn: 110152
2010-08-03 21:31:55 +00:00
Jim Grosbach
03b130774b Remove dead prototype
llvm-svn: 109691
2010-07-28 23:16:12 +00:00
Anton Korobeynikov
7ae895e007 Hook in GlobalMerge pass
llvm-svn: 109359
2010-07-24 21:52:08 +00:00
Evan Cheng
f215e55d5f - Allow target to specify when is register pressure "too high". In most cases,
it's too late to start backing off aggressive latency scheduling when most
  of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
  For ARM, this is almost always a win on # of instructions. It's runtime
  neutral for most of the tests. But for some kernels with high register
  pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
  54 and sped up by 20%.

llvm-svn: 109279
2010-07-23 22:39:59 +00:00
Eric Christopher
3d118d5e8a Baby steps towards ARM fast-isel.
llvm-svn: 109047
2010-07-21 22:26:11 +00:00
Evan Cheng
df725c25dd Teach bottom up pre-ra scheduler to track register pressure. Work in progress.
llvm-svn: 108991
2010-07-21 06:09:07 +00:00
Evan Cheng
b2ad0066f5 ARM has to provide its own TargetLowering::findRepresentativeClass because its scalar floating point registers alias its vector registers.
llvm-svn: 108761
2010-07-19 22:15:08 +00:00
Jim Grosbach
dc21ac2e0a Since ARM emits inline jump tables as part of the ConstantIsland pass,
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR6581.

llvm-svn: 108730
2010-07-19 17:20:38 +00:00
Jim Grosbach
5b8c14ce8a revert so I can get the right PR# in the log message.
llvm-svn: 108727
2010-07-19 17:19:40 +00:00
Jim Grosbach
42f3134738 Since ARM emits inline jump tables as part of the ConstantIsland pass,
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR7499.

llvm-svn: 108722
2010-07-19 17:18:28 +00:00
Jim Grosbach
749f4fca0a Add basic support to code-gen the ARM/Thumb2 bit-field insert (BFI) instruction
and a combine pattern to use it for setting a bit-field to a constant
value. More to come for non-constant stores.

llvm-svn: 108570
2010-07-16 23:05:05 +00:00
Bob Wilson
34f481e895 Add support for NEON VMVN immediate instructions.
llvm-svn: 108324
2010-07-14 06:31:50 +00:00
Bob Wilson
7feb850d36 Use a target-specific VMOVIMM DAG node instead of BUILD_VECTOR to represent
NEON VMOV-immediate instructions.  This simplifies some things.

llvm-svn: 108275
2010-07-13 21:16:48 +00:00
Evan Cheng
069f1f7c9a Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0.
llvm-svn: 108258
2010-07-13 19:27:42 +00:00
Evan Cheng
ed3f224f04 Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met:
1. The arguments are f32.
2. The arguments are loads and they have no uses other than the comparison.
3. The comparison code is EQ or NE.

e.g.
        vldr.32 s0, [r1]
        vldr.32 s1, [r0]
        vcmpe.f32       s1, s0
        vmrs    apsr_nzcv, fpscr
	beq     LBB0_2
=>
        ldr     r1, [r1]
        ldr     r0, [r0]
        cmp     r0, r1
        beq     LBB0_2

More complicated cases will be implemented in subsequent patches.

llvm-svn: 107852
2010-07-08 02:08:50 +00:00
Dan Gohman
c768525273 Split the SDValue out of OutputArg so that SelectionDAG-independent
code can do calling-convention queries. This obviates OutputArgReg.

llvm-svn: 107786
2010-07-07 15:54:55 +00:00
Dale Johannesen
b1fc776fca The hasMemory argument is irrelevant to how the argument
for an "i" constraint should get lowered; PR 6309.  While
this argument was passed around a lot, this is the only
place it was used, so it goes away from a lot of other
places.

llvm-svn: 106893
2010-06-25 21:55:36 +00:00
Bob Wilson
56db632295 Add basic support for NEON modified immediates besides VMOV.
llvm-svn: 106030
2010-06-15 19:05:35 +00:00
Bob Wilson
32016c38ee Rename functions referring to VMOV immediates to refer to NEON "modified
immediate" operands.  These functions have so far only been used for VMOV
but they also apply to other NEON instructions with modified immediate
operands.  No functional changes.

llvm-svn: 105969
2010-06-14 22:19:57 +00:00
Bob Wilson
2945a0ac66 For NEON vectors with 32- or 64-bit elements, select BUILD_VECTORs and
VECTOR_SHUFFLEs to REG_SEQUENCE instructions.  The standard ISD::BUILD_VECTOR
node corresponds closely to REG_SEQUENCE but I couldn't use it here because
its operands do not get legalized.  That is pretty awful, but I guess it
makes sense for other targets.  Instead, I have added an ARM-specific version
of BUILD_VECTOR that will have its operands properly legalized.
This fixes the rest of Radar 7872877.

llvm-svn: 105439
2010-06-04 00:04:02 +00:00
Dale Johannesen
891a19d5ae Early implementation of tail call for ARM.
A temporary flag -arm-tail-calls defaults to off,
so there is no functional change by default.
Intrepid users may try this; simple cases work
but there are bugs.

llvm-svn: 105413
2010-06-03 21:09:53 +00:00
Jim Grosbach
f3bd81ce11 Clean up 80 column violations. No functional change.
llvm-svn: 105350
2010-06-02 21:53:11 +00:00
Jim Grosbach
d788f9b580 back out 104862/104869. Can reuse stacksave after all. Very cool.
llvm-svn: 104897
2010-05-27 23:11:57 +00:00
Jim Grosbach
c2c7753f15 add ISD::STACKADDR to get the current stack pointer. Will be used by sjlj EH
to update the jmpbuf in the presence of VLAs.

llvm-svn: 104862
2010-05-27 18:23:48 +00:00
Jim Grosbach
bb4860d2a2 Adjust eh.sjlj.setjmp to properly have a chain and to have an opcode entry in
ISD::. No functional change.

llvm-svn: 104734
2010-05-26 20:22:18 +00:00
Evan Cheng
241d2c434e Implement @llvm.returnaddress. rdar://8015977.
llvm-svn: 104421
2010-05-22 01:47:14 +00:00
Jim Grosbach
b6cc69c655 Implement eh.sjlj.longjmp for ARM. Clean up the intrinsic a bit.
Followups: docs patch for the builtin and eh.sjlj.setjmp cleanup to match
longjmp.

llvm-svn: 104419
2010-05-22 01:06:18 +00:00
Evan Cheng
b5de7de4ce Allow targets more controls on what nodes are scheduled by reg pressure, what for latency in hybrid mode.
llvm-svn: 104293
2010-05-20 23:26:43 +00:00
Evan Cheng
85497bd415 Allow TargetLowering::getRegClassFor() to be called on illegal types. Also
allow target to override it in order to map register classes to illegal
but synthesizable types. e.g. v4i64, v8i64 for ARM / NEON.

llvm-svn: 103854
2010-05-15 02:18:07 +00:00
Dan Gohman
fb6f4da0e0 Implement a bunch more TargetSelectionDAGInfo infrastructure.
Move EmitTargetCodeForMemcpy, EmitTargetCodeForMemset, and
EmitTargetCodeForMemmove out of TargetLowering and into
SelectionDAGInfo to exercise this.

llvm-svn: 103481
2010-05-11 17:31:57 +00:00
Dan Gohman
eaacb8cb1f Remove the TargetLowering::getSubtarget() virtual function, which
was unused. TargetMachine::getSubtarget() is used instead.

llvm-svn: 103474
2010-05-11 16:21:03 +00:00
Dan Gohman
68f04d06c8 Get rid of the EdgeMapping map. Instead, just check for BasicBlock
changes before doing phi lowering for switches.

llvm-svn: 102809
2010-05-01 00:01:06 +00:00
Dan Gohman
a0f855157e Use const qualifiers with TargetLowering. This eliminates several
const_casts, and it reinforces the design of the Target classes being
immutable.

SelectionDAGISel::IsLegalToFold is now a static member function, because
PIC16 uses it in an unconventional way. There is more room for API
cleanup here.

And PIC16's AsmPrinter no longer uses TargetLowering.

llvm-svn: 101635
2010-04-17 15:26:15 +00:00
Dan Gohman
5c8db5ab3f Move per-function state out of TargetLowering subclasses and into
MachineFunctionInfo subclasses.

llvm-svn: 101634
2010-04-17 14:41:14 +00:00
Mon P Wang
484bbe6aa9 Reapply address space patch after fixing an issue in MemCopyOptimizer.
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)

llvm-svn: 100304
2010-04-04 03:10:48 +00:00
Mon P Wang
0ccf050ca3 Revert r100191 since it breaks objc in clang
llvm-svn: 100199
2010-04-02 18:43:02 +00:00
Mon P Wang
a01350755e Reapply address space patch after fixing an issue in MemCopyOptimizer.
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)

llvm-svn: 100191
2010-04-02 18:04:15 +00:00
Bob Wilson
aae933cc81 Revert Mon Ping's change 99928, since it broke all the llvm-gcc buildbots.
llvm-svn: 99948
2010-03-30 22:27:04 +00:00
Mon P Wang
9351ea594a Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
A update of langref will occur in a subsequent checkin.

llvm-svn: 99928
2010-03-30 20:55:56 +00:00
Bob Wilson
9501c478f7 Revert this change, since it was causing ARM performance regressions.
--- Reverse-merging r98889 into '.':
U    lib/Target/ARM/ARMInstrNEON.td
U    lib/Target/ARM/ARMISelLowering.h
U    lib/Target/ARM/ARMInstrInfo.td
U    lib/Target/ARM/ARMInstrVFP.td
U    lib/Target/ARM/ARMISelLowering.cpp
U    lib/Target/ARM/ARMInstrFormats.td

llvm-svn: 99010
2010-03-19 22:51:32 +00:00
Anton Korobeynikov
eeae840ed7 Get rid of target-specific fp <-> int nodes when still I'm here.
llvm-svn: 98889
2010-03-18 22:35:45 +00:00
Anton Korobeynikov
23c07f492e Get rid of target-specific nodes for fp16 <-> fp32 conversion.
llvm-svn: 98888
2010-03-18 22:35:37 +00:00
Anton Korobeynikov
48357cdc62 Add codegen support for FP16 on ARM
llvm-svn: 98502
2010-03-14 18:42:31 +00:00
Bob Wilson
84fc0200bd Use NEON vmin/vmax instructions for floating-point selects.
Radar 7461718.

llvm-svn: 96572
2010-02-18 06:05:53 +00:00
Jim Grosbach
a7e098af3b tighten up eh.setjmp sequence a bit.
llvm-svn: 95603
2010-02-08 23:22:00 +00:00
Evan Cheng
9057fea7ef Revert 95130.
llvm-svn: 95160
2010-02-02 23:55:14 +00:00
Evan Cheng
48375fbf4f Pass callsite return type to TargetLowering::LowerCall and use that to check sibcall eligibility.
llvm-svn: 95130
2010-02-02 21:29:10 +00:00
Evan Cheng
237629e476 Eliminate target hook IsEligibleForTailCallOptimization.
Target independent isel should always pass along the "tail call" property. Change
target hook LowerCall's parameter "isTailCall" into a refernce. If the target
decides it's impossible to honor the tail call request, it should set isTailCall
to false to make target independent isel happy.

llvm-svn: 94626
2010-01-27 00:07:07 +00:00
Jim Grosbach
70af2216fd Patch by David Conrad:
"On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction
 sequence it is now."

llvm-svn: 93758
2010-01-18 19:58:49 +00:00
Jim Grosbach
187ad02a4f Framework for atomic binary operations. The emitter for the pseudo instructions
just issues an error for the moment. The front end won't yet generate these
intrinsics for ARM, so this is behind the scenes until complete.

llvm-svn: 91200
2009-12-12 01:40:06 +00:00
Jim Grosbach
5a1c16e5bb Rough first pass at compare_and_swap atomic builtins for ARM mode. Work in progress.
llvm-svn: 91090
2009-12-11 01:42:04 +00:00
Jim Grosbach
be89da9845 Add memory barrier intrinsic support for ARM. Moving towards adding the atomic operations intrinsics.
llvm-svn: 91003
2009-12-10 00:11:09 +00:00
Evan Cheng
af90768b3c isLegalICmpImmediate should take a signed integer; code clean up.
llvm-svn: 86964
2009-11-12 07:13:11 +00:00
Evan Cheng
a11308742c Add TargetLowering::isLegalICmpImmediate. It tells LSR what immediate can be folded into target icmp instructions.
llvm-svn: 86858
2009-11-11 19:05:52 +00:00
Jim Grosbach
ea6c9c17f5 Use Unified Assembly Syntax for the ARM backend.
llvm-svn: 86494
2009-11-09 00:11:35 +00:00
Bob Wilson
95064e348a Add ARM codegen for indirect branches.
clang/test/CodeGen/indirect-goto.c runs! (unoptimized)

llvm-svn: 85577
2009-10-30 05:45:42 +00:00
Evan Cheng
1babe43881 Use fconsts and fconstd to materialize small fp constants.
llvm-svn: 85362
2009-10-28 01:44:26 +00:00
Anton Korobeynikov
829a3a18d2 ARM does not support offset folding (yet). Disable it for now.
This fixes PR5031. Unfortunately, there is no small testcase :(

llvm-svn: 82643
2009-09-23 19:04:09 +00:00
Evan Cheng
7cb9c456e5 Enhance EmitInstrWithCustomInserter() so target can specify CFG changes that sdisel will use to properly complete phi nodes.
Not functionality change yet.

llvm-svn: 82273
2009-09-18 21:02:19 +00:00
Sandeep Patel
9c4e094e2a Retype from unsigned to CallingConv::ID accordingly. Approved by Bob Wilson.
llvm-svn: 80773
2009-09-02 08:44:58 +00:00
Bob Wilson
5240e9de02 Remove unneeded ARM-specific DAG nodes for VLD* and VST* Neon operations.
The instructions can be selected directly from the intrinsics.  We will need
to add some ARM-specific nodes for VLD/VST of 3 and 4 128-bit vectors, but
those are not yet implemented.

llvm-svn: 80117
2009-08-26 17:39:53 +00:00
Bob Wilson
6d4400e852 Match VTRN, VZIP, and VUZP shuffles. Restore the tests for these operations,
now using shuffles instead of intrinsics.

llvm-svn: 79673
2009-08-21 20:54:19 +00:00
Anton Korobeynikov
20d832fa1b Fix some typos and use type-based isel for VZIP/VUZP/VTRN
llvm-svn: 79625
2009-08-21 12:41:42 +00:00
Anton Korobeynikov
218db4a01c Add lowering of ARM 4-element shuffles to multiple instructios via perfectshuffle-generated table.
llvm-svn: 79624
2009-08-21 12:41:24 +00:00
Anton Korobeynikov
a2e4bc2312 Use masks not nodes for vector shuffle predicates. Provide set of 'legal' masks, so legalizer won't infinite cycle
llvm-svn: 79619
2009-08-21 12:40:07 +00:00
Bob Wilson
fae9057bf0 Add support for Neon VEXT (vector extract) shuffles.
This is derived from a patch by Anton Korzh.  I modified it to recognize
the VEXT shuffles during legalization and lower them to a target-specific
DAG node.

llvm-svn: 79428
2009-08-19 17:03:43 +00:00
Bill Wendling
962adec4ee Reapply r79127. It was fixed by d0k.
llvm-svn: 79136
2009-08-15 21:21:19 +00:00
Bill Wendling
bfebbb6477 Revert r79127. It was causing compilation errors.
llvm-svn: 79135
2009-08-15 21:14:01 +00:00
Evan Cheng
5d841097a9 Change allowsUnalignedMemoryAccesses to take type argument since some targets
support unaligned mem access only for certain types. (Should it be size
instead?)

ARM v7 supports unaligned access for i16 and i32, some v6 variants support it
as well.

llvm-svn: 79127
2009-08-15 19:23:44 +00:00
Evan Cheng
67fd47b38b Add Thumb2 lsr hooks.
llvm-svn: 79032
2009-08-14 20:09:37 +00:00
Bob Wilson
d337cde6e5 Create a new ARM-specific DAG node, VDUP, to represent a splat from a
scalar_to_vector.  Generate these VDUP nodes during legalization instead
of trying to recognize the pattern during selection.

llvm-svn: 78994
2009-08-14 05:13:08 +00:00
Bob Wilson
7a311914ab During legalization, change Neon vdup_lane operations from shuffles to
target-specific VDUPLANE nodes.  This allows the subreg handling for the
quad-register version to be done easily with Pats in the .td file, instead
of with custom code in ARMISelDAGToDAG.cpp.

llvm-svn: 78993
2009-08-14 05:08:32 +00:00
Bob Wilson
8cb7da85e3 Revert r78852 for now. I want to do this differently, but I don't have time
to fix it tonight.

llvm-svn: 78896
2009-08-13 05:58:56 +00:00
Bob Wilson
b089d07a1f Recognize Neon VDUP shuffles during legalization instead of selection.
llvm-svn: 78852
2009-08-12 22:54:19 +00:00
Bob Wilson
d8b7ca4c28 Recognize Neon VREV shuffles during legalization instead of selection.
llvm-svn: 78850
2009-08-12 22:31:50 +00:00
Owen Anderson
b4bce99769 Rename MVT to EVT, in preparation for splitting SimpleValueType out into its own struct type.
llvm-svn: 78610
2009-08-10 22:56:29 +00:00
Evan Cheng
48b49cf5b9 It turns out most of the thumb2 instructions are not allowed to touch SP. The semantics of such instructions are unpredictable. We have just been lucky that tests have been passing.
This patch takes pain to ensure all the PEI lowering code does the right thing when lowering frame indices, insert code to manipulate stack pointers, etc. It's also custom lowering dynamic stack alloc into pseudo instructions so we can insert the right instructions at scheduling time.

This fixes PR4659 and PR4682.

llvm-svn: 78361
2009-08-07 00:34:42 +00:00
Bob Wilson
bd7627b23e Implement Neon VST[234] operations.
llvm-svn: 78330
2009-08-06 18:47:44 +00:00
Anton Korobeynikov
07ce0611d9 Missed pieces for ARM HardFP ABI.
Patch by Sandeep Patel!

llvm-svn: 78225
2009-08-05 19:04:42 +00:00
Dan Gohman
5d566d918b Major calling convention code refactoring.
Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.

This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.

This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.

llvm-svn: 78142
2009-08-05 01:29:28 +00:00
Bob Wilson
fe37bdfdd8 Lower Neon VLD* intrinsics to custom DAG nodes, and manually allocate the
results to fixed registers.

llvm-svn: 78025
2009-08-04 00:36:16 +00:00
Evan Cheng
fc846dd401 Optimize Thumb2 jumptable to use tbb / tbh when all the offsets fit in byte / halfword.
llvm-svn: 77422
2009-07-29 02:18:14 +00:00
Evan Cheng
cf483eb0c0 In thumb2 mode, add pc is unpredictable. Use add + mov pc instead (that is until more optimization goes in).
llvm-svn: 77364
2009-07-28 20:53:24 +00:00
Bob Wilson
ec256c8938 Add support for ARM Neon VREV instructions.
Patch by Anton Korzh, with some modifications from me.

llvm-svn: 77101
2009-07-26 00:39:34 +00:00
Evan Cheng
d615e606c4 Change Thumb2 jumptable codegen to one that uses two level jumps:
Before:
      adr r12, #LJTI3_0_0
      ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
      .long    LBB3_24
      .long    LBB3_30
      .long    LBB3_31
      .long    LBB3_32

After:
      adr r12, #LJTI3_0_0
      add pc, r12, +r0, lsl #2
LJTI3_0_0:
      b.w    LBB3_24
      b.w    LBB3_30
      b.w    LBB3_31
      b.w    LBB3_32

This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
   (smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
   into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
   won't have to over-estimate the size.

Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.

llvm-svn: 77024
2009-07-25 00:33:29 +00:00
Bob Wilson
e0478ff8e5 Fix comment typos.
llvm-svn: 75479
2009-07-13 18:11:36 +00:00
Bill Wendling
fdd5badace Update comments to make it clear that the function alignment is the Log2 of the
bytes and not bytes.

llvm-svn: 74624
2009-07-01 18:50:55 +00:00
Bill Wendling
c0fb316bd3 Add an "alignment" field to the MachineFunction object. It makes more sense to
have the alignment be calculated up front, and have the back-ends obey whatever
alignment is decided upon.

This allows for future work that would allow for precise no-op placement and the
like.

llvm-svn: 74564
2009-06-30 22:38:32 +00:00
David Goodwin
9e1280adf3 Rename ARMcmpNZ to ARMcmpZ and use it to represent comparisons that set only the Z flag (i.e. eq and ne). Make ARMcmpZ commutative.
llvm-svn: 74423
2009-06-29 15:33:01 +00:00
Bob Wilson
6db76aaf10 Add support for ARM's Advanced SIMD (NEON) instruction set.
This is still a work in progress but most of the NEON instruction set
is supported.

llvm-svn: 73919
2009-06-22 23:27:02 +00:00
Anton Korobeynikov
cc8d0058e2 Address review comments: add 3 ARM calling conventions.
Dispatch C calling conv. to one of these conventions based on
target triple and subtarget features.

llvm-svn: 73530
2009-06-16 18:50:49 +00:00
Bob Wilson
0ac9317588 Minor formatting fixes.
llvm-svn: 72172
2009-05-20 16:30:25 +00:00
Jim Grosbach
bed3aeff20 Update the names of the exception handling sjlj instrinsics to
llvm.eh.sjlj.* for better clarity as to their purpose and scope. Add
a description of llvm.eh.sjlj.setjmp to ExceptionHandling.html.
(llvm.eh.sjlj.longjmp documentation coming when that implementation is
added).

llvm-svn: 71758
2009-05-14 00:46:35 +00:00
Jim Grosbach
024feec42e Spelling correction s/builting/builtin/ and remove trailing whitespace in a few places
llvm-svn: 71735
2009-05-13 22:32:43 +00:00
Jim Grosbach
4bb5e9d1df Add support for GCC compatible builtin setjmp and longjmp intrinsics. This is
a supporting preliminary patch for GCC-compatible SjLJ exception handling. Note that these intrinsics are not designed to be invoked directly by the user, but
rather used by the front-end as target hooks for exception handling.

llvm-svn: 71610
2009-05-12 23:59:14 +00:00
Bob Wilson
911e92c7a3 Clean up formatting, remove trailing whitespace, fix comment typos and
punctuation.  No functional changes.

llvm-svn: 69378
2009-04-17 20:35:10 +00:00
Bob Wilson
b8756b00cd Use CallConvLower.h and TableGen descriptions of the calling conventions
for ARM.  Patch by Sandeep Patel.

llvm-svn: 69371
2009-04-17 19:07:39 +00:00
Bob Wilson
5b42ebe6a9 Fix PR3862: Recognize some ARM-specific constraints for immediates in inline
assembly.

llvm-svn: 68218
2009-04-01 17:58:54 +00:00
Dan Gohman
4105a38248 Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing
ScheduleDAG's TLI member to use const.

llvm-svn: 64018
2009-02-07 16:15:20 +00:00
Dale Johannesen
b7f2857776 Add some DL propagation to places that didn't
have it yet.  More coming.

llvm-svn: 63673
2009-02-03 22:26:09 +00:00
Dan Gohman
ab89b888e8 Const-qualify getPreIndexedAddressParts and friends.
llvm-svn: 62259
2009-01-15 16:29:45 +00:00
Duncan Sands
1fae2ea219 Change the interface to the type legalization method
ReplaceNodeResults: rather than returning a node which
must have the same number of results as the original
node (which means mucking around with MERGE_VALUES,
and which is also easy to get wrong since SelectionDAG
folding may mean you don't get the node you expect),
return the results in a vector.

llvm-svn: 60348
2008-12-01 11:39:25 +00:00
Dan Gohman
a712c8d29e Fix these enums' starting values to reflect the way that
instruction opcodes are now numbered. No functionality change.

llvm-svn: 56497
2008-09-23 18:42:32 +00:00
Dan Gohman
9742f7772d Rename SDOperand to SDValue.
llvm-svn: 54128
2008-07-27 21:46:04 +00:00
Duncan Sands
3ea6f15708 Rather than having a different custom legalization
hook for each way in which a result type can be
legalized (promotion, expansion, softening etc),
just use one: ReplaceNodeResults, which returns
a node with exactly the same result types as the
node passed to it, but presumably with a bunch of
custom code behind the scenes.  No change if the
new LegalizeTypes infrastructure is not turned on.

llvm-svn: 53137
2008-07-04 11:47:58 +00:00
Duncan Sands
d634afe3aa Wrap MVT::ValueType in a struct to get type safety
and better control the abstraction.  Rename the type
to MVT.  To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits().  Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).

llvm-svn: 52044
2008-06-06 12:08:01 +00:00
Dan Gohman
0285c1e9bb Fix the SVOffset values for loads and stores produced by
memcpy/memset expansion. It was a bug for the SVOffset value
to be used in the actual address calculations.

llvm-svn: 50359
2008-04-28 17:15:20 +00:00
Dan Gohman
8d46278998 Fix const-correctness issues with the SrcValue handling in the
memory intrinsic expansion code.

llvm-svn: 49666
2008-04-14 17:55:48 +00:00
Dan Gohman
15edbf989f Drop ISD::MEMSET, ISD::MEMMOVE, and ISD::MEMCPY, which are not Legal
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.

Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.

This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.

Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.

This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.

llvm-svn: 49572
2008-04-12 04:36:06 +00:00
Dan Gohman
99b38405e3 Simplify some logic in ComputeMaskedBits. And change ComputeMaskedBits
to pass the mask APInt by value, not by reference. 

llvm-svn: 47096
2008-02-13 22:28:48 +00:00
Dan Gohman
09023887f8 Convert SelectionDAG::ComputeMaskedBits to use APInt instead of uint64_t.
Add an overload that supports the uint64_t interface for use by clients
that haven't been updated yet.

llvm-svn: 47039
2008-02-13 00:35:47 +00:00
Nate Begeman
2b00217d58 This method should be virtual
llvm-svn: 46723
2008-02-04 23:04:24 +00:00
Evan Cheng
918b9c9335 Even though InsertAtEndOfBasicBlock is an ugly hack it still deserves a proper name. Rename it to EmitInstrWithCustomInserter since it does not necessarily insert
instruction at the end.

llvm-svn: 46562
2008-01-30 18:18:23 +00:00
Chris Lattner
ad9a6ccb83 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Chris Lattner
d2ee2dad04 implement a trivial readme entry.
llvm-svn: 44380
2007-11-27 22:36:16 +00:00
Chris Lattner
28262fbaf2 Several changes:
1) Change the interface to TargetLowering::ExpandOperationResult to 
   take and return entire NODES that need a result expanded, not just
   the value.  This allows us to handle things like READCYCLECOUNTER,
   which returns two values.
2) Implement (extremely limited) support in LegalizeDAG::ExpandOp for MERGE_VALUES.
3) Reimplement custom lowering in LegalizeDAGTypes in terms of the new
   ExpandOperationResult.  This makes the result simpler and fully 
   general.
4) Implement (fully general) expand support for MERGE_VALUES in LegalizeDAGTypes.
5) Implement ExpandOperationResult support for ARM f64->i64 bitconvert and ARM
   i64 shifts, allowing them to work with LegalizeDAGTypes.
6) Implement ExpandOperationResult support for X86 READCYCLECOUNTER and FP_TO_SINT,
   allowing them to work with LegalizeDAGTypes.

LegalizeDAGTypes now passes several more X86 codegen tests when enabled and when
type legalization in LegalizeDAG is ifdef'd out.

llvm-svn: 44300
2007-11-24 07:07:01 +00:00
Rafael Espindola
ec025c3042 Move the LowerMEMCPY and LowerMEMCPYCall to a common place.
Thanks for the suggestions Bill :-)

llvm-svn: 43742
2007-11-05 23:12:20 +00:00
Rafael Espindola
27a8907a7c Make ARM and X86 LowerMEMCPY identical by moving the isThumb check into getMaxInlineSizeThreshold
and by restructuring the X86 version.

New I just have to move this to a common place :-)

llvm-svn: 43554
2007-10-31 14:39:58 +00:00
Evan Cheng
252d9ddb4d Fix memcpy lowering when addresses are 4-byte aligned but size is not multiple of 4.
llvm-svn: 43234
2007-10-22 22:11:27 +00:00
Rafael Espindola
c751cbdb02 split LowerMEMCPY into LowerMEMCPYCall and LowerMEMCPYInline in the ARM backend.
llvm-svn: 43176
2007-10-19 14:35:17 +00:00
Dan Gohman
6df332f0cb Migrate X86 and ARM from using X86ISD::{,I}DIV and ARMISD::MULHILO{U,S} to
use ISD::{S,U}DIVREM and ISD::{S,U}MUL_HIO. Move the lowering code
associated with these operators into target-independent in LegalizeDAG.cpp
and TargetLowering.cpp.

llvm-svn: 42762
2007-10-08 18:33:35 +00:00
Duncan Sands
c358890f73 Fold the adjust_trampoline intrinsic into
init_trampoline.  There is now only one
trampoline intrinsic.

llvm-svn: 41841
2007-09-11 14:10:23 +00:00
Dan Gohman
352de5aeb9 More explicit keywords.
llvm-svn: 40757
2007-08-02 21:21:54 +00:00
Duncan Sands
e8bb2c6d32 Support for trampolines, except for X86 codegen which is
still under discussion.

llvm-svn: 40549
2007-07-27 12:58:54 +00:00
Dan Gohman
a62327ea40 Move ComputeMaskedBits, MaskedValueIsZero, and ComputeNumSignBits from
TargetLowering to SelectionDAG so that they have more convenient
access to the current DAG, in preparation for the ValueType routines
being changed from standalone functions to members of SelectionDAG for
the pre-legalize vector type changes.

llvm-svn: 37704
2007-06-22 14:59:07 +00:00
Evan Cheng
c49382e48f Allow predicated immediate ARM to ARM calls.
llvm-svn: 37659
2007-06-19 21:05:09 +00:00
Dale Johannesen
f66c6b85f1 More effective breakdown of memcpy into repeated load/store. These are now
in the order lod;lod;lod;sto;sto;sto which means the load-store optimizer
has a better chance of producing ldm/stm.  Ideally you would get cooperation
from the RA as well but this is not there yet.

llvm-svn: 37179
2007-05-17 21:31:21 +00:00
Lauro Ramos Venancio
f385b56870 ARM TLS: implement "general dynamic", "initial exec" and "local exec" models.
llvm-svn: 36506
2007-04-27 13:54:47 +00:00
Lauro Ramos Venancio
f8b49e5ee0 Implement PIC for arm-linux.
llvm-svn: 36324
2007-04-22 00:04:12 +00:00
Chris Lattner
98199016ae remove dead target hooks
llvm-svn: 35846
2007-04-09 23:33:39 +00:00
Chris Lattner
ae6e2c0ee5 remove some dead target hooks, subsumed by isLegalAddressingMode
llvm-svn: 35840
2007-04-09 22:27:04 +00:00
Lauro Ramos Venancio
7e21418680 - Divides the comparisons in two types: comparisons that only use N and Z
flags (ARMISD::CMPNZ) and comparisons that use all flags (ARMISD::CMP).
- Defines the instructions: TST, TEQ (ARM) and TST (Thumb).

llvm-svn: 35573
2007-04-02 01:30:03 +00:00
Chris Lattner
c0405a348d implement the new addressing mode description hook.
llvm-svn: 35521
2007-03-30 23:15:24 +00:00
Evan Cheng
a55449c051 Remove isLegalAddressImmediate.
llvm-svn: 35406
2007-03-28 01:53:55 +00:00
Chris Lattner
b19069959d switch TargetLowering::getConstraintType to take the entire constraint,
not just the first letter.  No functionality change.

llvm-svn: 35322
2007-03-25 02:14:49 +00:00
Dale Johannesen
44c0a5d545 repair x86 performance, dejagnu problems from previous change
llvm-svn: 35245
2007-03-21 21:51:52 +00:00
Dale Johannesen
3e422e3b49 do not share old induction variables when this would result in invalid
instructions (that would have to be split later)

llvm-svn: 35227
2007-03-20 21:54:54 +00:00
Dale Johannesen
c526b970ce fix obvious comment bug
llvm-svn: 35196
2007-03-20 00:30:56 +00:00
Evan Cheng
4858c6f781 Added isLegalAddressExpression(). Only allows X +/- C for now.
llvm-svn: 35122
2007-03-16 08:43:56 +00:00
Evan Cheng
7767159f08 Updated TargetLowering LSR addressing mode hooks for ARM and Thumb.
llvm-svn: 35075
2007-03-12 23:30:29 +00:00
Evan Cheng
0f07707270 - Fix codegen for pc relative constant (e.g. JT) in thumb mode:
.set PCRELV0, (LJTI1_0_0-(LPCRELL0+4))
LPCRELL0:
        add r1, pc, #PCRELV0
This is not legal since add r1, pc, #c requires the constant be a multiple of 4.
Do the following instead:
        .set PCRELV0, (LJTI1_0_0-(LPCRELL0+4))
LPCRELL0:
        mov r1, #PCRELV0
        add r1, pc

- In thumb mode, it's not possible to use .set generate a pc relative stub
  address. The stub is ARM code which is in a different section from the thumb
  code. Load the value from a constpool instead.
- Some asm printing clean up.

llvm-svn: 33664
2007-01-30 20:37:08 +00:00
Evan Cheng
c6e1d453d3 ARM backend contribution from Apple.
llvm-svn: 33353
2007-01-19 07:51:42 +00:00