1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

278 Commits

Author SHA1 Message Date
Matt Beaumont-Gay
dd419ee85c Add a 'using' declaration to suppress GCC's -Woverloaded-virtual while we
decide what pattern we want to follow in the future.

llvm-svn: 169561
2012-12-06 23:15:36 +00:00
Evan Cheng
4cdc6c4eef Replace r169459 with something safer. Rather than having computeMaskedBits to
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).

rdar://12771555

llvm-svn: 169536
2012-12-06 19:13:27 +00:00
Evan Cheng
c1db873871 Let targets provide hooks that compute known zero and ones for any_extend
and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.

define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
  %tmp1 = icmp ult i32 %a, 100
  br i1 %tmp1, label %bb1, label %bb2
bb1:
  %tmp2 = load i16* %ptr, align 2
  br label %bb2
bb2:
  %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
  %cmp = icmp ult i16 %tmp3, 24
  br i1 %cmp, label %bb3, label %exit
bb3:
  call void @bar() nounwind
  br label %exit
exit:
  ret void
}

This compiles to the followings before:
        push    {lr}
        mov     r2, #0
        cmp     r1, #99
        bhi     LBB0_2
@ BB#1:                                 @ %bb1
        ldrh    r2, [r0]
LBB0_2:                                 @ %bb2
        uxth    r0, r2
        cmp     r0, #23
        bhi     LBB0_4
@ BB#3:                                 @ %bb3
        bl      _bar
LBB0_4:                                 @ %exit
        pop     {lr}
        bx      lr

The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.

rdar://12771555

llvm-svn: 169459
2012-12-06 01:28:01 +00:00
Chandler Carruth
a98c778194 Sort includes for all of the .h files under the 'lib' tree. These were
missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224
2012-12-04 07:12:27 +00:00
Silviu Baranga
d93d64a5fd Added atomic 64 min/max/umin/umax instrinsics support in the ARM backend.
llvm-svn: 168886
2012-11-29 14:41:25 +00:00
Benjamin Kramer
bd65c85dc1 ARM: Implement CanLowerReturn so large vectors get expanded into sret.
Fixes 14337.

llvm-svn: 168809
2012-11-28 20:55:10 +00:00
Dmitri Gribenko
610d06f00e Use empty parens for empty function parameter list instead of '(void)'.
llvm-svn: 168049
2012-11-15 16:51:49 +00:00
Stepan Dyatkovskiy
ece4c2a9c1 ARM:
Removed extra stack frame object for fixed byval arguments,
VarArgsStyleRegisters invocation was reworked due to some improper usage in
past. PR14099 also demonstrates it.

llvm-svn: 166273
2012-10-19 08:23:06 +00:00
Stepan Dyatkovskiy
09c6b0a273 Issue:
Stack is formed improperly for long structures passed as byval arguments for
EABI mode.

If we took AAPCS reference, we can found the next statements:

A: "If the argument requires double-word alignment (8-byte), the NCRN (Next
Core Register Number) is rounded up to the next even register number." (5.5
Parameter Passing, Stage C, C.3).

B: "The alignment of an aggregate shall be the alignment of its most-aligned
component." (4.3 Composite Types, 4.3.1 Aggregates).

So if we have structure with doubles (9 double fields) and 3 Core unused
registers (r1, r2, r3): caller should use r2 and r3 registers only.
Currently r1,r2,r3 set is used, but it is invalid.

Callee VA routine should also use r2 and r3 regs only. All is ok here. This
behaviour is guessed by rounding up SP address with ADD+BFC operations.

Fix:
Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and
8 byte alignment, we waste odd registers then.

P.S.:
I also improved LDRB_POST_IMM regression test. Since ldrb instruction will
not generated by current regression test after this patch. 

llvm-svn: 166018
2012-10-16 07:16:47 +00:00
Stepan Dyatkovskiy
5182bb8695 Issue description:
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.

Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.

Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values. 

llvm-svn: 165616
2012-10-10 11:37:36 +00:00
Arnold Schwaighofer
d606c6fcdf Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 163136
2012-09-04 14:37:49 +00:00
Nadav Rotem
d1815a0763 Not all targets have efficient ISel code generation for select instructions.
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.

llvm-svn: 163093
2012-09-02 12:10:19 +00:00
Jakob Stoklund Olesen
abf0a9ec82 Remove the CAND/COR/CXOR custom ISD nodes and their select code.
These nodes are no longer needed because the peephole pass can fold
CMOV+AND into ANDCC etc.

llvm-svn: 162179
2012-08-18 21:49:50 +00:00
Arnold Schwaighofer
c751a25aed Revert 161581: Patch to implement UMLAL/SMLAL instructions for the ARM
architecture

It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7
thumb O3.

llvm-svn: 161736
2012-08-12 05:11:56 +00:00
Craig Topper
4d9cbceefd Change addTypeForNeon to use MVT instead of EVT so all the calls to getSimpleVT can be removed.
llvm-svn: 161735
2012-08-12 03:16:37 +00:00
Arnold Schwaighofer
f3d4d73157 Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 161581
2012-08-09 15:25:52 +00:00
Bob Wilson
d1eefbeac2 Fall back to selection DAG isel for calls to builtin functions.
Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>

llvm-svn: 161232
2012-08-03 04:06:28 +00:00
Bill Wendling
2320ff76d7 Re-enable the CMN instruction.
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>

llvm-svn: 158302
2012-06-11 08:07:26 +00:00
Manman Ren
2a5898a76b ARM: properly handle alignment for struct byval.
Factor out the expansion code into a function.
This change is to be enabled in clang.

rdar://9877866

llvm-svn: 157830
2012-06-01 19:33:18 +00:00
Manman Ren
f30d4765c5 ARM: support struct byval in llvm
We handle struct byval by inserting a pseudo op, which will be expanded to a
loop at ExpandISelPseudos.
A separate patch for clang will be submitted to enable struct byval.

rdar://9877866

llvm-svn: 157793
2012-06-01 02:44:42 +00:00
Justin Holewinski
77c4679dae Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall
to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.

NV_CONTRIB

llvm-svn: 157479
2012-05-25 16:35:28 +00:00
Hans Wennborg
b3c41d012d Make ARM and Mips use TargetMachine::getTLSModel()
This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).

llvm-svn: 156162
2012-05-04 09:40:39 +00:00
Evan Cheng
5825e9dbf5 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Rafael Espindola
88a1aeb123 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Craig Topper
0534d071b7 Reorder includes to match coding standards. Fix an issue or two exposed by that.
llvm-svn: 152978
2012-03-17 07:33:42 +00:00
Lang Hames
7918b0b225 Use vmov.f32 to materialize f32 consts on ARM. This relaxes constraints on
register allocation by allowing all 32 D-registers to be used. Patch by Cameron
Zwarich.

llvm-svn: 152824
2012-03-15 18:49:02 +00:00
Evan Cheng
c5ead6c49e Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call.
llvm-svn: 151645
2012-02-28 18:51:51 +00:00
Daniel Dunbar
b448d31a6b Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.
llvm-svn: 151630
2012-02-28 15:36:07 +00:00
Evan Cheng
d29a22e4b0 Some ARM implementaions, e.g. A-series, does return stack prediction. That is,
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623
2012-02-28 06:42:03 +00:00
Evan Cheng
d18a688213 Optimize a couple of common patterns involving conditional moves where the false
value is zero. Instead of a cmov + op, issue an conditional op instead. e.g.
    cmp   r9, r4
    mov   r4, #0
    moveq r4, #1 
    orr   lr, lr, r4

should be:
    cmp   r9, r4
    orreq lr, lr, #1

That is, optimize (or x, (cmov 0, y, cond)) to (or.cond x, y). Similarly extend
this to xor as well as (and x, (cmov -1, y, cond)) => (and.cond x, y).

It's possible to extend this to ADD and SUB but I don't think they are common.

rdar://8659097

llvm-svn: 151224
2012-02-23 01:19:06 +00:00
Craig Topper
3ed929de0a Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.
llvm-svn: 151134
2012-02-22 05:59:10 +00:00
Bob Wilson
0e88464871 Fix ARM SjLj-EH dispatch setup code. <rdar://problem/10444602>
The EmitBasePointerRecalculation function has 2 problems, one minor and one
fatal.  The minor problem is that it inserts the code at the setjmp
instead of in the dispatch block.  The fatal problem is that at the point
where this code runs, we don't know whether there will be a base pointer,
so the entire function is a no-op.  The base pointer recalculation needs to
be handled as it was before, by inserting a pseudo instruction that gets
expanded late.

Most of the support for the old approach is still here, but it no longer
has any connection to the eh_sjlj_dispatchsetup intrinsic.  Clean up the
parts related to the intrinsic and just generate the pseudo instruction
directly.

llvm-svn: 144781
2011-11-16 07:11:57 +00:00
Evan Cheng
47d8f8af84 Add vmov.f32 to materialize f32 immediate splats which cannot be handled by
integer variants. rdar://10437054

llvm-svn: 144608
2011-11-15 02:12:34 +00:00
Lang Hames
ba63f9da8b Fixed parameter name.
llvm-svn: 143594
2011-11-02 23:37:04 +00:00
Lang Hames
ceec8ec67e Try to lower memset/memcpy/memmove to vector instructions on ARM where the alignment permits.
llvm-svn: 143582
2011-11-02 22:52:45 +00:00
Bill Wendling
1f7c16c63f Take the code that was emitted for the llvm.eh.dispatch.setup intrinsic and emit
it with the new SjLj emitter stuff. This way there's no need to emit that
kind-of-hacky intrinsic.

llvm-svn: 141419
2011-10-07 22:08:37 +00:00
Bill Wendling
3c68e1d212 Refactor some of the code that sets up the entry block for SjLj EH. No functionality change.
llvm-svn: 141323
2011-10-06 22:18:16 +00:00
Bill Wendling
834bb83a41 Check-pointing the new SjLj EH lowering.
This code will replace the version in ARMAsmPrinter.cpp. It creates a new
machine basic block, which is the dispatch for the return from a longjmp
call. It then shoves the address of that machine basic block into the correct
place in the function context so that the EH runtime will jump to it directly
instead of having to go through a compare-and-jump-to-the-dispatch bit. This
should be more efficient in the common case.

llvm-svn: 141031
2011-10-03 21:25:38 +00:00
Jim Grosbach
d94ffffc87 ARM fix encoding of VMOV.f32 and VMOV.f64 immediates.
Encode the immediate into its 8-bit form as part of isel rather than later,
which simplifies things for mapping the encoding bits, allows the removal
of the custom disassembler decoding hook, makes the operand printer trivial,
and prepares things more cleanly for handling these in the asm parser.

rdar://10211428

llvm-svn: 140834
2011-09-30 00:50:06 +00:00
Duncan Sands
d1311488fe Add codegen support for vector select (in the IR this means a select
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons.  Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all").  Patch mostly by
Nadav Rotem.

llvm-svn: 139159
2011-09-06 19:07:46 +00:00
Eli Friedman
5d3814e0c4 64-bit atomic cmpxchg for ARM.
llvm-svn: 138868
2011-08-31 17:52:22 +00:00
Eli Friedman
d71c865ae0 Some 64-bit atomic operations on ARM. 64-bit cmpxchg coming next.
llvm-svn: 138845
2011-08-31 00:31:29 +00:00
Evan Cheng
91aa81acaa Follow up to r138791.
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.

llvm-svn: 138810
2011-08-30 19:09:48 +00:00
Evan Cheng
1eacb83316 Change ARM / Thumb2 addc / adde and subc / sube modeling to use physical
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.

When a i64 sub is expanded to subc + sube.
  libcall #1
     \
      \        subc 
       \       /  \
        \     /    \
         \   /    libcall #2
          sube

If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.

  subc
   |
  libcall #2
   |
  libcall #1
   |
  sube

However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.

The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.

rdar://10019576

llvm-svn: 138791
2011-08-30 01:34:54 +00:00
Chris Lattner
e1fe7061ce land David Blaikie's patch to de-constify Type, with a few tweaks.
llvm-svn: 135375
2011-07-18 04:54:35 +00:00
Evan Cheng
37ff73dfaf Improve codegen for select's:
if (x != 0) x = 1
if (x == 1) x = 1

Previous codegen looks like this:
        mov     r1, r0
        cmp     r1, #1
        mov     r0, #0
        moveq   r0, #1

The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
        cmp     r0, #1
        movne   r0, #0

rdar://9758317

llvm-svn: 135017
2011-07-13 00:42:17 +00:00
Eric Christopher
2090e793c1 Remove getRegClassForInlineAsmConstraint from the ARM port.
Part of rdar://9643582

llvm-svn: 134095
2011-06-29 21:10:36 +00:00
Eric Christopher
d68494ffdd Have LowerOperandForConstraint handle multiple character constraints.
Part of rdar://9119939

llvm-svn: 132510
2011-06-02 23:16:42 +00:00
Eli Friedman
12e590e760 Make the logic for determining function alignment more explicit. No functionality change.
llvm-svn: 131012
2011-05-06 20:34:06 +00:00
Dan Gohman
7beb845bab Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Jim Grosbach
77d45564c3 ARM and Thumb2 support for atomic MIN/MAX/UMIN/UMAX loads.
rdar://9326019

llvm-svn: 130234
2011-04-26 19:44:18 +00:00
Andrew Trick
a130d110d1 Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Andrew Trick
31c7962ce5 whitespace
llvm-svn: 130046
2011-04-23 03:24:11 +00:00
Stuart Hastings
a552942e02 ARM byval support. Will be enabled by another patch to the FE. <rdar://problem/7662569>
llvm-svn: 129858
2011-04-20 16:47:52 +00:00
Cameron Zwarich
1b8f91d2c8 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng
dd99a0a548 Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.
llvm-svn: 127981
2011-03-21 01:19:09 +00:00
Daniel Dunbar
34c65737c3 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng
c5f50f7322 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Bill Wendling
388dad6d62 Some minor cleanups based on feedback.
llvm-svn: 127694
2011-03-15 20:47:26 +00:00
Bill Wendling
da1364d669 Generate a VTBL instruction instead of a series of loads and stores when we
can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:

_shuf:
@ BB#0:       @ %entry
  push        {r4, r7, lr}
  add         r7, sp, #4
  sub         sp, #12
  mov         r4, sp
  bic         r4, r4, #7
  mov         sp, r4
  mov         r2, sp
  vmov        d16, r0, r1
  orr         r0, r2, #6
  orr         r3, r2, #7
  vst1.8      {d16[0]}, [r3]
  vst1.8      {d16[5]}, [r0]
  subs        r4, r7, #4
  orr         r0, r2, #5
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #4
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #3
  vst1.8      {d16[0]}, [r0]
  orr         r0, r2, #2
  vst1.8      {d16[2]}, [r0]
  orr         r0, r2, #1
  vst1.8      {d16[1]}, [r0]
  vst1.8      {d16[3]}, [r2]
  vldr.64     d16, [sp]
  vmov        r0, r1, d16
  mov         sp, r4
  pop         {r4, r7, pc}

The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>

llvm-svn: 127630
2011-03-14 23:02:38 +00:00
Bob Wilson
f8c4d1ded9 Fix a compiler crash where a Glue value had multiple uses. Radar 9049552.
llvm-svn: 127198
2011-03-08 01:17:20 +00:00
Cameron Zwarich
a1920d7f51 Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.
llvm-svn: 127175
2011-03-07 21:56:36 +00:00
Bob Wilson
1497601a7b Remove unused conditional negate operations.
llvm-svn: 127090
2011-03-05 16:54:31 +00:00
Stuart Hastings
539d4e1460 Support for byval parameters on ARM. Will be enabled by a forthcoming
patch to the front-end.  Radar 7662569.

llvm-svn: 126655
2011-02-28 17:17:53 +00:00
Bob Wilson
65f4a70b82 Add codegen support for using post-increment NEON load/store instructions.
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.

llvm-svn: 125014
2011-02-07 17:43:21 +00:00
Evan Cheng
c7ce7e2ac3 Given a pair of floating point load and store, if there are no other uses of
the load, then it may be legal to transform the load and store to integer
load and store of the same width.

This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []

to

ldr r12, []
str r12, []

rdar://8944252

llvm-svn: 124708
2011-02-02 01:06:55 +00:00
Evan Cheng
0dfe28a9b5 Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.

llvm-svn: 123991
2011-01-21 18:55:51 +00:00
Evan Cheng
53ec6fc591 Materialize GA addresses with movw + movt pairs for Darwin in PIC mode. e.g.
movw    r0, :lower16:(L_foo$non_lazy_ptr-(LPC0_0+4))
        movt    r0, :upper16:(L_foo$non_lazy_ptr-(LPC0_0+4))
LPC0_0:
        add     r0, pc, r0

It's not yet enabled by default as some tests are failing. I suspect bugs in
down stream tools.

llvm-svn: 123619
2011-01-17 08:03:18 +00:00
Evan Cheng
1afd04fc59 Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Bob Wilson
c485ff3ced Lower some BUILD_VECTORS using VEXT+shuffle.
Patch by Tim Northover.

llvm-svn: 123035
2011-01-07 21:37:30 +00:00
Evan Cheng
f7e586d749 Enable sibling call optimization of libcalls which are expanded during
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777

llvm-svn: 120501
2010-11-30 23:55:39 +00:00
Bob Wilson
3bb61d1932 Add support for NEON VLD2-dup instructions.
llvm-svn: 120236
2010-11-28 06:51:26 +00:00
Owen Anderson
52e3873edc Add support for ARM's specialized vector-compare-against-zero instructions.
llvm-svn: 118453
2010-11-08 23:21:22 +00:00
Owen Anderson
f26ea37db7 Disallow the certain NEON modified-immediate forms when generating vorr or vbic.
llvm-svn: 118300
2010-11-05 21:57:54 +00:00
Owen Anderson
add19dd6dd Add codegen and encoding support for the immediate form of vbic.
llvm-svn: 118291
2010-11-05 19:27:46 +00:00
Owen Anderson
1a89511e5d Add support for code generation of the one register with immediate form of vorr.
We could be more aggressive about making this work for a larger range of constants,
but this seems like a good start.

llvm-svn: 118201
2010-11-03 22:44:51 +00:00
Evan Cheng
eab7251695 Fix preload instruction isel. Only v7 supports pli, and only v7 with mp extension supports pldw. Add subtarget attribute to denote mp extension support and legalize illegal ones to nothing.
llvm-svn: 118160
2010-11-03 06:34:55 +00:00
Bob Wilson
183c466006 Overhaul memory barriers in the ARM backend. Radar 8601999.
There were a number of issues to fix up here:
* The "device" argument of the llvm.memory.barrier intrinsic should be
used to distinguish the "Full System" domain from the "Inner Shareable"
domain.  It has nothing to do with using DMB vs. DSB instructions.
* The compiler should never need to emit DSB instructions.  Remove the
ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB.
* Merge the separate DMB/DSB instructions for options only used for the
disassembler with the default DMB/DSB instructions.  Add the default
"full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum.
* Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement
a data memory barrier using the MCR instruction.
* Fix up encodings for these instructions (except MCR).
I also updated the tests and added a few new ones to check for DMB options
that were not currently being exercised.

llvm-svn: 117756
2010-10-30 00:54:37 +00:00
John Thompson
6115a7f1d4 Inline asm multiple alternative constraints development phase 2 - improved basic logic, added initial platform support.
llvm-svn: 117667
2010-10-29 17:29:13 +00:00
Jim Grosbach
a8c0be5343 Add a pre-dispatch SjLj EH hook on the unwind edge for targets to do any
setup they require. Use this for ARM/Darwin to rematerialize the base
pointer from the frame pointer when required. rdar://8564268

llvm-svn: 116879
2010-10-19 23:27:08 +00:00
Bob Wilson
6b6b53ad6f Remove unused ARMISD::AND selection DAG node.
llvm-svn: 116566
2010-10-15 04:34:40 +00:00
Bob Wilson
c4345abcc0 Define the TargetLowering::getTgtMemIntrinsic hook for ARM so that NEON load
and store intrinsics are represented with MemIntrinsicSDNodes.

llvm-svn: 114454
2010-09-21 17:56:22 +00:00
Evan Cheng
c9cb37516d Teach if-converter to be more careful with predicating instructions that would
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.

llvm-svn: 113570
2010-09-10 01:29:16 +00:00
Bob Wilson
3348d2eb50 Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply,
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests.  Add auto-upgrade support for the old intrinsics.

llvm-svn: 112773
2010-09-01 23:50:19 +00:00
Bill Wendling
385ad1516f Create an ARMISD::AND node. This node is exactly like the "ARM::AND" node, but
it sets the CPSR register.

llvm-svn: 112393
2010-08-29 03:02:11 +00:00
Bill Wendling
f10d5c00fc Consider this code snippet:
float t1(int argc) {
  return (argc == 1123) ? 1.234f : 2.38213f;
}

We would generate truly awful code on ARM (those with a weak stomach should look
away):

_t1:
  movw   r1, #1123
  movs   r2, #1
  movs   r3, #0
  cmp    r0, r1
  mov.w  r0, #0
  it     eq
  moveq  r0, r2
  movs   r1, #4
  cmp    r0, #0
  it     ne
  movne  r3, r1
  adr    r0, #LCPI1_0
  ldr    r0, [r0, r3]
  bx     lr

The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".

I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.

Now we generate this, which looks optimal to me:

_t1:
  movw   r1, #1123
  movs   r2, #0
  cmp    r0, r1
  adr    r0, #LCPI0_0
  it     eq
  moveq  r2, #4
  ldr    r0, [r0, r2]
  bx     lr
  .align  2
LCPI0_0:
  .long   1075344593  @ float 2.382130e+00
  .long   1067316150  @ float 1.234000e+00

llvm-svn: 110799
2010-08-11 08:43:16 +00:00
Nate Begeman
b506e13a32 Add support for getting & setting the FPSCR application register on ARM when VFP is enabled.
Add support for using the FPSCR in conjunction with the vcvtr instruction, for controlling fp to int rounding.
Add support for the FLT_ROUNDS_ node now that the FPSCR is exposed.

llvm-svn: 110152
2010-08-03 21:31:55 +00:00
Jim Grosbach
03b130774b Remove dead prototype
llvm-svn: 109691
2010-07-28 23:16:12 +00:00
Anton Korobeynikov
7ae895e007 Hook in GlobalMerge pass
llvm-svn: 109359
2010-07-24 21:52:08 +00:00
Evan Cheng
f215e55d5f - Allow target to specify when is register pressure "too high". In most cases,
it's too late to start backing off aggressive latency scheduling when most
  of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
  For ARM, this is almost always a win on # of instructions. It's runtime
  neutral for most of the tests. But for some kernels with high register
  pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
  54 and sped up by 20%.

llvm-svn: 109279
2010-07-23 22:39:59 +00:00
Eric Christopher
3d118d5e8a Baby steps towards ARM fast-isel.
llvm-svn: 109047
2010-07-21 22:26:11 +00:00
Evan Cheng
df725c25dd Teach bottom up pre-ra scheduler to track register pressure. Work in progress.
llvm-svn: 108991
2010-07-21 06:09:07 +00:00
Evan Cheng
b2ad0066f5 ARM has to provide its own TargetLowering::findRepresentativeClass because its scalar floating point registers alias its vector registers.
llvm-svn: 108761
2010-07-19 22:15:08 +00:00
Jim Grosbach
dc21ac2e0a Since ARM emits inline jump tables as part of the ConstantIsland pass,
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR6581.

llvm-svn: 108730
2010-07-19 17:20:38 +00:00
Jim Grosbach
5b8c14ce8a revert so I can get the right PR# in the log message.
llvm-svn: 108727
2010-07-19 17:19:40 +00:00
Jim Grosbach
42f3134738 Since ARM emits inline jump tables as part of the ConstantIsland pass,
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR7499.

llvm-svn: 108722
2010-07-19 17:18:28 +00:00
Jim Grosbach
749f4fca0a Add basic support to code-gen the ARM/Thumb2 bit-field insert (BFI) instruction
and a combine pattern to use it for setting a bit-field to a constant
value. More to come for non-constant stores.

llvm-svn: 108570
2010-07-16 23:05:05 +00:00
Bob Wilson
34f481e895 Add support for NEON VMVN immediate instructions.
llvm-svn: 108324
2010-07-14 06:31:50 +00:00
Bob Wilson
7feb850d36 Use a target-specific VMOVIMM DAG node instead of BUILD_VECTOR to represent
NEON VMOV-immediate instructions.  This simplifies some things.

llvm-svn: 108275
2010-07-13 21:16:48 +00:00
Evan Cheng
069f1f7c9a Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0.
llvm-svn: 108258
2010-07-13 19:27:42 +00:00