1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 12:12:47 +01:00
Commit Graph

10639 Commits

Author SHA1 Message Date
Chris Lattner
65b9983515 Allow passing a dag into dump and getOperationName. If one is available
when printing a node, use it to render target operations with their
target instruction name instead of "<<unknown>>".

llvm-svn: 22804
2005-08-16 18:33:07 +00:00
Chris Lattner
1b07a165e0 Use a extant helper to do this.
llvm-svn: 22802
2005-08-16 18:31:23 +00:00
Chris Lattner
73348d1e89 Add some methods for dag->dag isel.
Split RemoveNodeFromCSEMaps out of DeleteNodesIfDead to do it.

llvm-svn: 22801
2005-08-16 18:17:10 +00:00
Chris Lattner
a4c5954c52 Pull the LLVM -> DAG lowering code out of the pattern selector so that it
can be shared with the DAG->DAG selector.

llvm-svn: 22799
2005-08-16 17:14:42 +00:00
Chris Lattner
bc70c99aef Fix a bad case in gzip where we put lots of things in registers across the
loop, because a IV-dependent value was used outside of the loop and didn't
have immediate-folding capability

llvm-svn: 22798
2005-08-16 00:38:11 +00:00
Chris Lattner
3bfebb1e8f Fix Transforms/LoopStrengthReduce/2005-08-15-AddRecIV.ll
llvm-svn: 22797
2005-08-16 00:37:01 +00:00
Chris Lattner
543dd2e01c Turn loop strength reduction on by default.
Only run createLowerConstantExpressionsPass for the simple isel.  The DAG
isel has no need for it.

llvm-svn: 22794
2005-08-15 23:47:04 +00:00
Chris Lattner
c2cbe96e25 Teach LLVM to know how many times a loop executes when constructed with
a < expression, e.g.: for (i = m; i < n; ++i)

llvm-svn: 22793
2005-08-15 23:33:51 +00:00
Jim Laskey
c501f51bd2 Broke 80 column rule.
llvm-svn: 22792
2005-08-15 17:35:26 +00:00
Jim Laskey
92eccb9901 Changed code gen for int to f32 to use rounding. This makes FP results
consistent with gcc.

llvm-svn: 22791
2005-08-15 17:14:19 +00:00
Andrew Lenharth
b9c77079dc isIntImmediate is a good Idea. Add a flavor that checks bounds while it is at it
llvm-svn: 22790
2005-08-15 14:31:37 +00:00
Nate Begeman
54423e60c6 Fix last night's PPC32 regressions by
1. Not selecting the false value of a select_cc in the false arm, which
   isn't legal for nested selects.
2. Actually returning the node we created and Legalized in the FP_TO_UINT
   Expander.

llvm-svn: 22789
2005-08-14 18:38:32 +00:00
Nate Begeman
6e0168fe5f Fix last night's X86 regressions by putting code for SSE in the if(SSE)
block.  nur.

llvm-svn: 22788
2005-08-14 18:37:02 +00:00
Andrew Lenharth
972c05771c only build .a on alpha
llvm-svn: 22787
2005-08-14 15:14:34 +00:00
Nate Begeman
89f12b7721 Fix FP_TO_UINT with Scalar SSE2 now that the legalizer can handle it. We
now generate the relatively good code sequences:
unsigned short foo(float a) { return a; }
_foo:
        movss 4(%esp), %xmm0
        cvttss2si %xmm0, %eax
        movzwl %ax, %eax
        ret

and
unsigned bar(float a) { return a; }
_bar:
        movss .CPI_bar_0, %xmm0
        movss 4(%esp), %xmm1
        movapd %xmm1, %xmm2
        subss %xmm0, %xmm2
        cvttss2si %xmm2, %eax
        xorl $-2147483648, %eax
        cvttss2si %xmm1, %ecx
        ucomiss %xmm0, %xmm1
        cmovb %ecx, %eax
        ret

llvm-svn: 22786
2005-08-14 04:36:51 +00:00
Nate Begeman
9be6a214ff Teach the legalizer how to legalize FP_TO_UINT.
Teach the legalizer to promote FP_TO_UINT to FP_TO_SINT if the wider
  FP_TO_UINT is also illegal.  This allows us on PPC to codegen
  unsigned short foo(float a) { return a; }

as:
_foo:
.LBB_foo_0:     ; entry
        fctiwz f0, f1
        stfd f0, -8(r1)
        lwz r2, -4(r1)
        rlwinm r3, r2, 0, 16, 31
        blr

instead of:
_foo:
.LBB_foo_0:     ; entry
        fctiwz f0, f1
        stfd f0, -8(r1)
        lwz r2, -4(r1)
        lis r3, ha16(.CPI_foo_0)
        lfs f0, lo16(.CPI_foo_0)(r3)
        fcmpu cr0, f1, f0
        blt .LBB_foo_2  ; entry
.LBB_foo_1:     ; entry
        fsubs f0, f1, f0
        fctiwz f0, f0
        stfd f0, -16(r1)
        lwz r2, -12(r1)
        xoris r2, r2, 32768
.LBB_foo_2:     ; entry
        rlwinm r3, r2, 0, 16, 31
        blr

llvm-svn: 22785
2005-08-14 01:20:53 +00:00
Nate Begeman
2426bb6589 Make FP_TO_UINT Illegal. This allows us to generate significantly better
codegen for FP_TO_UINT by using the legalizer's SELECT variant.

Implement a codegen improvement for SELECT_CC, selecting the false node in
the MBB that feeds the phi node.  This allows us to codegen:
void foo(int *a, int b, int c) { int d = (a < b) ? 5 : 9; *a = d; }
as:
_foo:
        li r2, 5
        cmpw cr0, r4, r3
        bgt .LBB_foo_2  ; entry
.LBB_foo_1:     ; entry
        li r2, 9
.LBB_foo_2:     ; entry
        stw r2, 0(r3)
        blr

insted of:
_foo:
        li r2, 5
        li r5, 9
        cmpw cr0, r4, r3
        bgt .LBB_foo_2  ; entry
.LBB_foo_1:     ; entry
        or r2, r5, r5
.LBB_foo_2:     ; entry
        stw r2, 0(r3)
        blr

llvm-svn: 22784
2005-08-14 01:17:16 +00:00
Andrew Lenharth
5f4a72a18e Testing a variable before it is defined doesn't work so well. It is a fairly small thing, so just let everyone build the .a file
llvm-svn: 22783
2005-08-13 14:58:23 +00:00
Chris Lattner
57a2a74e99 Ooops, don't forget to clear this. The real inner loop is now:
.LBB_foo_3:     ; no_exit.1
        lfd f2, 0(r9)
        lfd f3, 8(r9)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r9)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfd f2, 0(r9)
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22782
2005-08-13 07:42:01 +00:00
Chris Lattner
f59e855dbc Recursively scan scev expressions for common subexpressions. This allows us
to handle nested loops much better, for example, by being able to tell that
these two expressions:

{( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>

{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

Have the following common part that can be shared:
{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

This allows us to codegen an important inner loop in 168.wupwise as:

.LBB_foo_4:     ; no_exit.1
        lfd f2, 16(r9)
        fmul f3, f0, f2
        fmul f2, f1, f2
        fadd f4, f3, f2
        stfd f4, 8(r9)
        fsub f2, f3, f2
        stfd f2, 16(r9)
        addi r8, r8, 1
        addi r9, r9, 16
        cmpw cr0, r8, r4
        ble .LBB_foo_4  ; no_exit.1

instead of:

.LBB_foo_3:     ; no_exit.1
        lfdx f2, r6, r9
        add r10, r6, r9
        lfd f3, 8(r10)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r10)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfdx f2, r6, r9
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22781
2005-08-13 07:27:18 +00:00
Nate Begeman
021a5b3fe1 Remove an unncessary argument to SimplifySelectCC and add an additional
assert when creating a select_cc node.

llvm-svn: 22780
2005-08-13 06:14:17 +00:00
Nate Begeman
4e8f777256 Fix the fabs regression on x86 by abstracting the select_cc optimization
out into SimplifySelectCC.  This allows both ISD::SELECT and ISD::SELECT_CC
to use the same set of simplifying folds.

llvm-svn: 22779
2005-08-13 06:00:21 +00:00
Nate Begeman
55d6c917c5 Remove support for 64b PPC, it's been broken for a long time. It'll be
back once a DAG->DAG ISel exists.

llvm-svn: 22778
2005-08-13 05:59:16 +00:00
Andrew Lenharth
d9a0c60da4 Fix oversized GOT problem with gcc-4 on alpha
llvm-svn: 22777
2005-08-13 05:09:50 +00:00
Chris Lattner
2681a83c43 Teach SplitCriticalEdge to update LoopInfo if it is alive. This fixes
a problem in LoopStrengthReduction, where it would split critical edges
then confused itself with outdated loop information.

llvm-svn: 22776
2005-08-13 01:38:43 +00:00
Chris Lattner
87bcd2794b remove dead code. The exit block list is computed on demand, thus does not
need to be updated.  This code is a relic from when it did.

llvm-svn: 22775
2005-08-13 01:30:36 +00:00
Chris Lattner
e06d2c3760 implement a couple of simple shift foldings.
e.g.  (X & 7) >> 3   -> 0

llvm-svn: 22774
2005-08-12 23:54:58 +00:00
Jim Laskey
6b95280f3c Fix for 2005-08-12-rlwimi-crash.ll. Make allowance for masks being shifted to
zero.

llvm-svn: 22773
2005-08-12 23:52:46 +00:00
Jim Laskey
7b32c9131d 1. This changes handles the cases of (~x)&y and x&(~y) yielding ANDC, and
(~x)|y and x|(~y) yielding ORC.

llvm-svn: 22771
2005-08-12 23:38:02 +00:00
Chris Lattner
811ef4cce0 When splitting critical edges, make sure not to leave the new block in the
middle of the loop.  This turns a critical loop in gzip into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

instead of this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_3:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
        add r2, r11, r27
        add r8, r12, r27
        b .LBB_test_9   ; loopexit
.LBB_test_5:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
        add r2, r9, r27
        add r8, r10, r27
        b .LBB_test_9   ; loopexit
.LBB_test_7:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

Next up, improve the code for the loop.

llvm-svn: 22769
2005-08-12 22:22:17 +00:00
Chris Lattner
4e2d62ba5d Add a helper method
llvm-svn: 22768
2005-08-12 22:14:06 +00:00
Chris Lattner
a5c0038c25 Fix a FIXME: if we are inserting code for a PHI argument, split the critical
edge so that the code is not always executed for both operands.  This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops).  On gzip, for
example, we turn this ugly code:

.LBB_test_1:    ; loopentry
        add r27, r3, r28
        lhz r27, 3(r27)
        add r26, r4, r28
        lhz r26, 3(r26)
        add r25, r30, r28    ;; Only live if exiting the loop
        add r24, r29, r28    ;; Only live if exiting the loop
        cmpw cr0, r27, r26
        bne .LBB_test_5 ; loopexit

into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_2:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_t_3:    ; shortcirc_next.0
.LBB_test_3:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


Next step: get the block out of the loop so that the loop is all
fall-throughs again.

llvm-svn: 22766
2005-08-12 22:06:11 +00:00
Chris Lattner
7b5d0e1463 Change break critical edges to not remove, then insert, PHI node entries.
Instead, just update the BB in-place.  This is both faster, and it prevents
split-critical-edges from shuffling the PHI argument list unneccesarily.

llvm-svn: 22765
2005-08-12 21:58:07 +00:00
Andrew Lenharth
b9b598d55f match gcc's use of tabs, makes diffs easier
llvm-svn: 22764
2005-08-12 16:14:08 +00:00
Andrew Lenharth
57c0f5da32 .section cleanup, patch from Nicholas Riley
llvm-svn: 22763
2005-08-12 16:13:43 +00:00
Jim Laskey
c79447e1ec 1. Added the function isOpcWithIntImmediate to simplify testing of operand with
specified opcode and an integer constant right operand.

2. Modified ISD::SHL, ISD::SRL, ISD::SRA to use rlwinm when applied after a mask.

llvm-svn: 22761
2005-08-11 21:59:23 +00:00
Chris Lattner
4d353459a3 Tidied up the use of dyn_cast<ConstantSDNode> by using isIntImmediate more.
Patch by Jim Laskey.

llvm-svn: 22760
2005-08-11 17:56:50 +00:00
Chris Lattner
5cba0bb5bb Use a more efficient method of creating integer and float virtual registers
(avoids an extra level of indirection in MakeReg).

  defined MakeIntReg using RegMap->createVirtualRegister(PPC32::GPRCRegisterClass)
  defined MakeFPReg using RegMap->createVirtualRegister(PPC32::FPRCRegisterClass)

  s/MakeReg(MVT::i32)/MakeIntReg/
  s/MakeReg(MVT::f64)/MakeFPReg/

Patch by Jim Laskey!

llvm-svn: 22759
2005-08-11 17:15:31 +00:00
Nate Begeman
09c56e0432 Add a select_cc optimization for recognizing abs(int). This speeds up an
integer MPEG encoding loop by a factor of two.

llvm-svn: 22758
2005-08-11 02:18:13 +00:00
Nate Begeman
206e850add Some SELECT_CC cleanups:
1. move assertions for node creation to getNode()
2. legalize the values returned in ExpandOp immediately
3. Move select_cc optimizations from SELECT's getNode() to SELECT_CC's,
   allowing them to be cleaned up significantly.

This paves the way to pick up additional optimizations on SELECT_CC, such
as sum-of-absolute-differences.

llvm-svn: 22757
2005-08-11 01:12:20 +00:00
Nate Begeman
23479935cc Make SELECT illegal on PPC32, switch to using SELECT_CC, which more closely
reflects what the hardware is capable of.  This significantly simplifies
the CC handling logic throughout the ISel.

llvm-svn: 22756
2005-08-10 20:52:09 +00:00
Nate Begeman
eddc9d4856 Add new node, SELECT_CC. This node is for targets that don't natively
implement SELECT.

llvm-svn: 22755
2005-08-10 20:51:12 +00:00
Chris Lattner
512a5e507e Changes for PPC32ISelPattern.cpp
1. Clean up how SelectIntImmediateExpr handles use counts.
2. "Subtract from" was not clearing hi 16 bits.

Patch by Jim Laskey

llvm-svn: 22754
2005-08-10 18:11:33 +00:00
Chris Lattner
51cf9fd316 Fix an oversight that may be causing PR617.
llvm-svn: 22753
2005-08-10 17:37:53 +00:00
Chris Lattner
67cef1a1d8 remove some trickiness that broke yacr2 and some other programs last night
llvm-svn: 22751
2005-08-10 17:15:20 +00:00
Chris Lattner
91f83576d8 Changed the XOR case to use the isOprNot predicate.
Patch by Jim Laskey!

llvm-svn: 22750
2005-08-10 16:35:46 +00:00
Chris Lattner
ad6d368eee 1. Refactored handling of integer immediate values for add, or, xor and sub.
New routine: ISel::SelectIntImmediateExpr
  2. Now checking use counts of large constants.  If use count is > 2 then drop
  thru so that the constant gets loaded into a register.
  Source:

int %test1(int %a) {
entry:
       %tmp.1 = add int %a,      123456789      ; <int> [#uses=1]
       %tmp.2 = or  int %tmp.1,  123456789      ; <int> [#uses=1]
       %tmp.3 = xor int %tmp.2,  123456789      ; <int> [#uses=1]
       %tmp.4 = sub int %tmp.3, -123456789      ; <int> [#uses=1]
       ret int %tmp.4
}

Did Emit:

       .machine ppc970


       .text
       .align  2
       .globl  _test1
_test1:
.LBB_test1_0:   ; entry
       addi r2, r3, -13035
       addis r2, r2, 1884
       ori r2, r2, 52501
       oris r2, r2, 1883
       xori r2, r2, 52501
       xoris r2, r2, 1883
       addi r2, r2, 52501
       addis r3, r2, 1883
       blr


Now Emits:

       .machine ppc970


       .text
       .align  2
       .globl  _test1
_test1:
.LBB_test1_0:   ; entry
       lis r2, 1883
       ori r2, r2, 52501
       add r3, r3, r2
       or r3, r3, r2
       xor r3, r3, r2
       add r3, r3, r2
       blr

Patch by Jim Laskey!

llvm-svn: 22749
2005-08-10 16:34:52 +00:00
Duraid Madina
6325af5006 sorry!! this is temporary; for some reason the nasty constmul code seems to
be an infinite loop when using g++-4.0.1*, this kills the ia64 nightly
tester. A proper fix shall be forthcoming!!! thanks for not killing me. :)

llvm-svn: 22748
2005-08-10 12:38:57 +00:00
Chris Lattner
74acf5edc8 Fix a bug compiling: select (i32 < i32), f32, f32
llvm-svn: 22747
2005-08-10 03:40:09 +00:00
Chris Lattner
179fc33e59 Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y]
into just Y.  This often occurs when it seperates loops that have collapsed loop
headers.  This implements LoopSimplify/phi-node-simplify.ll

llvm-svn: 22746
2005-08-10 02:07:32 +00:00
Chris Lattner
4ac016991c Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with
constant stride.  This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll

llvm-svn: 22744
2005-08-10 01:12:06 +00:00
Chris Lattner
0730ac081a Fix an obvious oops
llvm-svn: 22742
2005-08-10 00:59:40 +00:00
Chris Lattner
8c7e769325 Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.
For code like this:

void foo(float *a, float *b, int n, int stride_a, int stride_b) {
  int i;
  for (i=0; i<n; i++)
      a[i*stride_a] = b[i*stride_b];
}

we now emit:

.LBB_foo2_2:    ; no_exit
        lfs f0, 0(r4)
        stfs f0, 0(r3)
        addi r7, r7, 1
        add r4, r2, r4
        add r3, r6, r3
        cmpw cr0, r7, r5
        blt .LBB_foo2_2 ; no_exit

instead of:

.LBB_foo_2:     ; no_exit
        mullw r8, r2, r7     ;; multiply!
        slwi r8, r8, 2
        lfsx f0, r4, r8
        mullw r8, r2, r6     ;; multiply!
        slwi r8, r8, 2
        stfsx f0, r3, r8
        addi r2, r2, 1
        cmpw cr0, r2, r5
        blt .LBB_foo_2  ; no_exit

loops with variable strides occur pretty often.  For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.

Now we can allow indvars to turn functions written like this:

void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
  int i, ai = 0, bi = 0;
  for (i=0; i<n; i++)
    {
      a[ai] = b[bi];
      ai += stride_a;
      bi += stride_b;
    }
}

into code like the above for better analysis.  With this patch, they generate
identical code.

llvm-svn: 22740
2005-08-10 00:45:21 +00:00
Chris Lattner
3d251b90f3 Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll
by being more careful about updating PHI nodes

llvm-svn: 22739
2005-08-10 00:35:32 +00:00
Chris Lattner
24f927cfe9 Fix some 80 column violations.
Once we compute the evolution for a GEP, tell SE about it.  This allows users
of the GEP to know it, if the users are not direct.  This allows us to compile
this testcase:

void fbSolidFillmmx(int w, unsigned char *d) {
    while (w >= 64) {
        *(unsigned long long *) (d +  0) = 0;
        *(unsigned long long *) (d +  8) = 0;
        *(unsigned long long *) (d + 16) = 0;
        *(unsigned long long *) (d + 24) = 0;
        *(unsigned long long *) (d + 32) = 0;
        *(unsigned long long *) (d + 40) = 0;
        *(unsigned long long *) (d + 48) = 0;
        *(unsigned long long *) (d + 56) = 0;
        w -= 64;
        d += 64;
    }
}

into:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r2, 0
        stw r2, 0(r4)
        stw r2, 4(r4)
        stw r2, 8(r4)
        stw r2, 12(r4)
        stw r2, 16(r4)
        stw r2, 20(r4)
        stw r2, 24(r4)
        stw r2, 28(r4)
        stw r2, 32(r4)
        stw r2, 36(r4)
        stw r2, 40(r4)
        stw r2, 44(r4)
        stw r2, 48(r4)
        stw r2, 52(r4)
        stw r2, 56(r4)
        stw r2, 60(r4)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

instead of:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r11, 0
        stw r11, 0(r4)
        stw r11, 4(r4)
        stwx r11, r10, r4
        add r12, r10, r4
        stw r11, 4(r12)
        stwx r11, r9, r4
        add r12, r9, r4
        stw r11, 4(r12)
        stwx r11, r8, r4
        add r12, r8, r4
        stw r11, 4(r12)
        stwx r11, r7, r4
        add r12, r7, r4
        stw r11, 4(r12)
        stwx r11, r6, r4
        add r12, r6, r4
        stw r11, 4(r12)
        stwx r11, r5, r4
        add r12, r5, r4
        stw r11, 4(r12)
        stwx r11, r2, r4
        add r12, r2, r4
        stw r11, 4(r12)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

llvm-svn: 22737
2005-08-09 23:39:36 +00:00
Chris Lattner
6ca08d5739 implement two helper methods
llvm-svn: 22736
2005-08-09 23:36:33 +00:00
Chris Lattner
3179a74493 Fix spelling, fix some broken canonicalizations by my last patch
llvm-svn: 22734
2005-08-09 23:09:05 +00:00
Chris Lattner
5ad0216bd6 add a optimization note
llvm-svn: 22732
2005-08-09 22:30:57 +00:00
Chris Lattner
3290ca9983 add cc nodes to the AllNodes list so they show up in Graphviz output
llvm-svn: 22731
2005-08-09 20:40:02 +00:00
Chris Lattner
1277703a48 Update the targets to the new SETCC/CondCodeSDNode interfaces.
llvm-svn: 22729
2005-08-09 20:21:10 +00:00
Chris Lattner
0fa4402b59 Eliminate the SetCCSDNode in favor of a CondCodeSDNode class. This pulls the
CC out of the SetCC operation, making SETCC a standard ternary operation and
CC's a standard DAG leaf.  This will make it possible for other node to use
CC's as operands in the future...

llvm-svn: 22728
2005-08-09 20:20:18 +00:00
Chris Lattner
1552a40112 Minor cleanup patch, no functionality changes. Written by Jim Laskey.
llvm-svn: 22727
2005-08-09 18:29:55 +00:00
Chris Lattner
b3baf30fdd Fix CodeGen/Generic/div-neg-power-2.ll, a regression from last night.
llvm-svn: 22726
2005-08-09 18:08:41 +00:00
Chris Lattner
2872f369f0 SCEVAddExpr::get() of an empty list is invalid.
llvm-svn: 22724
2005-08-09 01:13:47 +00:00
Chris Lattner
11dd32a826 Implement: LoopStrengthReduce/share_ivs.ll
Two changes:
  * Only insert one PHI node for each stride.  Other values are live in
    values.  This cannot introduce higher register pressure than the
    previous approach, and can take advantage of reg+reg addressing modes.
  * Factor common base values out of uses before moving values from the
    base to the immediate fields.  This improves codegen by starting the
    stride-specific PHI node out at a common place for each IV use.

As an example, we used to generate this for a loop in swim:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfd f0, 0(r8)
        stfd f0, 0(r3)
        lfd f0, 0(r6)
        stfd f0, 0(r7)
        lfd f0, 0(r2)
        stfd f0, 0(r5)
        addi r9, r9, 1
        addi r2, r2, 8
        addi r5, r5, 8
        addi r6, r6, 8
        addi r7, r7, 8
        addi r8, r8, 8
        addi r3, r3, 8
        cmpw cr0, r9, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

now we emit:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfdx f0, r8, r2
        stfdx f0, r9, r2
        lfdx f0, r5, r2
        stfdx f0, r7, r2
        lfdx f0, r3, r2
        stfdx f0, r6, r2
        addi r10, r10, 1
        addi r2, r2, 8
        cmpw cr0, r10, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

As another more dramatic example, we used to emit this:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfd f0, 8(r21)
        lfd f4, 8(r3)
        lfd f5, 8(r27)
        lfd f6, 8(r22)
        lfd f7, 8(r5)
        lfd f8, 8(r6)
        lfd f9, 8(r30)
        lfd f10, 8(r11)
        lfd f11, 8(r12)
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfd f0, 8(r4)
        lfd f0, 8(r25)
        lfd f5, 8(r26)
        lfd f6, 8(r23)
        lfd f9, 8(r28)
        lfd f10, 8(r10)
        lfd f12, 8(r9)
        lfd f13, 8(r29)
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfd f0, 8(r24)
        lfd f0, 8(r8)
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfd f0, 8(r2)
        addi r20, r20, 1
        addi r2, r2, 8
        addi r8, r8, 8
        addi r10, r10, 8
        addi r12, r12, 8
        addi r6, r6, 8
        addi r29, r29, 8
        addi r28, r28, 8
        addi r26, r26, 8
        addi r25, r25, 8
        addi r24, r24, 8
        addi r5, r5, 8
        addi r23, r23, 8
        addi r22, r22, 8
        addi r3, r3, 8
        addi r9, r9, 8
        addi r11, r11, 8
        addi r30, r30, 8
        addi r27, r27, 8
        addi r21, r21, 8
        addi r4, r4, 8
        cmpw cr0, r20, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

we now emit:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfdx f0, r21, r20
        lfdx f4, r3, r20
        lfdx f5, r27, r20
        lfdx f6, r22, r20
        lfdx f7, r5, r20
        lfdx f8, r6, r20
        lfdx f9, r30, r20
        lfdx f10, r11, r20
        lfdx f11, r12, r20
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfdx f0, r4, r20
        lfdx f0, r25, r20
        lfdx f5, r26, r20
        lfdx f6, r23, r20
        lfdx f9, r28, r20
        lfdx f10, r10, r20
        lfdx f12, r9, r20
        lfdx f13, r29, r20
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfdx f0, r24, r20
        lfdx f0, r8, r20
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfdx f0, r2, r20
        addi r19, r19, 1
        addi r20, r20, 8
        cmpw cr0, r19, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

llvm-svn: 22722
2005-08-09 00:18:09 +00:00
Chris Lattner
ea50bf5aca Suck the base value out of the UsersToProcess vector into the BasedUser
class to simplify the code.  Fuse two loops.

llvm-svn: 22721
2005-08-08 22:56:21 +00:00
Chris Lattner
b9d13099a7 Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The
first is a correctness thing, and the later is an optzn thing.  This also
is needed to support a future change.

llvm-svn: 22720
2005-08-08 22:32:34 +00:00
Nate Begeman
6b842b4883 Factor out some common code, and be smarter about when to emit load hi/lo
code sequences.

llvm-svn: 22719
2005-08-08 22:22:56 +00:00
Chris Lattner
f1d821b665 Allow tools with "consume after" options (like lli) to take more positional
opts than they take directly.  Thanks to John C for pointing this problem
out to me!

llvm-svn: 22717
2005-08-08 21:57:27 +00:00
Chris Lattner
afd68f8f76 Remove getImmediateForOpcode, which is now dead.
Patch by Jim Laskey.

llvm-svn: 22716
2005-08-08 21:34:13 +00:00
Chris Lattner
f0eb0b2af5 Add new immediate handling support for mul/div.
Patch by Jim Laskey!

llvm-svn: 22715
2005-08-08 21:33:23 +00:00
Chris Lattner
8efdc3c8d4 Add support for OR/XOR/SUB immediates that are handled with the new immediate
way.  This allows ORI/ORIS pairs, for example.

llvm-svn: 22714
2005-08-08 21:30:29 +00:00
Chris Lattner
051d45ce3c Modify the ISD::AND opcode case to use new immediate constant predicates.
Includes wider support for rotate and mask cases.

Patch by Jim Laskey.

I've requested that Jim add new regression tests the newly handled cases.

llvm-svn: 22712
2005-08-08 21:24:57 +00:00
Chris Lattner
3b23144fc0 Modify the ISD::ADD opcode case to use new immediate constant predicates.
Includes support for 32-bit constants using addi/addis.

Patch by Jim Laskey.

llvm-svn: 22711
2005-08-08 21:21:03 +00:00
Chris Lattner
69eed9f8a7 Modify existing support functions to use new immediate constant predicates.
Patch by Jim Laskey

llvm-svn: 22710
2005-08-08 21:12:35 +00:00
Chris Lattner
fab821d774 Add support predicates for future immediate constant changes.
Patch by Jim Laskey

llvm-svn: 22709
2005-08-08 21:10:27 +00:00
Chris Lattner
f6320ae69a Move IsRunOfOnes to a more logical place and rename to a proper predicate form
(lowercase isXXX).

Patch by Jim Laskey.

llvm-svn: 22708
2005-08-08 21:08:09 +00:00
Nate Begeman
f2d22dbd9b Fix JIT encoding of ppc mfocrf instruction; the operands were reversed
llvm-svn: 22707
2005-08-08 20:04:52 +00:00
Chris Lattner
c6571e5c64 Use the new 'moveBefore' method to simplify some code. Really, which is
easier to understand?  :)

llvm-svn: 22706
2005-08-08 19:11:57 +00:00
Chris Lattner
e30b898fec Reject command lines that have too many positional arguments passed (e.g.,
'opt x y').  This fixes PR493.

Patch contributed by Owen Anderson!

llvm-svn: 22705
2005-08-08 17:25:38 +00:00
Chris Lattner
f6e6e25039 Not all constants are legal immediates in load/store instructions.
llvm-svn: 22704
2005-08-08 06:25:50 +00:00
Chris Lattner
ab45a77fed Implement LoopStrengthReduce/share_code_in_preheader.ll by having one
rewriter for all code inserted into the preheader, which is never flushed.

llvm-svn: 22702
2005-08-08 05:47:49 +00:00
Chris Lattner
dd97325bc0 Implement a simple optimization for the termination condition of the loop.
The termination condition actually wants to use the post-incremented value
of the loop, not a new indvar with an unusual base.

On PPC, for example, this allows us to compile
LoopStrengthReduce/exit_compare_live_range.ll to:

_foo:
        li r2, 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        cmpw cr0, r2, r4
        bne .LBB_foo_1  ; no_exit
        blr

instead of:

_foo:
        li r2, 1                ;; IV starts at 1, not 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r5, r2, 1
        cmpw cr0, r2, r4
        or r2, r5, r5           ;; Reg-reg copy, extra live range
        bne .LBB_foo_1  ; no_exit
        blr

This implements LoopStrengthReduce/exit_compare_live_range.ll

llvm-svn: 22699
2005-08-08 05:28:22 +00:00
Chris Lattner
e698d06904 add new helper function
llvm-svn: 22698
2005-08-08 05:21:50 +00:00
Chris Lattner
e7f14fb39d Handle 64-bit constant exprs on 64-bit targets.
llvm-svn: 22696
2005-08-08 04:26:32 +00:00
Chris Lattner
a539b03210 All stats are "Number of ..."
llvm-svn: 22694
2005-08-07 20:02:04 +00:00
Chris Lattner
bbab417e32 Add some simple folds that occur in bitfield cases. Fix a minor bug in
isHighOnes, where it would consider 0 to have high ones.

llvm-svn: 22693
2005-08-07 07:03:10 +00:00
Chris Lattner
427319ff4b Fix typoCVS: ----------------------------------------------------------------------
llvm-svn: 22692
2005-08-07 07:00:52 +00:00
Chris Lattner
fdb467b18d add a small simplification that can be exposed after promotion/expansion
llvm-svn: 22691
2005-08-07 05:00:44 +00:00
Chris Lattner
5b499da0a7 * Use the new PHINode::hasConstantValue method to simplify some code
* Teach this code to move allocas out of the loop when tail call eliminating
  a call marked 'tail'.  This implements TailCallElim/move_alloca_for_tail_call.ll
* Do not perform this transformation if a call is marked 'tail' and if there
  are allocas that we cannot move out of the loop in #2.  Doing so would increase
  the stack usage of the function.  This implements fixes
  PR615 and TailCallElim/dont-tce-tail-marked-call.ll.

llvm-svn: 22690
2005-08-07 04:27:41 +00:00
Chris Lattner
93820f4ad7 Consolidate the GPOpt stuff to all use the Subtarget, instead of still
depending on the command line option.  Now the command line option just
sets the subtarget as appropriate.  G5 opts will now default to on on
G5-enabled nightly testers among other machines.

llvm-svn: 22688
2005-08-05 22:05:03 +00:00
Chris Lattner
07af090121 adjust to change in getSubtarget() api
llvm-svn: 22687
2005-08-05 21:54:27 +00:00
Chris Lattner
79095d3416 Enable gp optimizations by default when available, even when a target triple
is available, since the target triple doesn't specify whether to use gpopts
or not.

llvm-svn: 22685
2005-08-05 21:25:13 +00:00
Chris Lattner
d82395fc04 add a note
llvm-svn: 22681
2005-08-05 19:18:32 +00:00
Chris Lattner
d3a8084e5b Change FindEarliestCallSeqEnd (used by libcall insertion) to use a set to
avoid revisiting nodes more than once.  This eliminates a source of
potentially exponential behavior.  For a small function in 191.fma3d
(hexah_stress_divergence_), this speeds up isel from taking > 20mins to
taking 0.07s.

llvm-svn: 22680
2005-08-05 18:10:27 +00:00
Chris Lattner
c7a67abac2 Fix a use-of-dangling-pointer bug, from the introduction of SrcValue's.
llvm-svn: 22679
2005-08-05 16:55:31 +00:00
Chris Lattner
644edfb51e Fix a latent bug in the libcall inserter that was exposed by Nate's patch
yesterday.  This fixes whetstone and a bunch of programs in the External tests.

llvm-svn: 22678
2005-08-05 16:23:57 +00:00
Chris Lattner
852038daec don't crash when running the PPC backend on non-ppc hosts without specifying
a subtarget.

llvm-svn: 22677
2005-08-05 16:17:22 +00:00
Chris Lattner
fbcacf822a PHINode::hasConstantValue should never return the PHI itself, even if the
PHI is its only operand.

llvm-svn: 22676
2005-08-05 15:37:31 +00:00
Chris Lattner
b80ff334d8 Fix an iterator invalidation problem when we decide a phi has a constant value
llvm-svn: 22675
2005-08-05 15:34:10 +00:00