1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
Commit Graph

5180 Commits

Author SHA1 Message Date
Chris Lattner
44ea12c5f8 Implement an important entry from README_ALTIVEC:
If an altivec predicate compare is used immediately by a branch, don't
use a (serializing) MFCR instruction to read the CR6 register, which requires
a compare to get it back to CR's.  Instead, just branch on CR6 directly. :)

For example, for:
void foo2(vector float *A, vector float *B) {
  if (!vec_any_eq(*A, *B))
    *B = (vector float){0,0,0,0};
}

We now generate:

_foo2:
        mfspr r2, 256
        oris r5, r2, 12288
        mtspr 256, r5
        lvx v2, 0, r4
        lvx v3, 0, r3
        vcmpeqfp. v2, v3, v2
        bne cr6, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
        vxor v2, v2, v2
        stvx v2, 0, r4
        mtspr 256, r2
        blr
LBB1_2: ; UnifiedReturnBlock
        mtspr 256, r2
        blr

instead of:

_foo2:
        mfspr r2, 256
        oris r5, r2, 12288
        mtspr 256, r5
        lvx v2, 0, r4
        lvx v3, 0, r3
        vcmpeqfp. v2, v3, v2
        mfcr r3, 2
        rlwinm r3, r3, 27, 31, 31
        cmpwi cr0, r3, 0
        beq cr0, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
        vxor v2, v2, v2
        stvx v2, 0, r4
        mtspr 256, r2
        blr
LBB1_2: ; UnifiedReturnBlock
        mtspr 256, r2
        blr

This implements CodeGen/PowerPC/vec_br_cmp.ll.

llvm-svn: 27804
2006-04-18 17:59:36 +00:00
Chris Lattner
519001b0ee move some stuff around, clean things up
llvm-svn: 27802
2006-04-18 17:52:36 +00:00
Chris Lattner
3e2a664ada Teach the codegen about instructions used for SSE spill code, allowing it
to optimize cases where it has to spill a lot

llvm-svn: 27801
2006-04-18 16:44:51 +00:00
Chris Lattner
e90fdf3b98 Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing
even/odd halves.  Thanks to Nate telling me what's what.

llvm-svn: 27793
2006-04-18 04:28:57 +00:00
Chris Lattner
5951b60cb4 Implement v16i8 multiply with this code:
vmuloub v5, v3, v2
        vmuleub v2, v3, v2
        vperm v2, v2, v5, v4

This implements CodeGen/PowerPC/vec_mul.ll.  With this, v16i8 multiplies are
6.79x faster than before.

Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.

Remove the 'integer multiplies' todo from the README file.

llvm-svn: 27792
2006-04-18 03:57:35 +00:00
Evan Cheng
6be2e4b419 Correct comments
llvm-svn: 27790
2006-04-18 03:45:01 +00:00
Chris Lattner
4d84b56e64 Lower v8i16 multiply into this code:
li r5, lo16(LCPI1_0)
        lis r6, ha16(LCPI1_0)
        lvx v4, r6, r5
        vmulouh v5, v3, v2
        vmuleuh v2, v3, v2
        vperm v2, v2, v5, v4

where v4 is:
LCPI1_0:                                        ;  <16 x ubyte>
        .byte   2
        .byte   3
        .byte   18
        .byte   19
        .byte   6
        .byte   7
        .byte   22
        .byte   23
        .byte   10
        .byte   11
        .byte   26
        .byte   27
        .byte   14
        .byte   15
        .byte   30
        .byte   31

This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.

llvm-svn: 27789
2006-04-18 03:43:48 +00:00
Chris Lattner
613d7fda64 Custom lower v4i32 multiplies into a cute sequence, instead of having legalize
scalarize the sequence into 4 mullw's and a bunch of load/store traffic.

This speeds up v4i32 multiplies 4.1x (measured) on a G5.  This implements
PowerPC/vec_mul.ll

llvm-svn: 27788
2006-04-18 03:24:30 +00:00
Evan Cheng
13a5022494 Another entry
llvm-svn: 27786
2006-04-18 01:22:57 +00:00
Evan Cheng
2f9011cd87 Another entry.
llvm-svn: 27784
2006-04-18 00:21:01 +00:00
Evan Cheng
98b1ca65dd Use movss to insert_vector_elt(v, s, 0).
llvm-svn: 27782
2006-04-17 22:45:49 +00:00
Evan Cheng
ecf13c5d79 Use two pinsrw to insert an element into v4i32 / v4f32 vector.
llvm-svn: 27779
2006-04-17 22:04:06 +00:00
Chris Lattner
81938fa3db remove done item
llvm-svn: 27778
2006-04-17 21:52:03 +00:00
Chris Lattner
fdecddb741 Don't diddle VRSAVE if no registers need to be added/removed from it. This
allows us to codegen functions as:

_test_rol:
        vspltisw v2, -12
        vrlw v2, v2, v2
        blr

instead of:

_test_rol:
        mfvrsave r2, 256
        mr r3, r2
        mtvrsave r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtvrsave r2
        blr

Testcase here: CodeGen/PowerPC/vec_vrsave.ll

llvm-svn: 27777
2006-04-17 21:48:13 +00:00
Evan Cheng
833ce43152 Encoding bug
llvm-svn: 27773
2006-04-17 21:33:57 +00:00
Chris Lattner
021f521a41 Vectors that are known live-in and live-out are clearly already marked in
the vrsave register for the caller.  This allows us to codegen a function as:

_test_rol:
        mfspr r2, 256
        mr r3, r2
        mtspr 256, r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtspr 256, r2
        blr

instead of:

_test_rol:
        mfspr r2, 256
        oris r3, r2, 40960
        mtspr 256, r3
        vspltisw v0, -12
        vrlw v2, v0, v0
        mtspr 256, r2
        blr

llvm-svn: 27772
2006-04-17 21:22:06 +00:00
Chris Lattner
a717d4f53b Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this:
vspltisw v2, -12
        vrlw v2, v2, v2

instead of:

        vspltisw v0, -12
        vrlw v2, v0, v0

when a function is returning a value.

llvm-svn: 27771
2006-04-17 21:19:12 +00:00
Chris Lattner
6b76deffb5 Move some knowledge about registers out of the code emitter into the register info.
llvm-svn: 27770
2006-04-17 21:07:20 +00:00
Chris Lattner
face261a94 Use a small table instead of macros to do this conversion.
llvm-svn: 27769
2006-04-17 20:59:25 +00:00
Evan Cheng
4de1805c84 Implement v8i16, v16i8 splat using unpckl + pshufd.
llvm-svn: 27768
2006-04-17 20:43:08 +00:00
Chris Lattner
e1d38ad84b implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll
llvm-svn: 27767
2006-04-17 20:32:50 +00:00
Chris Lattner
f2347c31b4 Make sure to check splats of every constant we can, handle splat(31) by
being a bit more clever, add support for odd splats from -31 to -17.

llvm-svn: 27764
2006-04-17 18:09:22 +00:00
Evan Cheng
5728f30f7c Incorrect foldMemoryOperand entries
llvm-svn: 27763
2006-04-17 18:06:12 +00:00
Evan Cheng
3d26db8148 Errors in patterns preventing load folding
llvm-svn: 27762
2006-04-17 18:05:01 +00:00
Jeff Cohen
4cacdf3a2b Add checks for __OpenBSD__.
llvm-svn: 27761
2006-04-17 17:55:41 +00:00
Chris Lattner
cc4222d95b Teach the ppc backend to use rol and vsldoi to generate splatted constants.
This implements vec_constants.ll:test_vsldoi and test_rol

llvm-svn: 27760
2006-04-17 17:55:10 +00:00
Chris Lattner
7d66e5a118 add a note
llvm-svn: 27758
2006-04-17 17:29:41 +00:00
Evan Cheng
eb739d0355 FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly
llvm-svn: 27755
2006-04-17 07:24:10 +00:00
Chris Lattner
2d8d6c9feb Make some code more general, adding support for constant formation of several
new patterns.

llvm-svn: 27754
2006-04-17 06:58:41 +00:00
Chris Lattner
9dd4ebffca Learn how to make odd splatted constants in range [17,29]. This implements
PowerPC/vec_constants.ll:test_29.

llvm-svn: 27752
2006-04-17 06:07:44 +00:00
Chris Lattner
72a67a5b1f Pull some code out into a helper function.
Effeciently codegen even splats in the range [-32,30].

This allows us to codegen <30,30,30,30> as:

        vspltisw v0, 15
        vadduwm v2, v0, v0

instead of as a cp load.

llvm-svn: 27750
2006-04-17 06:00:21 +00:00
Chris Lattner
5367a73dec Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle,
if it can be implemented in 3 or fewer discrete altivec instructions, codegen
it as such.  This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll

llvm-svn: 27748
2006-04-17 05:28:54 +00:00
Chris Lattner
34ec6432f6 Regenerate with adjusted costs
llvm-svn: 27746
2006-04-17 05:26:20 +00:00
Chris Lattner
36ceea9e96 Regenerate with correct offset
llvm-svn: 27744
2006-04-17 05:08:46 +00:00
Chris Lattner
671f50cf33 Increase the opcodes by one each to disambiguate COPY from VMRGHW.
llvm-svn: 27742
2006-04-17 00:47:48 +00:00
Chris Lattner
99ee809cb6 Check in a table, generated by llvm-PerfectShuffle, of optimal shuffles
of various 4-element vectors.

llvm-svn: 27739
2006-04-17 00:37:02 +00:00
Evan Cheng
68b2e5b4b0 movduprm, movshduprm bugs
llvm-svn: 27734
2006-04-16 18:11:28 +00:00
Evan Cheng
26d917789c Encoding bugs
llvm-svn: 27733
2006-04-16 07:02:22 +00:00
Evan Cheng
b2e3339cb2 Can't fold loads into alias vector SSE ops used for scalar operation. The load
address has to be 16-byte aligned but the values aren't spilled to 128-bit
locations.

llvm-svn: 27732
2006-04-16 06:58:19 +00:00
Chris Lattner
d86516991a Implement a TODO: have the legalizer canonicalize a bunch of operations to
one type (v4i32) so that we don't have to write patterns for each type, and
so that more CSE opportunities are exposed.

llvm-svn: 27731
2006-04-16 01:37:57 +00:00
Chris Lattner
f4126f0db7 Make the BUILD_VECTOR lowering code much more aggressive w.r.t constant vectors.
Remove some done items from the todo list.

llvm-svn: 27729
2006-04-16 01:01:29 +00:00
Chris Lattner
44245f11c3 Fix a crash when faced with a shuffle vector that has an undef in its mask.
llvm-svn: 27726
2006-04-15 23:48:05 +00:00
Chris Lattner
2ede0fef98 Add patterns for matching vnots with bit converted inputs. Most of these will
go away when I start using evan's binop type canonicalizer

llvm-svn: 27725
2006-04-15 23:45:24 +00:00
Chris Lattner
254683a3df Add a new vnot_conv predicate for matching vnot's where the allones vector is
bitconverted from some other type.

llvm-svn: 27724
2006-04-15 23:39:14 +00:00
Evan Cheng
9f33b2abc5 More encoding bugs
llvm-svn: 27722
2006-04-15 06:10:09 +00:00
Evan Cheng
87e0cd1569 pslldrm, psrawrm, etc. encoding bug
llvm-svn: 27721
2006-04-15 05:59:08 +00:00
Evan Cheng
4487cf8125 hsubp{s|d} encoding bug
llvm-svn: 27720
2006-04-15 05:52:42 +00:00
Evan Cheng
32e5d4f6bc Silly bug
llvm-svn: 27719
2006-04-15 05:37:34 +00:00
Evan Cheng
f9a93a1d3f Do not use movs{h|l}dup for a shuffle with a single non-undef node.
llvm-svn: 27718
2006-04-15 03:13:24 +00:00
Evan Cheng
300456c7f2 Added SSE (and other) entries to foldMemoryOperand().
llvm-svn: 27716
2006-04-14 23:33:27 +00:00