llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00

Author	SHA1	Message	Date
Chris Lattner	44ea12c5f8	Implement an important entry from README_ALTIVEC: If an altivec predicate compare is used immediately by a branch, don't use a (serializing) MFCR instruction to read the CR6 register, which requires a compare to get it back to CR's. Instead, just branch on CR6 directly. :) For example, for: void foo2(vector float A, vector float B) { if (!vec_any_eq(A, B)) *B = (vector float){0,0,0,0}; } We now generate: _foo2: mfspr r2, 256 oris r5, r2, 12288 mtspr 256, r5 lvx v2, 0, r4 lvx v3, 0, r3 vcmpeqfp. v2, v3, v2 bne cr6, LBB1_2 ; UnifiedReturnBlock LBB1_1: ; cond_true vxor v2, v2, v2 stvx v2, 0, r4 mtspr 256, r2 blr LBB1_2: ; UnifiedReturnBlock mtspr 256, r2 blr instead of: _foo2: mfspr r2, 256 oris r5, r2, 12288 mtspr 256, r5 lvx v2, 0, r4 lvx v3, 0, r3 vcmpeqfp. v2, v3, v2 mfcr r3, 2 rlwinm r3, r3, 27, 31, 31 cmpwi cr0, r3, 0 beq cr0, LBB1_2 ; UnifiedReturnBlock LBB1_1: ; cond_true vxor v2, v2, v2 stvx v2, 0, r4 mtspr 256, r2 blr LBB1_2: ; UnifiedReturnBlock mtspr 256, r2 blr This implements CodeGen/PowerPC/vec_br_cmp.ll. llvm-svn: 27804	2006-04-18 17:59:36 +00:00
Chris Lattner	519001b0ee	move some stuff around, clean things up llvm-svn: 27802	2006-04-18 17:52:36 +00:00
Chris Lattner	3e2a664ada	Teach the codegen about instructions used for SSE spill code, allowing it to optimize cases where it has to spill a lot llvm-svn: 27801	2006-04-18 16:44:51 +00:00
Chris Lattner	e90fdf3b98	Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing even/odd halves. Thanks to Nate telling me what's what. llvm-svn: 27793	2006-04-18 04:28:57 +00:00
Chris Lattner	5951b60cb4	Implement v16i8 multiply with this code: vmuloub v5, v3, v2 vmuleub v2, v3, v2 vperm v2, v2, v5, v4 This implements CodeGen/PowerPC/vec_mul.ll. With this, v16i8 multiplies are 6.79x faster than before. Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with GCC. Remove the 'integer multiplies' todo from the README file. llvm-svn: 27792	2006-04-18 03:57:35 +00:00
Evan Cheng	6be2e4b419	Correct comments llvm-svn: 27790	2006-04-18 03:45:01 +00:00
Chris Lattner	4d84b56e64	Lower v8i16 multiply into this code: li r5, lo16(LCPI1_0) lis r6, ha16(LCPI1_0) lvx v4, r6, r5 vmulouh v5, v3, v2 vmuleuh v2, v3, v2 vperm v2, v2, v5, v4 where v4 is: LCPI1_0: ; <16 x ubyte> .byte 2 .byte 3 .byte 18 .byte 19 .byte 6 .byte 7 .byte 22 .byte 23 .byte 10 .byte 11 .byte 26 .byte 27 .byte 14 .byte 15 .byte 30 .byte 31 This is 5.07x faster on the G5 (measured) than lowering to scalar code + loads/stores. llvm-svn: 27789	2006-04-18 03:43:48 +00:00
Chris Lattner	613d7fda64	Custom lower v4i32 multiplies into a cute sequence, instead of having legalize scalarize the sequence into 4 mullw's and a bunch of load/store traffic. This speeds up v4i32 multiplies 4.1x (measured) on a G5. This implements PowerPC/vec_mul.ll llvm-svn: 27788	2006-04-18 03:24:30 +00:00
Evan Cheng	13a5022494	Another entry llvm-svn: 27786	2006-04-18 01:22:57 +00:00
Evan Cheng	2f9011cd87	Another entry. llvm-svn: 27784	2006-04-18 00:21:01 +00:00
Evan Cheng	98b1ca65dd	Use movss to insert_vector_elt(v, s, 0). llvm-svn: 27782	2006-04-17 22:45:49 +00:00
Evan Cheng	ecf13c5d79	Use two pinsrw to insert an element into v4i32 / v4f32 vector. llvm-svn: 27779	2006-04-17 22:04:06 +00:00
Chris Lattner	81938fa3db	remove done item llvm-svn: 27778	2006-04-17 21:52:03 +00:00
Chris Lattner	fdecddb741	Don't diddle VRSAVE if no registers need to be added/removed from it. This allows us to codegen functions as: _test_rol: vspltisw v2, -12 vrlw v2, v2, v2 blr instead of: _test_rol: mfvrsave r2, 256 mr r3, r2 mtvrsave r3 vspltisw v2, -12 vrlw v2, v2, v2 mtvrsave r2 blr Testcase here: CodeGen/PowerPC/vec_vrsave.ll llvm-svn: 27777	2006-04-17 21:48:13 +00:00
Evan Cheng	833ce43152	Encoding bug llvm-svn: 27773	2006-04-17 21:33:57 +00:00
Chris Lattner	021f521a41	Vectors that are known live-in and live-out are clearly already marked in the vrsave register for the caller. This allows us to codegen a function as: _test_rol: mfspr r2, 256 mr r3, r2 mtspr 256, r3 vspltisw v2, -12 vrlw v2, v2, v2 mtspr 256, r2 blr instead of: _test_rol: mfspr r2, 256 oris r3, r2, 40960 mtspr 256, r3 vspltisw v0, -12 vrlw v2, v0, v0 mtspr 256, r2 blr llvm-svn: 27772	2006-04-17 21:22:06 +00:00
Chris Lattner	a717d4f53b	Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this: vspltisw v2, -12 vrlw v2, v2, v2 instead of: vspltisw v0, -12 vrlw v2, v0, v0 when a function is returning a value. llvm-svn: 27771	2006-04-17 21:19:12 +00:00
Chris Lattner	6b76deffb5	Move some knowledge about registers out of the code emitter into the register info. llvm-svn: 27770	2006-04-17 21:07:20 +00:00
Chris Lattner	face261a94	Use a small table instead of macros to do this conversion. llvm-svn: 27769	2006-04-17 20:59:25 +00:00
Evan Cheng	4de1805c84	Implement v8i16, v16i8 splat using unpckl + pshufd. llvm-svn: 27768	2006-04-17 20:43:08 +00:00
Chris Lattner	e1d38ad84b	implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll llvm-svn: 27767	2006-04-17 20:32:50 +00:00
Chris Lattner	f2347c31b4	Make sure to check splats of every constant we can, handle splat(31) by being a bit more clever, add support for odd splats from -31 to -17. llvm-svn: 27764	2006-04-17 18:09:22 +00:00
Evan Cheng	5728f30f7c	Incorrect foldMemoryOperand entries llvm-svn: 27763	2006-04-17 18:06:12 +00:00
Evan Cheng	3d26db8148	Errors in patterns preventing load folding llvm-svn: 27762	2006-04-17 18:05:01 +00:00
Jeff Cohen	4cacdf3a2b	Add checks for __OpenBSD__. llvm-svn: 27761	2006-04-17 17:55:41 +00:00
Chris Lattner	cc4222d95b	Teach the ppc backend to use rol and vsldoi to generate splatted constants. This implements vec_constants.ll:test_vsldoi and test_rol llvm-svn: 27760	2006-04-17 17:55:10 +00:00
Chris Lattner	7d66e5a118	add a note llvm-svn: 27758	2006-04-17 17:29:41 +00:00
Evan Cheng	eb739d0355	FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly llvm-svn: 27755	2006-04-17 07:24:10 +00:00
Chris Lattner	2d8d6c9feb	Make some code more general, adding support for constant formation of several new patterns. llvm-svn: 27754	2006-04-17 06:58:41 +00:00
Chris Lattner	9dd4ebffca	Learn how to make odd splatted constants in range [17,29]. This implements PowerPC/vec_constants.ll:test_29. llvm-svn: 27752	2006-04-17 06:07:44 +00:00
Chris Lattner	72a67a5b1f	Pull some code out into a helper function. Effeciently codegen even splats in the range [-32,30]. This allows us to codegen <30,30,30,30> as: vspltisw v0, 15 vadduwm v2, v0, v0 instead of as a cp load. llvm-svn: 27750	2006-04-17 06:00:21 +00:00
Chris Lattner	5367a73dec	Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle, if it can be implemented in 3 or fewer discrete altivec instructions, codegen it as such. This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll llvm-svn: 27748	2006-04-17 05:28:54 +00:00
Chris Lattner	34ec6432f6	Regenerate with adjusted costs llvm-svn: 27746	2006-04-17 05:26:20 +00:00
Chris Lattner	36ceea9e96	Regenerate with correct offset llvm-svn: 27744	2006-04-17 05:08:46 +00:00
Chris Lattner	671f50cf33	Increase the opcodes by one each to disambiguate COPY from VMRGHW. llvm-svn: 27742	2006-04-17 00:47:48 +00:00
Chris Lattner	99ee809cb6	Check in a table, generated by llvm-PerfectShuffle, of optimal shuffles of various 4-element vectors. llvm-svn: 27739	2006-04-17 00:37:02 +00:00
Evan Cheng	68b2e5b4b0	movduprm, movshduprm bugs llvm-svn: 27734	2006-04-16 18:11:28 +00:00
Evan Cheng	26d917789c	Encoding bugs llvm-svn: 27733	2006-04-16 07:02:22 +00:00
Evan Cheng	b2e3339cb2	Can't fold loads into alias vector SSE ops used for scalar operation. The load address has to be 16-byte aligned but the values aren't spilled to 128-bit locations. llvm-svn: 27732	2006-04-16 06:58:19 +00:00
Chris Lattner	d86516991a	Implement a TODO: have the legalizer canonicalize a bunch of operations to one type (v4i32) so that we don't have to write patterns for each type, and so that more CSE opportunities are exposed. llvm-svn: 27731	2006-04-16 01:37:57 +00:00
Chris Lattner	f4126f0db7	Make the BUILD_VECTOR lowering code much more aggressive w.r.t constant vectors. Remove some done items from the todo list. llvm-svn: 27729	2006-04-16 01:01:29 +00:00
Chris Lattner	44245f11c3	Fix a crash when faced with a shuffle vector that has an undef in its mask. llvm-svn: 27726	2006-04-15 23:48:05 +00:00
Chris Lattner	2ede0fef98	Add patterns for matching vnots with bit converted inputs. Most of these will go away when I start using evan's binop type canonicalizer llvm-svn: 27725	2006-04-15 23:45:24 +00:00
Chris Lattner	254683a3df	Add a new vnot_conv predicate for matching vnot's where the allones vector is bitconverted from some other type. llvm-svn: 27724	2006-04-15 23:39:14 +00:00
Evan Cheng	9f33b2abc5	More encoding bugs llvm-svn: 27722	2006-04-15 06:10:09 +00:00
Evan Cheng	87e0cd1569	pslldrm, psrawrm, etc. encoding bug llvm-svn: 27721	2006-04-15 05:59:08 +00:00
Evan Cheng	4487cf8125	hsubp{s\|d} encoding bug llvm-svn: 27720	2006-04-15 05:52:42 +00:00
Evan Cheng	32e5d4f6bc	Silly bug llvm-svn: 27719	2006-04-15 05:37:34 +00:00
Evan Cheng	f9a93a1d3f	Do not use movs{h\|l}dup for a shuffle with a single non-undef node. llvm-svn: 27718	2006-04-15 03:13:24 +00:00
Evan Cheng	300456c7f2	Added SSE (and other) entries to foldMemoryOperand(). llvm-svn: 27716	2006-04-14 23:33:27 +00:00

1 2 3 4 5 ...

5180 Commits