llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-29 23:12:55 +01:00

Author	SHA1	Message	Date
Chris Lattner	5951b60cb4	Implement v16i8 multiply with this code: vmuloub v5, v3, v2 vmuleub v2, v3, v2 vperm v2, v2, v5, v4 This implements CodeGen/PowerPC/vec_mul.ll. With this, v16i8 multiplies are 6.79x faster than before. Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with GCC. Remove the 'integer multiplies' todo from the README file. llvm-svn: 27792	2006-04-18 03:57:35 +00:00
Chris Lattner	fd1fb64f66	Add tests for v8i16 and v16i8 llvm-svn: 27791	2006-04-18 03:54:50 +00:00
Evan Cheng	6be2e4b419	Correct comments llvm-svn: 27790	2006-04-18 03:45:01 +00:00
Chris Lattner	4d84b56e64	Lower v8i16 multiply into this code: li r5, lo16(LCPI1_0) lis r6, ha16(LCPI1_0) lvx v4, r6, r5 vmulouh v5, v3, v2 vmuleuh v2, v3, v2 vperm v2, v2, v5, v4 where v4 is: LCPI1_0: ; <16 x ubyte> .byte 2 .byte 3 .byte 18 .byte 19 .byte 6 .byte 7 .byte 22 .byte 23 .byte 10 .byte 11 .byte 26 .byte 27 .byte 14 .byte 15 .byte 30 .byte 31 This is 5.07x faster on the G5 (measured) than lowering to scalar code + loads/stores. llvm-svn: 27789	2006-04-18 03:43:48 +00:00
Chris Lattner	613d7fda64	Custom lower v4i32 multiplies into a cute sequence, instead of having legalize scalarize the sequence into 4 mullw's and a bunch of load/store traffic. This speeds up v4i32 multiplies 4.1x (measured) on a G5. This implements PowerPC/vec_mul.ll llvm-svn: 27788	2006-04-18 03:24:30 +00:00
Chris Lattner	ab9ad1cdc1	new testcase llvm-svn: 27787	2006-04-18 03:22:16 +00:00
Evan Cheng	13a5022494	Another entry llvm-svn: 27786	2006-04-18 01:22:57 +00:00
Chris Lattner	d664951c95	Fix a build failure on Vladimir's tester. llvm-svn: 27785	2006-04-18 00:21:25 +00:00
Evan Cheng	2f9011cd87	Another entry. llvm-svn: 27784	2006-04-18 00:21:01 +00:00
Evan Cheng	98b1ca65dd	Use movss to insert_vector_elt(v, s, 0). llvm-svn: 27782	2006-04-17 22:45:49 +00:00
Chris Lattner	76beeb373d	Turn x86 unaligned load/store intrinsics into aligned load/store instructions if the pointer is known aligned. llvm-svn: 27781	2006-04-17 22:26:56 +00:00
Chris Lattner	8c8f1e35c9	Fix handling of calls in functions that use vectors. This fixes a crash on the code in GCC PR26546. llvm-svn: 27780	2006-04-17 22:10:08 +00:00
Evan Cheng	ecf13c5d79	Use two pinsrw to insert an element into v4i32 / v4f32 vector. llvm-svn: 27779	2006-04-17 22:04:06 +00:00
Chris Lattner	81938fa3db	remove done item llvm-svn: 27778	2006-04-17 21:52:03 +00:00
Chris Lattner	fdecddb741	Don't diddle VRSAVE if no registers need to be added/removed from it. This allows us to codegen functions as: _test_rol: vspltisw v2, -12 vrlw v2, v2, v2 blr instead of: _test_rol: mfvrsave r2, 256 mr r3, r2 mtvrsave r3 vspltisw v2, -12 vrlw v2, v2, v2 mtvrsave r2 blr Testcase here: CodeGen/PowerPC/vec_vrsave.ll llvm-svn: 27777	2006-04-17 21:48:13 +00:00
Chris Lattner	c22e565b1b	New testcase, shouldn't touch vrsave llvm-svn: 27776	2006-04-17 21:48:03 +00:00
Chris Lattner	5f5a9cc8ea	Add a MachineInstr::eraseFromParent convenience method. llvm-svn: 27775	2006-04-17 21:35:41 +00:00
Chris Lattner	bcd61bbadb	Add some convenience methods. llvm-svn: 27774	2006-04-17 21:35:08 +00:00
Evan Cheng	833ce43152	Encoding bug llvm-svn: 27773	2006-04-17 21:33:57 +00:00
Chris Lattner	021f521a41	Vectors that are known live-in and live-out are clearly already marked in the vrsave register for the caller. This allows us to codegen a function as: _test_rol: mfspr r2, 256 mr r3, r2 mtspr 256, r3 vspltisw v2, -12 vrlw v2, v2, v2 mtspr 256, r2 blr instead of: _test_rol: mfspr r2, 256 oris r3, r2, 40960 mtspr 256, r3 vspltisw v0, -12 vrlw v2, v0, v0 mtspr 256, r2 blr llvm-svn: 27772	2006-04-17 21:22:06 +00:00
Chris Lattner	a717d4f53b	Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this: vspltisw v2, -12 vrlw v2, v2, v2 instead of: vspltisw v0, -12 vrlw v2, v0, v0 when a function is returning a value. llvm-svn: 27771	2006-04-17 21:19:12 +00:00
Chris Lattner	6b76deffb5	Move some knowledge about registers out of the code emitter into the register info. llvm-svn: 27770	2006-04-17 21:07:20 +00:00
Chris Lattner	face261a94	Use a small table instead of macros to do this conversion. llvm-svn: 27769	2006-04-17 20:59:25 +00:00
Evan Cheng	4de1805c84	Implement v8i16, v16i8 splat using unpckl + pshufd. llvm-svn: 27768	2006-04-17 20:43:08 +00:00
Chris Lattner	e1d38ad84b	implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll llvm-svn: 27767	2006-04-17 20:32:50 +00:00
Chris Lattner	ceb52c4403	New testcase llvm-svn: 27766	2006-04-17 20:32:27 +00:00
Chris Lattner	9bcfa7f378	Codegen insertelement with constant insertion points as scalar_to_vector and a shuffle. For this: void %test2(<4 x float>* %F, float %f) { %tmp = load <4 x float>* %F ; <<4 x float>> [#uses=2] %tmp3 = add <4 x float> %tmp, %tmp ; <<4 x float>> [#uses=1] %tmp2 = insertelement <4 x float> %tmp3, float %f, uint 2 ; <<4 x float>> [#uses=2] %tmp6 = add <4 x float> %tmp2, %tmp2 ; <<4 x float>> [#uses=1] store <4 x float> %tmp6, <4 x float>* %F ret void } we now get this on X86 (which will get better): _test2: movl 4(%esp), %eax movaps (%eax), %xmm0 addps %xmm0, %xmm0 movaps %xmm0, %xmm1 shufps $3, %xmm1, %xmm1 movaps %xmm0, %xmm2 shufps $1, %xmm2, %xmm2 unpcklps %xmm1, %xmm2 movss 8(%esp), %xmm1 unpcklps %xmm1, %xmm0 unpcklps %xmm2, %xmm0 addps %xmm0, %xmm0 movaps %xmm0, (%eax) ret instead of: _test2: subl $28, %esp movl 32(%esp), %eax movaps (%eax), %xmm0 addps %xmm0, %xmm0 movaps %xmm0, (%esp) movss 36(%esp), %xmm0 movss %xmm0, 8(%esp) movaps (%esp), %xmm0 addps %xmm0, %xmm0 movaps %xmm0, (%eax) addl $28, %esp ret llvm-svn: 27765	2006-04-17 19:21:01 +00:00
Chris Lattner	f2347c31b4	Make sure to check splats of every constant we can, handle splat(31) by being a bit more clever, add support for odd splats from -31 to -17. llvm-svn: 27764	2006-04-17 18:09:22 +00:00
Evan Cheng	5728f30f7c	Incorrect foldMemoryOperand entries llvm-svn: 27763	2006-04-17 18:06:12 +00:00
Evan Cheng	3d26db8148	Errors in patterns preventing load folding llvm-svn: 27762	2006-04-17 18:05:01 +00:00
Jeff Cohen	4cacdf3a2b	Add checks for __OpenBSD__. llvm-svn: 27761	2006-04-17 17:55:41 +00:00
Chris Lattner	cc4222d95b	Teach the ppc backend to use rol and vsldoi to generate splatted constants. This implements vec_constants.ll:test_vsldoi and test_rol llvm-svn: 27760	2006-04-17 17:55:10 +00:00
Chris Lattner	af62fa8b51	Some more cases that can be generated with two instructions llvm-svn: 27759	2006-04-17 17:54:18 +00:00
Chris Lattner	7d66e5a118	add a note llvm-svn: 27758	2006-04-17 17:29:41 +00:00
Evan Cheng	eb739d0355	FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly llvm-svn: 27755	2006-04-17 07:24:10 +00:00
Chris Lattner	2d8d6c9feb	Make some code more general, adding support for constant formation of several new patterns. llvm-svn: 27754	2006-04-17 06:58:41 +00:00
Chris Lattner	f0fc91e4d7	New testcases llvm-svn: 27753	2006-04-17 06:58:16 +00:00
Chris Lattner	9dd4ebffca	Learn how to make odd splatted constants in range [17,29]. This implements PowerPC/vec_constants.ll:test_29. llvm-svn: 27752	2006-04-17 06:07:44 +00:00
Chris Lattner	fb4fb42e54	new testcase llvm-svn: 27751	2006-04-17 06:06:50 +00:00
Chris Lattner	72a67a5b1f	Pull some code out into a helper function. Effeciently codegen even splats in the range [-32,30]. This allows us to codegen <30,30,30,30> as: vspltisw v0, 15 vadduwm v2, v0, v0 instead of as a cp load. llvm-svn: 27750	2006-04-17 06:00:21 +00:00
Chris Lattner	78ab345316	New testcase llvm-svn: 27749	2006-04-17 05:58:22 +00:00
Chris Lattner	5367a73dec	Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle, if it can be implemented in 3 or fewer discrete altivec instructions, codegen it as such. This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll llvm-svn: 27748	2006-04-17 05:28:54 +00:00
Chris Lattner	dc4f42f75d	new testcase, these shuffles can be implemented with discrete instructions, and shouldn't be lowered to vperm. llvm-svn: 27747	2006-04-17 05:27:31 +00:00
Chris Lattner	34ec6432f6	Regenerate with adjusted costs llvm-svn: 27746	2006-04-17 05:26:20 +00:00
Chris Lattner	81fa159ca9	Encode a cost of zero as a cost of 1. llvm-svn: 27745	2006-04-17 05:25:16 +00:00
Chris Lattner	36ceea9e96	Regenerate with correct offset llvm-svn: 27744	2006-04-17 05:08:46 +00:00
Chris Lattner	d134f32a85	Really, I can count! llvm-svn: 27743	2006-04-17 05:05:52 +00:00
Chris Lattner	671f50cf33	Increase the opcodes by one each to disambiguate COPY from VMRGHW. llvm-svn: 27742	2006-04-17 00:47:48 +00:00
Chris Lattner	aee09fc8aa	assign stable opcodes to the various altivec ops. llvm-svn: 27741	2006-04-17 00:47:18 +00:00
Chris Lattner	dab7d994ef	PPCPerfectShuffle.h is autogenerated, don't include it in the LOC counts. llvm-svn: 27740	2006-04-17 00:46:09 +00:00

... 3 4 5 6 7 ...

24383 Commits