llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 22:12:47 +01:00

Author	SHA1	Message	Date
Che-Liang Chiou	f4aaa1b2e1	ptx: add state spaces llvm-svn: 122638	2010-12-30 10:41:27 +00:00
NAKAMURA Takumi	e16c40416e	test/CodeGen/X86/negative-sin.ll: FileCheck-ize. llvm-svn: 122619	2010-12-29 03:58:47 +00:00
NAKAMURA Takumi	d66bcf9ada	test/CodeGen/X86/fp-in-intregs.ll: FileCheck-ize. llvm-svn: 122618	2010-12-29 03:58:36 +00:00
Bob Wilson	85dbc89f44	Radar 8803471: Fix expansion of ARM BCCi64 pseudo instructions. If the basic block containing the BCCi64 (or BCCZi64) instruction ends with an unconditional branch, that branch needs to be deleted before appending the expansion of the BCCi64 to the end of the block. llvm-svn: 122521	2010-12-23 22:45:49 +00:00
Andrew Trick	cc701bcfdc	Fixes PR8823: add-with-overflow-128.ll In the bottom-up selection DAG scheduling, handle two-address instructions that read/write unspillable registers. Treat the entire chain of two-address nodes as a single live range. llvm-svn: 122472	2010-12-23 03:15:51 +00:00
Benjamin Kramer	49942a90b7	DAGCombine add (sext i1), X into sub X, (zext i1) if sext from i1 is illegal. The latter usually compiles into smaller code. example code: unsigned foo(unsigned x, unsigned y) { if (x != 0) y--; return y; } before: _foo: ## @foo cmpl $1, 4(%esp) ## encoding: [0x83,0x7c,0x24,0x04,0x01] sbbl %eax, %eax ## encoding: [0x19,0xc0] notl %eax ## encoding: [0xf7,0xd0] addl 8(%esp), %eax ## encoding: [0x03,0x44,0x24,0x08] ret ## encoding: [0xc3] after: _foo: ## @foo cmpl $1, 4(%esp) ## encoding: [0x83,0x7c,0x24,0x04,0x01] movl 8(%esp), %eax ## encoding: [0x8b,0x44,0x24,0x08] adcl $-1, %eax ## encoding: [0x83,0xd0,0xff] ret ## encoding: [0xc3] llvm-svn: 122455	2010-12-22 23:17:45 +00:00
Benjamin Kramer	d8387aa9bd	X86: Lower a select directly to a setcc_carry if possible. int test(unsigned long a, unsigned long b) { return -(a < b); } compiles to _test: ## @test cmpq %rsi, %rdi ## encoding: [0x48,0x39,0xf7] sbbl %eax, %eax ## encoding: [0x19,0xc0] ret ## encoding: [0xc3] instead of _test: ## @test xorl %ecx, %ecx ## encoding: [0x31,0xc9] cmpq %rsi, %rdi ## encoding: [0x48,0x39,0xf7] movl $-1, %eax ## encoding: [0xb8,0xff,0xff,0xff,0xff] cmovael %ecx, %eax ## encoding: [0x0f,0x43,0xc1] ret ## encoding: [0xc3] llvm-svn: 122451	2010-12-22 23:09:28 +00:00
Che-Liang Chiou	e73ad4387e	ptx: add ld instruction and test llvm-svn: 122398	2010-12-22 10:38:51 +00:00
Chris Lattner	04ef853e23	Fix a bug in ReduceLoadWidth that wasn't handling extending loads properly. We miscompiled the testcase into: _test: ## @test movl $128, (%rdi) movzbl 1(%rdi), %eax ret Now we get a proper: _test: ## @test movl $128, (%rdi) movsbl (%rdi), %eax movzbl %ah, %eax ret This fixes PR8757. llvm-svn: 122392	2010-12-22 08:02:57 +00:00
Dale Johannesen	e0fb87c3d7	Reapply 122353-122355 with fixes. 122354 was wrong; the shift type was needed one place, the shift count type another. The transform in 123555 had the same problem. llvm-svn: 122366	2010-12-21 21:55:50 +00:00
Benjamin Kramer	369872edfc	Add some x86 specific dagcombines for conditional increments. (add Y, (sete X, 0)) -> cmp X, 1; adc 0, Y (add Y, (setne X, 0)) -> cmp X, 1; sbb -1, Y (sub (sete X, 0), Y) -> cmp X, 1; sbb 0, Y (sub (setne X, 0), Y) -> cmp X, 1; adc -1, Y for unsigned foo(unsigned a, unsigned b) { if (a == 0) b++; return b; } we now get: foo: cmpl $1, %edi movl %esi, %eax adcl $0, %eax ret instead of: foo: testl %edi, %edi sete %al movzbl %al, %eax addl %esi, %eax ret llvm-svn: 122364	2010-12-21 21:41:44 +00:00
Dale Johannesen	972aba543a	Revert 122353-122355 for the moment, they broke stuff. llvm-svn: 122360	2010-12-21 21:22:27 +00:00
Dale Johannesen	39186cfb0b	Add a new transform to DAGCombiner. llvm-svn: 122355	2010-12-21 20:10:51 +00:00
Dale Johannesen	5f3e7b08f6	Get the type of a shift from the shift, not from its shift count operand. These should be the same but apparently are not always, and this is cleaner anyway. This improves the code in an existing test. llvm-svn: 122354	2010-12-21 20:06:19 +00:00
Bob Wilson	01593c55a2	Add ARM-specific DAG combining to cast i64 vector element load/stores to f64. Type legalization splits up i64 values into pairs of i32 values, which leads to poor quality code when inserting or extracting i64 vector elements. If the vector element is loaded or stored, it can be treated as an f64 value and loaded or stored directly from a VPR register. Use the pre-legalization DAG combiner to cast those vector elements to f64 types so that the type legalizer won't mess them up. Radar 8755338. llvm-svn: 122319	2010-12-21 06:43:19 +00:00
Dale Johannesen	036c3da142	Cosmetic changes. llvm-svn: 122259	2010-12-20 20:10:50 +00:00
Chris Lattner	4c3662e299	temporarily disable this: PR8823. llvm-svn: 122222	2010-12-20 02:11:23 +00:00
Chris Lattner	bee7320c3c	now that addc/adde are gone, "ADDC" in the X86 backend uses EFLAGS results, the same as setcc. Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS). This is a step towards finishing off PR5443. In the testcase in that bug we now get: movq %rdi, %rax addq %rsi, %rax sbbq %rcx, %rcx testb $1, %cl setne %dl ret instead of: movq %rdi, %rax addq %rsi, %rax movl $0, %ecx adcq $0, %rcx testq %rcx, %rcx setne %dl ret llvm-svn: 122219	2010-12-20 01:37:09 +00:00
Chris Lattner	2d4e17d195	We lower setb to sbb with the hope that the and will go away, when it doesn't, match it back to setb. On a 64-bit version of the testcase before we'd get: movq %rdi, %rax addq %rsi, %rax sbbb %dl, %dl andb $1, %dl ret now we get: movq %rdi, %rax addq %rsi, %rax setb %dl ret llvm-svn: 122217	2010-12-20 01:16:03 +00:00
Mon P Wang	666259546c	Add comment for testcase for 122206 llvm-svn: 122210	2010-12-20 00:54:26 +00:00
Mon P Wang	d3adab7a64	Prevents PerformShuffleCombine from creating a node with an illegal type after legalize types has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type. llvm-svn: 122206	2010-12-19 23:55:53 +00:00
Chris Lattner	297259f6f1	improve the setcc -> setcc_carry optimization to happen more consistently by moving it out of lowering into dag combine. Add some missing patterns for matching away extended versions of setcc_c. llvm-svn: 122201	2010-12-19 22:08:31 +00:00
Chris Lattner	ad85635a93	now that generic vector types aren't selected onto MMX registers, these tests don't need -disable-mmx. llvm-svn: 122188	2010-12-19 20:12:58 +00:00
Chris Lattner	2d6a408c4c	add a general coverage test for overflow intrinsics. llvm-svn: 122185	2010-12-19 20:01:13 +00:00
Chris Lattner	ac82ea26da	fix PR8642: if a critical edge has a PHI value that can trap, isel is required to split the edge. PHI values get evaluated on the edge, not in their predecessor block. llvm-svn: 122170	2010-12-19 04:58:57 +00:00
Chris Lattner	1cc35d2472	move this test into the ARM test so that it is only run when the arm backend is enabled. llvm-svn: 122163	2010-12-19 02:58:14 +00:00
Anton Korobeynikov	f49c9c02d6	Restore the behavior of frame lowering before my refactoring. It turns out that ppc backend has really weird interdependencies over different hooks and all stuff is fragile wrt small changes. This should fix PR8749 llvm-svn: 122155	2010-12-18 19:53:14 +00:00
Benjamin Kramer	84d3e6cfd0	Just rename the functions, relying on matching a instruction that has the same name as a symbol is way too fragile. llvm-svn: 122154	2010-12-18 14:23:57 +00:00
Benjamin Kramer	4d591385d1	Test more than just label names and make test work on non-x86 hosts. llvm-svn: 122153	2010-12-18 14:07:28 +00:00
Bob Wilson	776d3f73eb	Fix result type of Neon floating-point comparisons against zero. The result vector elements are always integers. Radar 8782191. llvm-svn: 122112	2010-12-18 00:04:33 +00:00
Bill Wendling	c16f9b1ccc	During local stack slot allocation, the materializeFrameBaseRegister function may be called. If the entry block is empty, the insertion point iterator will be the "end()" value. Calling ->getParent() on it (among others) causes problems. Modify materializeFrameBaseRegister to take the machine basic block and insert the frame base register at the beginning of that block. (It's very similar to what the code does all ready. The only difference is that it will always insert at the beginning of the entry block instead of after a previous materialization of the frame base register. I doubt that that matters here.) <rdar://problem/8782198> llvm-svn: 122104	2010-12-17 23:09:14 +00:00
Bob Wilson	c57c2d755b	Fix a DAGCombiner crash when folding binary vector operations with constant BUILD_VECTOR operands where the element type is not legal. I had previously changed this code to insert TRUNCATE operations, but that was just wrong. llvm-svn: 122102	2010-12-17 23:06:49 +00:00
Bob Wilson	347efe0b29	Combine several vector-related DAGCombiner tests. llvm-svn: 122101	2010-12-17 23:06:46 +00:00
Nate Begeman	ef5f3c0fa7	Add support for matching psign & plendvb to the x86 target Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts llvm-svn: 122098	2010-12-17 22:55:37 +00:00
Dale Johannesen	c2c6ebd82a	Add a transform to DAG Combiner. This improves the code for the case where 32-bit divide by constant is turned into 64-bit multiply by constant. 8771012. llvm-svn: 122090	2010-12-17 21:45:49 +00:00
Kalle Raiskila	68f221707a	Don't feed 19 bit immediates to ILA. Patch (slightly modified) by Visa Putkinen. llvm-svn: 122052	2010-12-17 09:36:09 +00:00
Bob Wilson	e06f6eabe7	Fix crash compiling a QQQQ REG_SEQUENCE for a Neon vld3_lane operation. Radar 8776599 llvm-svn: 122018	2010-12-17 01:21:12 +00:00
Jason W Kim	2f4a8d7553	1. ARM/MC/ELF: A few more ELF relocs for .o 2. Fixed EmitLocalCommonSymbol for ELF (Yes, they exist. :) Test added. llvm-svn: 121951	2010-12-16 03:12:17 +00:00
Jim Grosbach	84c2b29b58	Thumb1 had two patterns for the same load-from-constant-pool instruction. Canonicalize on tLDRpci and remove tLDRcp. llvm-svn: 121920	2010-12-15 23:52:36 +00:00
Eric Christopher	339499f8f3	Don't handle -arm-long-calls in fast isel for now. llvm-svn: 121919	2010-12-15 23:47:29 +00:00
Evan Cheng	68e1ed8752	Teach machine cse to commute instructions. llvm-svn: 121903	2010-12-15 22:16:21 +00:00
Bob Wilson	438a9a1367	Add Neon VCVT instructions for f32 <-> f16 conversions. Clang is now providing intrinsics for these and so we need to support them in the backend. Radar 8068427. llvm-svn: 121902	2010-12-15 22:14:12 +00:00
Wesley Peck	2a376c535e	Lower the MBlaze target specific calling conventions for "interrupt_handler" and "save_volatiles" correctly. This completes the custom calling convention functionality changes for the MBlaze backend that were started in 121888. llvm-svn: 121891	2010-12-15 20:27:28 +00:00
Chris Lattner	81815cd4db	take care of some todos, transforming [us]mul_lohi into a wider mul if the wider mul is legal. llvm-svn: 121848	2010-12-15 06:04:19 +00:00
Chris Lattner	3bec2e7d0d	merge two tests llvm-svn: 121847	2010-12-15 05:58:59 +00:00
Evan Cheng	7e96e67d98	Fix a minor bug in two-address pass. It was missing a commute opportunity. regB = move RCX regA = op regB, regC RAX = move regA where both regB and regC are killed. If regB is constrainted to non-compatible physical registers but regC is not constrainted at all, then it's better to commute the instruction. movl %edi, %eax shlq $32, %rcx leaq (%rcx,%rax), %rax => movl %edi, %eax shlq $32, %rcx orq %rcx, %rax rdar://8762995 llvm-svn: 121793	2010-12-14 21:34:53 +00:00
Evan Cheng	6a2bed92f5	bfi A, (and B, C1), C2) -> bfi A, B, C2 iff C1 & C2 == C1. rdar://8458663 llvm-svn: 121746	2010-12-14 03:22:07 +00:00
Jason W Kim	0cf0f7a078	fix fixme case typo :-) llvm-svn: 121743	2010-12-14 01:42:38 +00:00
Jason W Kim	b5cc5dad79	First cut of ARM/MC/ELF PIC relocations. Test has fixme, to move to .s -> .o test when AsmParser works better. llvm-svn: 121732	2010-12-13 23:16:07 +00:00
Bob Wilson	33e5e902b0	Remove the rest of the _sfp Neon instruction patterns. Use the same COPY_TO_REGCLASS approach as for the 2-register _sfp instructions. This change made a big difference in the code generated for the CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing a fine job, but some instructions that were previously moved outside the loop are not moved now. It's using fewer VFP registers now, which is generally a good thing, so I think the estimates for register pressure changed and that affected the LICM behavior. Since that isn't obviously wrong, I've just changed the test file. This completes the work for Radar 8711675. llvm-svn: 121730	2010-12-13 23:02:37 +00:00
Chris Lattner	d17cbf803b	rename test llvm-svn: 121697	2010-12-13 08:39:40 +00:00
Chris Lattner	14810c808b	Add a couple dag combines to transform mulhi/mullo into a wider multiply when the wider type is legal. This allows us to compile: define zeroext i16 @test1(i16 zeroext %x) nounwind { entry: %div = udiv i16 %x, 33 ret i16 %div } into: test1: # @test1 movzwl 4(%esp), %eax imull $63551, %eax, %eax # imm = 0xF83F shrl $21, %eax ret instead of: test1: # @test1 movw $-1985, %ax # imm = 0xFFFFFFFFFFFFF83F mulw 4(%esp) andl $65504, %edx # imm = 0xFFE0 movl %edx, %eax shrl $5, %eax ret Implementing rdar://8760399 and example #4 from: http://blog.regehr.org/archives/320 We should implement the same thing for [su]mul_hilo, but I don't have immediate plans to do this. llvm-svn: 121696	2010-12-13 08:39:01 +00:00
Wesley Peck	f842b79b4b	Missed some ADDI <-> ADDIK conversions in 121649. llvm-svn: 121652	2010-12-12 22:53:14 +00:00
Evan Cheng	b6773d7e1f	(or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056 llvm-svn: 121606	2010-12-11 04:11:38 +00:00
Bob Wilson	d30768fe3e	Add float patterns for Neon vld1-lane/dup and vst1-lane operations. llvm-svn: 121583	2010-12-10 22:13:32 +00:00
Bob Wilson	5ff13f9d5c	Fix some invalid alignments for Neon vld-dup and vld/st-lane instructions. Alignments smaller than the total size of the memory being loaded or stored, unless the alignment is 8 bytes, are not allowed. Add tests for this, too. llvm-svn: 121506	2010-12-10 19:37:42 +00:00
Nate Begeman	cb6d1c8193	Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable. llvm-svn: 121439	2010-12-10 00:26:57 +00:00
Jim Grosbach	8bc33cc6e5	ARM stm/ldm instructions require more than one register in the register list. Otherwise, a plain str/ldr should be used instead. Make sure we account for that in prologue/epilogue code generation. rdar://8745460 llvm-svn: 121391	2010-12-09 18:31:13 +00:00
Bruno Cardoso Lopes	93e5c2fb64	Add ROTR and ROTRV mips32 instructions. Patch by Akira Hatanaka llvm-svn: 121377	2010-12-09 17:32:30 +00:00
Eric Christopher	0e40452eb0	Rewrite the darwin tlv support to use a chain and return to copying the output to the correct register. Fixes a hidden problem uncovered by the last patch where we'd try to DAG combine our MVT::Other node oddly. llvm-svn: 121358	2010-12-09 06:25:53 +00:00
Eric Christopher	0100a8fda4	Remove extraneous copy from DAG conversion for darwin tls. This was popping up at O0 when it wasn't folded and the fast allocator would complain. llvm-svn: 121330	2010-12-09 00:27:58 +00:00
Eric Christopher	d601b8288f	Move this test to tlv* to make it easier to notice versus linux tls support. llvm-svn: 121316	2010-12-08 23:33:23 +00:00
Jason W Kim	2e6e50c1b0	ARM/MC/ELF TPsoft is now a proper pseudo inst. Added test to check bl __aeabi_read_tp gets emitted properly for ELF/ASM as well as ELF/OBJ (including fixup) Also added support for ELF::R_ARM_TLS_IE32 llvm-svn: 121312	2010-12-08 23:14:44 +00:00
Evan Cheng	3bd9b95b4d	Fix a bad prologue / epilogue codegen bug where the compiler would emit illegal vpush instructions to save / restore VFP / NEON registers like this: vpush {d8,d10,d11} vpop {d8,d10,d11} vpush and vpop do not allow gaps in the register list. rdar://8728956 llvm-svn: 121197	2010-12-07 23:08:38 +00:00
Bruno Cardoso Lopes	0e14644599	Match a pattern generated by a dag combiner opt where: (select (load (load tga0)) (load tga1)) => (load (select (load tga0) tga1)) Thanks to Akira for pointing that. llvm-svn: 121163	2010-12-07 19:00:20 +00:00
Devang Patel	6fe7fe8dd4	If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0. llvm-svn: 121059	2010-12-06 22:39:26 +00:00
Wesley Peck	996c76c27e	Fixed reversed operands for IDIV and CMP instructions in MBlaze backend. Use BRAD instead of BRD for indirect branches in MBlaze backend. patch contributed by Jack Whitham! llvm-svn: 121044	2010-12-06 22:06:49 +00:00
Wesley Peck	b168ddedaa	Fix a 16-bit immediate value detection bug in the MBlaze delay slot filler. Address more hazards in the MBlaze delay slot filler. patch contributed by Jack Whitham! llvm-svn: 121037	2010-12-06 21:11:01 +00:00
Rafael Espindola	4ec917db9b	Revert previous two patches while I try to find out how to make both linux and darwin assemblers happy :-( llvm-svn: 121004	2010-12-06 15:35:15 +00:00
Rafael Espindola	ad6219b193	Update test for the extra =. llvm-svn: 121001	2010-12-06 15:05:36 +00:00
Che-Liang Chiou	cd2878d421	ptx: add shift instructions llvm-svn: 120982	2010-12-06 04:00:03 +00:00
Evan Cheng	fc78767730	Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960	2010-12-05 22:04:16 +00:00
Chris Lattner	e30adfb732	Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags result. This allows us to compile: void *test12(long count) { return new int[count]; } into: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx movq $-1, %rdi cmovnoq %rax, %rdi jmp __Znam ## TAILCALL instead of: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx seto %cl testb %cl, %cl movq $-1, %rdi cmoveq %rax, %rdi jmp __Znam Of course it would be even better if the regalloc inverted the cmov to 'cmovoq', which would eliminate the need for the 'movq %rdi, %rax'. llvm-svn: 120936	2010-12-05 07:49:54 +00:00
Chris Lattner	76601e7a99	it turns out that when ".with.overflow" intrinsics were added to the X86 backend that they were all implemented except umul. This one fell back to the default implementation that did a hi/lo multiply and compared the top. Fix this to check the overflow flag that the 'mul' instruction sets, so we can avoid an explicit test. Now we compile: void *func(long count) { return new int[count]; } into: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] seto %cl ## encoding: [0x0f,0x90,0xc1] testb %cl, %cl ## encoding: [0x84,0xc9] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL instead of: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL Other than the silly seto+test, this is using the o bit directly, so it's going in the right direction. llvm-svn: 120935	2010-12-05 07:30:36 +00:00
Chris Lattner	9b4b9e751a	fix the rest of the linux miscompares :) llvm-svn: 120933	2010-12-05 02:08:07 +00:00
Chris Lattner	16bafb2414	generalize the previous check to handle -1 on either side of the select, inserting a not to compensate. Add a missing isZero check that I lost somehow. This improves codegen of: void *func(long count) { return new int[count]; } from: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] to: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01] sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff] notq %rdi ## encoding: [0x48,0xf7,0xd7] orq %rax, %rdi ## encoding: [0x48,0x09,0xc7] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] llvm-svn: 120932	2010-12-05 02:00:51 +00:00
Chris Lattner	e1c32a116b	relax this to handle linux defaulting to -static. llvm-svn: 120930	2010-12-05 01:31:13 +00:00
Chris Lattner	474ed0aa9b	Improve an integer select optimization in two ways: 1. generalize (select (x == 0), -1, 0) -> (sign_bit (x - 1)) to: (select (x == 0), -1, y) -> (sign_bit (x - 1)) \| y 2. Handle the identical pattern that happens with !=: (select (x != 0), y, -1) -> (sign_bit (x - 1)) \| y cmov is often high latency and can't fold immediates or memory operands. For example for (x == 0) ? -1 : 1, before we got: < testb %sil, %sil < movl $-1, %ecx < movl $1, %eax < cmovel %ecx, %eax now we get: > cmpb $1, %sil > sbbl %eax, %eax > orl $1, %eax llvm-svn: 120929	2010-12-05 01:23:24 +00:00
Chris Lattner	c383be807e	merge some tests into select.ll and make them more specific. llvm-svn: 120928	2010-12-05 01:13:58 +00:00
Chris Lattner	7e0a594633	rename test llvm-svn: 120927	2010-12-05 01:02:23 +00:00
Chris Lattner	1a760f6247	remove two tests that aren't really testing anything. llvm-svn: 120926	2010-12-05 01:02:13 +00:00
Benjamin Kramer	851691ddb2	Add patterns for the x86 popcnt instruction. - Also adds a new POPCNT subtarget feature that is currently enabled if the target supports SSE4.2 (nehalem) or SSE4A (barcelona). llvm-svn: 120917	2010-12-04 20:32:23 +00:00
Bob Wilson	20c65a9d33	The Thumb tADDrSPi instruction is not valid when the destination is SP. Check for that and try narrowing it to tADDspi instead. Radar 8724703. llvm-svn: 120892	2010-12-04 04:40:19 +00:00
Jim Grosbach	c69ad2176a	When using the 'push' mnemonic for Thumb2 stmdb, be explicit when it's the 32-bit wide version by adding the .w suffix. llvm-svn: 120838	2010-12-03 20:33:01 +00:00
Devang Patel	dad3193123	Hide tests, that check .loc, .file in output assembly, from darwin9 buildbot. llvm-svn: 120750	2010-12-02 23:29:58 +00:00
Devang Patel	822facd787	Use set directive for StartMinusEndExpr. This is a fix for llvm-gcc-i386-darwin9 buildbot failure. llvm-svn: 120742	2010-12-02 21:32:30 +00:00
Evan Cheng	402157b66e	Fix test. llvm-svn: 120730	2010-12-02 20:17:34 +00:00
Evan Cheng	4118b24aca	Fix and re-enable tail call optimization of expanded libcalls. llvm-svn: 120622	2010-12-01 22:59:46 +00:00
Owen Anderson	8802c68592	Add correct encodings for STRD and LDRD, including fixup support. Additionally, update these to unified syntax. llvm-svn: 120589	2010-12-01 19:18:46 +00:00
Evan Cheng	84162760b7	Speculatively disable x86 portion of r120501 to appease the x86_64 buildbot. llvm-svn: 120549	2010-12-01 03:27:20 +00:00
Jason W Kim	4d960e071c	ARM/MC/ELF relocation "hello world" for movw/movt. Lifted adjustFixupValue() from Darwin for sharing w ELF. Test added TODO: refactor ELFObjectWriter::RecordRelocation more. Possibly share more code with Darwin? Lots more relocations... llvm-svn: 120534	2010-12-01 02:40:06 +00:00
Evan Cheng	f7e586d749	Enable sibling call optimization of libcalls which are expanded during legalization time. Since at legalization time there is no mapping from SDNode back to the corresponding LLVM instruction and the return SDNode is target specific, this requires a target hook to check for eligibility. Only x86 and ARM support this form of sibcall optimization right now. rdar://8707777 llvm-svn: 120501	2010-11-30 23:55:39 +00:00
Che-Liang Chiou	f594fe5fc5	ptx: add command-line options for gpu target and ptx version llvm-svn: 120423	2010-11-30 10:14:14 +00:00
Eric Christopher	990bcd83b8	Not all platforms use _<func>. Duh. llvm-svn: 120418	2010-11-30 09:23:54 +00:00
Eric Christopher	f27f0b5234	Rewrite mwait and monitor support and custom lower arguments. Fixes PR8573. llvm-svn: 120404	2010-11-30 07:20:12 +00:00
Bob Wilson	bd3d3d2937	Add support for NEON VLD3-dup instructions. The encoding for alignment in VLD4-dup instructions is still a work in progress. llvm-svn: 120356	2010-11-30 00:00:35 +00:00
Evan Cheng	78baa6f30d	Mark Darwin call instructions as using "r7" to prevent the frame-register assignment instructions from being moved below / above calls. rdar://8690640 llvm-svn: 120339	2010-11-29 22:43:27 +00:00
Benjamin Kramer	af5a57ca66	Add missing colon. llvm-svn: 120336	2010-11-29 22:39:38 +00:00
Benjamin Kramer	84bf47f2d8	Fix some broken CHECK lines. llvm-svn: 120332	2010-11-29 22:34:55 +00:00
Bob Wilson	aa197b07e6	Add support for NEON VLD3-dup instructions. llvm-svn: 120312	2010-11-29 19:35:29 +00:00
Kalle Raiskila	71dec6ff42	Handle lshr for i128 correctly on SPU also when shiftamount > 7. llvm-svn: 120288	2010-11-29 14:44:28 +00:00
Kalle Raiskila	45865e9165	Enable PostRA scheduling for SPU. This speeds up selected test cases with up to 5% - no slowdowns observed. llvm-svn: 120286	2010-11-29 10:30:25 +00:00
Bob Wilson	3bb61d1932	Add support for NEON VLD2-dup instructions. llvm-svn: 120236	2010-11-28 06:51:26 +00:00
Rafael Espindola	45cd9713f2	Lower TLS_addr32 and TLS_addr64. llvm-svn: 120225	2010-11-27 20:43:02 +00:00
Bob Wilson	cbd6281807	Add NEON VLD1-dup instructions (load 1 element to all lanes). llvm-svn: 120194	2010-11-27 06:35:16 +00:00
Kalle Raiskila	b017eaea7b	Allow for 'fcmp ogt' in SPU. Fix by Visa Putkinen! llvm-svn: 120090	2010-11-24 11:42:17 +00:00
Bob Wilson	ff0e6b1252	Recognize sign/zero-extended constant BUILD_VECTORs for VMULL operations. We need to check if the individual vector elements are sign/zero-extended values. For now this only handles constants values. Radar 8687140. llvm-svn: 120034	2010-11-23 19:38:38 +00:00
Kalle Raiskila	f71cc94c91	Division by pow-of-2 is not cheap on SPU, do it with shifts. llvm-svn: 120022	2010-11-23 13:27:59 +00:00
Chris Lattner	c0fcc364c0	filecheckize llvm-svn: 119987	2010-11-23 02:26:52 +00:00
Evan Cheng	e6d55cd247	Fix epilogue codegen to avoid leaving the stack pointer in an invalid state. Previously Thumb2 would restore sp from fp like this: mov sp, r7 sub, sp, #4 If an interrupt is taken after the 'mov' but before the 'sub', callee-saved registers might be clobbered by the interrupt handler. Instead, try restoring directly from sp: add sp, #4 Or, if necessary (with VLA, etc.) use a scratch register to compute sp and then restore it: sub.w r4, r7, #8 mov sp, r7 rdar://8465407 llvm-svn: 119977	2010-11-22 18:12:04 +00:00
Kalle Raiskila	8f1131e569	Fix a bug with extractelement on SPU. In the attached testcase, the element was never extracted (missing rotate). llvm-svn: 119973	2010-11-22 16:28:26 +00:00
Benjamin Kramer	632a91cba5	Implement the "if (X == 6 \|\| X == 4)" -> "if ((X\|2) == 6)" optimization. This currently only catches the most basic case, a two-case switch, but can be extended later. llvm-svn: 119964	2010-11-22 09:45:38 +00:00
Wesley Peck	911abf2bc0	Implement branch analysis in the MBlaze backend. llvm-svn: 119951	2010-11-21 21:53:36 +00:00
Andrew Trick	3166f72d7a	Removing the useless test that I added recently. It was meant as an example, but not complicated enough to merit another test. llvm-svn: 119898	2010-11-20 07:26:51 +00:00
Dale Johannesen	6399550f2f	Prefetch has a MemOperand now. FileCheckize a test. This finishes up 8460971. llvm-svn: 119848	2010-11-19 21:49:38 +00:00
Mon P Wang	4965983b22	Make isScalarToVector to return false if the node is a scalar. This will prevent DAGCombine from making an illegal transformation of bitcast of a scalar to a vector into a scalar_to_vector. llvm-svn: 119819	2010-11-19 19:08:12 +00:00
Tanya Lattner	9cac0edef3	Fix bug in DAGCombiner for ARM that was trying to do a ShiftCombine on illegal types (vector should be split first). Added test case. llvm-svn: 119749	2010-11-18 22:06:46 +00:00
Duncan Sands	a61bc1a41a	The DAGCombiner was threading select over pairs of extending loads even if the extension types were not the same. The result was that if you fed a select with sext and zext loads, as in the testcase, then it would get turned into a zext (or sext) of the select, which is wrong in the cases when it should have been an sext (resp. zext). Reported and diagnosed by Sebastien Deldon. llvm-svn: 119728	2010-11-18 20:05:18 +00:00
Eric Christopher	bc6a51d63f	Rewrite stack callee saved spills and restores to use push/pop instructions. Remove movePastCSLoadStoreOps and associated code for simple pointer increments. Update routines that depended upon other opcodes for save/restore. Adjust all testcases accordingly. llvm-svn: 119725	2010-11-18 19:40:05 +00:00
Rafael Espindola	93a07b464e	Change CodeGen to use .loc directives. This produces a lot more readable output and testing is easier. A good example is the unknown-location.ll test that now can just look for ".loc 1 0 0". We also don't use a DW_LNE_set_address for every address change anymore. llvm-svn: 119613	2010-11-18 02:04:25 +00:00
Dale Johannesen	06f479d543	Do not throw away alignment when generating the DAG for memset; we may need it to decide between MOVAPS and MOVUPS later. Adjust a test that was looking for wrong code. PR 3866 / 8675131. llvm-svn: 119605	2010-11-18 01:35:23 +00:00
John Thompson	8c52bd2004	Fixed to use input redirection for source - to eliminate .s output. llvm-svn: 119599	2010-11-18 00:50:20 +00:00
John Thompson	b33f935bc3	Bug 8621 fix - pointer cast stripped from inline asm constraint argument. llvm-svn: 119590	2010-11-17 23:58:47 +00:00
Dale Johannesen	a87210e350	These tests are looking for library function names that appear to differ on Linux. Try to make them pass on Linux. Would be good for a Linux person to review this. llvm-svn: 119572	2010-11-17 21:57:32 +00:00
Bob Wilson	a217a6e40b	Change ARMGlobalMerge to keep BSS globals in separate pools. This completes the fixes for Radar 8673120. llvm-svn: 119566	2010-11-17 21:25:39 +00:00
Bob Wilson	819660b716	Fix ARMGlobalMerge pass to check if globals are entirely within range. It is generally not sufficient to check if the starting offset is in range of the maximum offset that can be efficiently used for the target. llvm-svn: 119565	2010-11-17 21:25:36 +00:00
Bob Wilson	49bf8702f0	Change the symbol for merged globals from "merged" to "_MergedGlobals". This makes it more clear that the symbol is an internal, compiler-generated name and gives a little more description about its contents. llvm-svn: 119564	2010-11-17 21:25:33 +00:00
Bob Wilson	fd7399b6a6	Fix the ARMGlobalMerge pass to look at variable sizes instead of pointer sizes. It was mistakenly looking at the pointer type when checking for the size of global variables. This is a partial fix for Radar 8673120. llvm-svn: 119563	2010-11-17 21:25:27 +00:00
Evan Cheng	ce610bd6b3	Remove ARM isel hacks that fold large immediates into a pair of add, sub, and, and xor. The 32-bit move immediates can be hoisted out of loops by machine LICM but the isel hacks were preventing them. Instead, let peephole optimization pass recognize registers that are defined by immediates and the ARM target hook will fold the immediates in. Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ instructions if there are multiple uses. This happens when the 'and' is live out, machine sink would have sinked the computation and that ends up pessimizing code. The peephole pass would recognize situations where the 'and' can be toggled to define CPSR and eliminate the comparison anyway. 2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking important optimizations. rdar://8663787, rdar://8241368 llvm-svn: 119548	2010-11-17 20:13:28 +00:00
Che-Liang Chiou	f8ffd59ccb	Add simple arithmetics and %type directive for PTX llvm-svn: 119485	2010-11-17 08:08:49 +00:00
Jakob Stoklund Olesen	ea79975a68	Fix PR8612 in the standard spiller, take two. The live range of a register defined by an early clobber starts at the use slot, not the def slot. Except when it is an early clobber tied to a use operand. Then it starts at the def slot like a standard def. llvm-svn: 119305	2010-11-16 00:40:59 +00:00
Jakob Stoklund Olesen	465c07cc66	Revert "Fix PR8612 in the standard spiller as well." This reverts r119183 which borke the buildbots. llvm-svn: 119270	2010-11-15 21:51:51 +00:00
Eric Christopher	22b01f0d76	Recommit this change and remove the failing part of the test - it didn't pass in the first place and was masked by earlier failures not warning and aborting the block. llvm-svn: 119184	2010-11-15 21:11:06 +00:00
Jakob Stoklund Olesen	920856ab38	Fix PR8612 in the standard spiller as well. The live range of a register defined by an early clobber starts at the use slot, not the def slot. llvm-svn: 119183	2010-11-15 20:55:53 +00:00
Jakob Stoklund Olesen	86fba16eb1	When spilling a register defined by an early clobber, make sure that the new live ranges for the spill register are also defined at the use slot instead of the normal def slot. This fixes PR8612 for the inline spiller. A use was being allocated to the same register as a spilled early clobber def. This problem exists in all the spillers. A fix for the standard spiller is forthcoming. llvm-svn: 119182	2010-11-15 20:55:49 +00:00
Chris Lattner	3e135493e9	remove a pointless testcase. llvm-svn: 119119	2010-11-15 05:07:03 +00:00
Chris Lattner	7743db8f20	remove some extraneous quotes to make the new instprinter match. llvm-svn: 119104	2010-11-15 02:43:46 +00:00
Chris Lattner	8cb5c07514	add some nounwind's. llvm-svn: 119086	2010-11-14 22:22:14 +00:00
Peter Collingbourne	4ec6b2dbe7	Recognise 32-bit ror-based bswap implementation used by uclibc llvm-svn: 119007	2010-11-13 19:54:30 +00:00
Evan Cheng	a7d3c3d387	Add conditional move of large immediate. llvm-svn: 118968	2010-11-13 02:25:14 +00:00
Evan Cheng	a08372d4b4	Fix an obvious typo which inverted an immediate. llvm-svn: 118951	2010-11-13 00:27:47 +00:00
Eric Christopher	ba758d0df4	This should be still failing, but is. Disable it with the forget-me-stick for now. llvm-svn: 118950	2010-11-13 00:25:06 +00:00
Evan Cheng	19f018a1be	Add conditional mvn instructions. llvm-svn: 118935	2010-11-12 22:42:47 +00:00
Evan Cheng	b565d1acf9	Add some missing isel predicates on def : pat patterns to avoid generating VFP vmla / vmls (they cause stalls). Disabling them in isel is properly not a right solution, I'll look into a proper solution next. llvm-svn: 118922	2010-11-12 20:32:20 +00:00
Andrew Trick	a2f5bd63ff	Emacs auto-fill bug. llvm-svn: 118908	2010-11-12 18:17:46 +00:00
Andrew Trick	e5e40ec069	Test case for PR8287: SD scheduling time. Fixed in r118904. llvm-svn: 118906	2010-11-12 17:57:22 +00:00
Kalle Raiskila	f89d0d0389	Fix memory access lowering on SPU, adding support for the case where alignment<value size. These cases were silently miscompiled before this patch. Now they are overly verbose -especially storing is- and any front-end should still avoid misaligned memory accesses as much as possible. The bit juggling algorithm added here probably has some room for improvement still. llvm-svn: 118889	2010-11-12 10:14:03 +00:00
Bruno Cardoso Lopes	37f2439cbd	Enable mips32 mul instruction. Patch by Akira Hatanaka <ahatanaka@mips.com> llvm-svn: 118864	2010-11-12 00:38:32 +00:00
Dan Gohman	8e986f9c2f	Remove the memmove->memcpy optimization from CodeGen. MemCpyOpt does this. llvm-svn: 118789	2010-11-11 16:24:49 +00:00
Bruno Cardoso Lopes	c911863e01	Add a test to the previous added clo instruction. Patch by Akira again llvm-svn: 118668	2010-11-10 02:22:44 +00:00

1 2 3 4 5 ...

4025 Commits