llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

Author	SHA1	Message	Date
Chris Lattner	95aad4d625	fix a crash on a pointless but valid zero-length memset, rdar://6808691 llvm-svn: 69680	2009-04-21 16:52:12 +00:00
Evan Cheng	c248188b46	Added a linearscan register allocation optimization. When the register allocator spill an interval with multiple uses in the same basic block, it creates a different virtual register for each of the reloads. e.g. %reg1498<def> = MOV32rm %reg1024, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0] %reg1506<def> = MOV32rm %reg1024, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0] %reg1486<def> = MOV32rr %reg1506 %reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead> %reg1510<def> = MOV32rm %reg1024, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0] => %reg1498<def> = MOV32rm %reg2036, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0] %reg1506<def> = MOV32rm %reg2037, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0] %reg1486<def> = MOV32rr %reg1506 %reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead> %reg1510<def> = MOV32rm %reg2038, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0] From linearscan's point of view, each of reg2036, 2037, and 2038 are separate registers, each is "killed" after a single use. The reloaded register is available and it's often clobbered right away. e.g. In thise case reg1498 is allocated EAX while reg2036 is allocated RAX. This means we end up with multiple reloads from the same stack slot in the same basic block. Now linearscan recognize there are other reloads from same SS in the same BB. So it'll "downgrade" RAX (and its aliases) after reg2036 is allocated until the next reload (reg2037) is done. This greatly increase the likihood reloads from SS are reused. This speeds up sha1 from OpenSSL by 5.8%. It is also an across the board win for SPEC2000 and 2006. llvm-svn: 69585	2009-04-20 08:01:12 +00:00
Chris Lattner	13a0dd0288	testcase for PR3898 llvm-svn: 69473	2009-04-18 20:49:22 +00:00
Duncan Sands	d2ba02aa87	Don't try to make BUILD_VECTOR operands have the same type as the vector element type: allow them to be of a wider integer type than the element type all the way through the system, and not just as far as LegalizeDAG. This should be safe because it used to be this way (the old type legalizer would produce such nodes), so backends should be able to handle it. In fact only targets which have legal vector types with an illegal promoted element type will ever see this (eg: <4 x i16> on ppc). This fixes a regression with the new type legalizer (vec_splat.ll). Also, treat SCALAR_TO_VECTOR the same as BUILD_VECTOR. After all, it is just a special case of BUILD_VECTOR. llvm-svn: 69467	2009-04-18 20:16:54 +00:00
Dale Johannesen	8a4446429e	Adjust XFAIL syntax, maybe that will help. The other way worked for me... llvm-svn: 69414	2009-04-18 02:01:23 +00:00
Dale Johannesen	05d46aca49	patch 69408 breaks this by removing the opportunity for the optimization it's testing to kick in (although it improves the code, getting rid of all spills). I don't understand the optimization well enough to rescue the test, so XFAILing. llvm-svn: 69409	2009-04-18 00:11:50 +00:00
Bob Wilson	b3e4773035	Rename file to have the correct suffix. llvm-svn: 69380	2009-04-17 20:40:20 +00:00
Bob Wilson	b8756b00cd	Use CallConvLower.h and TableGen descriptions of the calling conventions for ARM. Patch by Sandeep Patel. llvm-svn: 69371	2009-04-17 19:07:39 +00:00
Rafael Espindola	d74132e2c5	For general dynamic TLS access we must use leaq foo@TLSGD(%rip), %rdi as part of the instruction sequence. Using a register other than %rdi and then copying it to %rdi is not valid. llvm-svn: 69350	2009-04-17 14:35:58 +00:00
Evan Cheng	2d5be54315	Teach spiller to unfold instructions which modref spill slot when a scratch register is available and when it's profitable. e.g. xorq %r12<kill>, %r13 addq %rax, -184(%rbp) addq %r13, -184(%rbp) ==> xorq %r12<kill>, %r13 movq -184(%rbp), %r12 addq %rax, %r12 addq %r13, %r12 movq %r12, -184(%rbp) Two more instructions, but fewer memory accesses. It can also open up opportunities for more optimizations. llvm-svn: 69341	2009-04-17 01:29:40 +00:00
Rafael Espindola	a07d1c3103	fix PR3995. A scale must be 1, 2, 4 or 8. llvm-svn: 69284	2009-04-16 12:34:53 +00:00
Dan Gohman	98aa1d9693	Expand GEPs in ScalarEvolution expressions. SCEV expressions can now have pointer types, though in contrast to C pointer types, SCEV addition is never implicitly scaled. This not only eliminates the need for special code like IndVars' EliminatePointerRecurrence and LSR's own GEP expansion code, it also does a better job because it lets the normal optimizations handle pointer expressions just like integer expressions. Also, since LLVM IR GEPs can't directly index into multi-dimensional VLAs, moving the GEP analysis out of client code and into the SCEV framework makes it easier for clients to handle multi-dimensional VLAs the same way as other arrays. Some existing regression tests show improved optimization. test/CodeGen/ARM/2007-03-13-InstrSched.ll in particular improved to the point where if-conversion started kicking in; I turned it off for this test to preserve the intent of the test. llvm-svn: 69258	2009-04-16 03:18:22 +00:00
Dale Johannesen	040d118b17	Another testcase for IV shortening. llvm-svn: 69247	2009-04-16 00:45:21 +00:00
Bill Wendling	4153589196	Check for alignment. llvm-svn: 69140	2009-04-15 04:51:05 +00:00
Dale Johannesen	427e9aade9	Enhance induction variable code to remove the sext around sext(shorter IV + constant), using a longer IV instead, when it can figure out the add can't overflow. This comes up a lot in subscripting; mainly affects 64 bit. llvm-svn: 69123	2009-04-15 01:10:12 +00:00
Devang Patel	7323064183	While inlining, clone llvm.dbg.func.start intrinsic and adjust llvm.dbg.region.end instrinsic. This nested llvm.dbg.func.start/llvm.dbg.region.end pair now enables DW_TAG_inlined_subroutine support in code generator. llvm-svn: 69118	2009-04-15 00:17:06 +00:00
Bill Wendling	0861f3e874	Testcase for r69104. llvm-svn: 69110	2009-04-15 00:04:11 +00:00
Evan Cheng	dba98a0669	Optimize conditional branch on i1 phis with non-constant inputs. This turns: eq: %3 = icmp eq i32 %1, %2 br label %join ne: %4 = icmp ne i32 %1, %2 br label %join join: %5 = phi i1 [%3, %eq], [%4, %ne] br i1 %5, label %yes, label %no => eq: %3 = icmp eq i32 %1, %2 br i1 %3, label %yes, label %no ne: %4 = icmp ne i32 %1, %2 br i1 %4, label %yes, label %no llvm-svn: 69102	2009-04-14 23:40:03 +00:00
Dan Gohman	e1c4d4c5be	Fix the RUN lines so that this test actually tests. llvm-svn: 69096	2009-04-14 22:50:17 +00:00
Dan Gohman	365c457893	For the h-register addressing-mode trick, use the correct value for any non-address uses of the address value. This fixes 186.crafty. llvm-svn: 69094	2009-04-14 22:45:05 +00:00
Dan Gohman	3c19cf07d9	When the result of an EXTRACT_SUBREG, INSERT_SUBREG, or SUBREG_TO_REG operator is used by a CopyToReg to export the value to a different block, don't reuse the CopyToReg's register for the subreg operation result if the register isn't precisely the right class for the subreg operation. Also, rename the h-registers.ll test, now that there are more than one. llvm-svn: 69087	2009-04-14 22:17:14 +00:00
Evan Cheng	b64f2c1b08	Some of GR8_NOREX registers are only available in 64-bit mode. llvm-svn: 69049	2009-04-14 16:57:43 +00:00
Dale Johannesen	862ade6f10	Use the output of the asm so the optimizer won't delete it. llvm-svn: 69018	2009-04-14 01:51:40 +00:00
Evan Cheng	9f44d3148c	Fix PR3934 part 2. findOnlyInterestingUse() was not setting IsCopy and IsDstPhys which are returned by value and used by callee. This happened to work on the earlier test cases because of a logic error in the caller side. llvm-svn: 69006	2009-04-14 00:32:25 +00:00
Evan Cheng	fa48d5c8d0	PR3934: Fix a bogus two-address pass assertion. llvm-svn: 68979	2009-04-13 20:04:24 +00:00
Dan Gohman	be7227005f	Implement x86 h-register extract support. - Add patterns for h-register extract, which avoids a shift and mask, and in some cases a temporary register. - Add address-mode matching for turning (X>>(8-n))&(255<<n), where n is a valid address-mode scale value, into an h-register extract and a scaled-offset address. - Replace X86's MOV32to32_ and related instructions with the new target-independent COPY_TO_SUBREG instruction. On x86-64 there are complicated constraints on h registers, and CodeGen doesn't currently provide a high-level way to express all of them, so they are handled with a bunch of special code. This code currently only supports extracts where the result is used by a zero-extend or a store, though these are fairly common. These transformations are not always beneficial; since there are only 4 h registers, they sometimes require extra move instructions, and this sometimes increases register pressure because it can force out values that would otherwise be in one of those registers. However, this appears to be relatively uncommon. llvm-svn: 68962	2009-04-13 16:09:41 +00:00
Rafael Espindola	72347bffce	X86-64 TLS support for local exec and initial exec. llvm-svn: 68947	2009-04-13 13:02:49 +00:00
Chris Lattner	c1bfdc9bb2	Add a new "available_externally" linkage type. This is intended to support C99 inline, GNU extern inline, etc. Related bugzilla's include PR3517, PR3100, & PR2933. Nothing uses this yet, but it appears to work. llvm-svn: 68940	2009-04-13 05:44:34 +00:00
Rafael Espindola	ad8137187c	In X86DAGToDAGISel::MatchWrapper, if base or index are set, avoid matching only if symbolic addresses are RIP relatives. llvm-svn: 68924	2009-04-12 23:00:38 +00:00
Rafael Espindola	412b15f4ed	Add tests for the parts of X86-64 TLS that are already implemented. llvm-svn: 68901	2009-04-12 10:43:41 +00:00
Chris Lattner	6d6cf3ff4a	fix a cross-block fastisel crash handling overflow intrinsics. See comment for details. This fixes rdar://6772169 llvm-svn: 68890	2009-04-12 07:51:14 +00:00
Chris Lattner	f03202e76d	add some optimizations for strncpy/strncat and factor some code. Patch by Benjamin Kramer! llvm-svn: 68885	2009-04-12 05:06:39 +00:00
Chris Lattner	42b8e431b6	move a target-specific test into its directory so it isn't run if you don't configure the ARM target in. llvm-svn: 68843	2009-04-10 23:58:38 +00:00
Chris Lattner	0577b8e2ef	fix two problems with machine sinking: 1. Sinking would crash when the first instruction of a block was sunk due to iterator problems. 2. Instructions could be sunk to their current block, causing an infinite loop. This fixes PR3968 llvm-svn: 68787	2009-04-10 16:38:36 +00:00
Rafael Espindola	88986ef511	Don't fold a load if the other operand is a TLS address. With this we generate movl %gs:0, %eax leal i@NTPOFF(%eax), %eax instead of movl $i@NTPOFF, %eax addl %gs:0, %eax llvm-svn: 68778	2009-04-10 10:09:34 +00:00
Bob Wilson	c53238dff1	Fix pr3954. The register scavenger asserts for inline assembly with register destinations that are tied to source operands. The TargetInstrDescr::findTiedToSrcOperand method silently fails for inline assembly. The existing MachineInstr::isRegReDefinedByTwoAddr was very close to doing what is needed, so this revision makes a few changes to that method and also renames it to isRegTiedToUseOperand (for consistency with the very similar isRegTiedToDefOperand and because it handles both two-address instructions and inline assembly with tied registers). llvm-svn: 68714	2009-04-09 17:16:43 +00:00
Chris Lattner	301c4f39a0	reg0 references are not real registers. This fixes a crash on the attached testcase. llvm-svn: 68712	2009-04-09 16:50:43 +00:00
Dan Gohman	68de98eef3	Generalize ExtendUsesToFormExtLoad to be usable for ANY_EXTEND, in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the way it checked for live-out values, and simplify the way it find users by using SDNode::use_iterator's (relatively) new features. Also, make it slightly more permissive on targets with free truncates. In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are larger than necessary. If the target's SwitchAmountTy has enough bits, use it. This exposes the truncate to optimization early, enabling more optimizations. llvm-svn: 68670	2009-04-09 03:51:29 +00:00
Rafael Espindola	7eb72dc5f2	Re-apply 68552. Tested by bootstrapping llvm-gcc and using that to build llvm. llvm-svn: 68645	2009-04-08 21:14:34 +00:00
Bob Wilson	e0e4a070da	Add testcase for PR3795. llvm-svn: 68620	2009-04-08 18:00:55 +00:00
Duncan Sands	d0e186d90f	Soft float support for FREM. llvm-svn: 68614	2009-04-08 16:20:57 +00:00
Duncan Sands	ee34b0d05d	Soft float support for undef. Reported by Xerxes Rånby. llvm-svn: 68607	2009-04-08 13:33:37 +00:00
Chris Lattner	7d75f78b92	Instcombine should not promote whole computation trees to "strange" integer types, unless they are already strange. This prevents it from turning the code produced by SROA into crazy libcalls and stuff that the code generator can't handle. In the attached example, the result was an i96 multiply that caused the x86 backend to assert. Note that if TargetData had an idea of what the legal types are for a target that this could be used to stop instcombine from introducing i64 muls, as Scott wanted. llvm-svn: 68598	2009-04-08 05:41:03 +00:00
Dan Gohman	94fde57da3	Fully escape the grep string for this test. llvm-svn: 68580	2009-04-08 00:54:40 +00:00
Dan Gohman	b979f332fd	Update this test for recent codegen improvements. CodeGen is now using an lea in place of a mov and an add for this test. llvm-svn: 68579	2009-04-08 00:51:11 +00:00
Dan Gohman	c9ce27d6b7	Implement support for using modeling implicit-zero-extension on x86-64 with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG instructions), and teach the DAGCombiner to take advantage of this on targets which support it. This eliminates many redundant zero-extension operations on x86-64. This adds a new TargetLowering hook, isZExtFree. It's similar to isTruncateFree, except it only applies to actual definitions, and not no-op truncates which may not zero the high bits. Also, this adds a new optimization to SimplifyDemandedBits: transform operations like x+y into (zext (add (trunc x), (trunc y))) on targets where all the casts are no-ops. In contexts where the high part of the add is explicitly masked off, this allows the mask operation to be eliminated. Fix the DAGCombiner to avoid undoing these transformations to eliminate casts on targets where the casts are no-ops. Also, this adds a new two-address lowering heuristic. Since two-address lowering runs before coalescing, it helps to be able to look through copies when deciding whether commuting and/or three-address conversion are profitable. Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle the case that a clobber range extended both before and beyond an existing live range. In that case, multiple live ranges need to be added. This was exposed by the new subreg coalescing code. Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the spiller behavior it was looking for no longer occurrs with the new instruction selection. llvm-svn: 68576	2009-04-08 00:15:30 +00:00
Bill Wendling	6e702cf68c	Temporarily revert r68552. This was causing a failure in the self-hosting LLVM builds. --- Reverse-merging (from foreign repository) r68552 into '.': U test/CodeGen/X86/tls8.ll U test/CodeGen/X86/tls10.ll U test/CodeGen/X86/tls2.ll U test/CodeGen/X86/tls6.ll U lib/Target/X86/X86Instr64bit.td U lib/Target/X86/X86InstrSSE.td U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86RegisterInfo.cpp U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86CodeEmitter.cpp U lib/Target/X86/X86FastISel.cpp U lib/Target/X86/X86InstrInfo.h U lib/Target/X86/X86ISelDAGToDAG.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h U lib/Target/X86/X86ISelLowering.h U lib/Target/X86/X86InstrInfo.cpp U lib/Target/X86/X86InstrBuilder.h U lib/Target/X86/X86RegisterInfo.td llvm-svn: 68560	2009-04-07 22:35:25 +00:00
Rafael Espindola	0324937229	Reduce code duplication on the TLS implementation. This introduces a small regression on the generated code quality in the case we are just computing addresses, not loading values. Will work on it and on X86-64 support. llvm-svn: 68552	2009-04-07 21:37:46 +00:00
Dan Gohman	e98c3b1ea1	Don't attempt to handle aggregate argument values in FastISel; let SelectionDAG do those. This fixes PR3955. llvm-svn: 68546	2009-04-07 20:40:11 +00:00
Chris Lattner	2f520929d4	fix rdar://6762290, a crash compiling cxx filt with clang. llvm-svn: 68500	2009-04-07 05:03:34 +00:00

1 2 3 4 5 ...

6870 Commits