llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Bill Wendling	f79eccc675	Match this pattern so that we can generate simpler code: %a = ... %b = and i32 %a, 2 %c = srl i32 %b, 1 %d = br i32 %c, into %a = ... %b = and %a, 2 %c = X86ISD::CMP %b, 0 %d = X86ISD::BRCOND %c ... This applies only when the AND constant value has one bit set and the SRL constant is equal to the log2 of the AND constant. The back-end is smart enough to convert the result into a TEST/JMP sequence. llvm-svn: 67728	2009-03-26 01:47:50 +00:00
Bill Wendling	40ac545f38	Doxygen-ify comments. llvm-svn: 67727	2009-03-26 01:46:56 +00:00
Evan Cheng	3a7489a4cc	CodeGen still defaults to non-verbose asm, but llc now overrides it and default to verbose. llvm-svn: 67668	2009-03-25 01:47:28 +00:00
Evan Cheng	6ff8cea903	Don't print global names twice with -asm-verbose. llvm-svn: 67667	2009-03-25 01:08:42 +00:00
Dan Gohman	7a9e8cbf79	I was convinced that it's ok to allow a second i8 return value to be returned in DL. LLVM's multiple-return-value support is not ABI-conforming; front-ends that wish to have code emitted that conforms to an ABI are currently expected to make arrangements for this on their own rather than assuming that multiple-return-values will automatically do the right thing. This commit doesn't fundamentally change this situation. llvm-svn: 67588	2009-03-24 01:04:34 +00:00
Evan Cheng	b3196f1298	Do not emit comments unless -asm-verbose. llvm-svn: 67580	2009-03-24 00:17:40 +00:00
Dan Gohman	e9cf3083d2	Correct some comments. Operand numbers start at 0. llvm-svn: 67518	2009-03-23 15:40:10 +00:00
Evan Cheng	2ec94dd447	Model inline asm constraint which ties an input to an output register as machine operand TIED_TO constraint. This eliminated the need to pre-allocate registers for these. This also allows register allocator can eliminate the unneeded copies. llvm-svn: 67512	2009-03-23 08:01:15 +00:00
Dan Gohman	745c0acc79	Fix a grammaro in a comment that Bill noticed. llvm-svn: 67507	2009-03-23 05:02:44 +00:00
Dan Gohman	16b4a33039	Add comments explaining why there's only one register for i8 return values. llvm-svn: 67502	2009-03-23 04:28:24 +00:00
Nick Lewycky	a0dcd7e173	Remove strange extra semicolons. llvm-svn: 67287	2009-03-19 05:51:39 +00:00
Chris Lattner	205380a4e4	Disable the "call to immediate" optimization on x86-64. It is not safe in general because the immediate could be an arbitrary value that does not fit in a 32-bit pcrel displacement. Conservatively fall back to loading the value into a register and calling through it. We still do the optzn on X86-32. llvm-svn: 67142	2009-03-18 00:43:52 +00:00
Dan Gohman	f6c57d0fe7	Recognize bswapl as bswap too. llvm-svn: 67072	2009-03-17 02:45:40 +00:00
Dan Gohman	4efda2b52b	Recognize "bswapq" as an alternate spelling for the bswap instruction. llvm-svn: 67071	2009-03-17 02:17:27 +00:00
Dan Gohman	fd6debff99	Use %rip-relative addressing on x86-64 whenever practical, as it has a smaller encoding than absolute addressing. llvm-svn: 67002	2009-03-14 02:33:41 +00:00
Dan Gohman	e7495ef7aa	Don't forego folding of loads into 64-bit adds when the other operand is a signed 32-bit immediate. Unlike with the 8-bit signed immediate case, it isn't actually smaller to fold a 32-bit signed immediate instead of a load. In fact, it's larger in the case of 32-bit unsigned immediates, because they can be materialized with movl instead of movq. llvm-svn: 67001	2009-03-14 02:07:16 +00:00
Dan Gohman	fa0a3504ba	Improve FastISel's handling of truncates to i1, and implement ptrtoint and inttoptr in X86FastISel. These casts aren't always handled in the generic FastISel code because X86 sometimes needs custom code to do truncation and zero-extension. llvm-svn: 66988	2009-03-13 23:53:06 +00:00
Dan Gohman	790659c0d6	Fix FastISel's assumption that i1 values are always zero-extended by inserting explicit zero extensions where necessary. Included is a testcase where SelectionDAG produces a virtual register holding an i1 value which FastISel previously mistakenly assumed to be zero-extended. llvm-svn: 66941	2009-03-13 20:42:20 +00:00
Rafael Espindola	aadb9af093	add 8 and 16 bit TLS moves. add a fixme note on how to remove code duplication. llvm-svn: 66932	2009-03-13 19:39:55 +00:00
Rafael Espindola	ff17d02271	Improve sext and zext of TLS variables. llvm-svn: 66922	2009-03-13 18:37:06 +00:00
Chris Lattner	63569fa327	generalize this code so that fast isel handles integer truncates to i1, which codegen to the same thing as integer truncates to i8 (the top bits are just undefined). This implements rdar://6667338 llvm-svn: 66902	2009-03-13 16:36:42 +00:00
Bill Wendling	2fe64f48aa	These instructions have special lowering that may lower them to SSE instructions. Prevent that if we don't want implicit uses of SSE. llvm-svn: 66877	2009-03-13 08:41:47 +00:00
Evan Cheng	f9951d1557	Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues. 1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants. 2. MachineConstantPool alignment field is also a log2 value. 3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values. 4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries. 5. Asm printer uses expensive data structure multimap to track constant pool entries by sections. 6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic. Solutions: 1. ConstantPoolSDNode alignment field is changed to keep non-log2 value. 2. MachineConstantPool alignment field is also changed to keep non-log2 value. 3. Functions that create ConstantPool nodes are passing in non-log2 alignments. 4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT. 5. Asm printer uses cheaper data structure to group constant pool entries. 6. Asm printer compute entry offsets after grouping is done. 7. Change JIT code to compute entry offsets on the fly. llvm-svn: 66875	2009-03-13 07:51:59 +00:00
Chris Lattner	cbbdd230dd	generalize the previous code to use the full generality of LEA for i32/i64 expressions (we could also do i16 on cpus where i16 lea is fast, but I didn't add this). On the example, we now generate: _test: movl 4(%esp), %eax cmpl $42, (%eax) setl %al movzbl %al, %eax leal 4(%eax,%eax,8), %eax ret instead of: _test: movl 4(%esp), %eax cmpl $41, (%eax) movl $4, %ecx movl $13, %eax cmovg %ecx, %eax ret llvm-svn: 66869	2009-03-13 05:53:31 +00:00
Chris Lattner	878d951f8f	optimize the case of cond ? 42 : 41 and friends. This compiles the example to: _test: movl 4(%esp), %eax cmpl $41, (%eax) setg %al movzbl %al, %eax orl $4294967294, %eax ret instead of: movl 4(%esp), %eax cmpl $41, (%eax) movl $4294967294, %ecx movl $4294967295, %eax cmova %ecx, %eax ret which is smaller in code size and faster. rdar://6668608 llvm-svn: 66868	2009-03-13 05:22:11 +00:00
Dan Gohman	37d843c129	Enhance address-mode folding of ISD::ADD to handle cases where the operands can't both be fully folded at the same time. For example, in the included testcase, a global variable is being added with an add of two values. The global variable wants RIP-relative addressing, so it can't share the address with another base register, but it's still possible to fold the initial add. llvm-svn: 66865	2009-03-13 02:25:09 +00:00
Evan Cheng	d112c41d95	Re-apply 66024 with fixes: 1. Fixed indirect call to immediate address assembly. 2. Fixed JIT encoding by making the address pc-relative. llvm-svn: 66803	2009-03-12 18:15:39 +00:00
Chris Lattner	26a971c4ec	Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" related transformations out of target-specific dag combine into the ARM backend. These were added by Evan in r37685 with no testcases and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll). Add some simple X86-specific (for now) DAG combines that turn things like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently with the recently added cp constant select optimization, but is a very general xform. For example, we now compile the second example in const-select.ll to: _test: movsd LCPI2_0, %xmm0 ucomisd 8(%esp), %xmm0 seta %al movzbl %al, %eax movl 4(%esp), %ecx movsbl (%ecx,%eax,4), %eax ret instead of: _test: movl 4(%esp), %eax leal 4(%eax), %ecx movsd LCPI2_0, %xmm0 ucomisd 8(%esp), %xmm0 cmovbe %eax, %ecx movsbl (%ecx), %eax ret This passes multisource and dejagnu. llvm-svn: 66779	2009-03-12 06:52:53 +00:00
Chris Lattner	b904fac1b8	improve comment. llvm-svn: 66778	2009-03-12 06:46:02 +00:00
Evan Cheng	46e903d2f6	On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead. llvm-svn: 66776	2009-03-12 05:59:15 +00:00
Dan Gohman	d30e108f0e	Revert r66024. The JIT encoding for CALLpcrel32 is wrong -- see PR3773, and the assembly text output uses an indirect call ("call *") instead of a direct call. llvm-svn: 66735	2009-03-11 23:01:47 +00:00
Rafael Espindola	a8fe373200	optimize i8 and i16 tls values. llvm-svn: 66725	2009-03-11 22:40:04 +00:00
Bill Wendling	fca05e3a5c	Add a -no-implicit-float flag. This acts like -soft-float, but may generate floating point instructions that are explicitly specified by the user. llvm-svn: 66719	2009-03-11 22:30:01 +00:00
Duncan Sands	b27c523449	It makes no sense to have a ODR version of common linkage, so remove it. llvm-svn: 66690	2009-03-11 20:14:15 +00:00
Mon P Wang	287e422039	For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits. llvm-svn: 66684	2009-03-11 18:47:57 +00:00
Mon P Wang	2867737ad2	Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw. llvm-svn: 66645	2009-03-11 06:35:11 +00:00
Chris Lattner	eb9327f335	formatting change, reduce indentation. No functionality change. llvm-svn: 66642	2009-03-11 05:48:52 +00:00
Dan Gohman	e15d8f03c3	Add more information to the EFLAGS note. llvm-svn: 66515	2009-03-10 00:26:23 +00:00
Dan Gohman	995c3dd344	Add a note about EFLAGS optimization. llvm-svn: 66508	2009-03-09 23:47:02 +00:00
Chris Lattner	b89dbcd448	do not export all the X86FastISel symbols, ever. llvm-svn: 66382	2009-03-08 18:44:31 +00:00
Chris Lattner	3342ba06d4	add a note. llvm-svn: 66360	2009-03-08 03:04:26 +00:00
Chris Lattner	8ace06fdda	add a note. llvm-svn: 66359	2009-03-08 01:54:43 +00:00
Duncan Sands	5ab54d488f	Introduce new linkage types linkonce_odr, weak_odr, common_odr and extern_weak_odr. These are the same as the non-odr versions, except that they indicate that the global will only be overridden by an equivalent global. In C, a function with weak linkage can be overridden by a function which behaves completely differently. This means that IP passes have to skip weak functions, since any deductions made from the function definition might be wrong, since the definition could be replaced by something completely different at link time. This is not allowed in C++, thanks to the ODR (One-Definition-Rule): if a function is replaced by another at link-time, then the new function must be the same as the original function. If a language knows that a function or other global can only be overridden by an equivalent global, it can give it the weak_odr linkage type, and the optimizers will understand that it is alright to make deductions based on the function body. The code generators on the other hand map weak and weak_odr linkage to the same thing. llvm-svn: 66339	2009-03-07 15:45:40 +00:00
Dan Gohman	b9c32f1aca	Arithmetic instructions don't set EFLAGS bits OF and CF bits the same say the "test" instruction does in overflow cases, so eliminating the test is only safe when those bits aren't needed, as is the case for COND_E and COND_NE, or if it can be proven that no overflow will occur. For now, just restrict the optimization to COND_E and COND_NE and don't do any overflow analysis. llvm-svn: 66318	2009-03-07 01:58:32 +00:00
Dan Gohman	f9599e6c5f	Don't use plain INC32 and DEC32 on x86-64; it needs INC64_32r and INC64_16r, because these instructions are encoded differently on x86-64. This fixes JIT regressions on x86-64 in kimwitu++ and others. llvm-svn: 66207	2009-03-05 21:32:23 +00:00
Dan Gohman	1e9db7c1a1	When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand. The extra operand didn't appear to cause any trouble, but it was erroneous regardless. llvm-svn: 66206	2009-03-05 21:29:28 +00:00
Dan Gohman	f6f684b206	Fix the "test" optimization to recognize "dec" as an add of negative one, as subtracts of immediates are canonicalized to adds. llvm-svn: 66180	2009-03-05 19:32:48 +00:00
Dan Gohman	31fb085c2e	Re-apply 66008, now that the unfoldMemoryOperand bug is fixed. llvm-svn: 66058	2009-03-04 19:44:21 +00:00
Dan Gohman	f41e54c5af	Correct this comment. llvm-svn: 66057	2009-03-04 19:24:25 +00:00
Dan Gohman	04453ca36c	When using MachineInstr operand indices on SDNodes, the number of MachineInstr def operands must be subtracted out. This bug was uncovered by the recent x86 EFLAGS optimization. Before that, the only instructions that ever needed unfolding were things like CMP32rm, where NumDefs is zero. llvm-svn: 66056	2009-03-04 19:23:38 +00:00

1 2 3 4 5 ...

4128 Commits