llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Tim Northover	8efc0e4868	X86: change MOV64ri64i32 into MOV32ri64 The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. This fixes a typo in the opcode field of the original patch, which should make the legact JIT work again (& adds test for that problem). llvm-svn: 183068	2013-06-01 09:55:14 +00:00
Eric Christopher	e4ab862999	Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it seems to have caused PR16192 and other JIT related failures. llvm-svn: 183059	2013-05-31 23:30:45 +00:00
Tim Northover	8940245595	X86: change MOV64ri64i32 into MOV32ri64 The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. llvm-svn: 182991	2013-05-31 09:57:13 +00:00
Tim Northover	aa5932cde5	X86: use sub-register sequences for MOV*r0 operations Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions, it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg") and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is smaller and partial register updates can sometimes be avoided. Until recently, this sequence was a barrier to rematerialization though. That should now be fixed so it's an appropriate time to make the change. llvm-svn: 182928	2013-05-30 13:19:42 +00:00
Tim Northover	4a589eb424	X86: change zext moves to use sub-register infrastructure. 32-bit writes on amd64 zero out the high bits of the corresponding 64-bit register. LLVM makes use of this for zero-extension, but until now relied on custom MCLowering and other code to fixup instructions. Now we have proper handling of sub-registers, this can be done by creating SUBREG_TO_REG instructions at selection-time. Should be no change in functionality. llvm-svn: 182921	2013-05-30 10:43:18 +00:00
Jakob Stoklund Olesen	19c4788c8a	Annotate X86InstrCompiler.td with SchedRW lists. llvm-svn: 177936	2013-03-25 23:07:35 +00:00
Jakob Stoklund Olesen	71393fdd98	Annotate X86InstrCompiler.td with SchedRW lists. Add a new WriteZero SchedWrite type for the common dependency-breaking instructions that clear a register. llvm-svn: 177442	2013-03-19 21:16:56 +00:00
Ulrich Weigand	4570892d73	Remove an invalid and unnecessary Pat pattern from the X86 backend: def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))), (MOV64rm tglobaltlsaddr :$dst)>; This pattern is invalid because the MOV64rm instruction expects a source operand of type "i64mem", which is a subclass of X86MemOperand and thus actually consists of five MI operands, but the Pat provides only a single MI operand ("tglobaltlsaddr" matches an SDnode of type ISD::TargetGlobalTLSAddress and provides a single output). Thus, if the pattern were ever matched, subsequent uses of the MOV64rm instruction pattern would access uninitialized memory. In addition, with the TableGen patch I'm about to check in, this would actually be reported as a build-time error. Fortunately, the pattern does in fact never match, for at least two independent reasons. First, the code generator actually never generates a pattern of the form (load (X86Wrapper (tglobaltlsaddr))). For most combinations of TLS and code models, (tglobaltlsaddr) represents just an offset that needs to be added to some base register, so it is never directly dereferenced. The only exception is the initial-exec model, where (tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot, which is in fact directly dereferenced: but in that case, the X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match. Second, even if some patterns along those lines were ever generated, we should not need an extra Pat pattern to match it. Instead, the original MOV64rm instruction pattern ought to match directly, since it uses an "addr" operand, which is implemented via the SelectAddr C++ routine; this routine is supposed to accept the full range of input DAGs that may be implemented by a single mov instruction, including those cases involving ISD::TargetGlobalTLSAddress (and actually does so e.g. in the initial-exec case as above). To avoid build breaks (due to the above-mentioned error) after the TableGen patch is checked in, I'm removing this Pat here. llvm-svn: 177426	2013-03-19 19:49:52 +00:00
Benjamin Kramer	bdb1d9aad3	X86: Disable cmov-memory patterns on subtargets without cmov. Fixes PR15115. llvm-svn: 175962	2013-02-23 10:40:58 +00:00
Michael Liao	f1ce1e547c	Fix an issue of pseudo atomic instruction DAG schedule - Add list of physical registers clobbered in pseudo atomic insts Physical registers are clobbered when pseudo atomic instructions are expanded. Add them in clobber list to prevent DAG scheduler to mis-schedule them after these insns are declared side-effect free. - Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 173200	2013-01-22 21:47:38 +00:00
Craig Topper	8884832622	Remove # from the beginning and end of def names. llvm-svn: 171696	2013-01-07 05:26:58 +00:00
Craig Topper	fe4506dc6c	Add hasSideEffects=0 to some atomic instructions. llvm-svn: 171122	2012-12-26 23:08:12 +00:00
Michael Liao	a7e5913fde	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989	2012-10-15 22:39:43 +00:00
Benjamin Kramer	ea6041a09a	X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch for the others. Otherwise it will try to use SSE patterns and fail horribly if sse is disabled. Fixes PR14035. llvm-svn: 165377	2012-10-07 15:34:27 +00:00
Craig Topper	ba721cd4c0	Remove some encoding bits I forgot to remove from SETB_C16r and SETB_C64r in r165302. llvm-svn: 165303	2012-10-05 06:11:52 +00:00
Craig Topper	ec537b4c86	Move expansion of SETB_C(8/16/32/64)r from MCInstLower to ExpandPostRAPseudos and mark them as pseudos in the td file. llvm-svn: 165302	2012-10-05 06:05:15 +00:00
Michael Liao	5412422f4a	Add 'lock' prefix output support in assembly printer - Instead of embedding 'lock' into each mnemonic of atomic instructions except 'xchg', we teach X86 assembly printer to output 'lock' prefix similar to or consistent with code emitter. llvm-svn: 164659	2012-09-26 05:13:44 +00:00
Michael Liao	3d9c40c0c8	Fix 16-bit atomic inst encoding and keep pseudo-inst starting with '#' llvm-svn: 164453	2012-09-22 05:41:15 +00:00
Michael Liao	0a4f3eefaf	Fix typo in r164357 llvm-svn: 164452	2012-09-22 03:39:42 +00:00
Michael Liao	9a17cba52b	Fix a typo in r164357 llvm-svn: 164372	2012-09-21 16:03:03 +00:00
Michael Liao	439a9cea68	Revise td of X86 atomic instructions - Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357	2012-09-21 03:00:17 +00:00
Michael Liao	34658dca78	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Jakob Stoklund Olesen	72138019a9	Fix the TCRETURNmi64 bug differently. Add a PatFrag to match X86tcret using 6 fixed registers or less. This avoids folding loads into TCRETURNmi64 using 7 or more volatile registers. <rdar://problem/12282281> llvm-svn: 163819	2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen	eae8fc91cf	Revert r163761 "Don't fold indexed loads into TCRETURNmi64." The patch caused "Wrong topological sorting" assertions. llvm-svn: 163810	2012-09-13 16:52:17 +00:00
Jakob Stoklund Olesen	b15912aafd	Don't fold indexed loads into TCRETURNmi64. We don't have enough GR64_TC registers when calling a varargs function with 6 arguments. Since %al holds the number of vector registers used, only %r11 is available as a scratch register. This means that addressing modes using both base and index registers can't be folded into TCRETURNmi64. <rdar://problem/12282281> llvm-svn: 163761	2012-09-13 00:25:00 +00:00
Hans Wennborg	4344ad4a86	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Jakob Stoklund Olesen	88cf278739	Use ptr_rc_tailcall instead of GR32_TC. The getPointerRegClass() hook will return GR32_TC, or whatever is appropriate for the current function. Patch by Yiannis Tsiouris! llvm-svn: 156459	2012-05-09 01:50:09 +00:00
Manman Ren	6fde9f74b4	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312	2012-05-07 18:06:23 +00:00
Rafael Espindola	88a1aeb123	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Lang Hames	94d892c492	Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode. llvm-svn: 153680	2012-03-29 19:54:28 +00:00
Preston Gurd	d1ae391210	This patch adds X86 instruction itineraries for non-pseudo opcodes in X86InstrCompiler.td. It also adds –mcpu-generic to the legalize-shift-64.ll test so the test will pass if run on an Intel Atom CPU, which would otherwise produce an instruction schedule which differs from that which the test expects. llvm-svn: 153033	2012-03-19 14:10:12 +00:00
Michael J. Spencer	d2f0ce2674	Add WIN_FTOL_* psudo-instructions to model the unique calling convention used by the Win32 _ftol2 runtime function. Patch by Joe Groff! llvm-svn: 151382	2012-02-24 19:01:22 +00:00
Jakob Stoklund Olesen	b498ebe5b7	Use the same CALL instructions for Windows as for everything else. The different calling conventions and call-preserved registers are represented with regmask operands that are added dynamically. llvm-svn: 150708	2012-02-16 17:56:02 +00:00
Eli Friedman	a343d87eac	Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768. llvm-svn: 148240	2012-01-16 16:42:21 +00:00
Eli Friedman	a2b480b010	Get rid of unused codegen-only instruction. llvm-svn: 148239	2012-01-16 16:29:35 +00:00
Benjamin Kramer	ae4ad5f924	X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63. llvm-svn: 148024	2012-01-12 12:41:34 +00:00
Chandler Carruth	9ef50ef1f7	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chandler Carruth	7564e8371a	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Rafael Espindola	1958dc7193	Fixes an issue reported by -verify-machineinstrs. Patch by Sanjoy Das. llvm-svn: 143064	2011-10-26 21:16:41 +00:00
Rafael Espindola	90896edc6c	This commit introduces two fake instructions MORESTACK_RET and MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET followed by a MOV respectively. Having a fake instruction prevents the verifier from seeing a MachineBasicBlock end with a non-terminator (MOV). It also prevents the rather eccentric case of a MachineBasicBlock ending with RET but having successors nevertheless. Patch by Sanjoy Das. llvm-svn: 143062	2011-10-26 21:12:27 +00:00
Eli Friedman	34ffc961d7	Fix the assembler strings for a couple of atomic instructions. Doesn't really matter much in practice, but it's a bit cleaner. llvm-svn: 139563	2011-09-13 00:27:04 +00:00
Eli Friedman	9ea5599729	Fix atomic load and store on x86 to pass -verify-machineinstrs (and possibly fix some subtle bugs involving passes which check mayStore()). This isn't exactly ideal, but it is good enough for the moment. llvm-svn: 139245	2011-09-07 18:48:32 +00:00
Jakob Stoklund Olesen	ef8527b836	Pseudo CMOV instructions don't clobber EFLAGS. The explanation about a 0 argument being materialized as xor is no longer valid. Rematerialization will check if EFLAGS is live before clobbering it. The code produced by X86TargetLowering::EmitLoweredSelect does not clobber EFLAGS. This causes one less testb instruction to be generated in the cmov.ll test case. llvm-svn: 139057	2011-09-02 23:52:55 +00:00
Rafael Espindola	7721c15106	Adds a SelectionDAG node X86SegAlloca which will be custom lowered from DYNAMIC_STACKALLOC. Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which will match X86SegAlloca (based on word size) are also added. They will be custom emitted to inject the actual stack handling code. Patch by Sanjoy Das. llvm-svn: 138814	2011-08-30 19:43:21 +00:00
Eli Friedman	9f95c7d381	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. llvm-svn: 138660	2011-08-26 21:21:21 +00:00
Eli Friedman	6f95a6ae1b	Basic x86 code generation for atomic load and store instructions. llvm-svn: 138478	2011-08-24 20:50:09 +00:00
Bruno Cardoso Lopes	9a695724bd	Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556 llvm-svn: 137179	2011-08-09 23:27:13 +00:00
Eli Friedman	44fd5b2b59	Fix a couple ridiculous copy-paste errors. rdar://9914773 . llvm-svn: 137160	2011-08-09 22:17:39 +00:00
Eli Friedman	1a80401da2	X86ISD::MEMBARRIER does not require SSE2; it doesn't actually generate any code, and all x86 processors will honor the required semantics. llvm-svn: 136249	2011-07-27 19:43:50 +00:00
Dan Gohman	4762d28ff9	Add a comment describing why transforming (shl x, 1) to (add x, x) is to be considered safe enough in this context. llvm-svn: 133159	2011-06-16 15:55:48 +00:00

1 2

95 Commits