llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Rafael Espindola	20a45908b6	Convert these methods into static functions. llvm-svn: 195825	2013-11-27 07:14:26 +00:00
Rafael Espindola	22b6ec4d69	Cleanup and test X86AsmPrinter::printPCRelImm. It is only used for asm printing. On X86 we put basic block addresses on register before passing them to inline asm, so the MO_MachineBasicBlock case was dead. MO_ExternalSymbol was dead since any symbol being passed to inline asm is represented as MO_GlobalAddress. The MO_GlobalAddress and MO_Register cases were not tested. llvm-svn: 195824	2013-11-27 06:53:13 +00:00
Michael Liao	8c702e1a18	Fix PR18054 - Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG lowering where we need to check whether x is a vector type (in-reg type) of i8, i16 or i32; otherwise, that optimization is not valid. llvm-svn: 195779	2013-11-26 20:31:31 +00:00
Andrew Trick	95afafe3fa	StackMap: Implement support for DirectMemRefOp. A Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from IndirectMemRefOp locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can directly communicate the address if an alloca, while IndirectMemRefOp handle register spills. For example: entry: %a = alloca i64... llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a) Since both the alloca and stackmap intrinsic are in the entry block, and the intrinsic takes the address of the alloca, the runtime can assume that LLVM will not substitute alloca with any intervening value. This must be verified by the runtime by checking that the stack map's location is a Direct location type. The runtime can then determine the alloca's relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map. llvm-svn: 195712	2013-11-26 02:03:25 +00:00
Andrew Trick	95115e649e	whitespace llvm-svn: 195711	2013-11-26 02:03:20 +00:00
Cameron McInally	2ff051483c	Add an intrinsic for the SSE2 PAUSE instruction. llvm-svn: 195697	2013-11-26 00:20:43 +00:00
Rafael Espindola	c1e50a3473	Do the string comparison in the constructor instead of once per nop. Thanks to Roman Divacky for the suggestion. llvm-svn: 195684	2013-11-25 20:50:03 +00:00
Rafael Espindola	fa5cbd5557	Don't use nopl in cpus that don't support it. Patch by Mikulas Patocka. I added the test. I checked that for cpu names that gas knows about, it also doesn't generate nopl. The modified cpus: i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta Crusoe, Microsoft VirtualBox - see https://bbs.archlinux.org/viewtopic.php?pid=775414 k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that Via c3 and c3-Nehemiah don't have nopl llvm-svn: 195679	2013-11-25 20:15:14 +00:00
Tim Northover	1b70928c82	X86: enable AVX2 under Haswell native compilation Patch by Adam Strzelecki llvm-svn: 195632	2013-11-25 09:52:59 +00:00
Jim Grosbach	02f7297367	X86: Perform integer comparisons at i32 or larger. Utilizing the 8 and 16 bit comparison instructions, even when an input can be folded into the comparison instruction itself, is typically not worth it. There are too many partial register stalls as a result, leading to significant slowdowns. By always performing comparisons on at least 32-bit registers, performance of the calculation chain leading to the comparison improves. Continue to use the smaller comparisons when minimizing size, as that allows better folding of loads into the comparison instructions. rdar://15386341 llvm-svn: 195496	2013-11-22 19:57:47 +00:00
Michael Liao	0f7c6dee5e	Fix PR18014 - When simplifying the mask generation for BLEND, check whether that mask is also consumed by other non-BLEND insns. If true, skip that simplification. llvm-svn: 195476	2013-11-22 17:56:57 +00:00
Rafael Espindola	7660b5bf36	Don't produce tail calls when the caller is x86_thiscallcc. The callee will not pop the stack for us. llvm-svn: 195467	2013-11-22 15:18:28 +00:00
Kostya Serebryany	3c8539795c	Revert r195318 as it causes miscompilation (PR18029) llvm-svn: 195439	2013-11-22 10:30:39 +00:00
Ekaterina Romanova	eda4e2e4a7	SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) \| (y >> (64 - c))) when we are not optimizing for size. It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel. llvm-svn: 195383	2013-11-21 23:21:26 +00:00
Bill Wendling	07a5510fa2	The basic problem is that some mainstream programs cannot deal with the way clang optimizes tail calls, as in this example: int foo(void); int bar(void) { return foo(); } where the call is transformed to: calll .L0$pb .L0$pb: popl %eax .Ltmp0: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax movl foo@GOT(%eax), %eax popl %ebp jmpl *%eax # TAILCALL However, the GOT references must all be resolved at dlopen() time, and so this approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which usually populates the PLT with stubs that perform the actual resolving. This patch changes X86TargetLowering::LowerCall() to skip tail call optimization, if the called function is a global or external symbol. Patch by Dimitry Andric! PR15086 llvm-svn: 195318	2013-11-21 07:04:30 +00:00
NAKAMURA Takumi	efd1623a5d	X86ISelLowering.cpp: Mark a variable VT as LLVM_ATTRIBUTE_UNUSED. [-Wunused-variable] llvm-svn: 195238	2013-11-20 10:55:22 +00:00
NAKAMURA Takumi	d114df4bce	Whitespace. llvm-svn: 195237	2013-11-20 10:55:15 +00:00
Elena Demikhovsky	a11395e99e	Fixed compilation error. llvm-svn: 195230	2013-11-20 09:23:22 +00:00
Elena Demikhovsky	692524f3bd	AVX-512: Concat 4 128-bit vectors in one 512-bit vector. llvm-svn: 195229	2013-11-20 09:10:40 +00:00
Cameron McInally	9232b52359	Fix assembly operands for the SSE2 cvtsd2ss instruction. llvm-svn: 195129	2013-11-19 14:36:00 +00:00
Andrew Trick	9f7d826e8a	Use symbolic operands in the patchpoint folding routine and fix a spilling bug. Fixes <rdar://15487687> [JS] AnyRegCC argument ends up being spilled llvm-svn: 195094	2013-11-19 03:29:59 +00:00
Andrew Trick	15aac659a7	Add an abstraction to handle patchpoint operands. Hard-coded operand indices were scattered throughout lowering stages and layers. It was super bug prone. llvm-svn: 195093	2013-11-19 03:29:56 +00:00
Juergen Ributzka	5357a6d64b	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064	2013-11-19 00:57:56 +00:00
Reid Kleckner	552118c34a	Revert "COFF: Emit all MCSymbols rather than filtering out some of them" This reverts commit r190888, to fix PR17967. The original change wasn't the right way to get @feat.00 into the object file. The right fix is to make @feat.00 be a global symbol. llvm-svn: 195053	2013-11-18 23:08:12 +00:00
Alexey Samsonov	3bfef6bdb6	Revert r194865 and r194874. This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997	2013-11-18 09:31:53 +00:00
Andrew Trick	bd486c29f4	Added a size field to the stack map record to handle subregister spills. Implementing this on bigendian platforms could get strange. I added a target hook, getStackSlotRange, per Jakob's recommendation to make this as explicit as possible. llvm-svn: 194942	2013-11-17 01:36:23 +00:00
Juergen Ributzka	01930f65b5	The WebKit_JS CC preserves the same registers as the C CC. llvm-svn: 194936	2013-11-16 22:08:58 +00:00
Jim Grosbach	56d800bb1e	X86: Encode the 'h' cpu subtype in the MachO header for x86. llvm-svn: 194906	2013-11-16 00:52:57 +00:00
Lang Hames	37fe732f75	Remove unused arguments. llvm-svn: 194882	2013-11-15 23:19:01 +00:00
Lang Hames	7a23518af7	During folding for patchpoint/stackmap instructions, defer creation of new MIs until we know that folding will be successful. No functional change. llvm-svn: 194880	2013-11-15 23:13:21 +00:00
Juergen Ributzka	ee3af15269	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865	2013-11-15 22:34:48 +00:00
Bob Wilson	d433cf7463	Avoid illegal integer promotion in fastisel Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840	2013-11-15 19:09:27 +00:00
Cameron McInally	cae8bdeb82	Add AVX512 unmasked FMA intrinsics and support. llvm-svn: 194824	2013-11-15 17:01:14 +00:00
Matt Arsenault	9921608896	Add addrspacecast instruction. Patch by Michele Scandale! llvm-svn: 194760	2013-11-15 01:34:59 +00:00
Elena Demikhovsky	bac904c06d	AVX-512: Handled extractelement from mask vector; Added VMOSHDUP/VMOVSLDUP shuffle instructions. llvm-svn: 194691	2013-11-14 11:29:27 +00:00
Andrew Trick	1eb87f0d42	Minor extension to llvm.experimental.patchpoint: don't require a call. If a null call target is provided, don't emit a dummy call. This allows the runtime to reserve as little nop space as it needs without the requirement of emitting a call. llvm-svn: 194676	2013-11-14 06:54:10 +00:00
Juergen Ributzka	b47be624ea	SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too. This patch reapplies r193676 with an additional fix for the Hexagon backend. The SystemZ backend has already been fixed by r194148. The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 194542	2013-11-13 01:57:54 +00:00
Andrew Trick	12470267da	Cleanup the stackmap operand folding code and fix a corner case. I still don't know how to refer to the fixed operands symbolically. I plan to look into it. llvm-svn: 194529	2013-11-12 22:58:39 +00:00
Eric Christopher	b7b2cc176c	Add a FIXME for 32-bit q modifiers. llvm-svn: 194515	2013-11-12 21:47:44 +00:00
Andrew Trick	56e6608cf0	Simplify operand folding when rematerializing a load. We already know how to fold a reload from a frameindex without analyzing the load instruction. Generalize this to handle any frameindex load. This streamlines the logic for rematerializing loads from stack arguments. As a side effect, it allows stackmaps to record a stack argument location without spilling it. Verified no effect on codegen for llvm test-suite. llvm-svn: 194497	2013-11-12 18:06:12 +00:00
Lang Hames	dcd012fa30	Lower X86::MORESTACK_RET and X86::MORESTACK_RET_RESTORE_R10 in X86AsmPrinter::EmitInstruction, rather than X86MCInstLower::Lower. The aim is to improve the reusability of the X86MCInstLower class by making it more function-like. The X86::MORESTACK_RET_RESTORE_R10 pseudo broke the function model by emitting an extra instruction to the MCStreamer attached to the AsmPrinter. The patch should have no impact on generated code. llvm-svn: 194431	2013-11-11 23:00:41 +00:00
Andrew Trick	9a4f1fc067	Fix the recently added anyregcc convention to handle spilled operands. Fixes <rdar://15432754> [JS] Assertion: "Folded a def to a non-store!" The primary purpose of anyregcc is to prevent a patchpoint's call arguments and return value from being spilled. They must be available in a register, although the calling convention does not pin the register. It's up to the front end to avoid using this convention for calls with more arguments than allocatable registers. llvm-svn: 194428	2013-11-11 22:40:25 +00:00
Juergen Ributzka	a748d55906	[Stackmap] Materialize the jump address within the patchpoint noop slide. This patch moves the jump address materialization inside the noop slide. This enables patching of the materialization itself or its complete removal. This patch also adds the ability to define scratch registers that can be used safely by the code called from the patchpoint intrinsic. At least one scratch register is required, because that one is used for the materialization of the jump address. This patch depends on D2009. Differential Revision: http://llvm-reviews.chandlerc.com/D2074 Reviewed by Andy llvm-svn: 194306	2013-11-09 01:51:33 +00:00
Juergen Ributzka	f27436b708	[Stackmap] Add AnyReg calling convention support for patchpoint intrinsic. The idea of the AnyReg Calling Convention is to provide the call arguments in registers, but not to force them to be placed in a paticular order into a specified set of registers. Instead it is up tp the register allocator to assign any register as it sees fit. The same applies to the return value (if applicable). Differential Revision: http://llvm-reviews.chandlerc.com/D2009 Reviewed by Andy llvm-svn: 194293	2013-11-08 23:28:16 +00:00
Jim Grosbach	b8435149f5	X86: Assembly files with .cfi_cfa_def shouldn't hit llvm_unreachable() On darwin, when trying to create compact unwind info, a .cfi_cfa_def directive would case an llvm_unreachable() to be hit. Back off when we see this directive and generate the regular DWARF style eh_frame. rdar://15406518 llvm-svn: 194285	2013-11-08 22:33:06 +00:00
David Majnemer	ac56140f8a	X86 Disassembler: remove unused bool typedef-name llvm-svn: 194062	2013-11-05 10:34:42 +00:00
Craig Topper	8a08a00b6c	Lift alignment restrictions on load folding for a significant portion of AVX instructions. llvm-svn: 194048	2013-11-05 06:31:43 +00:00
Eric Christopher	a42eaab3a9	Check for both styles of clobbers, those produced by dragonegg and those produced by clang for the inline asm bswap conversion. Modified from a patch by Chris Smowton. llvm-svn: 194016	2013-11-04 21:41:21 +00:00
Cameron McInally	02e4f56c18	Add support for AVX512 masked vector blend intrinsics. llvm-svn: 194006	2013-11-04 19:14:56 +00:00
Benjamin Kramer	2d870f327a	X86: Add a description for AMD bdver3 aka Steamroller. This is just bdver2 + FSGSBase. llvm-svn: 193984	2013-11-04 10:29:20 +00:00

1 2 3 4 5 ...

9652 Commits