llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00

Author	SHA1	Message	Date
David Majnemer	263bcdd3a9	[X86] Implement the local-exec TLS model for Windows targets We know that _tls_index is zero for local-exec TLS variables because they are always defined in the executable. llvm-svn: 237772	2015-05-20 04:45:26 +00:00
Duncan P. N. Exon Smith	1b0c97740a	MC: Take MCSymbol in MachObjectWriter::getSymbolAddress(), NFC Pass through an `MCSymbol` instead of an `MCSymbolData` so we can get rid of the back pointer. llvm-svn: 237750	2015-05-20 00:02:39 +00:00
Duncan P. N. Exon Smith	4c0dd00c17	MC: Use MCSymbol in MCAsmLayout::getSymbolOffset(), NFC Continue to canonicalize on MCSymbol instead of MCSymbolData when both are needed. llvm-svn: 237749	2015-05-19 23:53:20 +00:00
Matthias Braun	0e49c41528	MachineInstr: Remove unused parameter. llvm-svn: 237726	2015-05-19 21:22:20 +00:00
Michael Kuperstein	c9165a5a41	[X86] ABI change for x86-32: pass 3 vector arguments in-register instead of 4, except on Darwin. This changes the ABI used on 32-bit x86 for passing vector arguments. Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register. The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI. Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior. This fixes PR21510 Differential revision: http://reviews.llvm.org/D9644 llvm-svn: 237682	2015-05-19 11:06:56 +00:00
Reid Kleckner	6d90c1f8f3	Re-land r237175: [X86] Always return the sret parameter in eax/rax ... This reverts commit r237210. Also fix X86/complex-fca.ll to match the code that we used to generate on win32 and now generate everwhere to conform to SysV. llvm-svn: 237639	2015-05-18 23:35:09 +00:00
David Blaikie	0be3b52a8f	Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced init only llvm-svn: 237624	2015-05-18 22:13:54 +00:00
Matthias Braun	4662acc8ae	MachineInstr: Change return value of getOpcode() to unsigned. This was previously returning int. However there are no negative opcode numbers and more importantly this was needlessly different from MCInstrDesc::getOpcode() (which even is the value returned here) and SDValue::getOpcode()/SDNode::getOpcode(). llvm-svn: 237611	2015-05-18 20:27:55 +00:00
Jim Grosbach	95c79d189f	MC: Clean up method names in MCContext. The naming was a mish-mash of old and new style. Update to be consistent with the new. NFC. llvm-svn: 237594	2015-05-18 18:43:14 +00:00
Elena Demikhovsky	0b01514dcb	AVX-512: Added intrinsics for ADDSS/D, MULSS/D, SUBSS/D, DIVSS/D instructions. These intrinsics are comming with rounding mode. Added intrinsics for MAXSS/D, MINSS/D - with and without sae. By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 237560	2015-05-18 07:24:19 +00:00
Elena Demikhovsky	e79ef5f23a	fixed compilation warning/error llvm-svn: 237559	2015-05-18 07:10:25 +00:00
Elena Demikhovsky	a8c3a107bb	AVX-512: Added patterns for scalar-to-vector broadcast llvm-svn: 237558	2015-05-18 07:06:23 +00:00
Elena Demikhovsky	7d3b86db52	AVX-512: Added VBROADCASTF64X4, VBROADCASTF64X2, VBROADCASTI32X8, and other instructions from this set Added encoding tests. llvm-svn: 237557	2015-05-18 06:42:57 +00:00
Elena Demikhovsky	e802d572b1	AVX-512: fixed extended load to 512-bit register llvm-svn: 237537	2015-05-17 08:08:06 +00:00
Elena Demikhovsky	1e28397b5a	AVX-512: fixed a bug in mask operations - (i1 1) pattern Filling k-reg with all-ones value was wrong, (i1 1) should switch on only one bit in mask register llvm-svn: 237536	2015-05-17 07:28:51 +00:00
Daniel Sanders	5bf4979cf3	[x86] Distinguish the 'o', 'v', 'X', and 'i' inline assembly memory constraints. Summary: But still handle them the same way since I don't know how they differ on this target. Of these, 'o' and 'v' are not tested but were already implemented. I'm not sure why 'i' is required for X86 since it's supposed to be an immediate constraint rather than a memory constraint. A test asserts without it so I've included it for now. No functional change intended. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8254 llvm-svn: 237517	2015-05-16 12:09:54 +00:00
Duncan P. N. Exon Smith	d7b04389e5	MC: Use MCSymbol in RelAndSymbol, NFC Switch from `MCSymbolData` to `MCSymbol`. llvm-svn: 237502	2015-05-16 01:14:19 +00:00
Duncan P. N. Exon Smith	d0247fc520	MC: Change MCFragment::Atom to an MCSymbol, NFC Change `MCFragment::Atom` from an `MCSymbolData` to an `MCSymbol`, moving in the direction of removing the back-pointer. llvm-svn: 237497	2015-05-16 00:48:58 +00:00
Pete Cooper	043506dd8f	Remove MCAssembler.h include from MCStreamer.h and fix users of MCStreamer.h llvm-svn: 237483	2015-05-15 22:19:42 +00:00
Pete Cooper	8d13c88def	Remove 3 includes from MCInstrDesc.h and explicitly include them where needed llvm-svn: 237481	2015-05-15 21:58:42 +00:00
David Majnemer	eec8124e55	[X86] Use a better sentinel offset for the FrameAddr index Other pieces of CodeGen want to negate frame object offsets to account for architectures where the stack grows down. Our object is a pseudo object so it's offset doesn't matter. However, we shouldn't choose an offset which results in undefined behavior if you negate it. llvm-svn: 237474	2015-05-15 20:08:27 +00:00
Jim Grosbach	0c6b91deee	MC: MCCodeGenInfo naming update. NFC. s/InitMCCodeGenInfo/initMCCodeGenInfo/ llvm-svn: 237471	2015-05-15 19:13:31 +00:00
Jim Grosbach	eb68de6ea2	MC: Update MCCodeEmitter naming. NFC. s/EncodeInstruction/encodeInstruction/ llvm-svn: 237469	2015-05-15 19:13:16 +00:00
Jim Grosbach	6eeec2791d	MC: Update MCFixup naming. NFC. s/MCFixup::Create/MCFixup::create/ llvm-svn: 237468	2015-05-15 19:13:05 +00:00
Eric Christopher	f72ad6ab7e	Remove setting FloatABIType from the X86 port, nothing uses it. llvm-svn: 237398	2015-05-14 22:26:54 +00:00
Elena Demikhovsky	b6e5772812	AVX-512: Added i1 type handling for calling conventions. i1 type is a legal type on AVX-512 and can be passed as parameter or return value. i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here). The result code is similar to the previous X86 targets, where i1 is allways promoted to i8. llvm-svn: 237350	2015-05-14 09:04:45 +00:00
Douglas Katzman	c41c10144d	[X86] Fix PR23271 - RIP-relative decoding bug in disassembler. Differential Revision: http://reviews.llvm.org/D9110 llvm-svn: 237310	2015-05-13 22:44:52 +00:00
Jim Grosbach	b635db1046	MC: Modernize MCOperand API naming. NFC. MCOperand::Create() methods renamed to MCOperand::create(). llvm-svn: 237275	2015-05-13 18:37:00 +00:00
Michael Kuperstein	5efc4deda0	Reverting r237234, "Use std::bitset for SubtargetFeatures" The buildbots are still not satisfied. MIPS and ARM are failing (even though at least MIPS was expected to pass). llvm-svn: 237245	2015-05-13 10:28:46 +00:00
Michael Kuperstein	56a8e05a6b	Use std::bitset for SubtargetFeatures Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. The first two times this was committed (r229831, r233055), it caused several buildbot failures. At least some of the ARM and MIPS ones were due to gcc/binutils issues, and should now be fixed. llvm-svn: 237234	2015-05-13 08:27:08 +00:00
Elena Demikhovsky	0803046ed4	AVX-512: fixed a bug in encoding of VPSRAQ instrcution, added a bunch of encoding tests. llvm-svn: 237232	2015-05-13 07:35:05 +00:00
Sanjoy Das	6d67db8c09	[Statepoints] Support for "patchable" statepoints. Summary: This change adds two new parameters to the statepoint intrinsic, `i64 id` and `i32 num_patch_bytes`. `id` gets propagated to the ID field in the generated StackMap section. If the `num_patch_bytes` is non-zero then the statepoint is lowered to `num_patch_bytes` bytes of nops instead of a call (the spill and reload code remains unchanged). A non-zero `num_patch_bytes` is useful in situations where a language runtime requires complete control over how a call is lowered. This change brings statepoints one step closer to patchpoints. With some additional work (that is not part of this patch) it should be possible to get rid of `TargetOpcode::STATEPOINT` altogether. PlaceSafepoints generates `statepoint` wrappers with `id` set to `0xABCDEF00` (the old default value for the ID reported in the stackmap) and `num_patch_bytes` set to `0`. This can be made more sophisticated later. Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9546 llvm-svn: 237214	2015-05-12 23:52:24 +00:00
Chandler Carruth	a9a05bbdfe	Revert r237175: [X86] Always return the sret parameter in eax/rax ... This commit broke an x86 test and the bots have been broken for well over an hour now so I'm just reverting. llvm-svn: 237210	2015-05-12 23:34:27 +00:00
Reid Kleckner	170b30ec78	[X86] Always return the sret parameter in eax/rax, even on 32-bit Summary: This rule was always in the old SysV i386 ABI docs and the new ones that H.J. Lu has put together, but we never noticed: EAX scratch register; also used to return integer and pointer values from functions; also stores the address of a returned struct or union Fixes PR23491. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9715 llvm-svn: 237175	2015-05-12 20:56:32 +00:00
Sanjay Patel	4034bb0a09	use 'auto' to improve readability; NFC llvm-svn: 237144	2015-05-12 15:15:55 +00:00
Elena Demikhovsky	bd877a1b65	AVX-512, X86: Added lowering for shift operations for SKX. The other changes in the LowerShift() are not functional, just to make the code more convenient. So, the functional changes for SKX only. llvm-svn: 237129	2015-05-12 13:25:46 +00:00
Andrea Di Biagio	7f0caa49fa	[X86] Remove useless target specific combine on TRUNCATE dag nodes. Before revision 171146, function 'PerformTruncateCombine' used to perform a premature lowering of TRUNCATE dag nodes. Revision 171146 then moved all the logic implemented by PerformTruncateCombine to a custom lowering hook. However, that revision forgot to delete function PerformTruncateCombine from the code. This patch removes function 'PerformTruncateCombine' since it has no effect on the SelectionDAG. No functional change intended. llvm-svn: 237122	2015-05-12 12:34:22 +00:00
Elena Demikhovsky	a975085c80	AVX-512: select operation for i1 vectors like: select i1 %cond, <16 x i1> %a, <16 x i1> %b. I added pseudo-CMOV patterns to resolve the "select". Added tests for KNL and SKX. llvm-svn: 237106	2015-05-12 09:36:52 +00:00
Michael Kuperstein	59d6eb954a	[X86] DAGCombine should not assume arbitrary vector types are simple The X86-specific DAGCombine for stores should not assume vector types are always simple. This fixes PR23476. Differential Revision: http://reviews.llvm.org/D9659 llvm-svn: 237097	2015-05-12 07:33:07 +00:00
Eric Christopher	2ba04d1116	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Pirama Arumuga Nainar	e8329e15ec	[X86] Updates to X86 backend for f16 promotion Summary: r235215 adds support for f16 to be considered as a load/store type and promote f16 operations to f32. This patch has miscellaneous fixes for the X86 backend so all f16 operations are handled: 1. Set loadextaction for f16 vectors to expand. 2. Handle FP_EXTEND in a switch statement when handling v2f32 3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or (store (SINT_TO_FP )) to a FILD. Tests included. Reviewers: ab, srhines, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9092 llvm-svn: 237004	2015-05-11 17:14:39 +00:00
Elena Demikhovsky	d5a28d81ad	Fixed compilation warning, NFC. llvm-svn: 236972	2015-05-11 06:23:41 +00:00
Elena Demikhovsky	f25b492812	AVX-512: Added SKX instructions and intrinsics: {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971	2015-05-11 06:05:05 +00:00
Elena Demikhovsky	1a04d86baa	AVX-512: fixed UINT_TO_FP operation for 512-bit types. llvm-svn: 236955	2015-05-10 14:23:52 +00:00
Elena Demikhovsky	1ed7ba869f	AVX-512: fixed a bug in i1 vectors lowering llvm-svn: 236947	2015-05-10 10:33:32 +00:00
Arnold Schwaighofer	d6f4926afa	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916	2015-05-08 23:52:00 +00:00
Pete Cooper	c8837e431b	[X86] Fast-ISel was incorrectly always killing the source of a truncate. A trunc from i32 to i1 on x86_64 generates an instruction such as %vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9 However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one. Otherwise, we are killing the source of the truncate which could be used later in the program. llvm-svn: 236890	2015-05-08 18:29:42 +00:00
Pat Gavlin	c022b8d288	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Andrea Di Biagio	6f502af8bb	[X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when decoding a PSHUFB mask. The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes where the mask node is a load from constant pool, and the constant pool node is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching it how to also look through X86ISD::WrapperRIP. This helps function combineX86ShufflesRecusively to combine more shuffle sequences containing PSHUFB nodes if we are in RIPRel PIC mode. Before this change, llc (with -relocation-model=pic -march=x86-64) was unable to decode a pshufb where the mask was loaded from a constant pool. For example, the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its operand, so instead of generating a single 'movaps' the backend always generated a sub-optimal 'movdqa + pshufb' sequence. Added test x86-fold-pshufb.ll. llvm-svn: 236863	2015-05-08 15:11:07 +00:00
Denis Protivensky	fb947c1f3d	Fix gcc warning of different enum and non-enum types in ternary Make '0' literal explicitly unsigned with '0u'. This appeared after r236775. llvm-svn: 236838	2015-05-08 12:21:03 +00:00
Matthias Braun	3b3ecc12b2	Change getTargetNodeName() to produce compiler warnings for missing cases, fix them llvm-svn: 236775	2015-05-07 21:33:59 +00:00
Sanjay Patel	4c1ff7e364	Use intrinsic pattern to make a simpler match This is a follow-on to r236740 where I took Andrea's advice in D9504 to remove a redundant pattern...except that I removed the wrong pattern! AFAICT, there is no change in the final code produced because subsequent passes would clean up the extra instructions created by the more complicated pattern. llvm-svn: 236743	2015-05-07 16:51:12 +00:00
Sanjay Patel	d786f12843	[x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507) Finish the job that was abandoned in D6958 following the refactoring in http://reviews.llvm.org/rL230221: 1. Uncomment the intrinsic def for the AVX r_Int instruction. 2. Add missing r_Int entries to the load folding tables; there are already tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I haven't added any more in this patch. 3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ). So instead of this: movaps %xmm0, %xmm1 rcpss %xmm1, %xmm1 movss %xmm1, %xmm0 We should now get: rcpss %xmm0, %xmm0 And instead of this: vsqrtss %xmm0, %xmm0, %xmm1 vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3] We should now get: vsqrtss %xmm0, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D9504 llvm-svn: 236740	2015-05-07 15:48:53 +00:00
Elena Demikhovsky	28f6bb84a5	AVX-512: Added all forms of FP compare instructions for KNL and SKX. Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714	2015-05-07 11:24:42 +00:00
Sanjoy Das	c8143387ff	[X86MCInst] Move LowerSTATEPOINT to inside X86AsmPrinter. NFC. llvm-svn: 236676	2015-05-06 23:53:26 +00:00
Sanjoy Das	c17aa0b859	[X86MCInst] Clean up LowerSTATEPOINT: variable names. NFC. llvm-svn: 236675	2015-05-06 23:53:24 +00:00
Pete Cooper	f7321a38fb	[x86] Fix register class of folded load index reg. When folding a load in to another instruction, we need to fix the class of the index register Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier. llvm-svn: 236644	2015-05-06 21:37:19 +00:00
Wei Mi	338b822ab9	[X86] Disable loop unrolling in loop vectorization pass when VF is 1. The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613	2015-05-06 17:12:25 +00:00
NAKAMURA Takumi	2655f48b30	Revert r236546, "propagate IR-level fast-math-flags to DAG nodes (NFC)" It caused undefined behavior. llvm-svn: 236600	2015-05-06 14:03:12 +00:00
Pete Cooper	5b7d6ac920	[X86 fast-isel] Constrain the index reg class to not include SP. The index reg on instructions with complex address modes is a GPR64_NOSP. Constrain it to appease the machine verifier. llvm-svn: 236557	2015-05-05 23:41:53 +00:00
Sanjay Patel	b7125c62c7	propagate IR-level fast-math-flags to DAG nodes (NFC) This patch adds the minimum plumbing necessary to use IR-level fast-math-flags (FMF) in the backend without actually using them for anything yet. This is a follow-on to: http://reviews.llvm.org/rL235997 ...which split the existing nsw / nuw / exact flags and FMF into their own struct. There are 2 structural changes here: 1. The main diff is that we're preparing to extend the optimization flags to affect more than just binary SDNodes. Eg, IR intrinsics ( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes that don't even exist in IR such as FMA, FNEG, etc. 2. The other change is that we're actually copying the FP fast-math-flags from the IR instructions to SDNodes. Differential Revision: http://reviews.llvm.org/D8900 llvm-svn: 236546	2015-05-05 21:40:38 +00:00
Sanjay Patel	43ba3dfb7e	use range-based for-loop; NFC llvm-svn: 236544	2015-05-05 21:20:52 +00:00
Reid Kleckner	feb0da7d82	Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236360. This change exposed a bug in WinEHPrepare by opting win32 code into EH preparation. We already knew that WinEHPrepare has bugs, and is the status quo for x64, so I don't think that's a reason to hold off on this change. I disabled exceptions in the sanitizer tests in r236505 and an earlier revision. llvm-svn: 236508	2015-05-05 17:44:16 +00:00
Quentin Colombet	c82cc9dc57	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Reid Kleckner	92f8523946	[X86] Fix assertion while DAG combining offsets and ExternalSymbols ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes. llvm-svn: 236471	2015-05-04 23:22:36 +00:00
Elena Demikhovsky	16b6cc68cf	AVX-512: added calling convention for i1 vectors in 32-bit mode. Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420	2015-05-04 12:40:50 +00:00
Elena Demikhovsky	40362f45c8	AVX-512: added integer "add" and "sub" instructions with saturation for SKX with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418	2015-05-04 12:35:55 +00:00
Elena Demikhovsky	5b00c277f4	AVX-512: Added VPACK* instructions forms for KNL and SKX and their intrinsics by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236414	2015-05-04 09:14:02 +00:00
Elena Demikhovsky	adaab7f0c7	Masked gather and scatter intrinsics - enabled codegen for KNL. llvm-svn: 236394	2015-05-03 07:12:25 +00:00
Simon Pilgrim	d84912935d	[SSE2] Minor tidyup of v16i8 SHL lowering. NFC. Removed code that was replicating v8i16 'shift + mask' implementation that is done more nicely by making use of LowerScalarImmediateShift llvm-svn: 236388	2015-05-02 14:42:43 +00:00
Reid Kleckner	8a0256ee3c	Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236359. Things are still broken despite testing. :( llvm-svn: 236360	2015-05-01 22:50:14 +00:00
Reid Kleckner	c7b4ae0218	Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236340. llvm-svn: 236359	2015-05-01 22:40:25 +00:00
Reid Kleckner	a471390f42	Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236339, it breaks the win32 clang-cl self-host. llvm-svn: 236340	2015-05-01 20:14:04 +00:00
Reid Kleckner	ec21431851	[WinEH] Add an EH registration and state insertion pass for 32-bit x86 This pass is responsible for constructing the EH registration object that gets linked into fs:00, which is all it does in this change. In the future, it will also insert stores to update the EH state number. I considered keeping this functionality in WinEHPrepare, but it's pretty separable and X86 specific. It has conceptually very little to do with the task of WinEHPrepare, which is currently outlining. WinEHPrepare is also in theory useful on ARM, but this logic is pretty x86 specific. Reviewers: andrew.w.kaylor, majnemer Differential Revision: http://reviews.llvm.org/D9422 llvm-svn: 236339	2015-05-01 20:04:54 +00:00
Reid Kleckner	6e67433faf	[X86] Use 4 byte preferred aggregate alignment on Win32 This helps reduce the frequency of stack realignment prologues in 32-bit X86 Windows code. Before this change and the corresponding clang change, we would take the max of the type preferred alignment and the explicit alignment on the alloca. If you don't override aggregate alignment in datalayout, you get a default of 8. This dates back to 2007 / r34356, and changing it seems prohibitively difficult at this point. llvm-svn: 236270	2015-04-30 22:11:59 +00:00
Daniel Jasper	6977e1bcb9	Silence unused warning in non-assert builds. llvm-svn: 236213	2015-04-30 09:01:21 +00:00
Elena Demikhovsky	201b5c4641	Masked gather and scatter - added DAGCombine visitors and AVX-512 instruction selection patterns. All other patches, including tests will follow. http://reviews.llvm.org/D7665 llvm-svn: 236211	2015-04-30 08:38:48 +00:00
Simon Pilgrim	07a58b66cf	[SSE] Fix for MUL v16i8 on pre-SSE41 targets (PR23369). Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte. llvm-svn: 236209	2015-04-30 08:23:16 +00:00
Pete Cooper	00f179e7a1	Change x86 CMOVE_F to read it source, not write it. This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set. Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit. So move the operand to the correct place in the td file. rdar://problem/20751584 llvm-svn: 236183	2015-04-29 23:51:33 +00:00
Reid Kleckner	97ba8a8b4c	[X86] Avoid mangling frameescape labels x86 Windows uses the '_' prefix for all global symbols, and this was mistakenly being applied to frameescape labels, which are not externally visible global symbols. They use the private global prefix 'L'. The right way to fix this is probably to stop masquerading this label as an ExternalSymbol and create a new SDNode type. These labels are not "external", and we know they will be resolved by assembly time. Having a custom SDNode type would allow us to do better X86 address mode matching, so it's probably worth doing eventually. llvm-svn: 236123	2015-04-29 16:46:01 +00:00
Elena Demikhovsky	dda40328e9	fixed 80-chars; NFC llvm-svn: 236093	2015-04-29 08:49:57 +00:00
Eric Christopher	0173cdac82	Reuse a lookup in an assert. llvm-svn: 236054	2015-04-28 22:38:35 +00:00
Sanjay Patel	7bcb75e23d	[x86] remove RCPPS and RSQRTPS intrinsic instruction definitions We don't need codegen-only intrinsic instructions for the vector forms of these instructions. This makes the reciprocal estimate instruction lowering identical to how we handle normal square roots: (V)SQRTPS / (V)SQRTPD. No existing regression tests fail with this patch. Differential Revision: http://reviews.llvm.org/D9301 llvm-svn: 236013	2015-04-28 18:48:45 +00:00
Sanjay Patel	877a22a3bc	move IR-level optimization flags into their own struct This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900). In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, we should eventually share this data between the IR passes and the backend. We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to use the new struct. The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a data member to the subclass. This means we don't have to repeat all of the get/set methods per flag, but we're potentially adding size to all nodes of this subclassi type. In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at least one free byte to play with due to struct alignment. Differential Revision: http://reviews.llvm.org/D9325 llvm-svn: 235997	2015-04-28 16:39:12 +00:00
Elena Demikhovsky	224807ff06	Fixed crash of variable shift inst on AVX2 https://llvm.org/bugs/show_bug.cgi?id=22955 llvm-svn: 235993	2015-04-28 14:46:35 +00:00
Sergey Dmitrouk	7bfbc12128	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	39180626db	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	01a4dcd3bb	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Elena Demikhovsky	23e33119e3	AVX-512: Added "pandn" intrinsics set by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 235971	2015-04-28 08:12:42 +00:00
Sanjay Patel	bc20f6f5ab	remove obsolete pattern matches for scalar SSE ops The blendi pattern should always replace the insertps pattern after: http://reviews.llvm.org/rL232850 http://reviews.llvm.org/rL235124 llvm-svn: 235930	2015-04-27 22:23:17 +00:00
Sanjay Patel	8f76b5a451	fix 80-cols; NFC llvm-svn: 235902	2015-04-27 17:45:44 +00:00
Sanjay Patel	5c239c3e21	fix typos; NFC llvm-svn: 235896	2015-04-27 17:03:31 +00:00
Elena Demikhovsky	489127abd6	AVX-512: added calling conventions for i1 vectors. Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724 llvm-svn: 235889	2015-04-27 15:11:19 +00:00
Elena Demikhovsky	3485573818	AVX-512: Extend/Truncate operations for SKX, SETCC for bit-vectors llvm-svn: 235875	2015-04-27 12:57:59 +00:00
Simon Pilgrim	8174379b38	[X86][SSE] Add v16i8/v32i8 multiplication support Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized. The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result. Differential Revision: http://reviews.llvm.org/D9115 llvm-svn: 235837	2015-04-27 07:55:46 +00:00
Lang Hames	5d79e39b45	[AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr. AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a reference for this is crufty. llvm-svn: 235752	2015-04-24 19:11:51 +00:00
Sanjay Patel	6006202c04	[x86] Add store-folded memop patterns for vcvtps2ph Differential Revision: http://reviews.llvm.org/D7296 llvm-svn: 235517	2015-04-22 16:11:19 +00:00
Andrea Di Biagio	7008d5a01f	[X86][AVX] Fix failure due to a missing ISel pattern to select VBROADCAST nodes (PR23259). This fixes a regression introduced at revision 218263. On AVX, if we optimize for size, a splat build_vector of a load is lowered into a VBROADCAST node. This is done even if the value type of the splat build_vector node is v2i64. Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast where the scalar element comes from a loadi64/loadf64. However, revision 218263 forgot to add an extra fallback pattern for the case where we have a X86VBroadcast of a loadi64 with multiple uses. This patch adds the missing tablegen pattern in X86InstrSSE.td. This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern to select v2i64 X86ISD::BROADCAST nodes. llvm-svn: 235509	2015-04-22 14:53:39 +00:00
Lang Hames	76014c544d	[patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the X86 backend. The code generated for symbolic targets is identical to the code generated for constant targets, except that a relocation is emitted to fix up the actual target address at link-time. This allows IR and object files containing patchpoints to be cached across JIT-invocations where the target address may change. llvm-svn: 235483	2015-04-22 06:02:31 +00:00
Sanjay Patel	b365b56ba8	[x86] allow 64-bit extracted vector element integer stores on a 32-bit system With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system even though 64-bit integers are not legal types. So instead of producing this: pshufd $229, %xmm0, %xmm1 ## xmm1 = xmm0[1,1,2,3] movd %xmm0, (%eax) movd %xmm1, 4(%eax) We can do: movq %xmm0, (%eax) This is a fix for the problem noted in D7296. Differential Revision: http://reviews.llvm.org/D9134 llvm-svn: 235460	2015-04-22 00:24:30 +00:00
Matthias Braun	d300744bcd	X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine There doesn't seem to be a reason to perform this target ISD node matching in an DAGCombine, moving it to lowering fixes PR23296. Differential Revision: http://reviews.llvm.org/D9137 llvm-svn: 235394	2015-04-21 17:21:36 +00:00
Elena Demikhovsky	13b5e09c11	AVX-512: Added VPMOVx2M instructions for SKX, fixed encoding of VPMOVM2x. llvm-svn: 235385	2015-04-21 14:38:31 +00:00
Elena Demikhovsky	61a239b83c	AVX-512: Added VPTESTM and VPTESTNM instructions for SKX llvm-svn: 235383	2015-04-21 13:13:46 +00:00
Elena Demikhovsky	abf0138a81	AVX-512: Added logical and arithmetic instructions for SKX by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 235375	2015-04-21 10:27:40 +00:00
Simon Pilgrim	5ad623ef36	[X86][SSE] Provide execution domains for scalar floating point operations This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since. I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests. Differential Revision: http://reviews.llvm.org/D9095 llvm-svn: 235372	2015-04-21 08:40:22 +00:00
Matthias Braun	3df6448424	X86: Do not select X86 custom vector nodes if operand types don't match X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected if the operand types do not match the result type because vector type legalization cannot deal with this for custom nodes. Testcase X86ISD::ADDSUB is attached. I could not create a testcase for the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296 Differential Revision: http://reviews.llvm.org/D9120 llvm-svn: 235367	2015-04-21 01:13:41 +00:00
Andrea Di Biagio	b10889be03	[X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273). This fixes a regression introduced at revision 231243. The target-independent selection algorithm in FastISel knows how to select a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr. Method X86FastISel::X86SelectSIToFP was therefore working under the wrong assumption that the target was AVX. That assumption was incorrect since we can have a target that is neither AVX nor SSE. So, rather than asserting for the presence of AVX, we should have had an early exit from 'X86SelectSIToFP' if the target was not AVX. This patch fixes the issue replacing the invalid assertion with an early exit. Thanks to Dimitry Andric for reporting this problem and for providing a small reproducible testcase. Added test pr23273.ll. llvm-svn: 235295	2015-04-20 11:56:59 +00:00
Simon Pilgrim	292688507a	[X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation. The fix ensures that scalar sources inserted into a vector are the correct bit size. Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support. llvm-svn: 235281	2015-04-19 22:16:49 +00:00
Craig Topper	868247e7c2	Remove unnecessary include and probably a layering violation. llvm-svn: 235262	2015-04-19 00:57:33 +00:00
Sanjay Patel	d7195a62c2	[X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd This is the AVX extension of r235014: http://llvm.org/viewvc/llvm-project?view=revision&revision=235014 Review: http://reviews.llvm.org/D8691 llvm-svn: 235210	2015-04-17 17:02:37 +00:00
Rafael Espindola	3882872dda	Move AliasedSymbol to MachObjectWriter. It was only used by MachO. Part of pr19627. llvm-svn: 235185	2015-04-17 12:28:43 +00:00
Sanjay Patel	e7d64b577c	[X86] add an exedepfix entry for movq == movlps == movlpd This is a 1-line patch (with a TODO for AVX because that will affect even more regression tests) that lets us substitute the appropriate 64-bit store for the float/double/int domains. It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice. Differential Revision: http://reviews.llvm.org/D8691 llvm-svn: 235014	2015-04-15 15:47:51 +00:00
Sanjay Patel	f5d182f516	[x86] Implement combineRepeatedFPDivisors Set the transform bar at 2 divisions because the fastest current x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle latency (best case) relative to a 5 cycle multiplier. So that's the worst case for this transform (no latency win), but multiplies are obviously pipelined while divisions are not, so there's still a big throughput win which we would expect to show up in typical FP code. These are the sequences I'm comparing: divss %xmm2, %xmm0 mulss %xmm1, %xmm0 divss %xmm2, %xmm0 Becomes: movss LCPI0_0(%rip), %xmm3 ## xmm3 = mem[0],zero,zero,zero divss %xmm2, %xmm3 mulss %xmm3, %xmm0 mulss %xmm1, %xmm0 mulss %xmm3, %xmm0 [Ignore for the moment that we don't optimize the chain of 3 multiplies into 2 independent fmuls followed by 1 dependent fmul...this is the DAG version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that, then the transform becomes even more profitable on all targets.] Differential Revision: http://reviews.llvm.org/D8941 llvm-svn: 235012	2015-04-15 15:22:55 +00:00
Rafael Espindola	aeb03deb16	Use raw_pwrite_stream in the object writer/streamer. The ELF object writer will take advantage of that in the next commit. llvm-svn: 234950	2015-04-14 22:14:34 +00:00
Krzysztof Parzyszek	3efcf81e03	Allow memory intrinsics to be tail calls llvm-svn: 234764	2015-04-13 17:16:45 +00:00
Alexander Kornienko	71412ece39	Use 'override/final' instead of 'virtual' for overridden methods The patch is generated using clang-tidy misc-use-override check. This command was used: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \ -checks='-*,misc-use-override' -header-filter='llvm\|clang' \ -j=32 -fix -format http://reviews.llvm.org/D8925 llvm-svn: 234679	2015-04-11 02:11:45 +00:00
Benjamin Kramer	f6149322d4	Reduce dyn_cast<> to isa<> or cast<> where possible. No functional change intended. llvm-svn: 234586	2015-04-10 11:24:51 +00:00
Rafael Espindola	adc15d13f8	clang-format bits of code to make a followup patch easy to read. llvm-svn: 234519	2015-04-09 18:32:58 +00:00
Rafael Espindola	77bf4fdaa6	Don't repeat name in comment. NFC. llvm-svn: 234506	2015-04-09 17:10:57 +00:00
Rafael Espindola	d83c383098	Refactor a lot of duplicated code for stub output. This also moves it earlier so that it they are produced before we print an end symbol for the data section. llvm-svn: 234315	2015-04-07 13:42:44 +00:00
Simon Pilgrim	a5a17c0b23	[X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction. This allows folding of byte loads and reduces scalar register usage as well. Differential Revision: http://reviews.llvm.org/D8839 llvm-svn: 234193	2015-04-06 18:39:00 +00:00
Craig Topper	fcc8191397	[X86] Apply AddedComplexity consistently for similar patterns. This keeps them together in the DAGISel tables and reduces table size slightly. llvm-svn: 234086	2015-04-04 04:22:12 +00:00
Craig Topper	4b79b11952	[X86] Add a comment about the change in r234075. llvm-svn: 234079	2015-04-04 02:31:43 +00:00
Craig Topper	54ac1d95cd	[X86] Don't use GR64 register 'and with immediate' instructions if the immediate is zero in the upper 33-bits or upper 57-bits. Use GR32 instructions instead. Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero. Fixes PR23100. llvm-svn: 234075	2015-04-04 02:08:20 +00:00
David Majnemer	694a466675	[WinEH] Sink UnwindHelp completely out of IR We don't need to represent UnwindHelp in IR. Instead, we can use the knowledge that we are emitting the parent function to decide if we should create the UnwindHelp stack object. llvm-svn: 234061	2015-04-03 22:32:26 +00:00
Duncan P. N. Exon Smith	054ffcf2c3	CodeGen: Assert that inlined-at locations agree As a follow-up to r234021, assert that a debug info intrinsic variable's `MDLocalVariable::getInlinedAt()` always matches the `MDLocation::getInlinedAt()` of its `!dbg` attachment. The goal here is to get rid of `MDLocalVariable::getInlinedAt()` entirely (PR22778), but I'll let these assertions bake for a while first. If you have an out-of-tree backend that just broke, you're probably attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction. The one you want is the location that was attached to the corresponding `@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with. llvm-svn: 234038	2015-04-03 19:20:26 +00:00
Simon Pilgrim	4f794f11fb	[X86] Added SSE4.2 CRC32 memory folding patterns + tests llvm-svn: 234013	2015-04-03 14:24:40 +00:00
Simon Pilgrim	380961e86b	[X86][3DNow] Added 3DNow! memory folding patterns + tests llvm-svn: 234008	2015-04-03 11:50:30 +00:00
Peter Collingbourne	b2266efe11	MC: For variable symbols, maintain MCSymbol::Section as a cache. Fixes PR19582. Previously, when an asm assignment (.set or =) was created, we would look up the section immediately in MCSymbol::setVariableValue. This caused symbols to receive the wrong section if the RHS of the assignment had not been seen yet. This had a knock-on effect in the object file emitters, causing them to emit extra symbols, or to give symbols the wrong visibility or the wrong section. For example, in the following asm: .data .Llocal: .text leaq .Llocal1(%rip), %rdi .Llocal1 = .Llocal2 .Llocal2 = .Llocal the first assignment would give .Llocal1 a null section, which would never get fixed up by the second assignment. This would cause the ELF object file emitter to consider .Llocal1 to be an undefined symbol and give it external linkage, even though .Llocal1 should not have been emitted at all in the object file. Or in the following asm: alias_to_local = Ltmp0 Ltmp0: the Mach-O object file emitter would give the alias_to_local symbol a n_type of N_SECT and a n_sect of 0. This is invalid under the Mach-O specification, which requires N_SECT symbols to receive a non-zero section number if the symbol is defined in a section in the object file. https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/#//apple_ref/c/tag/nlist After this change we do not look up the section when the assignment is created, but instead look it up on demand and store it in Section, which is treated as a cache if the symbol is a variable symbol. This change also fixes a bug in MCExpr::FindAssociatedSection. Previously, if we saw a subtraction, we would return the first referenced section, even in cases where we should have been returning the absolute pseudo-section. Now we always return the absolute pseudo-section for expressions that subtract two section-derived expressions. This isn't always correct (e.g. if one of the sections ends up being laid out at an absolute address), but it's probably the best we can do without more context. This allows us to remove code in two places where we appear to have been working around this bug, in MachObjectWriter::markAbsoluteVariableSymbols and in X86AsmPrinter::EmitStartOfAsmFile. Re-applies r233595 (aka D8586), which was reverted in r233898. Differential Revision: http://reviews.llvm.org/D8798 llvm-svn: 233995	2015-04-03 01:46:11 +00:00
Sanjay Patel	ee1b1c4540	[AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector Without this patch, we split the 256-bit vector into halves and produced something like: movzwl (%rdi), %eax vmovd %eax, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vblendps $15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7] Now, we eliminate the xor and blend because those zeros are free with the vmovd: movzwl (%rdi), %eax vmovd %eax, %xmm0 This should be the final fix needed to resolve PR22685: https://llvm.org/bugs/show_bug.cgi?id=22685 llvm-svn: 233941	2015-04-02 20:21:52 +00:00
Sanjay Patel	c17be4ec1a	[X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073) For code like this: define <8 x i32> @load_v8i32() { ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0> } We produce this AVX code: _load_v8i32: ## @load_v8i32 movl $7, %eax vmovd %eax, %xmm0 vxorps %ymm1, %ymm1, %ymm1 vblendps $1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7] retq There are at least 2 bugs in play here: We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs). We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd. The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem. [1] AddedComplexity has close to no documentation in the source. The best we have is this comment: "roughly corresponds to the number of nodes that are covered". It appears that x86 has bastardized this definition by inflating its values for some other undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). I searched my way to this page: https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ Differential Revision: http://reviews.llvm.org/D8794 llvm-svn: 233931	2015-04-02 17:56:17 +00:00
Elena Demikhovsky	74d944b41a	AVX-512: intrinsics for VPADD, VPMULDQ and VPSUB by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 233906	2015-04-02 10:51:40 +00:00
Peter Collingbourne	08b6124ef0	Revert r233595, "MC: For variable symbols, maintain MCSymbol::Section as a cache." llvm-svn: 233898	2015-04-02 07:02:51 +00:00
Benjamin Kramer	a711d21dc6	[X86] Don't accidentally select shll $1, %eax when shrinking an immediate. addl has higher throughput and this was needlessly picking a suboptimal encoding causing PR23098. I wish there was a way of doing this without further duplicating tbl- generated patterns, but so far I haven't found one. llvm-svn: 233832	2015-04-01 19:01:09 +00:00
David Majnemer	e7ba02b466	[WinEH] Generate .xdata for catch handlers This lets us catch exceptions in simple cases. N.B. Things that do not work include (but are not limited to): - Throwing from within a catch handler. - Catching an object with a named catch parameter. - 'CatchHigh' is fictitious, we aren't sure of its purpose. - We aren't entirely efficient with regards to the number of EH states that we generate. - IP-to-State tables are sensitive to the order of emission. llvm-svn: 233767	2015-03-31 22:35:44 +00:00
Sanjay Patel	0a3c26d356	typo; NFC llvm-svn: 233761	2015-03-31 21:24:47 +00:00
Sanjay Patel	e16f423a6a	[X86, AVX] fix zero-extending integer operand load patterns to use integer instructions This is a follow-on to r233704 and another partial fix for PR22685: https://llvm.org/bugs/show_bug.cgi?id=22685 llvm-svn: 233724	2015-03-31 18:43:43 +00:00
Sanjay Patel	c33fdfa218	[X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354) It improves the v4i64 case although not optimally. This AVX codegen: vmovq {{.#+}} xmm0 = mem[0],zero vxorpd %ymm1, %ymm1, %ymm1 vblendpd {{.#+}} ymm0 = ymm0[0],ymm1[1,2,3] Becomes: vmovsd {{.*#+}} xmm0 = mem[0],zero Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here: We're not handling v32i8 / v16i16. We're not getting the FP / int domains right for instruction selection. But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, I'm submitting this patch ahead of fixing the above. Differential Revision: http://reviews.llvm.org/D8341 llvm-svn: 233704	2015-03-31 16:32:11 +00:00
Rafael Espindola	9790d5e711	Fix the operand encoding in the test instruction. Fixes pr22995. llvm-svn: 233686	2015-03-31 12:31:55 +00:00
Ahmed Bougacha	5df03ee9ed	[X86] Generate MOVNT for all vector types. We used to miss non-Q YMM integer vectors, and, non-Q/D XMM integer vectors. While there, change the v4i32 patterns to prefer MOVNTDQ. llvm-svn: 233668	2015-03-31 03:16:51 +00:00
Eric Christopher	fdc8ea88a6	Replace the MCSubtargetInfo parameter with a Triple when creating an MCInstPrinter. Update all callers and use where we wanted a Triple previously. llvm-svn: 233648	2015-03-31 00:10:04 +00:00
Eric Christopher	cd7b8759a0	Remove unused MCSubtargetInfo argument from the X86 MCInstPrinter ctors. llvm-svn: 233614	2015-03-30 22:16:37 +00:00
Eric Christopher	f6dc0ee979	Remove unused Target argument from MCInstPrinter ctor functions. llvm-svn: 233607	2015-03-30 21:52:21 +00:00
Peter Collingbourne	937a5dc713	MC: For variable symbols, maintain MCSymbol::Section as a cache. This fixes the visibility of symbols in certain edge cases involving aliases with multiple levels of indirection. Fixes PR19582. Differential Revision: http://reviews.llvm.org/D8586 llvm-svn: 233595	2015-03-30 20:41:21 +00:00
Yaron Keren	5d3d22628b	Remove more superfluous .str() and replace std::string concatenation with Twine. Following r233392, http://llvm.org/viewvc/llvm-project?rev=233392&view=rev. llvm-svn: 233555	2015-03-30 15:42:36 +00:00
Sanjay Patel	047467d3e1	more space; NFC llvm-svn: 233554	2015-03-30 15:31:32 +00:00
Elena Demikhovsky	0e38b477c5	AVX-512: blank lines, duplicated tests, no functional changes see comments http://reviews.llvm.org/D6835 llvm-svn: 233528	2015-03-30 09:29:28 +00:00
Elena Demikhovsky	28fa559e12	AVX-512: added intrinsics for VPAND, VPOR and VPXOR by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 233525	2015-03-30 08:30:34 +00:00
Craig Topper	1e86b7910d	[X86] Remove FeatureAES for 'corei7' CPU. 'corei7' should match 'nehalem' which doesn't have AES. Having AES and not PCLMUL makes 'corei7' halfway between Nehalem and Westmere. llvm-svn: 233517	2015-03-30 06:31:11 +00:00
Elena Demikhovsky	16b58bef45	AVX-512: Fixed the "commutative" property flag in VPANDN instruction By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 233489	2015-03-29 09:14:29 +00:00
Akira Hatanaka	806517ee93	[X86] Read the feature bits from the subtarget that is passed to printInst instead of from MCInstPrinter::AvailableFeatures. llvm-svn: 233485	2015-03-28 20:56:05 +00:00
Akira Hatanaka	3e71777770	Partially revert the changes I made in r233473 to keep the code concise. llvm-svn: 233474	2015-03-28 04:40:43 +00:00
Akira Hatanaka	be98742cf0	clang-format X86ATTInstPrinter.{h,cpp} before I make changes to these files. llvm-svn: 233473	2015-03-28 04:25:41 +00:00
Akira Hatanaka	6a2e278ec7	[MCInstPrinter] Enable MCInstPrinter to change its behavior based on the per-function subtarget. Currently, code-gen passes the default or generic subtarget to the constructors of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which enables some targets (AArch64, ARM, and X86) to change their instprinter's behavior based on the subtarget feature bits. Since the backend can now use different subtargets for each function, instprinter has to be changed to use the per-function subtarget rather than the default subtarget. This patch takes the first step towards enabling instprinter to change its behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the various print methods table-gen auto-generates. I will follow up with changes to instprinters of AArch64, ARM, and X86. llvm-svn: 233411	2015-03-27 20:36:02 +00:00
Yaron Keren	3856893d6d	Remove superfluous .str() and replace std::string concatenation with Twine. llvm-svn: 233392	2015-03-27 17:51:30 +00:00
Sanjay Patel	b7ffd7dd27	comment cleanup; NFC llvm-svn: 233293	2015-03-26 17:18:17 +00:00
Benjamin Kramer	6ae650a922	Remove outdated README-SSE.txt entries. llvm-svn: 233292	2015-03-26 17:12:16 +00:00
Sanjay Patel	eef94e25b4	Use SDValue bool checks; NFC intended llvm-svn: 233289	2015-03-26 16:55:43 +00:00
Andrea Di Biagio	e2ef12f536	[X86][FastIsel] Teach how to select vector load instructions. This patch teaches fast-isel how to select 128-bit vector load instructions. Added test CodeGen/X86/fast-isel-vecload.ll Differential Revision: http://reviews.llvm.org/D8605 llvm-svn: 233270	2015-03-26 11:29:02 +00:00
Sanjay Patel	57d34fb183	[X86, AVX] improve insertion into zero element of 256-bit vector This patch allows AVX blend instructions to handle insertion into the low element of a 256-bit vector for the appropriate data types. For f32, instead of: vblendps $1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3] vblendps $15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7] we get: vblendps $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7] For f64, instead of: vmovsd %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1] vblendpd $3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3] we get: vblendpd $1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3] For the hardware-neglected integer data types, I left a TODO comment in the code and added regression tests for a follow-on patch. Differential Revision: http://reviews.llvm.org/D8609 llvm-svn: 233199	2015-03-25 17:36:01 +00:00
Craig Topper	eb82fb3203	[X86] Remove GetCpuIDAndInfo, GetCpuIDAndInfoEx and DetectFamilyModel functions from X86 MC layer. They haven't been used since CPU autodetection was removed from X86Subtarget.cpp. llvm-svn: 233170	2015-03-25 04:16:50 +00:00
Reid Kleckner	b3c593a951	X86: Fix frameescape when not using an FP We can't use TargetFrameLowering::getFrameIndexOffset directly, because Win64 really wants the offset from the stack pointer at the end of the prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(), which is a pretty close approximiation of that. It fails to handle cases with interestingly large stack alignments, which is pretty uncommon on Win64 and is TODO. llvm-svn: 233137	2015-03-24 23:46:01 +00:00
Sanjay Patel	b1b1054a09	[X86, AVX] recognize shufflevector with zero input as a vperm2 (PR22984) vperm2x128 instructions have the special ability (aka free hardware capability) to shuffle zero values into a vector. This patch recognizes that type of shuffle and generates the appropriate control byte. https://llvm.org/bugs/show_bug.cgi?id=22984 Differential Revision: http://reviews.llvm.org/D8563 llvm-svn: 233100	2015-03-24 19:19:07 +00:00
Michael Kuperstein	1278cdeb94	Revert "Use std::bitset for SubtargetFeatures" This reverts commit r233055. It still causes buildbot failures (gcc running out of memory on several platforms, and a self-host failure on arm), although less than the previous time. llvm-svn: 233068	2015-03-24 12:56:59 +00:00
Michael Kuperstein	c6ff005c9e	Use std::bitset for SubtargetFeatures Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. The first time this was committed (r229831), it caused several buildbot failures. At least some of the ARM ones were due to gcc/binutils issues, and should now be fixed. Differential Revision: http://reviews.llvm.org/D8542 llvm-svn: 233055	2015-03-24 09:17:25 +00:00
David Blaikie	f21503bde0	Refactor: Simplify boolean expressions in x86 target Simplify boolean expressions with `true` and `false` with `clang-tidy` Patch by Richard Thomson. Differential Revision: http://reviews.llvm.org/D8519 llvm-svn: 233002	2015-03-23 19:42:36 +00:00
Benjamin Kramer	6a9aa608f1	Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used. llvm-svn: 232998	2015-03-23 19:32:43 +00:00
David Majnemer	94a285c113	Silence a GCC warning llvm-svn: 232923	2015-03-22 21:27:10 +00:00
Simon Pilgrim	2b54cb1a02	Fixed MSVC compile warning issue introduced in r232837 - was reporting 'warning C4715: 'getType32' : not all control paths return a value' llvm-svn: 232913	2015-03-22 13:38:36 +00:00
Eric Christopher	3d3373d3e2	Cache the Function dependent subtarget on the MachineFunction. As preparation for removing the getSubtargetImpl() call from TargetMachine go ahead and flip the switch on caching the function dependent subtarget and remove the bare getSubtargetImpl call from the X86 port. As part of this add a few tests that show we can generate code and assemble on X86 based on features/cpu on the Function. llvm-svn: 232879	2015-03-21 03:13:10 +00:00
Sanjay Patel	34ad366455	[X86] Prefer blendps over insertps codegen for one special case With this patch, for this one exact case, we'll generate: blendps %xmm0, %xmm1, $1 instead of: insertps %xmm0, %xmm1, $0 If there's a memory operand available for load folding and we're optimizing for size, we'll still generate the insertps. The detailed performance data motivation for this may be found in D7866; in summary, blendps has 2-3x throughput vs. insertps on widely used chips. Differential Revision: http://reviews.llvm.org/D8332 llvm-svn: 232850	2015-03-20 21:19:52 +00:00
Benjamin Kramer	370163f28b	X86: Make helper functions static. NFC. llvm-svn: 232848	2015-03-20 21:07:30 +00:00
Rafael Espindola	2ca5cf8fba	Reorganize the x86 ELF relocation selection logic. The main differences are: * Split in 32 and 64 bit functions. * First switch on the Modifier so that we have only one non fully covered switch. * Map the fixup kind first to a x86_64 (or i386) specific enum, to make it easy to handle cases like X86::reloc_riprel_4byte_movq_load. * Switch on IsPCRel last, which reduces code duplication. Fixes pr22308. llvm-svn: 232837	2015-03-20 19:48:54 +00:00
Simon Pilgrim	2633b0d5ef	Stripped trailing whitespace. NFC. llvm-svn: 232822	2015-03-20 16:08:17 +00:00
Rafael Espindola	3b034bc8b6	Reduce indentation after return. NFC. llvm-svn: 232814	2015-03-20 14:33:25 +00:00
Rafael Espindola	c8d031b1da	Use early returns. NFC. llvm-svn: 232813	2015-03-20 14:23:46 +00:00
Rafael Espindola	2aacc6b69e	Fold a llvm_unreachable into an assert. NFC. llvm-svn: 232811	2015-03-20 13:50:15 +00:00
Rafael Espindola	28d3ba7bcc	clang-format a function. NFC. llvm-svn: 232810	2015-03-20 13:47:40 +00:00
Sanjay Patel	2dc56fc992	move insert, extract, concat helper functions closer to related helper functions; NFCI llvm-svn: 232781	2015-03-19 23:04:25 +00:00
Sanjay Patel	7bfdf498b8	[X86, AVX] use blends instead of insert128 with index 0 Another case of x86-specific shuffle strength reduction: avoid generating insert*128 instructions with index 0 because they are slower than their non-lane-changing blend equivalents. Shuffle lowering already catches most of these cases, but the zero vector case and some other paths such as in the modified test in vector-shuffle-256-v32.ll were getting through. Differential Revision: http://reviews.llvm.org/D8366 llvm-svn: 232773	2015-03-19 22:29:40 +00:00
Rafael Espindola	dcba9c010c	Split the object streamer callback in one per file format. There are two main advantages to doing this * Targets that only need to handle one of the formats specially don't have to worry about the others. For example, x86 now only registers a constructor for the COFF streamer. * Changes to the arguments passed to one format constructor will not impact the other formats. llvm-svn: 232699	2015-03-19 01:50:16 +00:00
Rafael Espindola	a6821e116c	two or more, use a for. llvm-svn: 232688	2015-03-18 23:15:49 +00:00
Simon Pilgrim	6f98dca24d	[X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED) Fixed broken tests. Differential Revision: http://reviews.llvm.org/D8416 llvm-svn: 232682	2015-03-18 22:18:51 +00:00
Eric Christopher	60fdac43a1	Revert "[X86][SSE] Avoid scalarization of v2i64 vector shifts" as it appears to have broken tests/bots. This reverts commit r232660. llvm-svn: 232670	2015-03-18 21:01:00 +00:00
Simon Pilgrim	97919c9f36	[X86][SSE] Avoid scalarization of v2i64 vector shifts Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets. This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware). Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware. Differential Revision: http://reviews.llvm.org/D8416 llvm-svn: 232660	2015-03-18 19:35:31 +00:00
Rafael Espindola	b0e2e60a8d	Handle X86::reloc_riprel_4byte in 32 bits mode. We can get there with .code64. Fixes pr22349. llvm-svn: 232651	2015-03-18 17:33:40 +00:00
Rafael Espindola	5657b7ec6c	Make EmitFunctionHeader a private helper. llvm-svn: 232481	2015-03-17 14:38:30 +00:00
Rafael Espindola	f2d8674c1f	Pass in a "const Triple &T" instead of a raw StringRef. llvm-svn: 232429	2015-03-16 22:29:29 +00:00
Rafael Espindola	9b8b42c6d0	Remove unused argument. NFC. llvm-svn: 232428	2015-03-16 22:06:15 +00:00
David Blaikie	9465551fc2	Fix uses of reserved identifiers starting with an underscore followed by an uppercase letter This covers essentially all of llvm's headers and libs. One or two weird cases I wasn't sure were worth/appropriate to fix. llvm-svn: 232394	2015-03-16 18:06:57 +00:00
Sanjay Patel	f98968e93d	fix comments to match code; NFC llvm-svn: 232385	2015-03-16 15:38:48 +00:00
Rafael Espindola	f61c5f7d3a	Use the i8 immediate cmp instructions when possible. llvm-svn: 232378	2015-03-16 14:25:08 +00:00
Rafael Espindola	ef7260471d	Don't repeat names in comments and clang-format this function. llvm-svn: 232375	2015-03-16 14:05:49 +00:00
Daniel Sanders	6dc30f40bf	Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC. Summary: This is instead of doing this in target independent code and is the last non-functional change before targets begin to distinguish between different memory constraints when selecting code for the ISD::INLINEASM node. Next, each target will individually move away from the idea that all memory constraints behave like 'm'. Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8173 llvm-svn: 232373	2015-03-16 13:13:41 +00:00
Gabor Horvath	e3a1ff29d9	[llvm] Replacing asserts with static_asserts where appropriate Summary: This patch consists of the suggestions of clang-tidy/misc-static-assert check. Reviewers: alexfh Reviewed By: alexfh Subscribers: xazax.hun, llvm-commits Differential Revision: http://reviews.llvm.org/D8343 llvm-svn: 232366	2015-03-16 09:53:42 +00:00
Simon Pilgrim	449078e945	Use SDValue bool check to tidyup some possible combines. NFC. llvm-svn: 232331	2015-03-15 19:47:42 +00:00
Simon Pilgrim	fc49db1d46	Use SDValue bool check to tidyup some possible combines. NFC. llvm-svn: 232325	2015-03-15 17:21:35 +00:00
Rafael Espindola	97eb5e9037	Use add32ri8 and friends on fast isel. This fixes pr22854. The core issue on the bug is that there are multiple instructions that print the same in assembly. In fact, there doesn't seem to be any syntax for specifying that a constant that fits in 8 bits should use a 32 bit immediate. The attached patch changes fast isel to consider i16immSExt8, i32immSExt8, and i64immSExt8. They were disabled because fastisel didn’t know to call the predicate back in the day. llvm-svn: 232223	2015-03-13 22:18:18 +00:00
Andrea Di Biagio	f18bec8053	[X86][AVX] Fix wrong lowering of v4x64 shuffles into concat_vector plus extract_subvector nodes. This patch fixes a bug in the shuffle lowering logic implemented by function 'lowerV2X128VectorShuffle'. The are few cases where function 'lowerV2X128VectorShuffle' wrongly expands a shuffle of two v4X64 vectors into a CONCAT_VECTORS of two EXTRACT_SUBVECTOR nodes. The problematic expansion only occurs when the shuffle mask M has an 'undef' element at position 2, and M is equivalent to mask <0,1,4,5>. In that case, the algorithm propagates the wrong vector to one of the two new EXTRACT_SUBVECTOR nodes. Example: ;; define <4 x double> @test(<4 x double> %A, <4 x double> %B) { entry: %0 = shufflevector <4 x double> %A, <4 x double> %B, <4 x i32><i32 undef, i32 1, i32 undef, i32 5> ret <4 x double> %0 } ;; Before this patch, llc (-mattr=+avx) generated: vinsertf128 $1, %xmm0, %ymm0, %ymm0 With this patch, llc correctly generates: vinsertf128 $1, %xmm1, %ymm0, %ymm0 Added test lower-vec-shuffle-bug.ll Differential Revision: http://reviews.llvm.org/D8259 llvm-svn: 232179	2015-03-13 17:29:49 +00:00
Daniel Sanders	b2b69459a8	Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints. The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. PR22883 was caused the matching operands copying the whole of the operand flags for the matched operand. This included the constraint id which needed to be replaced with the operand number. This has been fixed with a conversion function. Following on from this, matching operands also used the operand number as the constraint id. This has been fixed by looking up the matched operand and taking it from there. llvm-svn: 232165	2015-03-13 12:45:09 +00:00
Sanjay Patel	13a9b5db63	[X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles This should complete the job started in r231794 and continued in r232045: We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. AVX2 introduced proper integer variants of the hacked integer insert/extract C intrinsics that were created for this same functionality with AVX1. This should complete the removal of insert/extract128 intrinsics. The Clang precursor patch for this change was checked in at r232109. llvm-svn: 232120	2015-03-12 23:16:18 +00:00
Hal Finkel	dc4180d54f	Revert "r232027 - Add infrastructure for support of multiple memory constraints" This (r232027) has caused PR22883; so it seems those bits might be used by something else after all. Reverting until we can figure out what else to do. Original commit message: The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. llvm-svn: 232093	2015-03-12 20:09:39 +00:00
Quentin Colombet	77e9397bd4	[X86] Fix a regression introduced by r223641. The permps and permd instructions have their operands swapped compared to the intrinsic definition. Therefore, they do not fall into the INTR_TYPE_2OP category. I did not create a new category for those two, as they are the only one AFAICT in that case. <rdar://problem/20108262> llvm-svn: 232085	2015-03-12 19:34:12 +00:00
Eric Christopher	1e6f9b376b	Remove the need to cache the subtarget in the X86 TargetRegisterInfo classes. Use a Triple instead and simplify a lot of the querying logic to use lookups on the Triple. llvm-svn: 232071	2015-03-12 17:54:19 +00:00
Andrea Di Biagio	b58c185f5a	[X86] Fix wrong target specific combine on SETCC nodes. Part of the folding logic implemented by function 'PerformISDSETCCCombine' only worked under the assumption that the condition code in input could have been either SETNE or SETEQ. Unfortunately that assumption was incorrect, and in some cases the algorithm ended up incorrectly folding SETCC nodes. The incorrect folding only affected SETCC dag nodes where: - one of the operands was a build_vector of all zeroes; - the other operand was a SIGN_EXTEND from a vector of MVT:i1 elements; - the condition code was neither SETNE nor SETEQ. Example: (setcc (v4i32 (sign_extend v4i1:%A)), (v4i32 VectorOfAllZeroes), setge) Before this patch, the entire dag node sequence from the example was incorrectly folded to node %A. With this patch, the dag node sequence is folded to a (xor %A, (v4i1 VectorOfAllOnes)). Added test setcc-combine.ll. Thanks to Greg Bedwell for spotting this issue. llvm-svn: 232046	2015-03-12 15:16:58 +00:00
Daniel Sanders	4eee6f840d	Add infrastructure for support of multiple memory constraints. Summary: The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8171 llvm-svn: 232027	2015-03-12 11:00:48 +00:00
Elena Demikhovsky	e21b4ac3e7	AVX-512: Added encoding tests for VPROR, VPROL instructions, fixed opcode. llvm-svn: 232018	2015-03-12 07:28:41 +00:00
Eric Christopher	84c7b275d4	Remove some unnecessary forward declarations and put a couple more where they're supposed to reside. llvm-svn: 232014	2015-03-12 06:07:16 +00:00
Mehdi Amini	94c8770ed5	Move the DataLayout to the generic TargetMachine, making it mandatory. Summary: I don't know why every singled backend had to redeclare its own DataLayout. There was a virtual getDataLayout() on the common base TargetMachine, the default implementation returned nullptr. It was not clear from this that we could assume at call site that a DataLayout will be available with each Target. Now getDataLayout() is no longer virtual and return a pointer to the DataLayout member of the common base TargetMachine. I plan to turn it into a reference in a future patch. The only backend that didn't have a DataLayout previsouly was the CPPBackend. It now initializes the default DataLayout. This commit is NFC for all the other backends. Test Plan: clang+llvm ninja check-all Reviewers: echristo Subscribers: jfb, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8243 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231987	2015-03-12 00:07:24 +00:00
Eric Christopher	7e02765bdf	Have getCallPreservedMask and getThisCallPreservedMask take a MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979	2015-03-11 22:42:13 +00:00
Juergen Ributzka	f7fcca505b	Add the "vbroadcasti128" instruction back. This is a follow-up to r231182. This adds the "vbroadcasti128" instruction back, but without the intrinsic mapping. Also add a test to check the instriction encoding. This is related to rdar://problem/18742778. llvm-svn: 231945	2015-03-11 17:29:03 +00:00
Derek Schuff	5d3b63bf2e	Make NaCl's use of .init_array for static constructors match Linux Summary: The generic ELF TargetObjectFile defaults to .ctors, but Linux's defaults to .init_array by calling InitializeELF with the value of UseInitArray from TargetMachine. Make NaCl's behavior match. Reviewers: jvoung Differential Revision: http://reviews.llvm.org/D8240 llvm-svn: 231934	2015-03-11 16:16:09 +00:00
Elena Demikhovsky	320252ae4d	AVX-512: Added SKX forms of shift instructions. Added rotation instructions, encoding only. Added encoding tests for all these forms. llvm-svn: 231916	2015-03-11 10:25:42 +00:00
Eric Christopher	91c4e41987	Have TargetRegisterInfo::getLargestLegalSuperClass take a MachineFunction argument so that it can look up the subtarget rather than using a cached one in some Targets. llvm-svn: 231888	2015-03-10 23:46:01 +00:00
Eric Christopher	db29a2f01c	Remove the use of the subtarget in MCCodeEmitter creation and update all ports accordingly. Required a couple of small rewrites in handling subtarget features during creation in PPC. llvm-svn: 231861	2015-03-10 22:03:14 +00:00
Andrea Di Biagio	171a02e1ca	[X86][AVX] Fix wrong lowering of VPERM2X128 nodes There were cases where the backend computed a wrong permute mask for a VPERM2X128 node. Example: \code define <8 x float> @foo(<8 x float> %a, <8 x float> %b) { %shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7> ret <8 x float> %shuffle } \code end Before this patch, llc (with -mattr=+avx) emitted the following vperm2f128: vperm2f128 $0, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[0,1,0,1] With this patch, llc emits a vperm2f128 with a correct permute mask: vperm2f128 $17, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[2,3,2,3] Differential Revision: http://reviews.llvm.org/D8119 llvm-svn: 231601	2015-03-08 16:28:47 +00:00
Simon Pilgrim	60d1ada777	[DAGCombiner] Add a shuffle mask commutation helper function. NFCI. We have an increasing number of cases where we are creating commuted shuffle masks - all implementing nearly the same code. This patch adds a static helper function - ShuffleVectorSDNode::commuteMask() and replaces a number of cases to use it. Differential Revision: http://reviews.llvm.org/D8139 llvm-svn: 231581	2015-03-07 22:33:11 +00:00
Benjamin Kramer	38504f768a	Make constant arrays that are passed to functions as const. In theory this allows the compiler to skip materializing the array on the stack. In practice clang often fails to do that, but that's a different story. NFC. llvm-svn: 231571	2015-03-07 17:41:00 +00:00
Benjamin Kramer	d41a2d5067	X86: Roll repetitive code into a loop. NFC. llvm-svn: 231565	2015-03-07 15:06:16 +00:00
Eric Christopher	18294959f1	Typo. llvm-svn: 231547	2015-03-07 01:39:09 +00:00
Bruno Cardoso Lopes	f5e7d40f2d	[AsmPrinter][TLOF] 32-bit MachO support for replacing GOT equivalents Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and access through a non_lazy_symbol_pointers section is used instead. -- before _extgotequiv: .long _extfoo _delta: .long _extgotequiv-_delta -- after _delta: .long L_extfoo$non_lazy_ptr-_delta .section __IMPORT,__pointers,non_lazy_symbol_pointers L_extfoo$non_lazy_ptr: .indirect_symbol _extfoo .long 0 llvm-svn: 231475	2015-03-06 13:49:05 +00:00
Bruno Cardoso Lopes	c84d60c12f	[AsmPrinter][TLOF] ARM64 MachO support for replacing GOT equivalents Follow up r230264 and add ARM64 support for replacing global GOT equivalent symbol accesses by references to the GOT entry for the final symbol instead, example: -- before .globl _foo _foo: .long 42 .globl _gotequivalent _gotequivalent: .quad _foo .globl _delta _delta: .long _gotequivalent-_delta -- after .globl _foo _foo: .long 42 .globl _delta Ltmp3: .long _foo@GOT-Ltmp3 llvm-svn: 231474	2015-03-06 13:48:45 +00:00
David Majnemer	fd4eca8ad2	X86: Form IMGREL relocations for LLVM Functions We supported forming IMGREL relocations from ConstantExprs involving __ImageBase if the minuend was a GlobalVariable. Extend this functionality to all GlobalObjects. llvm-svn: 231456	2015-03-06 08:11:32 +00:00
Ahmed Bougacha	a07940a0f3	[X86] Remove stale comment. NFC. It turns out 256bit V[SZ]EXT nodes are still generated by the new shuffle lowering, so this is here to stay! llvm-svn: 231422	2015-03-05 23:18:41 +00:00
Sanjay Patel	e7dd0e711b	[AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483) This patch reduces code size for all AVX targets and increases speed for some chips. SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and only in the packed float/double flavors. AVX subsequently made the instruction useful by adding a 4-register operand form. So we just need to paper over the lack of scalar forms of this instruction, complicate the code to choose float or double forms, and use blendv on scalars since all FP is in xmm registers anyway. This gives us an approximately 50% speed up for a blendv microbenchmark sequence on SandyBridge and Haswell: blendv : 29.73 cycles/iter logic : 43.15 cycles/iter No new test cases with this patch because: 1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel 2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX 3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants. http://llvm.org/bugs/show_bug.cgi?id=22483 Differential Revision: http://reviews.llvm.org/D8063 llvm-svn: 231408	2015-03-05 21:46:54 +00:00
David Majnemer	bdce8557da	X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes We know that the absolute symbol will be less than 2GB and thus will always fit. llvm-svn: 231389	2015-03-05 18:50:12 +00:00
Reid Kleckner	d0e0d012a0	Replace llvm.frameallocate with llvm.frameescape Turns out it's pretty straightforward and simplifies the implementation. Reviewers: andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D8051 llvm-svn: 231386	2015-03-05 18:26:34 +00:00
Elena Demikhovsky	a71b2e475e	AVX-512, SKX: Enabled masked_load/store operations for this target. Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors, it is needed to pass all masked_memop.ll tests for SKX. llvm-svn: 231371	2015-03-05 15:11:35 +00:00
Craig Topper	558157f7a7	[X86] Use vmovss to handle inserting an element into index 0 of a v8f32 vector of zeros. llvm-svn: 231354	2015-03-05 06:38:42 +00:00
JF Bastien	0cecbf8a42	Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250	2015-03-04 15:47:57 +00:00
Andrea Di Biagio	c64113491c	[X86][FastISel] Simplify the logic in method X86SelectSIToFP. The target-independent selection algorithm in FastISel already knows how to select a SINT_TO_FP if the target is SSE but not AVX. On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr (for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64 conversion). This patch simplifies the logic in method X86SelectSIToFP knowing that the code would not be reachable if the subtarget doesn't have AVX. No functional change intended. llvm-svn: 231243	2015-03-04 14:23:25 +00:00
Davide Italiano	02e7db63b7	[MC][Target] Implement support for R_X86_64_SIZE{32,64}. Differential Revision: D7990 Reviewed by: rafael, majnemer llvm-svn: 231216	2015-03-04 06:49:39 +00:00
Juergen Ributzka	a0c556be5c	Remove 'llvm.x86.avx2.vbroadcasti128' intrinsic. The intrinsic is no longer generated by the front-end. Remove the intrinsic and auto-upgrade it to a vector shuffle. Reviewed by Nadav This is related to rdar://problem/18742778. llvm-svn: 231182	2015-03-04 00:13:25 +00:00
Paul Robinson	58615d5112	[X86][ELF] Correct relocation for DWARF TLS references Previously we had only Linux using DTPOFF for these; all X86 ELF targets should. Fixes a side issue mentioned in PR21077. Differential Revision: http://reviews.llvm.org/D8011 llvm-svn: 231130	2015-03-03 21:01:27 +00:00
Sanjay Patel	231cac536a	remove enum value names from comments; NFC llvm-svn: 231129	2015-03-03 20:58:35 +00:00
Sanjay Patel	efccc5f4a5	use bool operator shortcut; NFC llvm-svn: 231123	2015-03-03 20:41:27 +00:00
Michael Kuperstein	f22ea25e15	[X86][Haswell][SchedModel] Fix patterns for scalar FMA3 variants. llvm-svn: 231073	2015-03-03 15:47:02 +00:00
Elena Demikhovsky	04be7be81d	AVX-512: Moved patterns for masked load/store under avx_store, avx_load classes. No functional changes. llvm-svn: 231069	2015-03-03 15:03:35 +00:00
Craig Topper	526ab94856	[X86] Remove some unused code from disassembler. llvm-svn: 231055	2015-03-03 05:24:03 +00:00
Ahmed Bougacha	d8b9ab0f6c	[X86] Special-case 2x CMOV when custom-inserting. This lets us avoid a few copies that are otherwise hard to get rid of. The way this is done is, the custom-inserter looks at the following instruction for another CMOV, and replaces both at the same time. A previous version used a new CMOV2 opcode, but the custom inserter is expected to be able to return a different basic block anyway, which means it's OK - though far from ideal - to alter that block's contents. Explicitly document that, in case it ever makes a difference. Alternatives welcome! Follow-up to r231045. rdar://19767934 Closes http://reviews.llvm.org/D8019 llvm-svn: 231046	2015-03-03 01:21:16 +00:00
Ahmed Bougacha	10768a3027	[X86] Combine (cmov (and/or (setcc) (setcc))) into (cmov (cmov)). Fold and/or of setcc's to double CMOV: (CMOV F, T, ((cc1 \| cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2) (CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2) When we can't use the CMOV instruction, it might increase branch mispredicts. When we can, or when there is no mispredict, this improves throughput and reduces register pressure. These can't be catched by generic combines, because the pattern can appear when legalizing some instructions (such as fcmp une). rdar://19767934 http://reviews.llvm.org/D7634 llvm-svn: 231045	2015-03-03 01:09:14 +00:00
Paul Robinson	2d03e1ba28	Revert r230979, should apply to all X86 ELF. llvm-svn: 230985	2015-03-02 18:50:18 +00:00
Paul Robinson	b9d32f6612	[PS4] Correct relocation for DWARF TLS references. llvm-svn: 230979	2015-03-02 17:44:52 +00:00
Elena Demikhovsky	e4a06dd254	AVX-512: Add assembly parser support for Rounding mode By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 230962	2015-03-02 15:00:34 +00:00
Elena Demikhovsky	769ec279fb	AVX-512: Simplified MOV patterns, no functional changes. llvm-svn: 230954	2015-03-02 12:46:21 +00:00
Craig Topper	68ce8c3e30	[X86] There are only 8 mask registers. Fail disassembly if instruction tries to reference more. llvm-svn: 230931	2015-03-02 03:33:11 +00:00
Craig Topper	60ae8a07f3	[X86] Fix diassembler crash on AVX512 cmpps/cmppd with immediate that doesn't fit in 5-bits. Fixes PR22743. llvm-svn: 230924	2015-03-02 00:22:29 +00:00
Benjamin Kramer	cb2485d9c8	X86: Replace variadic function with init list. NFC. llvm-svn: 230911	2015-03-01 21:47:40 +00:00
Benjamin Kramer	ef1d6eb8ba	ArrayRef: Remove the equals helper with many arguments. With initializer lists there is a really neat idiomatic way to write this, 'ArrayRef.equals({1, 2, 3, 4, 5})'. Remove the equal method which always had a hard limit on the number of arguments. I considered rewriting it with variadic templates but that's not really a good fit for a function with homogeneous arguments. 'ArrayRef == {1, 2, 3, 4, 5}' would've been even more awesome, but C++11 doesn't allow init lists with binary operators. llvm-svn: 230907	2015-03-01 21:05:05 +00:00
Elena Demikhovsky	9eda5391f2	Reverted 230471 - gather scatter handling in table gen. llvm-svn: 230892	2015-03-01 08:23:41 +00:00

... 3 4 5 6 7 ...

11845 Commits