llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00

Author	SHA1	Message	Date
Simon Pilgrim	acd3489995	[X86][SSE] Support v16i8/v32i8 vector rotations This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957	2018-06-29 09:36:39 +00:00
Sjoerd Meijer	e768b5aa55	[ARM][AArch64] Armv8.4-A Enablement Initial patch adding assembly support for Armv8.4-A. Besides adding v8.4 as a supported architecture to the usual places, this also adds target features for the different crypto algorithms. Armv8.4-A introduced new crypto algorithms, made them optional, and allows different combinations: - none of the v8.4 crypto functions are supported, which is independent of the implementation of the Armv8.0 SHA1 and SHA2 instructions. - the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0 SHA1 and SHA2 instructions must also be implemented. - the v8.4 SM3 and SM4 support is implemented, which is independent of the implementation of the Armv8.0 SHA1 and SHA2 instructions. - all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1 and SHA2 instructions must also be implemented. The v8.4 crypto instructions are added to AArch64 only, and not AArch32, and are made optional extensions to Armv8.2-A. The user-facing Clang options will map on these new target features, their naming will be compatible with GCC and added in follow-up patches. The Armv8.4-A instruction sets can be downloaded here: https://developer.arm.com/products/architecture/a-profile/exploration-tools Differential Revision: https://reviews.llvm.org/D48625 llvm-svn: 335953	2018-06-29 08:43:19 +00:00
Craig Topper	3c896ff07f	[X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944	2018-06-29 05:43:26 +00:00
Tom Stellard	4bed5eda0d	AMDGPU: Separate R600 and GCN TableGen files Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942	2018-06-28 23:47:12 +00:00
Eli Friedman	2655e6dc4e	[ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes. We don't ever check these again (unless you're using -fno-integrated-as), so make sure the extracted bits are well-defined. I don't think it's possible to trigger any of the assertions on trunk, but it's difficult to prove. (The first one depends on DAGCombine to minimize the number of set bits in AND masks; I think the others are mathematically impossible to hit.) llvm-svn: 335931	2018-06-28 21:49:41 +00:00
Benjamin Kramer	74ec48e91c	[NVPTX] Delete dead code No functionality change. llvm-svn: 335913	2018-06-28 20:05:35 +00:00
Eli Friedman	951f2d2f6e	[ARM] Add missing Thumb2 assembler diagnostics. Mostly just adding checks for Thumb2 instructions which correspond to ARM instructions which already had diagnostics. While I'm here, also fix ARM-mode strd to check the input registers correctly. Differential Revision: https://reviews.llvm.org/D48610 llvm-svn: 335909	2018-06-28 19:53:12 +00:00
Simon Pilgrim	48920466ab	Remove unnecessary semicolon. NFCI. Fixes -Wpedantic warning. llvm-svn: 335901	2018-06-28 18:37:16 +00:00
Craig Topper	ed8e39a2a3	[X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895	2018-06-28 17:58:01 +00:00
Jonas Devlieghere	c40fc8755a	Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894	2018-06-28 17:56:43 +00:00
Jessica Paquette	848840c089	[MachineOutliner] Define MachineOutliner support in TargetOptions Targets should be able to define whether or not they support the outliner without the outliner being added to the pass pipeline. Before this, the outliner pass would be added, and ask the target whether or not it supports the outliner. After this, it's possible to query the target in TargetPassConfig, before the outliner pass is created. This ensures that passing -enable-machine-outliner will not modify the pass pipeline of any target that does not support it. https://reviews.llvm.org/D48683 llvm-svn: 335887	2018-06-28 17:45:43 +00:00
Simon Pilgrim	76e0ff6b5d	[WebAssembly] Add getSetCCResultType placeholder override to handle vector compare results. Necessary to get the rL335821 bugfix (which was reverted at rL335871) un-reverted. llvm-svn: 335884	2018-06-28 17:27:09 +00:00
Matthias Braun	719039ac2c	SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O) Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877	2018-06-28 17:00:45 +00:00
Stanislav Mekhanoshin	871a3e96ce	[AMDGPU] Early expansion of 32 bit udiv/urem This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868	2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin	d8075e70a5	[AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16 Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866	2018-06-28 15:24:46 +00:00
Sjoerd Meijer	2b32675441	[ARM] Parallel DSP Pass Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850	2018-06-28 12:55:29 +00:00
Hans Wennborg	c707b3f0f1	s/TablesChecked/TableChecked/ after r335823 llvm-svn: 335831	2018-06-28 10:24:38 +00:00
Matt Arsenault	5ad28faa54	AMDGPU: Remove MFI::ABIArgOffset We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830	2018-06-28 10:18:55 +00:00
Matt Arsenault	c8324b4087	AMDGPU: Error on calls from graphics shaders In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829	2018-06-28 10:18:36 +00:00
Matt Arsenault	c4c4be3926	AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct Not sure how this wasn't noticed before. llvm-svn: 335828	2018-06-28 10:18:23 +00:00
Matt Arsenault	27ded667a8	AMDGPU: Fix assert on aggregate type kernel arguments Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827	2018-06-28 10:18:11 +00:00
Benjamin Kramer	a503f64b28	Unify sorted asserts to use the existing atomic pattern These are all benign races and only visible in !NDEBUG. tsan complains about it, but a simple atomic bool is sufficient to make it happy. llvm-svn: 335823	2018-06-28 10:03:45 +00:00
Craig Topper	3582c81a34	[X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded. This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K. I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing. llvm-svn: 335806	2018-06-28 01:45:44 +00:00
Craig Topper	9b611f24dd	[X86] Change how we prefer shift by immediate over folding a load into a shift. BMI2 added new shift by register instructions that have the ability to fold a load. Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load. We used to enforce this priority by artificially lowering the complexity of the load pattern. This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky. llvm-svn: 335804	2018-06-28 00:47:41 +00:00
Benjamin Kramer	3bbe7b566c	[X86] Make folding table checking threadsafe This is a benign race, but tsan likes to complain about it. Just make it happy. llvm-svn: 335788	2018-06-27 21:01:53 +00:00
Craig Topper	5815114d82	[X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it. If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed. Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode. llvm-svn: 335787	2018-06-27 20:58:46 +00:00
Sameer AbuAsal	92233f14e9	[RISCV] Add machine function pass to merge base + offset Summary: In r333455 we added a peephole to fix the corner cases that result from separating base + offset lowering of global address.The peephole didn't handle some of the cases because it only has a basic block view instead of a function level view. This patch replaces that logic with a machine function pass. In addition to handling the original cases it handles uses of the global address across blocks in function and folding an offset from LW\SW instruction. This pass won't run for OptNone compilation, so there will be a negative impact overall vs the old approach at O0. Reviewers: asb, apazos, mgrang Reviewed By: asb Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones Differential Revision: https://reviews.llvm.org/D47857 llvm-svn: 335786	2018-06-27 20:51:42 +00:00
Fangrui Song	087e3db9d8	[X86] Fix unmatched parenthesis in r335768 llvm-svn: 335769	2018-06-27 19:12:07 +00:00
Craig Topper	b83d20a972	[X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result. The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte. This should match the behavior of objdump from binutils. llvm-svn: 335768	2018-06-27 19:03:36 +00:00
Daniel Sanders	4be0167b82	[globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767	2018-06-27 19:03:21 +00:00
Jessica Paquette	62e08924a6	[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758	2018-06-27 17:43:27 +00:00
Craig Topper	7ce32ed51f	[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754	2018-06-27 16:47:39 +00:00
Craig Topper	4dd98aa412	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin	cc17ccd818	[AMDGPU] Convert rcp to rcp_iflag If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742	2018-06-27 15:33:33 +00:00
Luke Geeson	fade465f79	[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix llvm-svn: 335737	2018-06-27 14:34:40 +00:00
Adhemerval Zanella	65794251d1	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Ivan A. Kosarev	e666ad4518	[NEON] Support vldNq intrinsics in AArch32 (LLVM part) This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733	2018-06-27 13:57:52 +00:00
Luke Geeson	34b88ee5ba	[AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns llvm-svn: 335715	2018-06-27 09:20:13 +00:00
Konstantin Zhuravlyov	8c230e4a9a	AMDGPU/NFC: Fix typo in comment llvm-svn: 335707	2018-06-27 05:36:03 +00:00
Craig Topper	be4feb18ce	[X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group. Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group. The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed. llvm-svn: 335694	2018-06-27 00:42:24 +00:00
Konstantin Zhuravlyov	4a6d2992e9	AMDGPU: Silence unused warnings in waitcnt insertion pass in release build Differential Revision: https://reviews.llvm.org/D48607 llvm-svn: 335669	2018-06-26 21:33:38 +00:00
Jessica Paquette	e5195a2cfc	[X86][AsmParser] Recommit r335658 Recommit of r335658 so that it does not change the behaviour of any existing error output. llvm-svn: 335668	2018-06-26 21:30:34 +00:00
Jessica Paquette	91e636b4ee	Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode" This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b. It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly. llvm-svn: 335660	2018-06-26 20:57:19 +00:00
Jessica Paquette	9dc2822253	[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode Right now, when we use RIP-relative instructions in 32-bit mode, we'll just assert and crash. This adds an error message which tells the user that they can't do that in 32-bit mode, so that we don't crash (and also can see the issue outside of assert builds). llvm-svn: 335658	2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin	91c15f9d04	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654	2018-06-26 20:04:19 +00:00
Matt Arsenault	2b0231f519	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650	2018-06-26 19:10:00 +00:00
Brendon Cahoon	a4831b8a52	[Hexagon] Add a "generic" cpu Add the generic processor for Hexagon so that it can be used with 3rd party programs that create a back-end with the "generic" CPU. This patch also enables the JIT for Hexagon. Differential Revision: https://reviews.llvm.org/D48571 llvm-svn: 335641	2018-06-26 18:44:05 +00:00
Simon Pilgrim	0c0e1104c8	[TargetLowering] isVectorClearMaskLegal - use ArrayRef<int> instead of const SmallVectorImpl<int>& This is more generic and matches isShuffleMaskLegal. Differential Revision: https://reviews.llvm.org/D48591 llvm-svn: 335605	2018-06-26 14:15:31 +00:00
Than McIntosh	70f3b30d46	[X86,ARM] Retain split-stack prolog check for sibling calls Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604	2018-06-26 14:11:30 +00:00
Tim Northover	93ed7be731	ARM: correctly decode VFP instructions following unpredictable t2IT When the condition code for an IT instruction is "AL" we get strange "15" predicates on subsequent instructions. These are dealt with for most instructions by treating them as "ARMCC::AL", but VFP takes a different path which didn't have this code. llvm-svn: 335594	2018-06-26 11:39:20 +00:00
Tim Northover	c3783b6a3e	ARM: diagnose unpredictable IT instructions IT instructions are allowed to have the 'AL' predicate, but it must never result in an 'NV' predicated instruction. Essentially this means that all branches must be 't' rather than 'e' if the predicate is 'AL'. This patch adds a diagnostic for this during assembly (error because parsing hits an assertion if allowed to continue) and an annotation during disassembly. llvm-svn: 335593	2018-06-26 11:38:41 +00:00
Simon Pilgrim	a092baf051	[X86] Just use ArrayRef instead of SmallVectorImpl in a few static method arguments. NFCI. llvm-svn: 335590	2018-06-26 10:45:41 +00:00
Craig Topper	0805e71971	[X86] Don't use getScalarShiftAmountTy to get the immediate type for target specific VSHLDQ/VSRLDQ nodes. These opcodes have a fixed type of i8 for their immediate and shouldn't have anything to do with the scalar shift amount used by target independent shift nodes. llvm-svn: 335578	2018-06-26 04:53:42 +00:00
Dan Gohman	57d54664d6	[WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments. CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments before legalization. The ISD::OutputArg "Outs" array holds legalized arguments, so when indexing into it to find the non-fixed arguemn, we need to use the number of arguments after legalization. Fixes PR37934. llvm-svn: 335576	2018-06-26 03:18:38 +00:00
Craig Topper	702b88ddc2	[X86] Use XOR for SUB (C, X) during isel if will help fold an immediate Summary: Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining. This seems like the safest way to do this trick. And we consider doing it as a DAG combine later. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48557 llvm-svn: 335575	2018-06-26 03:11:15 +00:00
Dan Gohman	29e06076c9	[WebAssembly] Fix a typo in a comment. llvm-svn: 335574	2018-06-26 03:03:41 +00:00
Craig Topper	31280b5b7c	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. This recommits r335562 and 335563 as a single commit. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335568	2018-06-26 01:37:02 +00:00
Craig Topper	c15ea5f565	Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction." These were supposed to have been squashed to a single commit. llvm-svn: 335566	2018-06-26 01:31:53 +00:00
Craig Topper	ef228f37e7	foo llvm-svn: 335562	2018-06-26 00:43:34 +00:00
Craig Topper	155ca218d7	[X86] Simplify intrinsic table binary search to not require a temporary struct. std::lower_bound doesn't require the thing to search for to be the same type as the table entries. We just need to define an appropriate comparison function that can take an table entry and an intrinsic number. llvm-svn: 335518	2018-06-25 20:27:46 +00:00
Craig Topper	0ac6ddab26	[X86] Add comment about the sorting of the memory folding tables added in r335501. llvm-svn: 335517	2018-06-25 20:11:16 +00:00
Lei Huang	852a9b0475	[PowerPC] Fix incorrectly encoded wait instruction Encoding for the wait instruction was wrong. Fix according to ISA 3.0. Differential Revision: https://reviews.llvm.org/D48550 llvm-svn: 335514	2018-06-25 19:28:27 +00:00
Reid Kleckner	d0145262e2	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508	2018-06-25 18:16:27 +00:00
Craig Topper	8df3473309	[X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search. With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed. We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it. Differential Revision: https://reviews.llvm.org/D48527 llvm-svn: 335501	2018-06-25 17:26:56 +00:00
Craig Topper	e64c6852d1	[X86] Allow base and index for gather instructions to appear in other order for Intel syntax. llvm-svn: 335500	2018-06-25 17:26:51 +00:00
Alexander Richardson	efbf6691b6	Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFC There are quite a few if statements that enumerate all these cases. It gets even worse in our fork of LLVM where we also have a Triple::cheri (which is mips64 + CHERI instructions) and we had to update all if statements that check for Triple::mips64 to also handle Triple::cheri. This patch helps to reduce our diff to upstream and should also make some checks more readable. Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D48548 llvm-svn: 335493	2018-06-25 16:49:20 +00:00
Matt Arsenault	91c55cb124	AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490	2018-06-25 16:17:48 +00:00
Matt Arsenault	69747dd448	AMDGPU: Remove commented out code llvm-svn: 335486	2018-06-25 15:42:20 +00:00
Matt Arsenault	763790b4bc	AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers llvm-svn: 335485	2018-06-25 15:42:12 +00:00
Matt Arsenault	39faef50af	AMDGPU: Respect align argument parameter This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478	2018-06-25 14:29:04 +00:00
Craig Topper	5224ed89ca	[X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands. We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted. The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices. With that abort case pushed earlier, we can remove all the abort checks and replace with asserts. llvm-svn: 335446	2018-06-25 06:05:37 +00:00
Heejin Ahn	75b74e2ab2	[WebAssembly] Add WebAssemblyException information analysis Summary: A WebAssemblyException object contains BBs that belong to a 'catch' part of the try-catch-end structure. Because CFGSort requires all the BBs within a catch part to be sorted together as it does for loops, this pass calculates the nesting structure of catch part of exceptions in a function. Now this assumes the use of Windows EH instructions. Reviewers: dschuff, majnemer Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D44134 llvm-svn: 335439	2018-06-25 01:20:21 +00:00
Heejin Ahn	eb5b2cac1c	[WebAssembly] Add WebAssemblyLateEHPrepare pass Summary: Add WebAssemblyLateEHPrepare pass that does several small jobs for exception handling. This runs before CFGSort, and is different from WasmEHPrepare pass that runs before ISel, even though the names are similar. Reviewers: dschuff, majnemer Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D46803 llvm-svn: 335438	2018-06-25 01:07:11 +00:00
Craig Topper	126b775f48	[X86] Simplify some code by using isOneConstant. NFC llvm-svn: 335437	2018-06-25 01:01:47 +00:00
Craig Topper	ef5e7f309f	[X86] Remove the changes to combineScalarToVector made in r335037. They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases. llvm-svn: 335436	2018-06-25 00:21:53 +00:00
Craig Topper	2ce9c890a9	[X86] Reduce the number of patterns needed for masked scalar ceil/floor isel. The scalar to vector on the mask register should not be part of the patterns. llvm-svn: 335435	2018-06-25 00:05:09 +00:00
Brad Smith	11b1a62550	[mips][ias] Enable IAS by default for OpenBSD / FreeBSD mips64/mips64el. Reviewers: atanasyan Differential Review: https://reviews.llvm.org/D31557 llvm-svn: 335434	2018-06-24 15:44:47 +00:00
Craig Topper	ef0f201dac	[X86] Regroup some isel patterns. NFC For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together. llvm-svn: 335429	2018-06-24 06:56:49 +00:00
Craig Topper	b4fc81652d	[X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions. llvm-svn: 335428	2018-06-24 06:29:50 +00:00
Craig Topper	bdb054a757	[X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix some test CHECK lines. llvm-svn: 335414	2018-06-23 06:15:04 +00:00
Craig Topper	9ecee8a705	[X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is used on a rip-relative address. llvm-svn: 335413	2018-06-23 06:03:48 +00:00
Craig Topper	bf899c382e	[X86][AsmParser] Improve base/index register checks. -Ensure EIP isn't used with an index reigster. -Ensure EIP isn't used as index register. -Ensure base register isn't a vector register. -Ensure eiz/riz usage matches the size of their base register. llvm-svn: 335412	2018-06-23 05:53:00 +00:00
Reid Kleckner	9079528d26	[AMDGPU] Update includes for intrinsic changes :( llvm-svn: 335409	2018-06-23 03:05:39 +00:00
Reid Kleckner	2a8c0506f2	[IR] Split Intrinsics.inc into enums and implementations Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407	2018-06-23 02:02:38 +00:00
Craig Topper	60f2d855a5	[X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with in/out instructions. Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds. This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions. llvm-svn: 335403	2018-06-23 00:03:20 +00:00
Craig Topper	b0c7e716b9	[X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking. This allows us to check these: -16-bit addressing doesn't support scale so we should error if we find one there. -Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index. llvm-svn: 335398	2018-06-22 22:28:39 +00:00
Craig Topper	e51eb974e1	[X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP]. By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register. There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently. llvm-svn: 335394	2018-06-22 21:57:24 +00:00
Craig Topper	4e2f16caa6	[X86] Don't accept (%si,%bp) 16-bit address expressions. The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx. This makes us compatible with gas. We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp] llvm-svn: 335384	2018-06-22 20:20:38 +00:00
Craig Topper	9a6b6d86a9	[X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a zero displacement. (%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement. llvm-svn: 335379	2018-06-22 19:42:21 +00:00
Craig Topper	da9e48626c	[X86][AsmParser] Check for invalid 16-bit base register in Intel syntax. llvm-svn: 335373	2018-06-22 17:50:40 +00:00
Craig Topper	3a8b5db807	[X86] Don't allow ESP/RSP to be used as an index register in assembly. Fixes PR37892 llvm-svn: 335370	2018-06-22 17:15:58 +00:00
Sjoerd Meijer	ce9d270373	Recommit of r335326, with the test fixed that I missed. llvm-svn: 335331	2018-06-22 10:03:03 +00:00
Simon Pilgrim	d34cfc19bf	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Sjoerd Meijer	26feb92a8c	Reverting r335326 while I look at the test failure llvm-svn: 335328	2018-06-22 09:17:08 +00:00
Sjoerd Meijer	129e526a8b	[ARM] ARMv6m and v8m.baseline strict align This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline, because it has no support for unaligned accesses. It looks like we always pass target feature "+strict-align" from Clang, so this is not a user facing problem, but querying the subtarget (in e.g. llc) for unaligned access support is incorrect. Differential Revision: https://reviews.llvm.org/D48437 llvm-svn: 335326	2018-06-22 08:48:13 +00:00
Matt Arsenault	aa80fb9253	AMDGPU: Add patterns for i32/i64 local atomic load/store Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325	2018-06-22 08:39:52 +00:00
Mikhail Dvoretckii	0fb1eee73a	[X86] Changing the check for valid inputs in combineScalarToVector Changing the logic of scalar mask folding to check for valid input types rather than against invalid ones, making it more robust and fixing PR37879. Differential Revision: https://reviews.llvm.org/D48366 llvm-svn: 335323	2018-06-22 08:28:05 +00:00
Tom Stellard	b6447f67a8	AMDGPU/GlobalISel: Default to using TableGen'd instruction selector Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319	2018-06-22 03:04:35 +00:00
Tom Stellard	4de5b46280	AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318	2018-06-22 02:54:57 +00:00
Tom Stellard	d9853a4c72	AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316	2018-06-22 02:34:29 +00:00
Tom Stellard	e41569733d	AMDGPU/GlobalISel: Implement select() for COPY Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315	2018-06-22 00:44:29 +00:00
Tom Stellard	5c60568c11	AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307	2018-06-21 23:38:20 +00:00
Reid Kleckner	c1eeade8d2	Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models" MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300	2018-06-21 22:19:05 +00:00
Reid Kleckner	3f66d67494	[X86] Commit some comments that weren't in the medium code model patch llvm-svn: 335298	2018-06-21 21:57:44 +00:00
Reid Kleckner	6435d99a79	[X86] Implement more of x86-64 large and medium PIC code models Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297	2018-06-21 21:55:08 +00:00
Konstantin Zhuravlyov	d421112771	AMDGPU: Remove ability to reserve VGPRs for debugger Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288	2018-06-21 20:28:19 +00:00
Scott Linder	4a78711447	[AMDGPU] Update assembler for HSA Code Object v3 Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281	2018-06-21 19:38:56 +00:00
Simon Dardis	e025b14586	[mips] Modify comment to test new email address (NFC). llvm-svn: 335269	2018-06-21 18:52:32 +00:00
Scott Linder	a83da62375	[AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268	2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov	1ba54fc164	AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267	2018-06-21 18:36:04 +00:00
Sirish Pande	8aef1e988f	Revert "[AArch64] Coalesce Copy Zero during instruction selection" This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7. Original Commit: Author: Haicheng Wu <haicheng@codeaurora.org> Date: Sun Feb 18 13:51:33 2018 +0000 [AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 Author's intention is to remove a BB that has one mov instruction. In order to do that, d8f571050 pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction. If we have multiple fall throughs, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction. This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one branch instruction, we should not pessimize Machine Sinking at all, and find some other solution. llvm-svn: 335251	2018-06-21 16:05:24 +00:00
David Green	219b65e574	[ARM] Enable useAA() for the in-order Cortex-R52 This option allows codegen (such as DAGCombine or MI scheduling) to use alias analysis information, which can help with the codegen on in-order cpu's, especially machine scheduling. Here I have done things the same way as AArch64, adding a subtarget feature to enable this for specific cores, and enabled it for the R52 where we have a schedule to make use of it. Differential Revision: https://reviews.llvm.org/D48074 llvm-svn: 335249	2018-06-21 15:48:29 +00:00
Sameer AbuAsal	70926d1510	[RISCV] Tail calls don't need to save return address Summary: When expanding the PseudoTail in expandFunctionCall() we were using X6 to save the return address. Since this is a tail call the return address is not needed, this patch replaces it with X0 to be ignored. This matches the behaviour listed in the ISA V2.2 document page 110. tail offset -----> jalr x0, x6, offset GCC exhibits the same behavior. Reviewers: apazos, asb, mgrang Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01 Differential Revision: https://reviews.llvm.org/D48343 llvm-svn: 335239	2018-06-21 14:37:09 +00:00
Mikhail Dvoretckii	8f27e9c26e	[x86] Lower some trunc + shuffle patterns to vpmov[q\|d][b\|w] This should help in lowering the following four intrinsics: _mm256_cvtepi32_epi8 _mm256_cvtepi64_epi16 _mm256_cvtepi64_epi8 _mm512_cvtepi64_epi8 Differential Revision: https://reviews.llvm.org/D46957 llvm-svn: 335238	2018-06-21 14:16:45 +00:00
Nicolai Haehnle	06e51aac77	AMDGPU: Remove redundant MIMG instruction variants Summary: For sample and gather ops, we can accurately determine the set of vaddr-size instruction variants that are required. This reduces the size of instruction tables by ~5%. The number of machine instruction opcodes is reduced from 10002 to 9476. Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48168 llvm-svn: 335232	2018-06-21 13:37:55 +00:00
Nicolai Haehnle	1c11c23080	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231	2018-06-21 13:37:45 +00:00
Nicolai Haehnle	d9213dadcb	AMDGPU: Select MIMG instructions manually in SITargetLowering Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228	2018-06-21 13:36:57 +00:00
Nicolai Haehnle	73cfb7a27e	AMDGPU: Refactor MIMG instruction TableGen using generic tables Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227	2018-06-21 13:36:44 +00:00
Nicolai Haehnle	855e1ccd65	AMDGPU: Use generic tables instead of SearchableTable Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226	2018-06-21 13:36:33 +00:00
Nicolai Haehnle	cd2fe54c65	AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM) Summary: This will allows us to provide rich metadata about the instructions in tables that are accessible by custom C++ code. Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48011 llvm-svn: 335224	2018-06-21 13:36:13 +00:00
Nicolai Haehnle	686c8a92c9	AMDGPU: Add implicit def of SCC to kill and indirect pseudos Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223	2018-06-21 13:36:08 +00:00
Nicolai Haehnle	a859c18433	AMDGPU: Turn D16 for MIMG instructions into a regular operand Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222	2018-06-21 13:36:01 +00:00
Simon Pilgrim	fe48f7ae99	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Craig Topper	deeee140fb	[X86] Remove masking from 512-bit floating max/min intrinsics. Use select instruction instead. llvm-svn: 335199	2018-06-21 05:00:56 +00:00
Simon Dardis	1786f41719	[mips] Add microMIPS specific addressing patterns. These are identical but use microMIPS instructions instead of MIPS instructions. Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the ISA_MICROMIPS adjective. Add tests for constant materialization. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48275 llvm-svn: 335185	2018-06-20 22:40:12 +00:00
Craig Topper	9533769289	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00
Simon Dardis	d94ff12893	[mips] Correct predicates for loads, bit manipulation instructions and some pseudos Additionally, correct the definition of the rdhwr instruction. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48216 llvm-svn: 335162	2018-06-20 19:59:58 +00:00
Matt Arsenault	5622f9d716	AMDGPU: Fix scalar_to_vector for v4i16/v4f16 llvm-svn: 335161	2018-06-20 19:45:48 +00:00
Matt Arsenault	808144795a	AMDGPU: Fix missing C++ mode comment llvm-svn: 335160	2018-06-20 19:45:40 +00:00
Alex Bradbury	deac507c70	[RISCV] Accept fmv.s.x and fmv.x.s as mnemonic aliases for fmv.w.x and fmv.x.w These instructions were renamed in version 2.2 of the user-level ISA spec, but the old name should also be accepted by standard tools. llvm-svn: 335154	2018-06-20 18:42:25 +00:00
Sam Clegg	95a1bd1cc2	[WebAssembly] Update know failures for the wasm waterfall Summary: The waterfall no longer builds .s files and no longers uses the wasm-o when it builds object files. Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48371 llvm-svn: 335135	2018-06-20 15:17:12 +00:00
Alex Bradbury	3ced7985d7	[RISCV] Add InstAlias definitions for fgt.{s\|d}, fge.{s\|d} These are produced by GCC and supported by GAS, but not currently contained in the pseudoinstruction listing in the RISC-V ISA manual. llvm-svn: 335127	2018-06-20 14:03:02 +00:00
Krzysztof Parzyszek	46bdb4d45e	[Hexagon] Remove 'T' from HasVNN predicates, NFC Patch by Sumanth Gundapaneni. llvm-svn: 335124	2018-06-20 13:56:09 +00:00
Simon Dardis	b7e0fb6440	[mips] Fix the predicates of some DSP instructions from AdditionalPredicates to ASEPredicate Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48166 llvm-svn: 335122	2018-06-20 13:29:57 +00:00
Alex Bradbury	85940f5697	[RISCV] Add InstAlias definitions for sgt and sgtu These are produced by GCC and supported by GAS, but not currently contained in the pseudoinstruction listing in the RISC-V ISA manual. llvm-svn: 335120	2018-06-20 12:54:02 +00:00
Tim Northover	63b281a203	ARM: convert ORR instructions to ADD where possible on Thumb. Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a 3-address encoding and a wider range of immediates. So, particularly when optimizing for code size (but it doesn't make things worse elsewhere) it's beneficial to select an OR operation to an ADD if we know overflow won't occur. This is made even better by LLVM's penchant for putting operations in canonical form by converting the other way. llvm-svn: 335119	2018-06-20 12:09:44 +00:00
Tim Northover	e031e390d0	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Andrea Di Biagio	4893e095df	[llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register. This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register. On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits". Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register. This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register. The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly. I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2. Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise. Differential Revision: https://reviews.llvm.org/D48225 llvm-svn: 335113	2018-06-20 10:08:11 +00:00
Roman Lebedev	489484b98d	[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity. Summary: First off: i do not have any access to that processor, so this is purely theoretical, no benchmarks. I have been looking into bdver2 scheduling profile, and while cross-referencing the existing btver2, znver1 profiles, and the reference docs (`Software Optimization Guide for AMD Family {15,16,17}h Processors`), i have noticed that only btver2 scheduling profile specifies these. Also, there is no mca test coverage. Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb Reviewed By: GGanesh Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D47676 llvm-svn: 335099	2018-06-20 07:01:14 +00:00
Clement Courbet	a5fd94dfc2	[X86] Add sched class WriteLAHFSAHF and fix values. Summary: I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB. Atom is from Agner and SLM is a guess. I've left AMD processors alone. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48079 llvm-svn: 335097	2018-06-20 06:13:39 +00:00
Craig Topper	28006b2aa7	[X86] Use binary search of the EVEX->VEX static tables instead of populating two DenseMaps for lookups Summary: After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures. Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table. I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time. Reviewers: RKSimon, spatel, chandlerc Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48312 llvm-svn: 335092	2018-06-20 04:32:04 +00:00
Vlad Tsyrklevich	8000cbcff4	Revert r334980 and 334983 This reverts commits r334980 and r334983 because they were causing build timeouts on the x86_64-linux-ubsan bot. llvm-svn: 335085	2018-06-20 00:02:32 +00:00
Jessica Paquette	0a78d09ccb	[MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue insertOutlinerPrologue was not used by any target, and prologue-esque code was beginning to appear in insertOutlinerEpilogue. Refactor that into one function, buildOutlinedFrame. This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue. llvm-svn: 335076	2018-06-19 21:14:48 +00:00
Heejin Ahn	75f7b503ba	[WebAssembly] Fix liveness tracking info after drop insertion Summary: This fixes liveness tracking information after `drop` instruction insertion in ExplicitLocals pass. When a drop instruction is inserted to drop a dead register operand, the original operand should be marked not dead anymore because it is now used by the new drop instruction. And the operand to the new drop instruction should be marked killed instead. This bug caused some programs to fail when `llc` is run with `-verify-machineinstrs` option. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48253 llvm-svn: 335074	2018-06-19 20:30:42 +00:00
Krzysztof Parzyszek	266224bf91	[Hexagon] Fix the value of HexagonII::TypeCVI_FIRST This value is the first vector instruction type in numerical order. The previous value was incorrect, leaving TypeCVI_GATHER outside of the range for vector instructions. This caused vector .new instructions to be incorrectly encoded in the presence of gather. llvm-svn: 335065	2018-06-19 18:09:54 +00:00
Craig Topper	9b3d8dac02	[X86] Initialize FMA3Info directly in its constructor instead of relying on std::call_once FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object. So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff. Differential Revision: https://reviews.llvm.org/D48194 llvm-svn: 335064	2018-06-19 18:06:52 +00:00
Craig Topper	0fce14307b	[X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc. Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE. llvm-svn: 335062	2018-06-19 17:51:42 +00:00
Krzysztof Parzyszek	968d948456	[Hexagon] Enforce restrictions on packetizing cache instructions llvm-svn: 335061	2018-06-19 17:26:20 +00:00
Simon Dardis	61b6020f82	[mips] Mark microMIPS64 as being unsupported. There are no provided instruction definitions for this architecture. Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D48320 llvm-svn: 335057	2018-06-19 16:05:44 +00:00
Simon Dardis	b923a15dc8	[mips] Fix the predicates of some aliases Previously, some aliases were marked as not being available for microMIPS32R6, but this was overridden at the top level. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48321 llvm-svn: 335053	2018-06-19 15:25:01 +00:00
Strahinja Petrovic	f9eee9f1c7	[PowerPC] Fix label address calculation for ppc32 This patch fixes calculating address of label on ppc32 (for -fPIC). Differential Revision: https://reviews.llvm.org/D46582 llvm-svn: 335043	2018-06-19 13:07:40 +00:00
Mikhail Dvoretckii	23c5041b95	[X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns This patch handles back-end folding of generic patterns created by lowering the X86 rounding intrinsics to native IR in cases where the instruction isn't a straightforward packed values rounding operation, but a masked operation or a scalar operation. Differential Revision: https://reviews.llvm.org/D45203 llvm-svn: 335037	2018-06-19 10:37:52 +00:00
Mikhail Dvoretckii	f31fdd2894	Test commit. llvm-svn: 335026	2018-06-19 07:55:10 +00:00
QingShan Zhang	495146cd04	If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating, and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction. Differential Revision: https://reviews.llvm.org/D47568 Reviewed By: Nemanjai llvm-svn: 335024	2018-06-19 06:54:51 +00:00
Craig Topper	150ec2ae44	[X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter. This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information. Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions. Finally, remove the manual table from the emitter. This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap. llvm-svn: 335018	2018-06-19 04:24:44 +00:00
Craig Topper	7043d25ccf	[X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0. EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity. The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1. This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0. This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler. llvm-svn: 335017	2018-06-19 04:24:42 +00:00
Craig Topper	868256cb42	[X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCI The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit. llvm-svn: 335015	2018-06-19 03:17:46 +00:00
Heejin Ahn	155055a5d6	[WebAssembly] Add more utility functions Summary: Added more utility functions that will be used in EH-related passes Also changed `LoopBottom` function to `getBottom` and uses templates to be able to handle other classes as well, which will be used in CFGSort later. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48262 llvm-svn: 335006	2018-06-19 00:32:03 +00:00
Heejin Ahn	9e5ae7307e	[WebAssembly] Make rethrow instruction take a target BB argument Summary: This patch changes the rethrow instruction to take a BB argument in LLVM backend, like `br` and `br_if`s. This BB is a target catch BB the rethrow instruction unwinds to. This BB argument will be converted to an relative depth immediate at the end of CFGStackify pass, as in the same way of branches. RETHROW_TO_CALLER is a codegen-only instruction that should be used when a rethrow instruction does not have an unwind destination BB, i.e., it should rethrow to its caller function. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48260 llvm-svn: 334998	2018-06-18 23:54:29 +00:00
Craig Topper	b77dc63276	[X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass. The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd? llvm-svn: 334994	2018-06-18 23:20:57 +00:00
Eric Christopher	94f7d92109	Tidy comment language and explanation. llvm-svn: 334990	2018-06-18 22:21:19 +00:00
Eric Christopher	f2e78c8681	Pull non-lazy stub table emission into a separate function alongside the individual stub creation to increase readability a bit in the non-object file format specific function. llvm-svn: 334989	2018-06-18 22:21:18 +00:00
Eric Christopher	fb21b6d9b3	Add return statements to make it clear that all of these are mutually exclusive conditions. else if would have worked just as well, but this keeps the original readability a bit more clear. llvm-svn: 334988	2018-06-18 22:21:13 +00:00
Wouter van Oortmerssen	1d61f2cb75	[WebAssembly] Modified tablegen defs to have 2 parallel instuction sets. Summary: One for register based, much like the existing definitions, and one for stack based (suffix _S). This allows us to use registers in most of LLVM (which works better), and stack based in MC (which results in a simpler and more readable assembler / disassembler). Tried to keep this change as small as possible while passing tests, follow-up commit will: - Add reg->stack conversion in MI. - Fix asm/disasm in MC to be stack based. - Fix emitter to be stack based. tests passing: llvm-lit -v `find test -name WebAssembly` test/CodeGen/WebAssembly test/MC/WebAssembly test/MC/Disassembler/WebAssembly test/DebugInfo/WebAssembly test/CodeGen/MIR/WebAssembly test/tools/llvm-objdump/WebAssembly Reviewers: dschuff, sbc100, jgravelle-google, sunfish Subscribers: aheejin, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48183 llvm-svn: 334985	2018-06-18 21:22:44 +00:00
Sander de Smalen	14aa992de5	[AArch64][SVE] Asm: Fix predicate pattern diagnostics. This patch uses the DiagnosticPredicate for SVE predicate patterns to improve their diagnostics, now giving a 'invalid operand' diagnostic if the type is not an immediate or one of the expected pattern labels. Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48220 llvm-svn: 334983	2018-06-18 21:03:02 +00:00
Sander de Smalen	2192e35631	[AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions. The variants added by this patch are: - SQINC signed increment, e.g. sqinc x0, w0, all, mul #4 - SQDEC signed decrement, e.g. sqdec x0, w0, all, mul #4 - UQINC unsigned increment, e.g. uqinc w0, all, mul #4 - UQDEC unsigned decrement, e.g. uqdec w0, all, mul #4 This patch includes asmparser changes to parse a GPR64 as a GPR32 in order to satisfy the constraint check: x0 == GPR64(w0) in: sqinc x0, w0, all, mul #4 ^___^ (must match) Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47716 llvm-svn: 334980	2018-06-18 20:50:33 +00:00
Wouter van Oortmerssen	5f69877632	[WebAssembly] Cleaned up register accessors in WebAssemblyMachineFunctionInfo.h Tested: llvm-lit -v `find test -name WebAssembly` (This is a commit access "test commit" :) llvm-svn: 334979	2018-06-18 20:45:49 +00:00
Craig Topper	96f30be3bf	[X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source. Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility. llvm-svn: 334971	2018-06-18 18:47:07 +00:00
Sander de Smalen	b9e05bec1f	[AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions. Summary: The variants added by this patch are: - SQINC (signed increment) - UQINC (unsigned increment) - SQDEC (signed decrement) - UQDEC (unsigned decrement) For example: uqincw x0, all, mul #4 Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Differential Revision: https://reviews.llvm.org/D47715 llvm-svn: 334948	2018-06-18 14:47:52 +00:00
Simon Pilgrim	6c92f29bd5	[X86][BtVer2] Flag AVX2+ scheduler classes as unsupported Jaguar only supports up to AVX1 Differential Revision: https://reviews.llvm.org/D48274 llvm-svn: 334947	2018-06-18 14:31:14 +00:00
Sander de Smalen	6864576067	[AArch64][SVE] Asm: Support for vector element compares. This patch adds instructions for comparing elements from two vectors, e.g. cmpgt p0.s, p0/z, z0.s, z1.s and also adds support for comparing to a 64-bit wide element vector, e.g. cmpgt p0.s, p0/z, z0.s, z1.d The patch also contains aliases for certain comparisons, e.g.: cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s llvm-svn: 334931	2018-06-18 10:59:19 +00:00
Clement Courbet	cafa0cab01	[X86] Fix NOOP sched overrides on BDW/HSW/SKL. Summary: Noop certainly does not use resources. Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits, gchatelet Differential Revision: https://reviews.llvm.org/D48028 llvm-svn: 334927	2018-06-18 06:48:22 +00:00
Craig Topper	78ffc4f614	[X86] Create X86InstrFMA3Group objects fully in a static table instead of on the heap. NFCI Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables. Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0. This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted. This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method. llvm-svn: 334925	2018-06-18 06:32:22 +00:00
Craig Topper	5679b5a5ef	[X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions. We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions. Also remove the vpextrw.s EVEX alias. That's not something gas implements. llvm-svn: 334922	2018-06-18 05:00:50 +00:00
Craig Topper	58ed1c810a	[X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases. The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser. Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported. llvm-svn: 334920	2018-06-18 01:28:05 +00:00
Craig Topper	d9f71cdd4e	[X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info. These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table. llvm-svn: 334915	2018-06-17 18:00:16 +00:00
Craig Topper	77b20be268	[X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling. Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output. This fixed at least fma-scalar-memfold.ll to succed without the peephole pass. llvm-svn: 334908	2018-06-17 16:29:46 +00:00
Sander de Smalen	013bed785d	[AArch64][SVE] Asm: Support for bitwise operations on predicate vectors. This patch adds support for instructions performing bitwise operations on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS. This patch also adds several aliases: orr p0.b, p1/z, p1.b, p1.b => mov p0.b, p1.b orrs p0.b, p1/z, p1.b, p1.b => movs p0.b, p1.b and p0.b, p1/z, p2.b, p2.b => mov p0.b, p1/z, p2.b ands p0.b, p1/z, p2.b, p2.b => movs p0.b, p1/z, p2.b eor p0.b, p1/z, p2.b, p1.b => not p0.b, p1/z, p2.b eors p0.b, p1/z, p2.b, p1.b => nots p0.b, p1/z, p2.b llvm-svn: 334906	2018-06-17 10:48:21 +00:00
Sander de Smalen	1afe6f4625	[AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions. Support for SVE's predicated select instructions to select elements from either vector, both in a data-vector and a predicate-vector variant. llvm-svn: 334905	2018-06-17 10:11:04 +00:00
Jonas Hahnfeld	4e517a57ce	[NVPTX] Ignore target-cpu and -features for inlining We don't want to prevent inlining because of target-cpu and -features attributes that were added to newer versions of LLVM/Clang: There are no incompatible functions in PTX, ptxas will throw errors in such cases. Differential Revision: https://reviews.llvm.org/D47691 llvm-svn: 334904	2018-06-17 09:55:20 +00:00
Heejin Ahn	fa383144a4	[WebAssembly] Simple comment fix. NFC. llvm-svn: 334899	2018-06-17 00:37:56 +00:00
Craig Topper	29015a63e6	[X86] More additions to the load folding tables based on the autogenerated tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898	2018-06-16 23:25:50 +00:00
Craig Topper	9470965797	[X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser. These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority. llvm-svn: 334897	2018-06-16 23:25:48 +00:00
Craig Topper	3e8074d360	[X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions. VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode. VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode. llvm-svn: 334896	2018-06-16 23:25:47 +00:00
Stanislav Mekhanoshin	1479defe70	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882	2018-06-16 03:46:59 +00:00
Daniel Sanders	50735a1161	[globalisel][tablegen] Add support for C++ predicates on PatFrags and use it to support BFC on ARM. So far, we've only handled special cases of PatFrag like ImmLeaf. This patch adds support for the remaining cases using similar mechanisms. Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on different types and representations and as such the code is not compatible between the two. It's therefore necessary to add an alternative implementation in the GISelPredicateCode field. The target test for this feature could easily be done with IntImmLeaf and this would save on a little boilerplate. The reason I've chosen to implement this using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to find a rule that was blocked solely by lack of support for PatFrag predicates. I found that the ones I investigated as being likely candidates for the test were further blocked by other things. llvm-svn: 334871	2018-06-15 23:13:43 +00:00
Craig Topper	34f9b235b0	[X86] Add more instructions to the hasUndefRegUpdate list. Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future. llvm-svn: 334867	2018-06-15 22:25:04 +00:00
Sean Fertile	21a7afce23	[PowerPC] Add support for high and higha symbol modifiers on tls modifers. Enables using the high and high-adjusted symbol modifiers on thread local storage modifers in powerpc assembly. Needed to be able to support 64 bit thread-pointer and dynamic-thread-pointer access sequences. Differential Revision: https://reviews.llvm.org/D47754 llvm-svn: 334856	2018-06-15 19:47:16 +00:00
Sean Fertile	ab15f3f58c	[PPC64] Support "symbol@high" and "symbol@higha" symbol modifers. Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly. The modifiers represent accessing the segment consiting of bits 16-31 of a 64-bit address/offset. Differential Revision: https://reviews.llvm.org/D47729 llvm-svn: 334855	2018-06-15 19:47:11 +00:00
Tomasz Krupa	c3e22c04da	[X86] Lowering sqrt intrinsics to native IR Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849	2018-06-15 18:05:24 +00:00
Craig Topper	28a7250d25	[X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate. An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too. llvm-svn: 334848	2018-06-15 17:56:17 +00:00
Sander de Smalen	60effea7eb	[AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions. Predicated splat/copy of SIMD/FP register or general purpose register to SVE vector, along with MOV-aliases. llvm-svn: 334842	2018-06-15 16:39:46 +00:00
Sander de Smalen	f3cfe53db1	[AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions. Increment/decrement scalar register by (scaled) element count given by predicate pattern, e.g. 'incw x0, all, mul #4'. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47713 llvm-svn: 334838	2018-06-15 15:47:44 +00:00
Matt Arsenault	bc91f61db9	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836	2018-06-15 15:31:36 +00:00
Matt Arsenault	30c299fdc4	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835	2018-06-15 15:15:46 +00:00
Sander de Smalen	dda2100a0f	[AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47712 llvm-svn: 334831	2018-06-15 13:57:51 +00:00
Simon Dardis	1331e402c6	[mips] Add licensing information of the microMIPS tablegen files. (NFC) llvm-svn: 334827	2018-06-15 13:29:35 +00:00
Sander de Smalen	00d2fd13c5	[AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates. Some instructions require of a limited set of FP immediates as operands, for example '#0.5 or #1.0' for SVE's FADD instruction. This patch adds support for parsing and printing such FP immediates as exact values (e.g. #0.499999 is not accepted for #0.5). Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47711 llvm-svn: 334826	2018-06-15 13:11:49 +00:00
Roman Lebedev	a0dac64487	[AMDGPU] Recognize x & ~(-1 << y) pattern. Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817	2018-06-15 09:56:45 +00:00
Roman Lebedev	733f1f7fbd	[AMDGPU] Recognize x & ((1 << y) - 1) pattern. Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816	2018-06-15 09:56:39 +00:00
Roman Lebedev	ab8ea027b7	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern. Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815	2018-06-15 09:56:31 +00:00
Craig Topper	ddac8b162b	Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update." There's a typo causing the build to fail. llvm-svn: 334803	2018-06-15 06:15:26 +00:00
Craig Topper	21c8de8539	[X86] Prevent folding stack reloads with instructions that have an undefined register update. We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency. llvm-svn: 334802	2018-06-15 06:11:36 +00:00
Craig Topper	5526c28acc	[X86] Add more instructions to the memory folding tables using the autogenerated table as a guide. I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions. There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass. llvm-svn: 334800	2018-06-15 05:49:19 +00:00
Craig Topper	5584c8b9f5	[X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency. llvm-svn: 334785	2018-06-15 04:42:54 +00:00
Sanjay Patel	41b45b2b8c	[x86] be more selective about converting 'and' to shuffle (PR37749) isVectorClearMaskLegal() is the TLI hook used by the generic DAGCombiner::XformToShuffleWithZero(). We've grown to accomodate/expect this transform to shuffle (disabling it more generally results in many regressions). So I'm narrowly excluding the 256-bit types that clearly are not worthwhile for AVX1. I think in most cases we are able to recover by converting the shuffle back into 'and' ops, but the cases in: https://bugs.llvm.org/show_bug.cgi?id=37749 ...show that there are cracks. llvm-svn: 334759	2018-06-14 19:55:02 +00:00
Craig Topper	bb91ae466c	[X86] Fix stale comment in folding tables. llvm-svn: 334758	2018-06-14 19:28:31 +00:00
Tom Stellard	5608dfed6a	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757	2018-06-14 19:26:37 +00:00
Craig Topper	3eb8c69f72	[X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide. The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX. llvm-svn: 334728	2018-06-14 15:40:31 +00:00
Craig Topper	33666c49e2	[X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions. llvm-svn: 334727	2018-06-14 15:40:30 +00:00
Craig Topper	2cd52c8559	[X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load. Found by an audit of the manual table vs the autogenerated table. llvm-svn: 334726	2018-06-14 15:40:29 +00:00
Craig Topper	944ab10211	[X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions. Some of these instructions are already in the manual folding table so we should have them in the auto table too. llvm-svn: 334725	2018-06-14 15:40:27 +00:00
Simon Dardis	ee8aa2671a	[mips] Correct predicates for MSA pseudo instructions llvm-svn: 334708	2018-06-14 13:03:53 +00:00
Craig Topper	ad3af3e8a7	[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685	2018-06-14 03:16:58 +00:00
Tom Stellard	bec61a1866	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665	2018-06-13 22:30:47 +00:00
Craig Topper	726af1b5f7	[X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table. They were in the operand 1 folding table, but their foldable operand is operand 2. llvm-svn: 334648	2018-06-13 20:03:42 +00:00
Stanislav Mekhanoshin	3138aeda82	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640	2018-06-13 18:52:54 +00:00
Yaxun Liu	166d7e51aa	[AMDGPU] Change enqueue kernel handle type Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 llvm-svn: 334625	2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky	f3bc07c9d5	[AMDGPU][MC] Enabled parsing of relocations on VALU instructions See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566 Reviewers: artem.tamazov, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D47884 llvm-svn: 334622	2018-06-13 17:02:03 +00:00
Dmitry Preobrazhensky	224cb6c72d	[AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4 See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D47885 llvm-svn: 334609	2018-06-13 15:32:46 +00:00
Tom Stellard	968b81b26b	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607	2018-06-13 15:06:37 +00:00
Zoran Jovanovic	58b914129e	[mips][microMIPS] Extending size reduction pass with LWP and SWP Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. It introduces reduction of two instructions into one instruction: Two SW instructions are transformed into one SWP instrucition. Two LW instructions are transformed into one LWP instrucition. Differential Revision: https://reviews.llvm.org/D39115 llvm-svn: 334595	2018-06-13 12:51:37 +00:00
Sanjay Patel	434596cb1e	[x86] eliminate even more sign-bit tests with vector select This shortcoming was noted in D47330, and the test diffs show we already had other examples where we failed to fold to a SHRUNKBLEND: /// Dynamic (non-constant condition) vector blend where only the sign bits /// of the condition elements are used. This is used to enforce that the /// condition mask is not valid for generic VSELECT optimizations. This patch implements an idea from D48043 and would obsolete that patch because it catches more cases (notable the AVX1 case that was missed there). All we're doing is allowing the existing transform to fire more often by removing the post-legalize constraint. All of the relevant feature checks and other predicates are left as-is. Differential Revision: https://reviews.llvm.org/D48078 llvm-svn: 334592	2018-06-13 12:28:32 +00:00
Alex Bradbury	446ce455bc	[RISCV] Add codegen support for atomic load/stores with RV32A Fences are inserted according to table A.6 in the current draft of version 2.3 of the RISC-V Instruction Set Manual, which incorporates the memory model changes and definitions contributed by the RISC-V Memory Consistency Model task group. Instruction selection failures will now occur for 8/16/32-bit atomicrmw and cmpxchg operations when targeting RV32IA until lowering for these operations is added in a follow-on patch. Differential Revision: https://reviews.llvm.org/D47589 llvm-svn: 334591	2018-06-13 12:04:51 +00:00
Alex Bradbury	e0603b95db	[RISCV] Codegen support for atomic operations on RV32I This patch adds lowering for atomic fences and relies on AtomicExpandPass to lower atomic loads/stores, atomic rmw, and cmpxchg to __atomic_* libcalls. test/CodeGen/RISCV/atomic-* are modelled on the exhaustive test/CodeGen/PPC/atomics-regression.ll, and will prove more useful once RV32A codegen support is introduced. Fence mappings are taken from table A.6 in the current draft of version 2.3 of the RISC-V Instruction Set Manual, which incorporates the memory model changes and definitions contributed by the RISC-V Memory Consistency Model task group. Differential Revision: https://reviews.llvm.org/D47587 llvm-svn: 334590	2018-06-13 11:58:46 +00:00
Clement Courbet	e3e2fa9c0a	[TableGen] Emit a fatal error on inconsistencies in resource units vs cycles. Summary: For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files. For more obvious cases, I've ventured a fix. Some notes: - Exynos is especially fishy. - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice. - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit. Also see PR37310. Reviewers: RKSimon, craig.topper, javed.absar Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46356 llvm-svn: 334586	2018-06-13 09:41:49 +00:00
Hiroshi Inoue	c50f941cad	[PowerPC] fix trivial typos in comment, NFC llvm-svn: 334583	2018-06-13 08:54:13 +00:00
Hiroshi Inoue	2dde032c9a	[PowerPC] avoid verification failure due to PowerPC VSX Swap Removal pass This patch fixes a failure in lnt tests with -verify-machineinstrs option. When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information. llvm-svn: 334579	2018-06-13 08:25:14 +00:00
Craig Topper	45edfd5c6f	[X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead. llvm-svn: 334576	2018-06-13 07:19:21 +00:00
Craig Topper	73c6ebc8bf	[X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator. Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore. llvm-svn: 334563	2018-06-13 00:04:08 +00:00
Craig Topper	7b76729ea5	[X86] Remove VPCOMPRESSB/W from the autogenerated load folding table. llvm-svn: 334562	2018-06-13 00:04:04 +00:00
Stanislav Mekhanoshin	017429bc2b	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559	2018-06-12 23:50:37 +00:00
Krzysztof Parzyszek	90f9f3ddd4	[DAGCombiner] Recognize more patterns for ABS Differential Revision: https://reviews.llvm.org/D47831 llvm-svn: 334553	2018-06-12 21:51:49 +00:00
Petr Hosek	4239933fa7	[AArch64] Support reserving x20 register Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531	2018-06-12 20:00:50 +00:00
Craig Topper	e65706af2c	[X86] Remove mayLoad flag from AVX512 truncating store instructions. llvm-svn: 334529	2018-06-12 19:59:08 +00:00
Reid Kleckner	aeb232db7c	[MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFF All COFF targets should use @IMGREL32 relocations for symbol differences against __ImageBase. Do the same for getSectionForConstant, so that immediates lowered to globals get merged across TUs. Patch by Chris January Differential Revision: https://reviews.llvm.org/D47783 llvm-svn: 334523	2018-06-12 18:56:05 +00:00
Konstantin Zhuravlyov	fb41a59565	AMDHSA/NFC: Code object v3 updates (additional): - Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521	2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov	7de6ea264e	AMDHSA: Code object v3 updates - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519	2018-06-12 18:02:46 +00:00
Fangrui Song	5fc4e7a80f	[MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use R_X86_64_GOTPC32 instead of R_X86_64_PC32 Summary: This is similar to D46319 (ARM). x86-64 psABI p40 gives an example: leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32. Reviewers: javed.absar, echristo Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar Differential Revision: https://reviews.llvm.org/D47507 llvm-svn: 334515	2018-06-12 16:20:44 +00:00
Simon Pilgrim	f6cb95e1e4	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Craig Topper	115cd845d4	[X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table. llvm-svn: 334511	2018-06-12 15:48:03 +00:00
Krzysztof Parzyszek	5570d6d39d	[Hexagon] Make floating point operations expensive for vectorization llvm-svn: 334508	2018-06-12 15:12:50 +00:00
Sanjay Patel	7bd5278a54	[x86] move shrunkblend transform to helper function; NFCI We should be able to obsolete D48043 by easing the constraints on this existing code. llvm-svn: 334504	2018-06-12 14:21:51 +00:00
Krzysztof Parzyszek	cca7086924	[SelectionDAG] Provide default expansion for rotates Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497	2018-06-12 12:49:36 +00:00
Simon Dardis	3c0a89fac1	[mips] Guard some floating point instructions correctly Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47636 llvm-svn: 334491	2018-06-12 10:28:06 +00:00
Aleksandar Beserminji	8dac2acfcf	[mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with additional flag, so instead of always lowering to lui %hi(...), addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi, %higher or %highest depending on the added flag. Differential Revision: https://reviews.llvm.org/D47941 llvm-svn: 334490	2018-06-12 10:23:49 +00:00
Luke Geeson	577f7ff74f	[AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns llvm-svn: 334488	2018-06-12 09:35:20 +00:00
Craig Topper	b33c89957c	[X86] Add NotMemoryFoldable to the VPCOMPRESS instructions. llvm-svn: 334481	2018-06-12 07:32:19 +00:00
Craig Topper	f3d2f3994e	[X86] Add NotMemoryFoldable to more instructions. These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form. llvm-svn: 334479	2018-06-12 07:32:17 +00:00
Craig Topper	14ec8add52	[X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from the autogenerated load folding table. Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table. A few are real opcode collisions where the memory and register forms are completely different instructions. llvm-svn: 334474	2018-06-12 04:34:59 +00:00

... 3 4 5 6 7 ...

48319 Commits