llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Simon Pilgrim	af8901c21c	[X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN opcodes and remove the X86 implementation With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations. This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code. Differential Revision: http://reviews.llvm.org/D10947 llvm-svn: 241506	2015-07-06 20:30:47 +00:00
Asaf Badouh	a51b8d0d5b	[X86][AVX512] Multiply Packed Unsigned Integers with Round and Scale pmulhrsw review: http://reviews.llvm.org/D10948 llvm-svn: 241443	2015-07-06 14:03:40 +00:00
Benjamin Kramer	89b0e53c15	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Sanjay Patel	30c5c88ab0	fix typos in comment; NFC llvm-svn: 241174	2015-07-01 17:55:07 +00:00
Asaf Badouh	732e3b5425	[x86][AVX512] Add vscalef support include encoding and intrinsics review: http://reviews.llvm.org/D10730 llvm-svn: 240906	2015-06-28 14:30:39 +00:00
Alexander Kornienko	f993659b8f	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Elena Demikhovsky	1df83908be	AVX-512: Added all forms of VPABS instruction Added all intrinsics, tests for encoding, tests for intrinsics. llvm-svn: 240386	2015-06-23 08:19:46 +00:00
Alexander Kornienko	40cb19d802	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Asaf Badouh	6e78caf9ff	[AVX512] add instructions: VPAVGB and VPAVGW review http://reviews.llvm.org/D10504 llvm-svn: 240012	2015-06-18 12:30:53 +00:00
Simon Pilgrim	cfcaa0aa93	[X86][SSE] Vectorize v2i32 to v2f64 conversions This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions. Differential Revision: http://reviews.llvm.org/D10433 llvm-svn: 239855	2015-06-16 21:40:28 +00:00
Igor Breger	f163333815	AVX-512: Implemented cvtsi2ss/d cvtusi2ss/d instructions with round control for KNL. Added intrinsics for cvtsi2ss/d instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D10430 llvm-svn: 239694	2015-06-14 12:44:55 +00:00
Asaf Badouh	08f13fa0ba	re-apply 238809 AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. CR: http://reviews.llvm.org/D9991 llvm-svn: 238923	2015-06-03 13:41:48 +00:00
Elena Demikhovsky	13b85a4aa6	AVX-512: Implemented SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions for SKX and KNL. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238917	2015-06-03 10:56:40 +00:00
Simon Pilgrim	5a028d9698	[X86] Removed (unused) FSRL x86 operation This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....). Since the refactoring of the shuffle lowering code this no longer has any use. Differential Revision: http://reviews.llvm.org/D10169 llvm-svn: 238906	2015-06-03 08:32:36 +00:00
Asaf Badouh	f8387bd5f5	revert 238809 llvm-svn: 238810	2015-06-02 07:45:19 +00:00
Asaf Badouh	9a55f1d0aa	AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. llvm-svn: 238809	2015-06-02 07:18:14 +00:00
Elena Demikhovsky	9e9a44e5bd	AVX-512: Implemented VRANGEPD and VRANGEPD instructions for SKX. Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238738	2015-06-01 11:05:34 +00:00
Elena Demikhovsky	9db95755e6	AVX-512: Implemented VFIXUPIMMPD and VFIXUPIMMPS instructions for KNL and SKX Implemented DAG lowering for all these forms. Added tests for encoding. by Igor Breger (igor.breger@intel.com) llvm-svn: 238728	2015-06-01 06:50:49 +00:00
Matt Arsenault	b0334192af	Add address space argument to isLegalAddressingMode This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723	2015-06-01 05:31:59 +00:00
Chandler Carruth	11c24e4998	[x86] Implement a faster vector population count based on the PSHUFB in-register LUT technique. Summary: A description of this technique can be found here: http://wm.ite.pl/articles/sse-popcount.html The core of the idea is to use an in-register lookup table and the PSHUFB instruction to compute the population count for the low and high nibbles of each byte, and then to use horizontal sums to aggregate these into vector population counts with wider element types. On x86 there is an instruction that will directly compute the horizontal sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily. Various tricks are used to get vNi32 and vNi16 from the vNi8 that the LUT computes. The base implemantion of this, and most of the work, was done by Bruno in a follow up to D6531. See Bruno's detailed post there for lots of timing information about these changes. I have extended Bruno's patch in the following ways: 0) I committed the new tests with baseline sequences so this shows a diff, and regenerated the tests using the update scripts. 1) Bruno had noticed and mentioned in IRC a redundant mask that I removed. 2) I introduced a particular optimization for the i32 vector cases where we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD + PSADBW to compute doubled high i32 popcounts. This takes advantage of the fact that to line up the high i32 popcounts we have to shift them anyways, and we can shift them by one fewer bit to effectively divide the count by two. While the PSHUFD based horizontal add is no faster, it doesn't require registers or load traffic the way a mask would, and provides more ILP as it happens on different ports with high throughput. 3) I did some code cleanups throughout to simplify the implementation logic. 4) I refactored it to continue to use the parallel bitmath lowering when SSSE3 is not available to preserve the performance of that version on SSE2 targets where it is still much better than scalarizing as we'll still do a bitmath implementation of popcount even in scalar code there. With #1 and #2 above, I analyzed the result in IACA for sandybridge, ivybridge, and haswell. In every case I measured, the throughput is the same or better using the LUT lowering, even v2i64 and v4i64, and even compared with using the native popcnt instruction! The latency of the LUT lowering is often higher than the latency of the scalarized popcnt instruction sequence, but I think those latency measurements are deeply misleading. Keeping the operation fully in the vector unit and having many chances for increased throughput seems much more likely to win. With this, we can lower every integer vector popcount implementation using the LUT strategy if we have SSSE3 or better (and thus have PSHUFB). I've updated the operation lowering to reflect this. This also fixes an issue where we were scalarizing horribly some AVX lowerings. Finally, there are some remaining cleanups. There is duplication between the two techniques in how they perform the horizontal sum once the byte population count is computed. I'm going to factor and merge those two in a separate follow-up commit. Differential Revision: http://reviews.llvm.org/D10084 llvm-svn: 238636	2015-05-30 03:20:59 +00:00
Elena Demikhovsky	7d3b86db52	AVX-512: Added VBROADCASTF64X4, VBROADCASTF64X2, VBROADCASTI32X8, and other instructions from this set Added encoding tests. llvm-svn: 237557	2015-05-18 06:42:57 +00:00
Daniel Sanders	5bf4979cf3	[x86] Distinguish the 'o', 'v', 'X', and 'i' inline assembly memory constraints. Summary: But still handle them the same way since I don't know how they differ on this target. Of these, 'o' and 'v' are not tested but were already implemented. I'm not sure why 'i' is required for X86 since it's supposed to be an immediate constraint rather than a memory constraint. A test asserts without it so I've included it for now. No functional change intended. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8254 llvm-svn: 237517	2015-05-16 12:09:54 +00:00
Eric Christopher	2ba04d1116	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Elena Demikhovsky	f25b492812	AVX-512: Added SKX instructions and intrinsics: {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971	2015-05-11 06:05:05 +00:00
Pat Gavlin	c022b8d288	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Matthias Braun	3b3ecc12b2	Change getTargetNodeName() to produce compiler warnings for missing cases, fix them llvm-svn: 236775	2015-05-07 21:33:59 +00:00
Elena Demikhovsky	28f6bb84a5	AVX-512: Added all forms of FP compare instructions for KNL and SKX. Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714	2015-05-07 11:24:42 +00:00
Elena Demikhovsky	16b6cc68cf	AVX-512: added calling convention for i1 vectors in 32-bit mode. Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420	2015-05-04 12:40:50 +00:00
Elena Demikhovsky	40362f45c8	AVX-512: added integer "add" and "sub" instructions with saturation for SKX with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418	2015-05-04 12:35:55 +00:00
Sanjay Patel	f5d182f516	[x86] Implement combineRepeatedFPDivisors Set the transform bar at 2 divisions because the fastest current x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle latency (best case) relative to a 5 cycle multiplier. So that's the worst case for this transform (no latency win), but multiplies are obviously pipelined while divisions are not, so there's still a big throughput win which we would expect to show up in typical FP code. These are the sequences I'm comparing: divss %xmm2, %xmm0 mulss %xmm1, %xmm0 divss %xmm2, %xmm0 Becomes: movss LCPI0_0(%rip), %xmm3 ## xmm3 = mem[0],zero,zero,zero divss %xmm2, %xmm3 mulss %xmm3, %xmm0 mulss %xmm1, %xmm0 mulss %xmm3, %xmm0 [Ignore for the moment that we don't optimize the chain of 3 multiplies into 2 independent fmuls followed by 1 dependent fmul...this is the DAG version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that, then the transform becomes even more profitable on all targets.] Differential Revision: http://reviews.llvm.org/D8941 llvm-svn: 235012	2015-04-15 15:22:55 +00:00
Daniel Sanders	6dc30f40bf	Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC. Summary: This is instead of doing this in target independent code and is the last non-functional change before targets begin to distinguish between different memory constraints when selecting code for the ISD::INLINEASM node. Next, each target will individually move away from the idea that all memory constraints behave like 'm'. Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8173 llvm-svn: 232373	2015-03-16 13:13:41 +00:00
JF Bastien	0cecbf8a42	Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250	2015-03-04 15:47:57 +00:00
Sanjay Patel	231cac536a	remove enum value names from comments; NFC llvm-svn: 231129	2015-03-03 20:58:35 +00:00
Eric Christopher	454cbc40f6	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699	2015-02-26 22:38:43 +00:00
Eric Christopher	2a41cb1089	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583	2015-02-26 00:00:24 +00:00
Elena Demikhovsky	b15d81ba19	AVX-512: recommitted 229837 + bugfix + test llvm-svn: 230223	2015-02-23 15:12:31 +00:00
Benjamin Kramer	2c5770c005	Remove dead prototype. llvm-svn: 230137	2015-02-21 14:35:00 +00:00
Eric Christopher	c93875565e	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942	2015-02-20 00:45:28 +00:00
Elena Demikhovsky	41438d50e6	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. llvm-svn: 229837	2015-02-19 10:48:04 +00:00
Elena Demikhovsky	2ae2229fab	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645	2015-02-18 07:59:20 +00:00
Ahmed Bougacha	fccf28b772	[CodeGen] Add hook/combine to form vector extloads, enabled on X86. The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325	2015-02-05 18:31:02 +00:00
Bruno Cardoso Lopes	559b43d1de	[X86][MMX] Handle i32->mmx conversion using movd Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293	2015-02-05 13:23:07 +00:00
Eric Christopher	b584389296	Reuse a bunch of cached subtargets and remove getSubtarget calls without a Function argument. llvm-svn: 227814	2015-02-02 17:38:43 +00:00
Eric Christopher	cc4cd0396b	Remove the last vestiges of resetOperationActions. llvm-svn: 227648	2015-01-31 00:21:17 +00:00
Elena Demikhovsky	237f19f35f	AVX-512: Added FMA intrinsics with rounding mode By Asaf Badouh and Elena Demikhovsky Added special nodes for rounding: FMADD_RND, FMSUB_RND.. It will prevent merge between nodes with rounding and other standard nodes. llvm-svn: 227303	2015-01-28 10:21:27 +00:00
David Majnemer	8bb88f79fa	X86: Don't make illegal GOTTPOFF relocations "ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF relocation target a movq or addq instruction. Prohibit the truncation of such loads to movl or addl. This fixes PR22083. Differential Revision: http://reviews.llvm.org/D6839 llvm-svn: 225250	2015-01-06 07:12:52 +00:00
Andrea Di Biagio	64b0da84f4	[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899	2014-12-28 11:07:35 +00:00
Michael Kuperstein	3790301d73	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle. This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429	2014-12-17 12:32:17 +00:00
Elena Demikhovsky	e879b19906	AVX-512: Added all forms of COMPRESS instruction + intrinsics + tests llvm-svn: 224019	2014-12-11 15:02:24 +00:00
Elena Demikhovsky	2450049261	AVX-512: Intrinsics for ERI 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774	2014-11-12 07:31:03 +00:00

1 2 3 4 5 ...

694 Commits