llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00

Author	SHA1	Message	Date
Eric Christopher	1b3b092405	Fix typo. llvm-svn: 209377	2014-05-22 01:21:44 +00:00
Eric Christopher	98c81f11ea	Fix compilation issues. llvm-svn: 209342	2014-05-21 23:51:57 +00:00
Eric Christopher	7880d61aac	Make early if conversion dependent upon the subtarget and add a subtarget hook to enable. Unconditionally add to the pass pipeline for targets that might want to use it. No functional change. llvm-svn: 209340	2014-05-21 23:40:26 +00:00
Quentin Colombet	b70fffa971	[X86] Fix a bug in the lowering of BLENDI introduced in r209043. ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the second argument. On the other hand, BLENDI uses 0 to identify the first argument and 1 to identify the second argument. Fix the generation of the blend mask to account for this difference. The bug did not show up with r209043, because we were not checking for the actual arguments of the blend instruction! This commit also fixes the test cases. Note: The same mask works for the BLENDr variant because the arguments are swapped during instruction selection (see the BLENDXXrr patterns). <rdar://problem/16975435> llvm-svn: 209324	2014-05-21 22:00:39 +00:00
Evgeniy Stepanov	d57ad51b22	[asan] Fix x86-32 asm instrumentation to preserve flags. Patch by Yuri Gorshenin. llvm-svn: 209280	2014-05-21 08:14:24 +00:00
Simon Atanasyan	e4d2663548	Add parentheses to suppress the gcc warning '-Wparentheses'. No functional changes. llvm-svn: 209203	2014-05-20 10:23:04 +00:00
Alexey Volkov	9a03018603	[X86] Tune LEA usage for Silvermont According to Intel Software Optimization Manual on Silvermont in some cases LEA is better to be replaced with ADD instructions: "The rule of thumb for ADDs and LEAs is that it is justified to use LEA with a valid index and/or displacement for non-destructive destination purposes (especially useful for stack offset cases), or to use a SCALE. Otherwise, ADD(s) are preferable." Differential Revision: http://reviews.llvm.org/D3826 llvm-svn: 209198	2014-05-20 08:55:50 +00:00
Juergen Ributzka	b62ac80e67	[ConstantHoisting][X86] Change the cost model to never hoist constants for types larger than i128. Currently the X86 backend doesn't support types larger than i128 very well. For example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted. This fix changes the cost model to never hoist constants for types larger than i128. Once the codegen issues have been resolved, the cost model can be updated to allow also larger types. This is related to <rdar://problem/16954938> llvm-svn: 209162	2014-05-19 21:00:53 +00:00
Andrea Di Biagio	41bcee5bc3	[X86] Add ISel patterns to improve the selection of TZCNT and LZCNT. Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always provide the operand size as output if the input operand is zero. We can take advantage of this knowledge during instruction selection stage in order to simplify a few corner case. llvm-svn: 209159	2014-05-19 20:38:59 +00:00
Filipe Cabecinhas	f09daeadf1	Added more insertps optimizations Summary: When inserting an element that's coming from a vector load or a broadcast of a vector (or scalar) load, combine the load into the insertps instruction. Added PerformINSERTPSCombine for the case where we need to fix the load (load of a vector + insertps with a non-zero CountS). Added patterns for the broadcasts. Also added tests for SSE4.1, AVX, and AVX2. Reviewers: delena, nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3581 llvm-svn: 209156	2014-05-19 19:45:57 +00:00
Benjamin Kramer	600e24a1cb	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00
Saleem Abdulrasool	501d3b6235	Target: remove old constructors for CallLoweringInfo This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082	2014-05-17 21:50:17 +00:00
Chandler Carruth	a0362e551c	[x86] Fix a bad predicate I spotted by inspection -- pshufhw and pshuflw were added in SSE2, no SSSE3. Found this while auditing all uses of SSSE3 in the X86 target. I don't actually expect this to make a significant difference on anything and I don't have any detailed test cases but I updated the existing test cases that already covered some of this code path. llvm-svn: 209056	2014-05-17 03:29:20 +00:00
Filipe Cabecinhas	3d72585a01	Implemented special cases for PerformVSELECTCombine. vselects with constant masks, after legalization, will get turned into specialized shuffle_vectors so they can be matched to blend+imm instructions. Fixed some tests. llvm-svn: 209044	2014-05-16 22:47:54 +00:00
Filipe Cabecinhas	9acd5d4e5e	Lower vselects into X86ISD::BLENDI when appropriate. LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the condition is constant and we can emit that instruction, given the subtarget. This is not enough for all cases. An additional SELECTCombine optimization will be committed. Fixed tests that were expecting variable blends but where a blend+imm can be generated. Added test where we can't emit blend+immediate. Added avx2 blend+imm tests. llvm-svn: 209043	2014-05-16 22:47:49 +00:00
Filipe Cabecinhas	7f5f4ad94e	Implemented LowerVSELECT to custom lower some instructions. No functionality change intended. The types that previously were set to lower as Expand or Legal are doing the same thing with this lowering function. llvm-svn: 209042	2014-05-16 22:47:43 +00:00
Rafael Espindola	e809bea68e	Delete getAliasedGlobal. llvm-svn: 209040	2014-05-16 22:37:03 +00:00
Tim Northover	25b5918f36	X86: disable printing of bare "mov" aliases In AT&T syntax, we should probably print the full "movl" or "movw". TableGen used to ignore these aliases because it was miscounting the number of operands. This fixes the issue. This will be tested when the TableGen "should I print this Alias" heuristic is fixed (very soon). llvm-svn: 208963	2014-05-16 09:41:26 +00:00
Andrea Di Biagio	ac4cb52129	[X86] Teach the backend how to fold SSE4.1/AVX/AVX2 blend intrinsics. Added target specific combine rules to fold blend intrinsics according to the following rules: 1) fold(blend A, A, Mask) -> A; 2) fold(blend A, B, <allZeros>) -> A; 3) fold(blend A, B, <allOnes>) -> B. Added two new tests to verify that the new folding rules work for all the optimized blend intrinsics. llvm-svn: 208895	2014-05-15 15:18:15 +00:00
Tim Northover	ac5dac4c75	TableGen: use correct MIOperand when printing aliases Previously, TableGen assumed that every aliased operand consumed precisely 1 MachineInstr slot (this was reasonable because until a couple of days ago, nothing more complicated was eligible for printing). This allows a couple more ARM64 aliases to print so we can remove the special code. On the X86 side, I've gone for explicit AT&T size specifiers as the default, so turned off a few of the aliases that would have just started printing. llvm-svn: 208880	2014-05-15 13:36:01 +00:00
Tim Northover	4ba95d4483	TableGen/ARM64: print aliases even if they have syntax variants. To get at least one use of the change (and some actual tests) in with its commit, I've enabled the AArch64 & ARM64 NEON mov aliases. llvm-svn: 208867	2014-05-15 11:16:32 +00:00
Alp Toker	18115693f7	Fix typos llvm-svn: 208839	2014-05-15 01:52:21 +00:00
Jay Foad	e0eac700cb	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Benjamin Kramer	56a86f3d17	X86: If we have an instruction that sets a flag and a zero test on the input of that instruction try to eliminate the test. For example tzcntl %edi, %ebx testl %edi, %edi je .label can be rewritten into tzcntl %edi, %ebx jb .label A minor complication is that tzcnt sets CF instead of ZF when the input is zero, we have to rewrite users of the flags from ZF to CF. Currently we recognize patterns using lzcnt, tzcnt and popcnt. Differential Revision: http://reviews.llvm.org/D3454 llvm-svn: 208788	2014-05-14 16:14:45 +00:00
Reid Kleckner	d7efe8386c	Try to fix an SDAG dependence issue with sret r208453 added support for having sret on the second parameter. In that change, the code for copying sret into a virtual register was hoisted into the loop that lowers formal parameters. This caused a "Wrong topological sorting" assertion failure during scheduling when a parameter is passed in memory. This change undoes that by creating a second loop that deals with sret. I'm worried that this fix is incomplete. I don't fully understand the dependence issues. However, with this change we produce the same DAGs we used to produce, so if they are broken, they are just as broken as they have always been. llvm-svn: 208637	2014-05-12 22:01:27 +00:00
Tim Northover	3c2cc7a397	TableGen: use PrintMethods to print more aliases llvm-svn: 208607	2014-05-12 18:04:06 +00:00
Aaron Ballman	005c04633a	Silencing an MSVC warning about not all control paths returning a value (even though the switch is fully covered). No functional change. llvm-svn: 208565	2014-05-12 14:22:58 +00:00
Rafael Espindola	13ccdbe63e	Remove an always true argument. llvm-svn: 208557	2014-05-12 13:30:10 +00:00
Benjamin Kramer	32f96f80e4	X86: Make sure that we have SSE4.1 before we generate insertps nodes. PR19721. llvm-svn: 208552	2014-05-12 13:12:08 +00:00
NAKAMURA Takumi	64ea062a28	X86ISelLowering.cpp:LowerINTRINSIC_W_CHAIN(): Prune impossible "default:" [-Wcovered-switch-default] llvm-svn: 208533	2014-05-12 10:16:46 +00:00
Elena Demikhovsky	e132f428fd	Fixed compilation issue llvm-svn: 208524	2014-05-12 07:45:41 +00:00
Elena Demikhovsky	784490ba2d	AVX-512: changes in intrinsics 1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed. 2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012). 3) I added gather/scatter prefetch intrinsics. 4) I fixed MRMm encoding for masked instructions. llvm-svn: 208522	2014-05-12 07:18:51 +00:00
Hal Finkel	5b038e4cbc	Pass the value type to TLI::getRegisterByName We must validate the value type in TLI::getRegisterByName, because if we don't and the wrong type was used with the IR intrinsic, then we'll assert (because we won't be able to find a valid register class with which to construct the requested copy operation). For PPC64, additionally, the type information is necessary to decide between the 64-bit register and the 32-bit subregister. No functionality change. llvm-svn: 208508	2014-05-11 19:29:07 +00:00
Hal Finkel	34d719e885	Add 'override' to getRegisterByName in *ISelLowering.h No functionality change intended. llvm-svn: 208507	2014-05-11 19:28:55 +00:00
Filipe Cabecinhas	5c7f162cea	Fixed a bug when lowering build_vector (PR19694) When lowering build_vector to an insertps, we would still lower it, even if the source vectors weren't v4x32. This would break on avx if the source was a v8x32. We now check the type of the source vectors. llvm-svn: 208487	2014-05-11 08:12:56 +00:00
Reid Kleckner	86ad66783f	Revert "[ms-cxxabi] Add a new calling convention that swaps 'this' and 'sret'" This reverts commit r200561. This calling convention was an attempt to match the MSVC C++ ABI for methods that return structures by value. This solution didn't scale, because it would have required splitting every CC available on Windows into two: one for methods and one for free functions. Now that we can put sret on the second arg (r208453), and Clang does that (r208458), revert this hack. llvm-svn: 208459	2014-05-09 22:56:42 +00:00
Reid Kleckner	0c2e3574f4	Allow sret on the second parameter as well as the first MSVC always places the implicit sret parameter after the implicit this parameter of instance methods. We used to handle this for x86_thiscallcc by allocating the sret parameter on the stack and leaving the this pointer in ecx, but that doesn't handle alternative calling conventions like cdecl, stdcall, fastcall, or the win64 convention. Instead, change the verifier to allow sret on the second parameter. This also requires changing the Mips and X86 backends to return the argument with the sret parameter, instead of assuming that the sret parameter comes first. The Sparc backend also returns sret parameters in a register, but I wasn't able to update it to handle secondary sret parameters. It currently calls report_fatal_error if you feed it an sret in the second parameter. Reviewers: rafael.espindola, majnemer Differential Revision: http://reviews.llvm.org/D3617 llvm-svn: 208453	2014-05-09 22:32:13 +00:00
Andrea Di Biagio	db3980299c	Fix 80 col violation. No functional change intended. llvm-svn: 208405	2014-05-09 11:08:23 +00:00
Benjamin Kramer	330ea89877	[asan] Stop leaking X86Operands. llvm-svn: 208400	2014-05-09 09:48:03 +00:00
Filipe Cabecinhas	f3415cd85c	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 208372	2014-05-08 23:16:08 +00:00
Andrea Di Biagio	7e0e036a46	[X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics. This patch teaches the backend how to combine packed SSE2/AVX2 arithmetic shift intrinsics. The rules are: - Always fold a packed arithmetic shift by zero to its first operand; - Convert a packed arithmetic shift intrinsic dag node into a ISD::SRA only if the shift count is known to be smaller than the vector element size. This patch also teaches to function 'getTargetVShiftByConstNode' how fold target specific vector shifts by zero. Added two new tests to verify that the DAGCombiner is able to fold sequences of SSE2/AVX2 packed arithmetic shift calls. llvm-svn: 208342	2014-05-08 17:44:04 +00:00
Evgeniy Stepanov	196ad52640	[asan] Preserve flags in asm instrumentation. Patch by Yuri Gorshenin. llvm-svn: 208296	2014-05-08 09:55:24 +00:00
Hal Finkel	c52e65b830	Move late partial-unrolling thresholds into the processor definitions The old method used by X86TTI to determine partial-unrolling thresholds was messy (because it worked by testing target features), and also would not correctly identify the target CPU if certain target features were disabled. After some discussions on IRC with Chandler et al., it was decided that the processor scheduling models were the right containers for this information (because it is often tied to special uop dispatch-buffer sizes). This does represent a small functionality change: - For generic x86-64 (which uses the SB model and, thus, will get some unrolling). - For AMD cores (because they still currently use the SB scheduling model) - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump the default threshold to 50; we're working on a test case for this). Otherwise, nothing has changed for any other targets. The logic, however, has been moved into BasicTTI, so other targets may now also opt-in to this functionality simply by setting LoopMicroOpBufferSize in their processor model definitions. llvm-svn: 208289	2014-05-08 09:14:44 +00:00
Filipe Cabecinhas	275860c4fd	Lower certain build_vectors to insertps instructions Summary: Vectors built with zeros and elements in the same order as another (source) vector are optimized to be built using a single insertps instruction. Also optimize when we move one element in a vector to a different place in that vector while zeroing out some of the other elements. Further optimizations are possible, described in TODO comments. I will be implementing at least some of them in the near future. Added some tests for different cases where this optimization triggers. Reviewers: nadav, delena, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3521 llvm-svn: 208271	2014-05-08 00:25:16 +00:00
Hal Finkel	a22ec95e68	[X86TTI] Remove the unrolling branch limits The loop stream detector (LSD) on modern Intel cores, which optimizes the execution of small loops, has limits on the number of taken branches in addition to uop-count limits (modern AMD cores have similar limits). Unfortunately, at the IR level, estimating the number of branches that will be taken is difficult. For one thing, it strongly depends on later passes (block placement, etc.). The original implementation took a conservative approach and limited the maximal BB DFS depth of the loop. However, fairly-extensive benchmarking by several of us has revealed that this is the wrong approach. In fact, there are zero known cases where the branch limit prevents a detrimental unrolling (but plenty of cases where it does prevent beneficial unrolling). While we could improve the current branch counting logic by incorporating branch probabilities, this further complication seems unjustified without a motivating regression. Instead, unless and until a regression appears, the branch counting will be removed. llvm-svn: 208255	2014-05-07 22:25:18 +00:00
Quentin Colombet	9b13d839be	[X86] Selectively mark the FMA variants inside a family as isCommutable. Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or memory) are commutable. E.g., for the 213 family (with the syntax src1, src2, src3): fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3 Now consider the 231 family: fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B But fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231. Working on a reduced test case! <rdar://problem/16800495> llvm-svn: 208252	2014-05-07 21:43:35 +00:00
Eric Christopher	91701a28d0	Reformat a couple of functions for clarity. llvm-svn: 208248	2014-05-07 21:05:47 +00:00
Chandler Carruth	17e2aa3c0a	[x86] Make the 'x86-64' cpu, what I see as and many use as the generic default architecture for reasonable modern x86 processors, actually be modern. This processor model should essentially be "tuned" for modern x86 chips as much as possible without undue penalties on any specific architecture. Previously we weren't even using the nice scheduling models. There are a few other tweaks needed here, but this change at least I have benchmarked across a decent swatch of chips (intel's clovertown, westmere, and sandybridge; amd's istanbul) and seen no significant regressions. If anyone has suggested ways to test this, just let me know. Somewhat alarmingly, no existing tests failed. llvm-svn: 208230	2014-05-07 17:37:03 +00:00
Evgeniy Stepanov	4a872de9c5	[asan] Add a flag to control asm instrumentation. With this change, asm instrumentation is disabled by default. llvm-svn: 208167	2014-05-07 07:54:11 +00:00
Andrea Di Biagio	540f8696d1	[X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa). Before this patch, the backend always emitted a store+load sequence to bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting i64 node was then used to build a v2i32 vector. With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free" bitcast to type MVT::v4i32. The elements of the resulting v4i32 are then extracted to build a v2i32 vector (which is illegal and therefore promoted to MVT::v2i64). This is in general cheaper than emitting a stack store+load sequence to bitconvert the operand from type f64 to type i64. llvm-svn: 208107	2014-05-06 17:09:03 +00:00
Renato Golin	8a9a382ab2	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Craig Topper	659167eb76	Use X86 memory operand enums instead of hardcoding. llvm-svn: 208064	2014-05-06 07:04:32 +00:00
Reid Kleckner	3d55680273	Fix i128 div/mod on mingw64 The Win64 docs are very clear that anything larger than 8 bytes is passed by reference, and GCC MinGW64 honors that for __modti3 and friends. Patch by Jameson Nash! llvm-svn: 208029	2014-05-06 01:20:42 +00:00
Filipe Cabecinhas	cccfa360ce	Revert "Optimize shufflevector that copies an i64/f64 and zeros the rest." This reverts commit 207992. I misread the phab number on the LGTM. llvm-svn: 207993	2014-05-05 19:40:36 +00:00
Filipe Cabecinhas	912c9dd1fe	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 207992	2014-05-05 19:36:28 +00:00
Elena Demikhovsky	52b1f22e9c	AVX-512: minor change in rndscale intrinsic llvm-svn: 207937	2014-05-04 13:35:37 +00:00
Saleem Abdulrasool	d9cd5486e9	X86: further range-loopify AsmPrinter Use more range loops in the X86AsmPrinter. NFC. llvm-svn: 207928	2014-05-04 01:54:17 +00:00
Saleem Abdulrasool	53f4c01738	X86: remove X86COFFMachineModuleInfo Remove dead code. This is vestigial after r98384. llvm-svn: 207927	2014-05-04 01:54:12 +00:00
Saleem Abdulrasool	edfd1818df	X86: repair export compatibility with MinGW/cygwin Both MinGW and cygwin (i686) construct export directives without the global leader prefix. This is mostly due to the fact that they use GNU ld which does not correctly handle the export directive. This apparently has been been broken for a while. However, this was recently reported as being broken by mingwandroid and diorcety of the msys2 project. Remove the global leader prefix if targeting MinGW or cygwin, otherwise, retain the global leader prefix. Add an explicit test for cygwin's behaviour of export directives. llvm-svn: 207926	2014-05-04 00:03:48 +00:00
Saleem Abdulrasool	f061e56242	X86: refactor export directive generation Create a helper function to generate the export directive. This was previously duplicated inline to handle export directives for variables and functions. This also enables the use of range-based iterators for the generation of the directive rather than the traditional loops. NFC. llvm-svn: 207925	2014-05-04 00:03:41 +00:00
Rafael Espindola	bb77d317cd	Fix pr19645. The fix itself is fairly simple: move getAccessVariant to MCValue so that we replace the old weak expression evaluation with the far more general EvaluateAsRelocatable. This then requires that EvaluateAsRelocatable stop when it finds a non trivial reference kind. And that in turn requires the ELF writer to look harder for weak references. Last but not least, this found a case where we were being bug by bug compatible with gas and accepting an invalid input. I reported pr19647 to track it. llvm-svn: 207920	2014-05-03 19:57:04 +00:00
Benjamin Kramer	b0e4e7faba	Add a description for AMD's bdver4 (aka Excavator). This is just bdver3 + AVX2 + BMI2. llvm-svn: 207847	2014-05-02 15:47:07 +00:00
Joerg Sonnenberger	2105d4cb10	Restore condition incorrectly changed in r96289 to the older state. llvm-svn: 207716	2014-04-30 22:40:27 +00:00
Michael Zolotukhin	b12e921e64	[X86] Never hoist the shift value of a shift instruction. There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. This change is like r206101, but for X86. rdar://problem/16190769 llvm-svn: 207692	2014-04-30 19:17:32 +00:00
Evgeniy Stepanov	fdf3d359e7	[asan] Disable asm instrumentation on unsupported platforms. Only emit calls to compiler-rt asm routines on platforms where they are present (currently limited to linux i386/x86_64). Patch by Yuri Gorshenin. llvm-svn: 207651	2014-04-30 14:04:31 +00:00
Craig Topper	79b097d66a	Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently. llvm-svn: 207616	2014-04-30 07:17:30 +00:00
Reid Kleckner	a6d2fff5aa	Implement X86 code generation for musttail Currently, musttail codegen is relying on sibcall optimization, and reporting a fatal error if fails. Sibcall optimization fails when stack arguments need to be modified, which is insufficient for musttail. The logic for moving arguments in memory safely is already implemented for GuaranteedTailCallOpt. This change merely arranges for musttail calls to use it. No functional change for GuaranteedTailCallOpt. Reviewers: espindola Differential Revision: http://reviews.llvm.org/D3493 llvm-svn: 207598	2014-04-29 23:55:41 +00:00
Tim Northover	2e5fed0f5d	X86: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207518	2014-04-29 10:06:10 +00:00
Elena Demikhovsky	b3422ddc5e	AVX-512: optimized a shuffle pattern to VINSERTI64x4. Added intrinsics for VPERMT2PS/PD/D/Q instructions. llvm-svn: 207513	2014-04-29 09:09:15 +00:00
Craig Topper	244adfe60a	[C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. llvm-svn: 207511	2014-04-29 07:58:41 +00:00
Eric Christopher	4af26f41d6	None of these targets actually define their own CFI_INSTRUCTION opcode so there's no reason to use the target namespace for it rather than TargetOpcode. llvm-svn: 207475	2014-04-29 00:16:46 +00:00
Eric Christopher	03fb9a011b	Fix 80-columns, tab characters, and comments. llvm-svn: 207472	2014-04-29 00:16:33 +00:00
Quentin Colombet	a3db448819	[X86] Add more details in the comments of X86TargetLowering::getScalingFactorCost. llvm-svn: 207432	2014-04-28 18:39:57 +00:00
Patrik Hagglund	da02d6849e	Fix gcc -Wsign-compare warning in X86DisassemblerTables.cpp. X86_MAX_OPERANDS is changed to unsigned. Also, add range-based for loops for affected loops. This in turn needed an ArrayRef instead of a pointer-to-array in InternalInstruction. llvm-svn: 207413	2014-04-28 12:12:27 +00:00
Craig Topper	9683cb114b	Convert more SelectionDAG functions to use ArrayRef. llvm-svn: 207397	2014-04-28 05:57:50 +00:00
Craig Topper	b663bffa27	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Craig Topper	e5c6e7f4ea	Convert one last signature of getNode to take an ArrayRef of SDUse. llvm-svn: 207376	2014-04-27 19:21:06 +00:00
Craig Topper	536995c0a7	Convert SelectionDAG::getMergeValues to use ArrayRef. llvm-svn: 207374	2014-04-27 19:20:57 +00:00
Benjamin Kramer	764309a6cd	X86TTI: Adjust sdiv cost now that we can lower it on plain SSE2. Includes a fix for a horrible typo that caused all SDIV costs to be slightly off :) llvm-svn: 207371	2014-04-27 18:47:54 +00:00
Benjamin Kramer	cc45aefeb0	X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts. This is more expensive than pmuldq but still cheaper than scalarizing the whole thing. llvm-svn: 207370	2014-04-27 18:47:41 +00:00
Saleem Abdulrasool	a5ef2286b3	MC: create X86WinCOFFStreamer for target specific behaviour This introduces a target specific streamer, X86WinCOFFStreamer, which handles the target specific behaviour (e.g. WinEH). This is mostly to ensure that differences between ARM and X86 remain disjoint and do not accidentally cross boundaries. This is the final staging change for enabling object emission for Windows on ARM. llvm-svn: 207344	2014-04-27 03:48:12 +00:00
Craig Topper	e0741a0fcb	Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size. llvm-svn: 207329	2014-04-26 19:29:41 +00:00
Craig Topper	1b1f54bcca	Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>. llvm-svn: 207327	2014-04-26 18:35:24 +00:00
Benjamin Kramer	e35c57aa7d	Print X86ISD::PMULDQ nodes properly in debug output. llvm-svn: 207322	2014-04-26 16:26:41 +00:00
Benjamin Kramer	dfc082bbd6	X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now. Turn vectorization back on. llvm-svn: 207320	2014-04-26 14:53:05 +00:00
Benjamin Kramer	6ac674546f	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Benjamin Kramer	aad9317559	X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317	2014-04-26 13:01:03 +00:00
Benjamin Kramer	89fb3dd5a4	Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors. llvm-svn: 207316	2014-04-26 13:00:53 +00:00
Benjamin Kramer	46f8aa6183	X86: Custom lower v4i32 UMUL_LOHI into 2 pmuludqs. Test will follow soon. llvm-svn: 207314	2014-04-26 12:06:11 +00:00
Quentin Colombet	f5142bf03e	[X86] Implement TargetLowering::getScalingFactorCost hook. Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301	2014-04-26 01:11:26 +00:00
Filipe Cabecinhas	54c5ad74d7	Optimization for certain shufflevector by using insertps. Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291	2014-04-25 23:51:17 +00:00
Saleem Abdulrasool	10443ee9e4	X86: convert object streamer selection to a switch Change the object streamer selection to a switch from a series of if conditions. Rather than defaulting to ELF, require that an ELF format is requested. The Windows/!ELF is maintained as MachO would have been selected first and will still provide a MachO format. Add an assertion that if COFF is requested that the target platform is Windows as only WinCOFF object emission is currently supported. llvm-svn: 207200	2014-04-25 06:29:36 +00:00
Craig Topper	6d411cb95a	[C++] Use 'nullptr'. Target edition. llvm-svn: 207197	2014-04-25 05:30:21 +00:00
Benjamin Kramer	16a4cb8d8a	X86: Don't transform shifts into ands when the sign bit is tested. Should unbreak MultiSource/Benchmarks/mediabench/g721/g721encode/encode. llvm-svn: 207145	2014-04-24 20:51:37 +00:00
Reid Kleckner	e7e2ccb9e9	Add 'musttail' marker to call instructions This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143	2014-04-24 20:14:34 +00:00
Andrea Di Biagio	dae3a5b91a	[X86] Add support for Read Time Stamp Counter x86 builtin intrinsics. This patch: - Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and 'int_x86_rdtscp') as GCCBuiltin intrinsics; - Teaches the backend how to lower the two new builtins; - Introduces a common function to lower READCYCLECOUNTER dag nodes and the two new rdtsc/rdtscp intrinsics; - Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll' correctly verifies that both READCYCLECOUNTER and the two new intrinsics work fine for both 64bit and 32bit Subtargets. llvm-svn: 207127	2014-04-24 17:18:27 +00:00
David Blaikie	52ea9cc073	Spread some const around for non-mutating uses of MCSymbolData. I discovered this const-hole while attempting to coalesnce the Symbol and SymbolMap data structures. There's some pending issues with that, but I figured this change was easy to flush early. llvm-svn: 207124	2014-04-24 16:59:40 +00:00
Evgeniy Stepanov	26774a4e98	[asan] Use MCInstrInfo in inline asm instrumentation. Patch by Yuri Gorshenin. llvm-svn: 207115	2014-04-24 13:29:34 +00:00
Evgeniy Stepanov	6eb633223c	[asan] Fix instrumentation of x86 intel syntax inline assembly. Patch by Yuri Gorshenin. llvm-svn: 207092	2014-04-24 09:56:15 +00:00
Benjamin Kramer	ec7fca3a00	X86: Emit test instead of constant shift + compare if the shift result is unused. This allows us to compile return (mask & 0x8 ? a : b); into testb $8, %dil cmovnel %edx, %esi instead of andl $8, %edi shrl $3, %edi cmovnel %edx, %esi which we formed previously because dag combiner canonicalizes setcc of and into shift. llvm-svn: 207088	2014-04-24 08:15:31 +00:00

1 2 3 4 5 ...

10249 Commits