llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-30 07:22:55 +01:00

Author	SHA1	Message	Date
Hal Finkel	fdd124178e	PPC: Support dynamic allocas with large alignment Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only just recently implemented fully). Now we can also support dynamic allocas with higher than the default target stack alignment (16 bytes). In order to round-up the requested size to the maximum requested alignment, we need an additional register to hold the rounded-up size. We're already using one scavenged register to hold the previous stack-pointer value (which needs to be stored with the signal-safe stdux update), and so when we have dynamic allocas and a large alignment, we allocate two emergency spill slots for the scavenger. llvm-svn: 186562	2013-07-18 04:28:21 +00:00
Hal Finkel	79a33a00d6	PPC: Add base-pointer support to builtin setjmp/longjmp First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj is implemented): instead of using r31 as the base pointer when it is not needed as a frame pointer, now the base pointer will always be r30 when needed. Second, we introduce another pseudo register, BP, which is used just like the FP pseudo register to refer to the base register before we know for certain what register it will be. Third, we now save BP into the jmp_buf, and restore r30 from that slot in longjmp. If the function that called setjmp did not use a base pointer, then r30 will be overwritten by the setjmp-calling-function's restore code. FP restoration (which is restored into r31) works the same way. llvm-svn: 186545	2013-07-17 23:50:51 +00:00
Joey Gouly	200e661b16	Add the tests that I forgot to 'svn add' with my previous commit (r186504). llvm-svn: 186506	2013-07-17 14:03:49 +00:00
Richard Osborne	b765390114	[XCore] Ensure implicit operands aren't lost on the return instruction. Patch by Robert Lytton. llvm-svn: 186500	2013-07-17 10:58:37 +00:00
Craig Topper	f16a718df3	Make x86 fast-isel correctly choose between aligned and unaligned operations for vector stores. Fixes PR16640. llvm-svn: 186491	2013-07-17 05:57:45 +00:00
Hal Finkel	149f358122	PPC: Add CTR-register clobber to builtin setjmp Because the builtin longjmp implementation uses a CTR-based indirect jump, when the control flow arrives at the builtin setjmp call, the CTR register has necessarily been clobbered. Correspondingly, this adds CTR to the list of implicit definitions of the builtin setjmp pseudo instruction. We don't need to add CTR to the implicit definitions of builtin longjmp because, even though it does clobber the CTR register, the control flow cannot return to inside the loop unless there is also a builtin setjmp call. llvm-svn: 186488	2013-07-17 05:35:44 +00:00
Hal Finkel	e625744d86	PPC: Implement base pointer and stack realignment This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer support to function correctly. This implementation follows the strategy suggested by Dale Johannesen in r48188 where the following comment was added: This does not currently work, because the delta between old and new stack pointers is added to offsets that reference incoming parameters after the prolog is generated, and the code that does that doesn't handle a variable delta. You don't want to do that anyway; a better approach is to reserve another register that retains to the incoming stack pointer, and reference parameters relative to that. And now we do exactly that. If we don't need a frame pointer, then we use r31 as a base pointer. If we do need a frame pointer, then we use r30 as a base pointer. The base pointer retains the value of the stack pointer before it was decremented in the prologue. We then use the base pointer to resolve all negative frame indicies. The basic scheme follows that for base pointers in the X86 backend. We use a base pointer when we need to dynamically realign the incoming stack pointer. This currently applies only to static objects (dynamic allocas with large alignments, and base-pointer support in SjLj lowering will come in future commits). llvm-svn: 186478	2013-07-17 00:45:52 +00:00
NAKAMURA Takumi	7b93767d62	llvm/test/CodeGen/X86/vec_setcc.ll: Add explicit -mtriple=x86_64-unknown-unknown to satisfy win32-targeted configuration. llvm-svn: 186477	2013-07-17 00:42:37 +00:00
Benjamin Kramer	6e6528e46d	Finally, force the target for this test. Should unbreak non-x86 buildbots. llvm-svn: 186445	2013-07-16 19:22:07 +00:00
Benjamin Kramer	876b63a443	Label names also differ between platforms. Use a relaxed regex. llvm-svn: 186442	2013-07-16 18:54:21 +00:00
Benjamin Kramer	1459dae6ee	Fix test not to fail when the target doesn't use leading underscores on symbols. llvm-svn: 186439	2013-07-16 18:42:01 +00:00
Manman Ren	c67f77c5d6	Cleanup testing case by using a shorter name for types. llvm-svn: 186436	2013-07-16 18:26:48 +00:00
Juergen Ributzka	e612fc1230	[X86] Use min/max to optimze unsigend vector comparison on X86 Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required instructions. This trick also works for UGT/ULT, but there is no advantage in doing so. It wouldn't reduce the number of instructions and it would actually reduce performance. Reviewer: Ben radar:5972691 llvm-svn: 186432	2013-07-16 18:20:45 +00:00
Ulrich Weigand	c1b627a527	[APFloat] PR16573: Avoid losing mantissa bits in ppc_fp128 to double truncation When truncating to a format with fewer mantissa bits, APFloat::convert will perform a right shift of the mantissa by the difference of the precision of the two formats. Usually, this will result in just the mantissa bits needed for the target format. One special situation is if the input number is denormal. In this case, the right shift may discard significant bits. This is usually not a problem, since truncating a denormal usually results in zero (underflow) after normalization anyway, since the result format's exponent range is usually smaller than the target format's. However, there is one case where the latter property does not hold: when truncating from ppc_fp128 to double. In particular, truncating a ppc_fp128 whose first double of the pair is denormal should result in just that first double, not zero. The current code however performs an excessive right shift, resulting in lost result bits. This is then caught in the APFloat::normalize call performed by APFloat::convert and causes an assertion failure. This patch checks for the scenario of truncating a denormal, and attempts to (possibly partially) replace the initial mantissa right shift by decrementing the exponent, if doing so will still result in a valid target format exponent. Index: test/CodeGen/PowerPC/pr16573.ll =================================================================== --- test/CodeGen/PowerPC/pr16573.ll (revision 0) +++ test/CodeGen/PowerPC/pr16573.ll (revision 0) @@ -0,0 +1,11 @@ +; RUN: llc < %s \| FileCheck %s + +target triple = "powerpc64-unknown-linux-gnu" + +define double @test() { + %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double + ret double %1 +} + +; CHECK: .quad -9111018957755033591 + Index: lib/Support/APFloat.cpp =================================================================== --- lib/Support/APFloat.cpp (revision 185817) +++ lib/Support/APFloat.cpp (working copy) @@ -1956,6 +1956,23 @@ X86SpecialNan = true; } + // If this is a truncation of a denormal number, and the target semantics + // has larger exponent range than the source semantics (this can happen + // when truncating from PowerPC double-double to double format), the + // right shift could lose result mantissa bits. Adjust exponent instead + // of performing excessive shift. + if (shift < 0 && isFiniteNonZero()) { + int exponentChange = significandMSB() + 1 - fromSemantics.precision; + if (exponent + exponentChange < toSemantics.minExponent) + exponentChange = toSemantics.minExponent - exponent; + if (exponentChange < shift) + exponentChange = shift; + if (exponentChange < 0) { + shift -= exponentChange; + exponent += exponentChange; + } + } + // If this is a truncation, perform the shift before we narrow the storage. if (shift < 0 && (isFiniteNonZero() \|\| category==fcNaN)) lostFraction = shiftRight(significandParts(), oldPartCount, -shift); llvm-svn: 186409	2013-07-16 13:03:25 +00:00
Richard Osborne	e37374c506	[XCore] Fix printing of inline asm operands. Previously an asm operand with no operand modifier would give the error "invalid operand in inline asm". llvm-svn: 186407	2013-07-16 12:48:34 +00:00
Richard Sandiford	ab0fb439a9	[SystemZ] Use ROSBG and non-zero form of RISBG for OR nodes llvm-svn: 186405	2013-07-16 11:55:57 +00:00
Richard Sandiford	c99f769478	[SystemZ] Use RISBG for (shift (and ...)) Another patch in the series to make more use of R.SBG. This one extends r186072 and r186073 to handle cases where the AND is inside the shift. llvm-svn: 186399	2013-07-16 11:02:24 +00:00
Tim Northover	69d676cd12	ARM: implement ldrex, strex and clrex intrinsics Intrinsics already existed for the 64-bit variants, so these support operations of size at most 32-bits. llvm-svn: 186392	2013-07-16 09:46:55 +00:00
Renato Golin	5b7294a39c	ARM EABI divmod support This patch enables calls to __aeabi_idivmod when in EABI mode, by using the remainder value returned on registers (R1), enabled by the ARM triple "none-eabi". Note that Darwin and GNUEABI triples will continue lowering on GNU style, that is, using the stack for the remainder. Still need to add SREM/UREM support fix for 64-bit lowering. llvm-svn: 186390	2013-07-16 09:32:17 +00:00
Manman Ren	19fc512a36	PEI: Support for non-zero SPAdj at beginning of a basic block. We can have a FrameSetup in one basic block and the matching FrameDestroy in a different basic block when we have struct byval. In that case, SPAdj is not zero at beginning of the basic block. Modify PEI to correctly set SPAdj at beginning of each basic block using DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block. PEI had an assert SPAdjCount \|\| SPAdj == 0. If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure. We can add an extra condition to make sure the pairs are matched: The pairs start with a FrameSetup. But since we are doing a much better job in the verifier, this patch removes the check in PEI. PR16393 llvm-svn: 186364	2013-07-15 23:47:29 +00:00
Hal Finkel	608dbe4a4d	Fix register subclass handling in PPCInstrInfo::insertSelect PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the common subclass of the true and false inputs, and then selecting either the 32-bit or the 64-bit isel variant based on the result of calling PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC) (where RC is the common subclass). Unfortunately, this is not quite right: if we have something like this: %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76; G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6 then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8 pseudo-register). As a result, we also need to check the common subclass against GPRC_NOR0 and G8RC_NOX0 explicitly. This had not been a problem for clients of insertSelect that called canInsertSelect first (because it had a compensating mistake), but insertSelect is also used by the PPC pseudo-instruction expander, and this error was causing a problem in that context. This problem was found by csmith. llvm-svn: 186343	2013-07-15 20:22:58 +00:00
Tom Stellard	5a5b5f2786	R600/SI: Add support for 64-bit loads https://bugs.freedesktop.org/show_bug.cgi?id=65873 llvm-svn: 186339	2013-07-15 19:00:09 +00:00
Hal Finkel	d34cb3e70c	Remove invalid assert in DAGTypeLegalizer::RemapValue There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks which, in part, says: // Note that these invariants may not hold momentarily when processing a node: // the node being processed may be put in a map before being marked Processed. Unfortunately, this assert would be valid only if the above-mentioned invariant held unconditionally. This was causing llc to assert when, in fact, everything was fine. Thanks to Richard Sandiford for investigating this issue! Fixes PR16562. llvm-svn: 186338	2013-07-15 18:57:05 +00:00
Anton Korobeynikov	ae224d3711	Use conventional syntax for branches. Patch by Job! llvm-svn: 186291	2013-07-14 18:19:44 +00:00
Anton Korobeynikov	21a3bcc541	Properly lower jump tables on MSP430. Patch by Job Noorman! llvm-svn: 186283	2013-07-14 15:11:00 +00:00
Stephen Lin	7e501cf4c3	Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$.$$[A-Za-z0-9_-]$:$ $$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;$.$-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;$.$-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;$.$-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;$.*$-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280	2013-07-14 06:24:09 +00:00
Stephen Lin	ece45b5ee9	Convert Windows to Unix line endings, no functionality change. llvm-svn: 186264	2013-07-13 22:08:55 +00:00
Stephen Lin	3ae734a60c	Convert CodeGen//.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion. This was done with the following sed invocation to catch label lines demarking function boundaries: sed -i '' "s/^;$ $$[A-Z0-9_]$:$ $test$[A-Za-z0-9_-]$:$ $$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen//*.ll which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct. llvm-svn: 186258	2013-07-13 20:38:47 +00:00
Benjamin Kramer	984e83f2c3	Convert a couple of grep tests to FileCheck. llvm-svn: 186250	2013-07-13 17:30:25 +00:00
Akira Hatanaka	642cb0096f	[mips] Remove trailing whitespace. llvm-svn: 186230	2013-07-12 23:47:38 +00:00
Akira Hatanaka	6f48f304ac	[mips] Implement MipsTargetMachine::getInstrItineraryData(). llvm-svn: 186227	2013-07-12 23:33:22 +00:00
JF Bastien	5885dc293c	Fix ARM paired GPR COPY lowering ARM paired GPR COPY was being lowered to two MOVr without CC. This patch puts the CC back. My test is a reduction of the case where I encountered the issue, 64-bit atomics use paired GPRs. The issue only occurs with selectionDAG, FastISel doesn't encounter it so I didn't bother calling it. llvm-svn: 186226	2013-07-12 23:33:03 +00:00
Benjamin Kramer	dfbb134032	R600: Reapply testcase from r186178, the big endian issue should be fixed by r186196. llvm-svn: 186209	2013-07-12 21:54:43 +00:00
Tom Stellard	7645224e44	R600: Remove the fpconst64.ll test which was failing on non-x86 buildbots I'm guessing the failure had something to do with the double precision floating point constant used in the test. llvm-svn: 186191	2013-07-12 19:29:54 +00:00
Tom Stellard	977376a943	R600/SI: Add support for f64 kernel arguments Patch by: Niels Ole Salscheider Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186182	2013-07-12 18:15:26 +00:00
Tom Stellard	ce0acc677f	R600/SI: Implement select and compares for SI Patch by: Niels Ole Salscheider Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186181	2013-07-12 18:15:19 +00:00
Tom Stellard	f2a3075fdd	R600/SI: Add fsqrt pattern for SI Patch by: Niels Ole Salscheider Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186180	2013-07-12 18:15:13 +00:00
Tom Stellard	b7b09a29aa	R600/SI: Add double precision fsub pattern for SI Patch by: Niels Ole Salscheider Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186179	2013-07-12 18:15:08 +00:00
Tom Stellard	43c1f3d80d	R600/SI: SI support for 64bit ConstantFP Patch by: Niels Ole Salscheider Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186178	2013-07-12 18:15:02 +00:00
Tom Stellard	8b6f62dcb2	R600/SI: Add initial double precision support for SI Patch by: Niels Ole Salscheider Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186177	2013-07-12 18:14:56 +00:00
Benjamin Kramer	95a2f42a75	X86: Shrink certain forms of movsx. In particular: movsbw %al, %ax --> cbtw movswl %ax, %eax --> cwtl movslq %eax, %rax --> cltq According to Intel's manual those have the same performance characteristics but come with a smaller encoding. llvm-svn: 186174	2013-07-12 18:06:44 +00:00
Stephen Lin	f8bbffe976	X86: fold SSE2/AVX2 logical shift by immediate amount into zero vector when possible Patch by Andrea Di Biagio llvm-svn: 186165	2013-07-12 15:31:36 +00:00
Stephen Lin	c6bb3a6cda	Start using CHECK-LABEL in some tests. llvm-svn: 186163	2013-07-12 14:54:12 +00:00
Richard Sandiford	bab8ad599d	[SystemZ] Add test missing from r186148 Sigh, twice in two days sorry. One day I'll remember... llvm-svn: 186150	2013-07-12 09:20:14 +00:00
Richard Sandiford	8fe174c649	[SystemZ] Optimize sign-extends of vector setccs Normal (sext (setcc ...)) sequences are optimised into (select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND. However, this is deliberately not done for vectors, and after vector type legalization we have (sext_inreg (setcc ...)) instead. I wondered about trying to extend DAGCombiner to handle this case too, but it seemed to be a loss on some other targets I tried, even those for which SETCC isn't "legal" and SELECT_CC is. llvm-svn: 186149	2013-07-12 09:17:10 +00:00
Richard Sandiford	b7ea44c800	[SystemZ] Improve spilling of LGDR and LDGR If the source of these instructions is spilled we should load the destination. If the destination is spilled we should store the source. llvm-svn: 186147	2013-07-12 08:37:17 +00:00
Charles Davis	2b2075f834	Target/X86: Add explicit Win64 and System V/x86-64 calling conventions. Summary: This patch adds explicit calling convention types for the Win64 and System V/x86-64 ABIs. This allows code to override the default, and use the Win64 convention on a target that wants to use SysV (and vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU attributes. Reviewers: CC: llvm-svn: 186144	2013-07-12 06:02:35 +00:00
Hal Finkel	f153e34eee	PPC: Add some missing V_SET0 patterns We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for v8i16 (which occurs in the test case) or v16i8. The same was true for V_SETALLONES (so I added the associated patterns for those as well). Another bug found by llvm-stress. llvm-svn: 186108	2013-07-11 17:43:32 +00:00
Hal Finkel	adac2cbb4a	PPCDAGToDAGISel::isRunOfOnes should return false on zero This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI with an out-of-range operand. Most uses of the isRunOfOnes function are guarded by a condition that the value is not zero. This was not true in two places, and in both places a zero input would result in an out-of-rage MB value (= 32). To fix this, isRunOfOnes returns false on a zero input (and I've remove one now-redundant guard). llvm-svn: 186101	2013-07-11 16:31:51 +00:00
Richard Sandiford	f53d62a725	[SystemZ] Add testcase missing from r186073 llvm-svn: 186074	2013-07-11 09:10:38 +00:00
Richard Sandiford	e5c5a78828	[SystemZ] Use zeroing form of RISBG for shift-and-AND sequences Extend r186072 to handle shifts and ANDs. llvm-svn: 186073	2013-07-11 09:10:09 +00:00
Richard Sandiford	fa42560424	[SystemZ] Use zeroing form of RISBG for some AND sequences RISBG can handle some ANDs for which no AND IMMEDIATE exists. It also acts as a three-operand AND for some cases where an AND IMMEDIATE could be used instead. It might be worth adding a pass to replace RISBG with AND IMMEDIATE in cases where the register operands end up being the same and where AND IMMEDIATE is smaller. llvm-svn: 186072	2013-07-11 08:59:12 +00:00
Hal Finkel	38ec4d9a41	RegScavenger should not exclude undef uses When computing currently-live registers, the register scavenger excludes undef uses. As a result, undef uses are ignored when computing the restore points of registers spilled into the emergency slots. While the register scavenger normally excludes from consideration, when scavenging, registers used by the current instruction, we need to not exclude undef uses. Otherwise, we might end up requiring more emergency spill slots than we have (in the case where the undef use is the currently-spilled register). Another bug found by llvm-stress. llvm-svn: 186067	2013-07-11 05:55:57 +00:00
Hal Finkel	e4c36bd8fb	Move r186044 tests into CodeGen/X86 I had thought that these tests could be target-neutral, but in practice this is not the case (on some targets, like Hexagon and Darwin), they trigger an assert (a different assert than the one that r186044 fixes). llvm-svn: 186051	2013-07-11 01:55:55 +00:00
Michel Danzer	68916ffa69	R600/SI: Initial local memory support Enough for the radeonsi driver to use it for calculating derivatives. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186012	2013-07-10 16:37:07 +00:00
Michel Danzer	c2e06ddf2d	R600/SI: Add intrinsic for retrieving the current thread ID Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186010	2013-07-10 16:36:52 +00:00
Michel Danzer	47a9f6685b	R600/SI: Add intrinsics for texture sampling with user derivatives Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186008	2013-07-10 16:36:36 +00:00
Jim Grosbach	d6be90a2b8	ARM: Fix incorrect pack pattern for thumb2 Propagate the fix from r185712 to Thumb2 codegen as well. Original commit message applies here as well: A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them in the bottom half of "x". An arithmetic and logic shift are only equivalent in this context if the shift amount is 16. We would be shifting in ones into the bottom 16bits instead of zeros if "y" is negative. rdar://14338767 llvm-svn: 185982	2013-07-09 22:59:22 +00:00
Adrian Prantl	52d9227e1d	move test into the appropriate subdir. llvm-svn: 185972	2013-07-09 21:44:11 +00:00
Adrian Prantl	a295f68201	Reapply an improved version of r180816/180817. Change the informal convention of DBG_VALUE machine instructions so that we can express a register-indirect address with an offset of 0. The old convention was that a DBG_VALUE is a register-indirect value if the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE is register-indirect if the first operand is a register and the second operand is an immediate. For plain register values the combination reg, reg is used. MachineInstrBuilder::BuildMI knows how to build the new DBG_VALUES. rdar://problem/13658587 llvm-svn: 185966	2013-07-09 20:28:37 +00:00
Stephen Lin	fb9d247b9c	Appease buildbots after r185956: just set -mcpu explicitly, as it should have been from the beginning. llvm-svn: 185962	2013-07-09 19:27:10 +00:00
Stephen Lin	ec7f16225b	Appease Atom buildbot after r185956 (explicitly turn on AVX) llvm-svn: 185961	2013-07-09 18:55:52 +00:00
Hal Finkel	560c3b2ad4	WidenVecRes_BUILD_VECTOR must use the first operand's type Because integer BUILD_VECTOR operands may have a larger type than the result's vector element type, and all operands must have the same type, when widening a BUILD_VECTOR node by adding UNDEFs, we cannot use the vector element type, but rather must use the type of the existing operands. Another bug found by llvm-stress. llvm-svn: 185960	2013-07-09 18:55:10 +00:00
Bill Schmidt	2499045a19	[PowerPC] Better fix for PR16556. A more complete example of the bug in PR16556 was recently provided, showing that the previous fix was not sufficient. The previous fix is reverted herein. The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as custom lowering for FP_TO_SINT during type legalization, without checking whether the input type is handled by that routine. LowerFP_TO_INT requires the input to be f32 or f64, so we fail when the input is ppcf128. I'm leaving the test case from the initial fix (r185821) in place, and adding the new test as another crash-only check. llvm-svn: 185959	2013-07-09 18:50:20 +00:00
Stephen Lin	59ba368813	Attempt to appease buildbot after r185956 by explicitly turning setting -fma,-fma4 attrs (I'm assuming they're set because the bot is running on machine that has one or the other.) llvm-svn: 185958	2013-07-09 18:41:43 +00:00
Stephen Lin	30b326010c	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956	2013-07-09 18:16:56 +00:00
Hal Finkel	9cb3ba300f	Don't crash in SE dealing with ashr x, -1 ScalarEvolution::getSignedRange uses ComputeNumSignBits from ValueTracking on ashr instructions. ComputeNumSignBits can return zero, but this case was not handled correctly by the code in getSignedRange which was calling: APInt::getSignedMinValue(BitWidth).ashr(NS - 1) with NS = 0, resulting in an assertion failure in APInt::ashr. Now, we just return the conservative result (as with NS == 1). Another bug found by llvm-stress. llvm-svn: 185955	2013-07-09 18:16:16 +00:00
Hal Finkel	984c244d8d	DAGCombine tryFoldToZero cannot create illegal types after type legalization When folding sub x, x (and other similar constructs), where x is a vector, the result is a vector of zeros. After type legalization, make sure that the input zero elements have a legal type. This type may be larger than the result's vector element type. This was another bug found by llvm-stress. llvm-svn: 185949	2013-07-09 17:02:45 +00:00
Ulrich Weigand	b664c03f18	[PowerPC] Revert r185476 and fix up TLS variant kinds In the commit message to r185476 I wrote: >The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD >correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. >This causes some confusion with the asm parser, since VK_PPC_TLSGD >is output as @tlsgd, which is then read back in as VK_TLSGD. > >To avoid this confusion, this patch removes the PowerPC-specific >modifiers and uses the generic modifiers throughout. (The only >drawback is that the generic modifiers are printed in upper case >while the usual convention on PowerPC is to use lower-case modifiers. >But this is just a cosmetic issue.) This was unfortunately incorrect, there is is fact another, serious drawback to using the default VK_TLSLD/VK_TLSGD variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT to return true, which in turn causes the ELFObjectWriter to emit an undefined reference to _GLOBAL_OFFSET_TABLE_. This is a problem on powerpc64, because it uses the TOC instead of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_, so the symbol remains undefined. This means shared libraries using TLS built with the integrated assembler are currently broken. While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation probably ought to be properly fixed at some point, for now I'm simply reverting the r185476 commit. Now this in turn exposes the breakage of handling @tlsgd/@tlsld in the asm parser that this check-in was originally intended to fix. To avoid this regression, I'm also adding a different fix for this problem: while common code now parses @tlsgd as VK_TLSGD, a special hack in the asm parser translates this code to the platform-specific VK_PPC_TLSGD that the back-end now expects. While this is not really pretty, it's self-contained and shouldn't hurt anything else for now. One the underlying problem is fixed, this hack can be reverted again. llvm-svn: 185945	2013-07-09 16:41:09 +00:00
Vincent Lejeune	5517f57c42	R600: Do not predicated basic block with multiple alu clause Test is not included as it is several 1000 lines long. To test this functionnality, a test case must generate at least 2 ALU clauses, where an ALU clause is ~110 instructions long. NOTE: This is a candidate for the stable branch. llvm-svn: 185943	2013-07-09 15:03:33 +00:00
Vincent Lejeune	0c1224c533	R600: Fix a rare bug where swizzle optimization returns wrong values llvm-svn: 185942	2013-07-09 15:03:25 +00:00
Vincent Lejeune	48ea85c102	R600: Fix wrong export reswizzling llvm-svn: 185941	2013-07-09 15:03:19 +00:00
Vincent Lejeune	d29844ad2e	R600: Use DAG lowering pass to handle fcos/fsin NOTE: This is a candidate for the stable branch. llvm-svn: 185940	2013-07-09 15:03:11 +00:00
Alexander Potapenko	9c55668f5b	Revert r185872 - "Stop emitting weak symbols into the "coal" sections" This patch broke `make check-asan` on Mac, causing ld warnings like the following one: ld: warning: direct access in __GLOBAL__I_a to global weak symbol ___asan_mapping_scale means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. The resulting test binaries crashed with incorrect ASan warnings. llvm-svn: 185923	2013-07-09 10:00:16 +00:00
Richard Sandiford	03cd63553a	[SystemZ] Use MVC for simple load/store pairs Look for patterns of the form (store (load ...), ...) in which the two locations are known not to partially overlap. (Identical locations are OK.) These sequences are better implemented by MVC unless either the load or the store could use RELATIVE LONG instructions. The testcase showed that we weren't using LHRL and LGHRL for extload16, only sextloadi16. The patch fixes that too. llvm-svn: 185919	2013-07-09 09:46:39 +00:00
Richard Sandiford	9295f19189	[SystemZ] Use "STC;MVC" for memset Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet small enough for a single MVC. As with memcpy, I'm leaving longer cases till later. The number of tests might seem excessive, but f33 & f34 from memset-04.ll failed the first cut because I'd not added the "?:" on the calculation of Size1. llvm-svn: 185918	2013-07-09 09:32:42 +00:00
Hal Finkel	f972e31b6c	PPC: Allocate RS spill slot for unaligned i64 load/store This fixes another bug found by llvm-stress! If we happen to be doing an i64 load or store into a stack slot that has less than a 4-byte alignment, then the frame-index elimination may need to use an indexed load or store instruction (because the offset may not be a multiple of 4, a requirement of the STD/LD instructions). The extra register needed to hold the offset comes from the register scavenger, and it is possible that the scavenger will need to use an emergency spill slot. As a result, we need to make sure that a spill slot is allocated when doing an i64 load/store into a less-than-4-byte-aligned stack slot. Because test cases for things like this tend to be fairly fragile, I've concatenated a few small bugpoint-reduced test cases together to form the regression test. llvm-svn: 185907	2013-07-09 06:34:51 +00:00
Bill Wendling	f81cb5b9ed	Stop emitting weak symbols into the "coal" sections. The Mach-O linker has been able to support the weak-def bit on any symbol for quite a while now. The compiler however continued to place these symbols into a "coal" section, which required the linker to map them back to the base section name. Replace the sections like this: __TEXT/__textcoal_nt instead use __TEXT/__text __TEXT/__const_coal instead use __TEXT/__const __DATA/__datacoal_nt instead use __DATA/__data <rdar://problem/14265330> llvm-svn: 185872	2013-07-08 21:34:52 +00:00
Ulrich Weigand	cb20efc341	[PowerPC] Always use "assembler dialect" 1 A setting in MCAsmInfo defines the "assembler dialect" to use. This is used by common code to choose between alternatives in a multi-alternative GNU inline asm statement like the following: __asm__ ("{sfe\|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2)); The meaning of these dialects is platform specific, and GCC defines those for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for new-style (PowerPC) mnemonics, like in the example above. To be compatible with inline asm used with GCC, LLVM ought to do the same. Specifically, this means we should always use assembler dialect 1 since old-style mnemonics really aren't supported on any current platform. However, the current LLVM back-end uses: AssemblerDialect = 1; // New-Style mnemonics. in PPCMCAsmInfoDarwin, and AssemblerDialect = 0; // Old-Style mnemonics. in PPCLinuxMCAsmInfo. The Linux setting really isn't correct, we should be using new-style mnemonics everywhere. This is changed by this commit. Unfortunately, the setting of this variable is overloaded in the back-end to decide whether or not we are on a Darwin target. This is done in PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect to 1 for both Darwin and Linux no longer allows us to make this distinction. Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter to distinguish Darwin targets, and ignores the SyntaxVariant parameter. As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs to be passed in by the caller when creating a target MCExpr. (To do so this patch implicitly also reverts commit 184441.) llvm-svn: 185858	2013-07-08 20:20:51 +00:00
Hal Finkel	c4d29e61ee	PPC: Mark vector CC action for SETO and SETONE as Expand Another bug found by llvm-stress! This fixes hitting llvm_unreachable("Invalid integer vector compare condition"); at the end of getVCmpInst in PPCISelDAGToDAG. llvm-svn: 185855	2013-07-08 20:00:03 +00:00
Joey Gouly	b4f59412fd	Add a comment to this change, requested by Eric Christopher. llvm-svn: 185853	2013-07-08 19:52:51 +00:00
Jim Grosbach	b4234c1d88	ARM: Improve codegen for generic vselect. Fall back to by-element insert rather than building it up on the stack. rdar://14351991 llvm-svn: 185846	2013-07-08 18:18:52 +00:00
Hal Finkel	059614de7f	PPC: Mark vector FREM as Expand by default Another bug found by llvm-stress! This fixes crashing with: LLVM ERROR: Cannot select: v4f32 = frem ... llvm-svn: 185840	2013-07-08 17:30:25 +00:00
Bill Schmidt	58913550ff	[PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT). PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be either an f32 or f64, but this is not checked. A long double (ppcf128) operand will normally be custom-lowered to a conversion to f64 in this context. However, this isn't the case for an UNDEF node. This patch recognizes a ppcf128 as a legal source operand for FP_TO_INT only if it's an undef, in which case it creates an undef of the target type. At some point we might want to do a wholesale custom lowering of ISD::UNDEF when the type is ppcf128, but it's not really clear that's a great idea, and probably more work than it's worth for a situation that only arises in the case of a programming error. At this point I think simple is best. The test case comes from PR16556, and is a crash-test only. llvm-svn: 185821	2013-07-08 14:22:45 +00:00
Nico Rieck	7230f0d23f	Reuse %rax after calling __chkstk on win64 Reapply this as I reverted the wrong commit. llvm-svn: 185807	2013-07-08 11:20:11 +00:00
Nico Rieck	90555b76a0	Revert "Proper va_arg/va_copy lowering on win64" This reverts commit 2b52880592a525cfe04d8f9008a35da8c2ea94c3. Needs review. llvm-svn: 185806	2013-07-08 11:19:44 +00:00
Richard Sandiford	537b8d7bec	[SystemZ] Use MVC for memcpy Use MVC for memcpy in cases where a single MVC is enough. Using MVC is a win for longer copies too, but I'll leave that for later. llvm-svn: 185802	2013-07-08 09:35:23 +00:00
Hal Finkel	b21ca286dc	Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors This fixes a bug (found by llvm-stress) in DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result type would always be larger than the original operands. This is not always true, however, with boolean vectors. For example, promoting a node of type v8i1 (where the operands will be of type i32, the type to which i1 is promoted) will yield a node with a result vector element type of i16 (and operands of type i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands to the result type. llvm-svn: 185794	2013-07-08 06:16:58 +00:00
Nico Rieck	cd7bc94022	Revert "Reuse %rax after calling __chkstk on win64" This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e. llvm-svn: 185781	2013-07-08 01:30:57 +00:00
Nico Rieck	15089c31ec	Reuse %rax after calling __chkstk on win64 llvm-svn: 185778	2013-07-07 16:48:39 +00:00
Nico Rieck	f5c31a8456	Proper va_arg/va_copy lowering on win64 llvm-svn: 185763	2013-07-06 18:08:19 +00:00
Benjamin Kramer	5d9e616519	DAGCombiner: Don't drop extension behavior when shrinking a load when unsafe. ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the case when all of the bits the extension would otherwise produce are dropped by the shrink. It would be possible to shrink the load in more cases by merging the extensions, but this isn't trivial and a very rare case. I left a TODO for that case. Fixes PR16551. llvm-svn: 185755	2013-07-06 14:05:09 +00:00
Tim Northover	696f647891	Stop putting operations after a tail call. This prevents the emission of DAG-generated vreg definitions after a tail call be dropping them entirely (on the grounds that nothing could use them anyway, and they interfere with O0 CodeGen). llvm-svn: 185754	2013-07-06 12:58:45 +00:00
Arnold Schwaighofer	97cea9b991	ARM: Add a pack pattern for matching arithmetic shift right llvm-svn: 185714	2013-07-05 18:57:49 +00:00
Arnold Schwaighofer	d5fc888196	ARM: Fix incorrect pack pattern A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them in the bottom half of "x". An arithmetic and logic shift are only equivalent in this context if the shift amount is 16. We would be shifting in ones into the bottom 16bits instead of zeros if "y" is negative. radar://14338767 llvm-svn: 185712	2013-07-05 18:28:39 +00:00
Richard Sandiford	c0fe83c1b6	[SystemZ] Remove no-op MVCs The stack coloring pass has code to delete stores and loads that become trivially dead after coloring. Extend it to cope with single instructions that copy from one frame index to another. The testcase happens to show an example of this kicking in at the moment. It did occur in Real Code too though. llvm-svn: 185705	2013-07-05 14:38:48 +00:00
Richard Sandiford	8e414e2aef	Fix double renaming bug in stack coloring pass The stack coloring pass renumbered frame indexes with a loop of the form: for each frame index FI for each instruction I that uses FI for each use of FI in I rename FI to FI' This caused problems if an instruction used two frame indexes F0 and F1 and if F0 was renamed to F1 and F1 to F2. The first time we visited the instruction we changed F0 to F1, then we changed both F1s to F2. In other words, the problem was that SSRefs recorded which instructions used an FI, but not which MachineOperands and MachineMemOperands within that instruction used it. This is easily fixed for MachineOperands by walking the instructions once and processing each operand in turn. There's already a loop to do that for dead store elimination, so it seemed more efficient to fuse the two at the block level. MachineMemOperands are more tricky because they can be shared between instructions. The patch handles them by making SSRefs an array of MachineMemOperands rather than an array of MachineInstrs. We might end up processing the same MachineMemOperand twice, but that's OK because we always know from the SSRefs index what the original frame index was. llvm-svn: 185703	2013-07-05 14:24:47 +00:00
Richard Sandiford	d84ed5f34a	[SystemZ] Enable the use of MVC for frame-to-frame spills ...now that the problem that prompted the restriction has been fixed. The original spill-02.py was a compromise because at the time I couldn't find an example that actually failed without the two scavenging slots. The version included here did. llvm-svn: 185701	2013-07-05 14:02:01 +00:00
Richard Sandiford	acd92ea1e1	[SystemZ] Allocate a second register scavenging slot This is another prerequisite for frame-to-frame MVC copies. I'll commit the patch that makes use of the slot separately. The downside of trying to test many corner cases with each of the available addressing modes is that a fair few tests need to account for the new frame layout. I do still think it's useful to have all these tests though, since it's something that wouldn't get much coverage otherwise. llvm-svn: 185698	2013-07-05 13:11:52 +00:00
Joey Gouly	76f34b0ffb	PR16490: fix a crash in ARMDAGToDAGISel::SelectInlineAsm. In the SelectionDAG immediate operands to inline asm are constructed as two separate operands. The first is a constant of value InlineAsm::Kind_Imm and the second is a constant with the value of the immediate. In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we should skip over the next operand too. llvm-svn: 185688	2013-07-05 10:19:40 +00:00
Quentin Colombet	49190aa8d1	[ARM] Improve the instruction selection of vector loads. In the ARM back-end, build_vector nodes are lowered to a target specific build_vector that uses floating point type. This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. In other words, this conversion may introduce artificial dependencies when the code leading to the build vector cannot be completed with a floating point type. In particular, this happens when loads are not aligned. Before this patch, in that case, the compiler generates general purpose loads and creates the floating point vector from them, instead of directly using the vector unit. The patch uses a vector friendly sequence of code when the inserted bitcasts to floating point survived DAGCombine. This is done by a target specific DAGCombine that changes the target specific build_vector into a sequence of insert_vector_elt that get rid of the bitcasts. <rdar://problem/14170854> llvm-svn: 185587	2013-07-03 21:42:57 +00:00
Ulrich Weigand	a5490843a1	[PowerPC] Use mtocrf when available Just as with mfocrf, it is also preferable to use mtocrf instead of mtcrf when only a single CR register is to be written. Current code however always emits mtcrf. This probably does not matter when using an external assembler, since the GNU assembler will in fact automatically replace mtcrf with mtocrf when possible. It does create inefficient code with the integrated assembler, however. To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and uses those instead of MTCRF/MTCRF8 everything. Just as done in the MFOCRF patch committed as 185556, these patterns will be converted back to MTCRF if MTOCRF is not available on the machine. As a side effect, this allows to modify the MTCRF pattern to accept the full range of mask operands for the benefit of the asm parser. llvm-svn: 185561	2013-07-03 17:59:07 +00:00
Rafael Espindola	006ee945bb	Prefix failing commands with not to make clear they are expected to fail. llvm-svn: 185554	2013-07-03 16:41:29 +00:00
Rafael Espindola	4af20b9fde	Remove another old test. It was only passing because 'grep andpd' was not finding any andpd, but we don't fail if part of a pipe fails. llvm-svn: 185552	2013-07-03 16:35:26 +00:00
Rafael Espindola	6cccbc7b17	Remove test for the old EH system. It doesn't parse anymore. llvm-svn: 185551	2013-07-03 16:30:01 +00:00
Richard Sandiford	c7495a0fca	[SystemZ] Fold more spills Add a mapping from register-based <INSN>R instructions to the corresponding memory-based <INSN>. Use it to cut down on the number of spill loads. Some instructions extend their operands from smaller fields, so this required a new TSFlags field to say how big the unextended operand is. This optimisation doesn't trigger for C(G)R and CL(G)R because in practice we always combine those instructions with a branch. Adding a test for every other case probably seems excessive, but it did catch a missed optimisation for DSGF (fixed in r185435). llvm-svn: 185529	2013-07-03 10:10:02 +00:00
Tim Northover	f4b07c69d1	ARM: relax the atomic release barrier to "dmb ishst" on Swift Swift cores implement store barriers that are stronger than the ARM specification but weaker than general barriers. They are, in fact, just about enough to provide the ordering needed for atomic operations with release semantics. This patch makes use of that quirk. llvm-svn: 185527	2013-07-03 09:20:36 +00:00
Richard Osborne	207824e7f8	[XCore] Add ISel pattern for LDWCP Patch by Robert Lytton. llvm-svn: 185518	2013-07-03 07:48:50 +00:00
Ulrich Weigand	042ff673b7	[PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. This causes some confusion with the asm parser, since VK_PPC_TLSGD is output as @tlsgd, which is then read back in as VK_TLSGD. To avoid this confusion, this patch removes the PowerPC-specific modifiers and uses the generic modifiers throughout. (The only drawback is that the generic modifiers are printed in upper case while the usual convention on PowerPC is to use lower-case modifiers. But this is just a cosmetic issue.) llvm-svn: 185476	2013-07-02 21:29:06 +00:00
Richard Sandiford	750b064fa2	[SystemZ] Use DSGFR over DSGR in more cases Fixes some cases where we were using full 64-bit division for (sdiv i32, i32) and (sdiv i64, i32). The "32" in "SDIVREM32" just refers to the second operand. The first operand of all DIVREMs is a GR128. llvm-svn: 185435	2013-07-02 15:40:22 +00:00
Richard Sandiford	33deb195f9	[SystemZ] Use MVC to spill loads and stores Try to use MVC when spilling the destination of a simple load or the source of a simple store. As explained in the comment, this doesn't yet handle the case where the load or store location is also a frame index, since that could lead to two simultaneous scavenger spills, something the backend can't handle yet. spill-02.py tests that this restriction kicks in, but unfortunately I've not yet found a case that would fail without it. The volatile trick I used for other scavenger tests doesn't work here because we can't use MVC for volatile accesses anyway. I'm planning on relaxing the restriction later, hopefully with a test that does trigger the problem... Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly classified as SimpleBDX{Load,Store}. It wouldn't be easy to test for that bug separately, which is why I didn't split out the fix as a separate patch. llvm-svn: 185434	2013-07-02 15:28:56 +00:00
Richard Osborne	ad449c14dd	[XCore] Fix instruction selection for zext, mkmsk instructions. r182680 replaced CountLeadingZeros_32 with a template function countLeadingZeros that relies on using the correct argument type to give the right result. The type passed in the XCore backend after this revision was incorrect in a couple of places. Patch by Robert Lytton. llvm-svn: 185430	2013-07-02 14:46:34 +00:00
Tim Northover	d0b90ac5a7	DAGCombiner: fix use-counting issue when forming zextload DAGCombiner was counting all uses of a load node when considering whether it's worth combining into a zextload. Really, it wants to ignore the chain and just count real uses. rdar://problem/13896307 llvm-svn: 185419	2013-07-02 09:58:53 +00:00
Hal Finkel	4eba1c5685	Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409	2013-07-02 03:39:34 +00:00
Bill Schmidt	4e099704b5	Index: test/CodeGen/PowerPC/reloc-align.ll =================================================================== --- test/CodeGen/PowerPC/reloc-align.ll (revision 0) +++ test/CodeGen/PowerPC/reloc-align.ll (revision 0) @@ -0,0 +1,34 @@ +; RUN: llc -mcpu=pwr7 -O1 < %s \| FileCheck %s + +; This test verifies that the peephole optimization of address accesses +; does not produce a load or store with a relocation that can't be +; satisfied for a given instruction encoding. Reduced from a test supplied +; by Hal Finkel. + +target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64" +target triple = "powerpc64-unknown-linux-gnu" + +%struct.S1 = type { [8 x i8] } + +@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1 + +; Function Attrs: nounwind readonly +define signext i32 @main() #0 { +entry: + %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1)) +; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l + ret i32 %call +} + +; Function Attrs: nounwind readonly +define internal fastcc signext i32 @func_90(%struct.S1 byval nocapture %p_91) #0 { +entry: + %0 = bitcast %struct.S1* %p_91 to i64* + %bf.load = load i64* %0, align 1 + %bf.shl = shl i64 %bf.load, 26 + %bf.ashr = ashr i64 %bf.shl, 54 + %bf.cast = trunc i64 %bf.ashr to i32 + ret i32 %bf.cast +} + +attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } Index: lib/Target/PowerPC/PPCAsmPrinter.cpp =================================================================== --- lib/Target/PowerPC/PPCAsmPrinter.cpp (revision 185327) +++ lib/Target/PowerPC/PPCAsmPrinter.cpp (working copy) @@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI OutStreamer.EmitRawText(StringRef("\tmsync")); return; } + break; + case PPC::LD: + case PPC::STD: + case PPC::LWA: { + // Verify alignment is legal, so we don't create relocations + // that can't be supported. + // FIXME: This test is currently disabled for Darwin. The test + // suite shows a handful of test cases that fail this check for + // Darwin. Those need to be investigated before this sanity test + // can be enabled for those subtargets. + if (!Subtarget.isDarwin()) { + unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1; + const MachineOperand &MO = MI->getOperand(OpNum); + if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4) + llvm_unreachable("Global must be word-aligned for LD, STD, LWA!"); + } + // Now process the instruction normally. + break; } + } LowerPPCMachineInstrToMCInst(MI, TmpInst, this); OutStreamer.EmitInstruction(TmpInst); Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp =================================================================== --- lib/Target/PowerPC/PPCISelDAGToDAG.cpp (revision 185327) +++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp (working copy) @@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() { if (GlobalAddressSDNode GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) { SDLoc dl(GA); const GlobalValue GV = GA->getGlobal(); + // We can't perform this optimization for data whose alignment + // is insufficient for the instruction encoding. + if (GV->getAlignment() < 4 && + (StorageOpcode == PPC::LD \|\| StorageOpcode == PPC::STD \|\| + StorageOpcode == PPC::LWA)) { + DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n"); + continue; + } ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags); } else if (ConstantPoolSDNode CP = dyn_cast<ConstantPoolSDNode>(ImmOpnd)) { llvm-svn: 185380	2013-07-01 20:52:27 +00:00
Akira Hatanaka	a704ad94d7	[mips] Fix test case to check that mips64 instructions are generated. llvm-svn: 185371	2013-07-01 20:18:58 +00:00
Anton Korobeynikov	a6a4ed92ee	Really fix the test. Sorry for the breakage... llvm-svn: 185369	2013-07-01 19:51:36 +00:00
Anton Korobeynikov	8e5dfb5ca4	Fix the test which relies on uncommitted change llvm-svn: 185368	2013-07-01 19:50:31 +00:00
Anton Korobeynikov	f6e50fd093	Add jump tables handling for MSP430. Patch by Job Noorman! llvm-svn: 185364	2013-07-01 19:44:44 +00:00
Cameron Zwarich	7af5158621	Fix PR16508. When phis get lowered, destination copies are inserted using an iterator that is determined once for all phis in the block, which BuildMI interprets as a request to insert an instruction directly before the iterator. In the case of a cyclic phi, source copies may also be inserted directly before this iterator, which can cause source copies to be inserted before destination copies. The fix is to keep an iterator to the last phi and then advance it while lowering each phi in order to insert destination copies directly after the phis. llvm-svn: 185363	2013-07-01 19:42:46 +00:00
Hal Finkel	0b854b8c04	Don't form PPC CTR loops for over-sized exit counts Although you can't generate this from C on PPC64, if you have a loop using a 64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had been cauing the PPCCTRLoops pass to assert. Thanks to Joerg Sonnenberger for providing a test case! llvm-svn: 185361	2013-07-01 19:34:59 +00:00
Tim Northover	c1348880dc	AArch64: correct CodeGen of MOVZ/MOVK combinations. According to the AArch64 ELF specification (4.6.8), it's the assembler's responsibility to make sure the shift amount is correct in relocated MOVZ/MOVK instructions. This wasn't being obeyed by either the MCJIT CodeGen or RuntimeDyldELF (which happened to work out well for JIT tests). This commit should make us compliant in this area. llvm-svn: 185360	2013-07-01 19:23:10 +00:00
Tim Northover	51fd747de9	Revert r185339 (ARM: relax the atomic release barrier to "dmb ishst") Turns out I'd misread the architecture reference manual and thought that was a load/store-store barrier, when it's not. Thanks for pointing it out Eli! llvm-svn: 185356	2013-07-01 18:37:33 +00:00
Tim Northover	25286e5b71	ARM: relax the atomic release barrier to "dmb ishst" I believe the full "dmb ish" barrier is not required to guarantee release semantics for atomic operations. The weaker "dmb ishst" prevents previous operations being reordered with a store executed afterwards, which is enough. A key point to note (fortunately already correct) is that this barrier alone is insufficient for sequential consistency, no matter how liberally placed. llvm-svn: 185339	2013-07-01 14:48:48 +00:00
Justin Holewinski	d88c5d6e19	[NVPTX] Add support for module-scope inline asm Since we were explicitly not calling AsmPrinter::doInitialization, any module-scope inline asm was not being printed. llvm-svn: 185336	2013-07-01 13:00:14 +00:00
Justin Holewinski	89a1f98197	[NVPTX] 64-bit ADDC/ADDE are not legal llvm-svn: 185333	2013-07-01 12:59:04 +00:00
Justin Holewinski	6284a5cea6	[NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos llvm-svn: 185332	2013-07-01 12:59:01 +00:00
Justin Holewinski	77ba2f5ed9	[NVPTX] Handle signext/zeroext attributes properly Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value. Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions llvm-svn: 185331	2013-07-01 12:58:58 +00:00
Justin Holewinski	46fe052e1f	[NVPTX] Add support for native SIGN_EXTEND_INREG where available llvm-svn: 185330	2013-07-01 12:58:56 +00:00
Justin Holewinski	46d3cad4d8	[NVPTX] Add isel patterns for [reg+offset] form of ldg/ldu. llvm-svn: 185329	2013-07-01 12:58:52 +00:00
Justin Holewinski	c254f0f839	[NVPTX] Make sure we zero out high-order 24 bits for 8-bit load into 32-bit value llvm-svn: 185328	2013-07-01 12:58:48 +00:00
Vincent Lejeune	4cef82fa31	R600: Support schedule and packetization of trans-only inst llvm-svn: 185268	2013-06-29 19:32:43 +00:00
Hal Finkel	055ca2ecc9	PPC: Ignore spill/restore requests for VRSAVE (except on Darwin) This fixes PR16418, which reports that a function calling __builtin_unwind_init() asserts. The cause is that this generates a spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is only really used on Darwin). The test case checks only that we don't crash. We can add correctness checks once someone verifies what behavior the function is supposed to have. llvm-svn: 185235	2013-06-28 22:29:56 +00:00
Hal Finkel	49c072c532	Fix CodeGen/PowerPC/stack-protector.ll on OpenBSD On OpenBSD, the stack-smash protection transform uses "__guard_local" and "__stack_smash_handler" instead of "__stack_chk_guard" and "__stack_chk_fail". However, CodeGen/PowerPC/stack-protector.ll doesn't specify a target OS, so on OpenBSD it fails. Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While there, convert to FileCheck. Patch by Matthew Dempsky. llvm-svn: 185206	2013-06-28 20:18:14 +00:00
Hal Finkel	7f9144ae20	Fix a PPC rlwimi instruction-selection bug Under certain (evidently rare) circumstances, this code used to convert OR(a, AND(x, y)) into OR(a, x). This was incorrect. While there, I've added a comment to the code immediately above. llvm-svn: 185201	2013-06-28 20:00:07 +00:00
Lang Hames	e8b20c4f7b	Add missing case to switch statement - DAGTypeLegalizer::ExpandIntegerResult should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP. Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash during isel. <rdar://problem/14074644> llvm-svn: 185186	2013-06-28 18:36:42 +00:00
Justin Holewinski	434a514175	[NVPTX] Add (1.0 / sqrt(x)) => rsqrt(x) generation when allowable by FP flags llvm-svn: 185178	2013-06-28 17:58:13 +00:00
Justin Holewinski	f17855a9dc	[NVPTX] Calling conventions fix Fix ABI handling for function returning bool -- use st.param.b32 to return the value and use ld.param.b32 in caller to load the return value. llvm-svn: 185177	2013-06-28 17:58:10 +00:00
Justin Holewinski	6feb5e8392	[NVPTX] Add support for cttz/ctlz/ctpop llvm-svn: 185176	2013-06-28 17:58:07 +00:00
Justin Holewinski	d365a376eb	[NVPTX] Clean up comparison/select/convert patterns and factor out PTX instructions from their patterns Test case is no breakage llvm-svn: 185175	2013-06-28 17:58:04 +00:00
Justin Holewinski	0f70140107	[NVPTX] Remove i8 register class. PTX support for i8 (.b8, .u8, .s8) is rather poor and we're better off just ignoring it and letting LLVM expand all i8 ops out to i16. llvm-svn: 185174	2013-06-28 17:57:59 +00:00
Justin Holewinski	9ae87e685a	[NVPTX] Add support for vectorized function return values llvm-svn: 185173	2013-06-28 17:57:55 +00:00
Justin Holewinski	7332dc0027	[NVPTX] Clean up handling of formal arguments and enable generation of vector parameter loads llvm-svn: 185172	2013-06-28 17:57:53 +00:00
Weiming Zhao	b97c1a69a2	Bug 13662: Enable GPRPair for all i64 operands of inline asm on ARM This patch assigns paired GPRs for inline asm with 64-bit data on ARM. It's enabled for both ARM and Thumb to support modifiers like %H, %Q, %R. llvm-svn: 185169	2013-06-28 17:26:02 +00:00
Tom Stellard	99f122e9be	R600: Add local memory support via LDS Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 185162	2013-06-28 15:47:08 +00:00
Tom Stellard	97e3c49801	R600: Add support for GROUP_BARRIER instruction Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 185161	2013-06-28 15:46:59 +00:00
Tim Northover	e13264c995	ARM: ensure fixed-point conversions have sane types We were generating intrinsics for NEON fixed-point conversions that didn't exist (e.g. float -> i16). There are two cases to consider: + iN is smaller than float. In this case we can do the conversion but need an extend or truncate as well. + iN is larger than float. In this case using the NEON conversion would be incorrect so we don't perform any combining. llvm-svn: 185158	2013-06-28 15:29:25 +00:00
Manman Ren	5bedd08922	Debug Info: clean up usage of Verify. No functionality change. It should suffice to check the type of a debug info metadata, instead of calling Verify. For cases where we know the type of a DI metadata, use assert. Also update testing cases to make them conform to the format of DI classes. llvm-svn: 185135	2013-06-28 05:43:10 +00:00
Tom Stellard	e3160dde2c	R600: Remove alu-split.ll test The purpose of this test was to check boundary conditions for the size of an ALU clause. This test is very sensitive to changes to the optimizer or scheduler, because it requires an exact number of ALU instructions in order to remain valid. It's not good to have a test this sensitive, because it is confusing to developers who implement optimizations and then 'break' the test. I'm not sure if there is a good way to test these limits using lit, but if I can come up with replacement test that isn't as sensitive I'll add it back to the tree. llvm-svn: 185084	2013-06-27 17:00:38 +00:00
Joey Gouly	42f1898415	Add a Subtarget feature 'v8fp' to the ARM backend. llvm-svn: 185073	2013-06-27 11:49:26 +00:00

1 2 3 4 5 ...

7865 Commits