llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Rafael Espindola	71b6b2942e	Don't pass relocation-model= to tests that don't need it. Very few things in MC itself use the option. Most of the code that that uses it could be move to CodeGen. llvm-svn: 269871	2016-05-18 00:27:17 +00:00
Renato Golin	6fa4110be7	Fix an assert in SelectionDAGBuilder when processing inline asm When processing inline asm that contains errors, make sure we can recover gracefully by creating an UNDEF SDValue for the inline asm statement before returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for consumers that don't exit on the first error that is emitted (e.g. clang) and that would assert later on. Fixes PR24071. Patch by Diana Picus. llvm-svn: 269811	2016-05-17 19:52:01 +00:00
Rafael Espindola	6af348a0a8	Add a test showing how hidden stubs are handled on ppc. llvm-svn: 269766	2016-05-17 14:24:33 +00:00
Renato Golin	8761d9ca19	[llc] New diagnostic handler Without a diagnostic handler installed, llc's behaviour is to exit on the first error that it encounters. This is very different from the behaviour of clang and other front ends, which try to gather as many errors as possible before exiting. This commit adds a diagnostic handler to llc, allowing it to find and report more than one error. The old behaviour is preserved under a flag (-exit-on-error). Some of the tests fail with the new diagnostic handler, so they have to use the new flag in order to run under the previous behaviour. Some of these are known bugs, others need further investigation. Ideally, we should fix the tests and remove the flag at some point in the future. Reapplied after fixing the LLDB build that was broken due to the new DiagnosticSeverity in LLVMContext.h, and fixed an UB in the new change. Patch by Diana Picus. llvm-svn: 269655	2016-05-16 14:28:02 +00:00
Renato Golin	9a07441a63	Revert "[llc] New diagnostic handler" This reverts commit r269563. Even though now it passes all LLDB bots after a local fix, there's a new buildbot it fails with tests that we hadn't seen locally: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/15647 Adding those tests to the list to investigate. llvm-svn: 269568	2016-05-14 14:37:11 +00:00
Renato Golin	2f3e6a2fbd	[llc] New diagnostic handler Without a diagnostic handler installed, llc's behaviour is to exit on the first error that it encounters. This is very different from the behaviour of clang and other front ends, which try to gather as many errors as possible before exiting. This commit adds a diagnostic handler to llc, allowing it to find and report more than one error. The old behaviour is preserved under a flag (-exit-on-error). Some of the tests fail with the new diagnostic handler, so they have to use the new flag in order to run under the previous behaviour. Some of these are known bugs, others need further investigation. Ideally, we should fix the tests and remove the flag at some point in the future. Reapplied after fixing the LLDB build that was broken due to the new DiagnosticSeverity in LLVMContext.h. Patch by Diana Picus. llvm-svn: 269563	2016-05-14 13:15:22 +00:00
Renato Golin	b3cf714aaf	Revert "[llc] New diagnostic handler" This reverts commit r269428, as it breaks the LLDB build. We need to understand how to change LLDB in the same way as LLC before landing this again. llvm-svn: 269432	2016-05-13 16:02:44 +00:00
Renato Golin	212232b871	[llc] New diagnostic handler Without a diagnostic handler installed, llc's behaviour is to exit on the first error that it encounters. This is very different from the behaviour of clang and other front ends, which try to gather as many errors as possible before exiting. This commit adds a diagnostic handler to llc, allowing it to find and report more than one error. The old behaviour is preserved under a flag (-exit-on-error). Some of the tests fail with the new diagnostic handler, so they have to use the new flag in order to run under the previous behaviour. Some of these are known bugs, others need further investigation. Ideally, we should fix the tests and remove the flag at some point in the future. Patch by Diana Picus. llvm-svn: 269428	2016-05-13 15:37:46 +00:00
Hal Finkel	ff8397dabb	[PowerPC] Fix a DAG replacement bug in PPCTargetLowering::DAGCombineExtBoolTrunc While promoting nodes in PPCTargetLowering::DAGCombineExtBoolTrunc, it is possible for one of the nodes to be replaced by another. To make sure we do not visit the deleted nodes, and to make sure we visit the replacement nodes, use a list of HandleSDNodes to track the to-be-promoted nodes during the promotion process. The same fix has been applied to the analogous code in PPCTargetLowering::DAGCombineTruncBoolExt. Fixes PR26985. llvm-svn: 269272	2016-05-12 04:00:56 +00:00
Rafael Espindola	3560edede3	Make "@name =" mandatory for globals in .ll files. An oddity of the .ll syntax is that the "@var = " in @var = global i32 42 is optional. Writing just global i32 42 is equivalent to @0 = global i32 42 This means that there is a pretty big First set at the top level. The current implementation maintains it manually. I was trying to refactor it, but then started wondering why keep it a all. I personally find the above syntax confusing. It looks like something is missing. This patch removes the feature and simplifies the parser. llvm-svn: 269096	2016-05-10 18:22:45 +00:00
Nemanja Ivanovic	286a9532e8	[Power9] Add support for -mcpu=pwr9 in the back end This patch corresponds to review: http://reviews.llvm.org/D19683 Simply adds the bits for being able to specify -mcpu=pwr9 to the back end. llvm-svn: 268950	2016-05-09 18:54:58 +00:00
Strahinja Petrovic	e297a43f7c	[PowerPC] fix register alignment for long double type This patch fixes register alignment for long double type in soft float mode. Before this patch alignment was 8 and this patch changes it to 4. Differential Revision: http://reviews.llvm.org/D18034 llvm-svn: 268909	2016-05-09 12:27:39 +00:00
Nemanja Ivanovic	d6a1215edc	[PowerPC] Generate VSX version of splat word This patch corresponds to review: http://reviews.llvm.org/D18592 It allows the PPC back end to generate the xxspltw instruction where we previously only emitted vspltw. llvm-svn: 268516	2016-05-04 16:04:02 +00:00
Haicheng Wu	611abb7dc9	[MBP] Use Function::optForSize() instead of checking OptimizeForSize directly. Fix a FIXME. Disable loop alignment if compiled with -Oz now. llvm-svn: 268121	2016-04-29 22:01:10 +00:00
Guozhi Wei	194cccd63b	[PPC] Enable shuffling of VSX vectors This patch fixes PR27078 by enabling shuffling of vectors if VSX is available. llvm-svn: 268064	2016-04-29 17:00:54 +00:00
Marcin Koscielnicki	832d560a7e	[PowerPC] Fix the EH_SjLj_Setup pseudo. This instruction is just a control flow marker - it should not actually exist in the object file. Unfortunately, nothing catches it before it gets to AsmPrinter. If integrated assembler is used, it's considered to be a normal 4-byte instruction, and emitted as an all-0 word, crashing the program. With external assembler, a comment is emitted. Fixed by setting Size to 0 and handling it in MCCodeEmitter - this means the comment will still be emitted if integrated assembler is not used. This broke an ASan test, which has been disabled for a long time as a result (see the discussion on D19657). We can reenable it once this lands. llvm-svn: 267943	2016-04-28 21:24:37 +00:00
Chuang-Yu Cheng	d389efdf8e	[ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit This fixes PR27414 Reviewers: kbarton mgrang tjablin http://reviews.llvm.org/D19255 llvm-svn: 267660	2016-04-27 02:59:28 +00:00
Marcin Koscielnicki	704c818d77	[PowerPC] Add support for llvm.thread.pointer Differential Revision: http://reviews.llvm.org/D19304 llvm-svn: 267546	2016-04-26 10:37:22 +00:00
Marcin Koscielnicki	de3ced2d10	[PR27390] [CodeGen] Reject indexed loads in CombinerDAG. visitAND, when folding and (load) forgets to check which output of an indexed load is involved, happily folding the updated address output on the following testcase: target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" %typ = type { i32, i32 } define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) { %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1 %1 = load i32, i32* %b, align 4 %2 = ptrtoint i32* %b to i64 %3 = and i64 %2, -35184372088833 %4 = inttoptr i64 %3 to i32* %_msld = load i32, i32* %4, align 4 %zzz = add i32 %1, %_msld ret i32 %zzz } Fix this by checking ResNo. I've found a few more places that currently neglect to check for indexed load, and tightened them up as well, but I don't have test cases for them. In fact, they might not be triggerable at all, at least with current targets. Still, better safe than sorry. Differential Revision: http://reviews.llvm.org/D19202 llvm-svn: 267420	2016-04-25 15:43:44 +00:00
Marcin Koscielnicki	5a59940bb3	[PowerPC] [PR27387] Disallow r0 for ADD8TLS. ADD8TLS, a variant of add instruction used for initial-exec TLS, currently accepts r0 as a source register. While add itself supports r0 just fine, linker can relax it to a local-exec sequence, converting it to addi - which doesn't support r0. Differential Revision: http://reviews.llvm.org/D19193 llvm-svn: 267388	2016-04-25 09:24:34 +00:00
Sanjay Patel	67f5c38cab	use FileCheck; add test for disguised fabs llvm-svn: 267051	2016-04-21 20:58:58 +00:00
Marcin Koscielnicki	452981873a	[PowerPC] [SSP] Fix stack guard load for 32-bit. r266809 incorrectly used LD to load the stack guard, it should be LWZ. Differential Revision: http://reviews.llvm.org/D19358 llvm-svn: 267017	2016-04-21 17:36:05 +00:00
Mandeep Singh Grang	28ddad394b	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC. Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834	2016-04-19 23:51:52 +00:00
Tim Shen	a12f27cb73	[PPC, SSP] Support PowerPC Linux stack protection. llvm-svn: 266809	2016-04-19 20:14:52 +00:00
Strahinja Petrovic	19968b2801	[PowerPC] add comment to test Added comment in test for soft-float operations on ppc architecture. Test commit. llvm-svn: 266600	2016-04-18 11:52:14 +00:00
Adrian Prantl	fb3abba237	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram. Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446	2016-04-15 15:57:41 +00:00
Nirav Dave	a36a5aeca3	Fix typing on generated LXV2DX/STXV2DX instructions [PPC] Previously when casting generic loads to LXV2DX/ST instructions we would leave the original load return type in place allowing for an assertion failure when we merge two equivalent LXV2DX nodes with different types. This fixes PR27350. Reviewers: nemanjai Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19133 llvm-svn: 266438	2016-04-15 15:01:38 +00:00
Sanjay Patel	4fd6cc9cca	[ppc] add tests to show potential andc optimization llvm-svn: 266261	2016-04-13 23:23:30 +00:00
Chuang-Yu Cheng	19e3905d51	[PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0 Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046). The PPCInstrInfo::optimizeCompareInstr function could create a new use of CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of CR0 is created. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D18884 llvm-svn: 266040	2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng	83dddda4a1	[PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field In the ELFv2 ABI, we are not required to save all CR fields. If only one nonvolatile CR field is clobbered, use mfocrf instead of mfcr to selectively save the field, because mfocrf has short latency compares to mfcr. Thanks Nemanja's invaluable hint! Reviewers: nemanjai tjablin hfinkel kbarton http://reviews.llvm.org/D17749 llvm-svn: 266038	2016-04-12 03:04:44 +00:00
Nirav Dave	7b6e012d2f	Fix Load Control Dependence in MemCpy Generation In Memcpy lowering we had missed a dependence from the load of the operation to successor operations. This causes us to potentially construct an in initial DAG with a memory dependence not fully represented in the chain sub-DAG but rather require looking at the entire DAG breaking alias analysis by allowing incorrect repositioning of memory operations. To work around this, r200033 changed DAGCombiner::GatherAllAliases to be conservative if any possible issues to happen. Unfortunately this check forbade many non-problematic situations as well. For example, it's common for incoming argument lowering to add a non-aliasing load hanging off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would find that other (unvisited) use of the EntryNode chain, and just give up entirely. Furthermore, the check was incomplete: it would not actually detect all such potentially problematic DAG constructions, because GatherAllAliases did not guarantee to visit all chain nodes going up to the root EntryNode. This is in general fine -- giving up early will just miss a potential optimization, not generate incorrect results. But, for this non-chain dependency detection code, it's possible that you could have a load attached to a higher-up chain node than any which were visited. If that load aliases your store, but the only dependency is through the value operand of a non-aliasing store, it would've been missed by this code, and potentially reordered. With the dependence added, this check can be removed and Alias Analysis can be much more aggressive. This fixes code quality regression in the Consecutive Store Merge cleanup (D14834). Test Change: ppc64-align-long-double.ll now may see multiple serializations of its stores Differential Revision: http://reviews.llvm.org/D18062 llvm-svn: 265836	2016-04-08 19:44:40 +00:00
Chuang-Yu Cheng	6e4b4f696f	CXX_FAST_TLS calling convention: performance improvement for PPC64 This is the same change on PPC64 as r255821 on AArch64. I have even borrowed his commit message. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D17533 llvm-svn: 265781	2016-04-08 12:04:32 +00:00
Amaury Sechet	fe775804df	Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule Summary: EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate. EHPad are scheduled in reverse probability order in order to have them flow into each others naturally. Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17625 llvm-svn: 265726	2016-04-07 21:29:39 +00:00
Ehsan Amiri	36d2ad6539	[PPC] Enable transformations in PPCPassConfig::addIRPasses at O2 http://reviews.llvm.org/D18562 A large number of testcases has been modified so they pass after this test. One testcase is deleted, because I realized even after undoing the original change that was committed with this testcase, the testcase still passes. So I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll) is an arbitrary change to keep it passing. Given the original intention of the testcase, and the fact that fixing it will require some time to change the testcase, we concluded that this quick change will be enough. llvm-svn: 265683	2016-04-07 15:30:55 +00:00
Ehsan Amiri	ee69c81e47	[PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP http://reviews.llvm.org/D18405 When the integer value loaded is never used directly as integer we should use VSX or Floating Point Facility integer loads and avoid extra direct move llvm-svn: 265593	2016-04-06 20:12:29 +00:00
Chuang-Yu Cheng	a5a77bb95c	[ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap test. http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708 llvm-svn: 265528	2016-04-06 10:48:36 +00:00
Chuang-Yu Cheng	de5217e247	[ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and add a couple of test cases. This patch also passed llvm/clang bootstrap test, and spec2006 build/run/result validation. Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617 Great thanks to Tom's (tjablin) help, he contributed a lot to this patch. Thanks Hal and Kit's invaluable opinions! Reviewers: hfinkel kbarton http://reviews.llvm.org/D16315 llvm-svn: 265506	2016-04-06 02:04:38 +00:00
Matthias Braun	14367ed4c8	test: Always treat .mir files as tests even outside of CodeGen/MIR We missed a handful of .mir tests that existed outside the test/CodeGen/MIR directory. Also fix the three powerpc .mir tests that nobody noticed were broken. llvm-svn: 265350	2016-04-04 21:23:44 +00:00
Chuang-Yu Cheng	283d5efd78	Fix Sub-register Rewriting in Aggressive Anti-Dependence Breaker Previously, HandleLastUse would delete RegRef information for sub-registers if they were dead even if their corresponding super-register were still live. If the super-register were later renamed, then the definitions of the sub-register would not be updated appropriately. This patch alters the behavior so that RegInfo information for sub-registers is only deleted when the sub-register and super-register are both dead. This resolves PR26775. This is the mirror image of Hal's r227311 commit. Author: Tom Jablin (tjablin) Reviewers: kbarton uweigand nemanjai hfinkel http://reviews.llvm.org/D18448 llvm-svn: 265097	2016-04-01 02:05:29 +00:00
Adrian Prantl	1653aa9f14	testcase gardening: update the emissionKind enum to the new syntax. (NFC) llvm-svn: 265081	2016-04-01 00:16:49 +00:00
Adrian Prantl	3e537cddb9	Move the DebugEmissionKind enum from DIBuilder into DICompileUnit. This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077	2016-03-31 23:56:58 +00:00
Tim Shen	bf4b3be718	Move asm-printer-topological-order.ll to PowerPC backend llvm-svn: 265067	2016-03-31 22:32:10 +00:00
Hal Finkel	15180ab71e	[PowerPC] Cleanup test/CodeGen/PowerPC/qpx-load-splat.ll Removing unnecessary attributes and metadata... llvm-svn: 265049	2016-03-31 20:45:00 +00:00
Hal Finkel	f7a5afdf0a	[PowerPC] Add a late MI-level pass for QPX load/splat simplification Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). llvm-svn: 265047	2016-03-31 20:39:41 +00:00
Ulrich Weigand	359f7aec9c	[PowerPC] Attempt to fix fast-isel-i64offset.ll failure The test case added in r265023 is failing on ninja-x64-msvc-RA-centos6. Update the test to make less specific assumptions on code generation. llvm-svn: 265026	2016-03-31 16:38:57 +00:00
Ulrich Weigand	20f5073bab	[PowerPC] Correctly compute 64-bit offsets in fast isel PPCSimplifyAddress contains this code: IntegerType OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(Context) : Type::getInt64Ty(Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually always MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. llvm-svn: 265023	2016-03-31 15:37:06 +00:00
Ulrich Weigand	ef758401da	[PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct if and only if the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 llvm-svn: 265020	2016-03-31 14:44:50 +00:00
Hal Finkel	fb163aaea6	[PowerPC] Load two floats directly instead of using one 64-bit integer load When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) llvm-svn: 264988	2016-03-31 02:56:05 +00:00
Hal Finkel	82fdd00490	[PowerPC] Refactor popcnt[dw] target features Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690	2016-03-29 01:36:01 +00:00
Kyle Butt	7a8a4c3cae	[Codegen] Decrease minimum jump table density. Minimum density for both optsize and non optsize are now options -sparse-jump-table-density (default 10) for non optsize functions -dense-jump-table-density (default 40) for optsize functions, which matches the current default. This improves several benchmarks at google at the cost of a small codesize increase. For code compiled with -Os, the old behavior continues llvm-svn: 264689	2016-03-29 00:23:41 +00:00
Hal Finkel	9df9f25590	[PowerPC] On the A2, popcnt[dw] are very slow The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600	2016-03-28 17:52:08 +00:00
Hal Finkel	10aaab9a43	[PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532	2016-03-27 05:40:56 +00:00
David Majnemer	e94081b48d	[PowerPC] Disable the CTR optimization in the presence of {min,max}num The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508	2016-03-26 09:42:31 +00:00
Eric Christopher	6ddb84b162	Finish the incomplete 'd' inline asm constraint support for PPC by making sure we give it a register and mark it as a register constraint. llvm-svn: 264340	2016-03-24 21:04:52 +00:00
Eric Christopher	0b5937f7d0	Reorder check lines, comments in test and remove unnecessary IR. llvm-svn: 264339	2016-03-24 21:04:47 +00:00
Nemanja Ivanovic	7d010d19df	[PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode This patch corresponds to review: http://reviews.llvm.org/D17711 It disables direct moves on these operations in 32-bit mode since the patterns assume 64-bit registers. The final patch is slightly different from the Phabricator review as the bitcast operations needed to be disabled in 32-bit mode as well. This fixes PR26617. llvm-svn: 264282	2016-03-24 13:40:33 +00:00
Kyle Butt	0c8ea793c0	Codegen: [PPC] Word Rotates are Zero Extending. Add Word rotates to the list of instructions that are zero extending. This allows them to be used in dot form to compare with zero. llvm-svn: 264183	2016-03-23 19:51:22 +00:00
Tim Shen	695f1e65cb	[PPC, FastISel] Fix ordered/unordered fcmp For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. llvm-svn: 263753	2016-03-17 22:27:58 +00:00
Petar Jovanovic	f95962007a	[PowerPC] Disable CTR loops optimization for soft float operations This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 llvm-svn: 263727	2016-03-17 17:11:33 +00:00
Nemanja Ivanovic	eaa9a4f81a	Fix for PR 26378 This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). llvm-svn: 263338	2016-03-12 10:23:07 +00:00
Kit Barton	0f693412b6	[PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions This has to be committed before the FE changes Phabricator: http://reviews.llvm.org/D17837 llvm-svn: 263035	2016-03-09 17:48:01 +00:00
Tim Shen	464ed3d905	[PPCVSXFMAMutate] Temporarily disable this pass llvm-svn: 262573	2016-03-03 01:27:35 +00:00
Nemanja Ivanovic	f9bc1afbcf	Fix for PR26180 Corresponds to Phabricator review: http://reviews.llvm.org/D16592 This fix includes both an update to how we handle the "generic" CPU on LE systems as well as Anton's fix for the Fast Isel issue. llvm-svn: 262233	2016-02-29 16:42:27 +00:00
Kit Barton	f070e5e85c	[PPC] Legalize FNEG on PPC when possible Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has native support for this opcode Phabricator: http://reviews.llvm.org/D17647 llvm-svn: 262079	2016-02-26 21:59:44 +00:00
Paul Robinson	119632fa81	Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT. FileCheck actually doesn't support combo suffixes. Differential Revision: http://reviews.llvm.org/D17588 llvm-svn: 262054	2016-02-26 19:40:34 +00:00
Nemanja Ivanovic	662ab414aa	Fix for PR26690 take 2 This is what was meant to be in the initial commit to fix this bug. The parens were missing. This commit also adds a test case for the bug and has undergone full testing on PPC and X86. llvm-svn: 261546	2016-02-22 18:04:00 +00:00
Justin Lebar	975bf7a977	When printing MIR, output to errs() rather than outs(). Summary: Without this, this command $ llvm-run llc -stop-after machine-cp -o - <( echo '' ) outputs an error, because we close stdout twice -- once when closing the file opened for "-o", and again when closing outs(). Also clarify in the outs() definition that you can't ever call it if you want to open your own raw_fd_ostream on stdout. Reviewers: jroelofs, tstellarAMD Subscribers: jholewinski, qcolombet, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17422 llvm-svn: 261286	2016-02-19 00:18:46 +00:00
Nemanja Ivanovic	a54ae64cf1	Add the missing test case for PR26193 llvm-svn: 259888	2016-02-05 15:03:17 +00:00
Nemanja Ivanovic	b7bc445a9f	Fix for PR 26356 Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840	2016-02-04 23:14:42 +00:00
Nemanja Ivanovic	8a43819d85	Provide a test case for rl259798 llvm-svn: 259835	2016-02-04 22:36:10 +00:00
Renato Golin	f1ae93e13e	[PPC] Move PPC test to a PPC-specific dir llvm-svn: 259797	2016-02-04 16:14:59 +00:00
Nemanja Ivanovic	5a78d13728	Test case for PR 26381 llvm-svn: 259740	2016-02-04 01:58:20 +00:00
Tim Shen	c6dc619045	[SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior This patch consists of two parts: a performance fix in DAGCombiner.cpp and a correctness fix in SelectionDAG.cpp. The test case tests the bug that's uncovered by the performance fix, and fixed by the correctness fix. The performance fix keeps the containers required by the hasPredecessorHelper (which is a lazy DFS) and reuse them. Since hasPredecessorHelper is called in a loop, the overall efficiency reduced from O(n^2) to O(n), where n is the number of SDNodes. The correctness fix keeps iterating the neighbor list even if it's time to early return. It will return after finishing adding all neighbors to Worklist, so that no neighbors are discarded due to the original early return. llvm-svn: 259691	2016-02-03 20:58:55 +00:00
Jonas Paulsson	99afcac2ad	[ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten. Recommited, after some fixing with test cases. Updated test cases: test/CodeGen/AArch64/arm64-misched-memdep-bug.ll test/CodeGen/AArch64/tailcall_misched_graph.ll Temporarily disabled test cases: test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated) test/CodeGen/PowerPC/vsx-fma-m.ll test/CodeGen/PowerPC/vsx-fma-sp.ll http://reviews.llvm.org/D8705 Reviewers: Hal Finkel, Andy Trick. llvm-svn: 259673	2016-02-03 17:52:29 +00:00
Kyle Butt	77f7a2b7a8	Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates. The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617	2016-02-03 01:41:09 +00:00
Eric Christopher	14a36607b8	Since LI/LIS sign extend the constant passed into the instruction we should check that the sign extended constant fits into 16-bits if we want a zero extended value, otherwise go ahead and put it together piecemeal. Fixes PR26356. llvm-svn: 259177	2016-01-29 07:20:01 +00:00
David Majnemer	444856c58e	Address buildbot fallout from r259065 llvm-svn: 259074	2016-01-28 18:59:04 +00:00
Dan Gohman	a72e83c26e	[MC] Use .p2align instead of .align For historic reasons, the behavior of .align differs between targets. Fortunately, there are alternatives, .p2align and .balign, which make the interpretation of the parameter explicit, and which behave consistently across targets. This patch teaches MC to use .p2align instead of .align, so that people reading code for multiple architectures don't have to remember which way each platform does its .align directive. Differential Revision: http://reviews.llvm.org/D16549 llvm-svn: 258750	2016-01-26 00:03:25 +00:00
Ulrich Weigand	257810adf2	[PowerPC] Fix large code model with the ELFv2 ABI The global entry point prologue currently assumes that the TOC associated with a function is less than 2GB away from the function entry point. This is always true when using the medium or small code model, but may not be the case when using the large code model. This patch adds a new variant of the ELFv2 global entry point prologue that lifts the 2GB restriction when building with -mcmodel=large. This works by emitting a quadword containing the distance from the function entry point to its associated TOC immediately before the entry point, and then using a prologue like: ld r2,-8(r12) add r2,r2,r12 Since creation of the entry point prologue is now split across two separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog code), I've switched to using named labels instead of just temporaries to indicate the locations of the global and local entry points and the new TOC offset data word. These names are provided by new routines in PPCFunctionInfo modeled after the existing PPCFunctionInfo::getPICOffsetSymbol. Note that a corresponding change was committed to GCC here: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html Reviewers: hfinkel Differential Revision: http://reviews.llvm.org/D15500 llvm-svn: 257597	2016-01-13 13:12:23 +00:00
Kyle Butt	f1fe33f755	Codegen: [PPC] Handle weighted comparisons when inserting selects. Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle the weighted predicates as well. This latent bug was triggered by r255398, because it added use of the branch-weighted predicates. While here, switch over an enum instead of an int to get the compiler to enforce totality in the future. llvm-svn: 257518	2016-01-12 21:00:43 +00:00
Rafael Espindola	a70cf2fac3	Remove a bugs assert. There is no reason the value being printed has to be positive. Fixes pr25802. llvm-svn: 257412	2016-01-11 23:21:45 +00:00
Kyle Butt	70e01b07f9	Add call sequence start and end for __tls_get_addr This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839. For a PIC TLS variable access in a function, prologue (mflr followed by std and stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but no one saves/restores it. Also added a test for save/restore clobbered registers during calling __tls_get_addr. Patch by Tim Shen llvm-svn: 257137	2016-01-08 02:06:19 +00:00
Nemanja Ivanovic	3c08538d6e	Bitcasts between FP and INT values using direct moves This patch corresponds to review: http://reviews.llvm.org/D15286 This patch was meant to land in revision 255246, but I accidentally uploaded the patch that corresponds to http://reviews.llvm.org/D15372 in that revision accidentally. Thereby, this patch is the actual Bitcasts using direct moves patch, whereas http://reviews.llvm.org/rL255246 actually corresponds to http://reviews.llvm.org/D15372. llvm-svn: 255649	2015-12-15 14:50:34 +00:00
Petar Jovanovic	6b5f6d05bd	[Power PC] llvm soft float support for ppc32 This is the second in a set of patches for soft float support for ppc32, it enables soft float operations. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D13700 llvm-svn: 255516	2015-12-14 17:57:33 +00:00
Hal Finkel	6f341281a9	Fix test/CodeGen/PowerPC/ppc-shrink-wrapping.ll after r255398 llvm-svn: 255414	2015-12-12 00:42:05 +00:00
Hal Finkel	28f5cfb976	[PowerPC] Add Branch Hints for Highly-Biased Branches This branch adds hints for highly biased branches on the PPC architecture. Even in absence of profiling information, LLVM will mark code reaching unreachable terminators and other exceptional control flow constructs as highly unlikely to be reached. Patch by Tom Jablin! llvm-svn: 255398	2015-12-12 00:32:00 +00:00
Kyle Butt	d374fb8cd3	[PPC]: Peephole optimize small accesss to aligned globals. Access to aligned globals gives us a chance to peephole optimize nonzero offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't overflow the available displacement. For example: addis 3, 2, b4v@toc@ha addi 4, 3, b4v@toc@l lbz 5, b4v@toc@l(3) ; This is the result of the current peephole lbz 6, 1(4) ; optimizer lbz 7, 2(4) lbz 8, 3(4) If b4v is 4-byte aligned, we can skip using register 4 because we know that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate: addis 3, 2, b4v@toc@ha lbz 4, b4v@toc@l(3) lbz 5, b4v@toc@l+1(3) lbz 6, b4v@toc@l+2(3) lbz 7, b4v@toc@l+3(3) Saving a register and an addition. Larger alignments allow larger structures/arrays to be optimized. llvm-svn: 255319	2015-12-11 00:47:36 +00:00
Eric Christopher	971f116a1c	Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst, x)) combines for ppc_fp128, since signbit computation is more complicated. Discussion thread: http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html Patch by Tim Shen! llvm-svn: 255305	2015-12-10 22:09:06 +00:00
Kyle Butt	706d39a054	PPC: Teach FMA mutate to respect register classes. This was causing bad code gen and assembly that won't assemble, as mixed altivec and vsx code would end up with a vsx high register assigned to an altivec instruction, which won't work. Constraining the classes allows the optimization to proceed. llvm-svn: 255299	2015-12-10 21:28:40 +00:00
Nemanja Ivanovic	1c3c6f9380	Bitcasts between FP and INT values using direct moves This patch corresponds to review: http://reviews.llvm.org/D15286 LLVM IR frequently contains bitcast operations between floating point and integer values of the same width. Doing this through memory operations is quite expensive on PPC. This patch allows the use of direct register moves between FPRs and GPRs for lowering bitcasts. llvm-svn: 255246	2015-12-10 13:35:28 +00:00
Kit Barton	d8708a5236	[PPC64] Convert bool literals to i32 Convert i1 values to i32 values if they should be allocated in GPRs instead of CRs. Phabricator: http://reviews.llvm.org/D14064 llvm-svn: 254942	2015-12-07 20:50:29 +00:00
Kyle Butt	33b91653ae	Tests: PPC: remove unnecessary metadata. NFC Remove unnecessary metadata from a test case. llvm-svn: 254544	2015-12-02 21:08:03 +00:00
Kyle Butt	4f2d0356a1	[CodeGen]: Fix bad interaction with AntiDep breaking and inline asm. AggressiveAntiDepBreaker was renaming registers specified by the user for inline assembly. While this will work for compiler-specified registers, it won't work for user-specified registers, and at the time this runs, I don't currently see a way to distinguish them. llvm-svn: 254532	2015-12-02 18:58:51 +00:00
Nemanja Ivanovic	1662faa6c9	Patch to fix a crash in the PowerPC back end due to ISD::ROTL and ISD::ROTR not being expanded. Test case included. llvm-svn: 254501	2015-12-02 10:36:24 +00:00
Yury Gribov	8eadab0fdb	Introduce new @llvm.get.dynamic.area.offset.i{32, 64} intrinsics. The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset from native stack pointer to the address of the most recent dynamic alloca on the caller's stack. These intrinsics are intendend for use in combination with @llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning routines. Patch by Max Ostapenko. Differential Revision: http://reviews.llvm.org/D14983 llvm-svn: 254404	2015-12-01 11:40:55 +00:00
Kit Barton	faa2a02a53	Enable shrink wrapping for PPC64 Re-enable shrink wrapping for PPC64 Little Endian. One minor modification to PPCFrameLowering::findScratchRegister was necessary to handle fall-thru blocks (blocks with no terminator) correctly. Tested with all LLVM test, clang tests, and the self-hosting build, with no problems found. PHabricator: http://reviews.llvm.org/D14778 llvm-svn: 254314	2015-11-30 18:59:41 +00:00
Hal Finkel	f66d2e051d	[PowerPC] Don't generate mfocrf on the e500mc The e500mc does not actually support the mfocrf instruction; update the processor definitions to reflect that fact. Patch by Tom Rix (with some test-case cleanup by me). llvm-svn: 254064	2015-11-25 10:14:31 +00:00
Cong Hou	5747eb82f8	Let SelectionDAG start to use probability-based interface to add successors. The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes. 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights. 3. Use new interfaces in all other passes. 4. Remove old interfaces. This the second patch above. In this patch SelectionDAG starts to use probability-based interfaces in MBB to add successors but other MC passes are still using weight-based interfaces. Therefore, we need to maintain correct weight list in MBB even when probability-based interfaces are used. This is done by updating weight list in probability-based interfaces by treating the numerator of probabilities as weights. This change affects many test cases that check successor weight values. I will update those test cases once this patch looks good to you. Differential revision: http://reviews.llvm.org/D14361 llvm-svn: 253965	2015-11-24 08:51:23 +00:00
Eric Christopher	6ad23a3203	Weak non-function symbols were being accessed directly, which is incorrect, as the chosen representative of the weak symbol may not live with the code in question. Always indirect the access through the TOC instead. Patch by Kyle Butt! llvm-svn: 253708	2015-11-20 20:51:31 +00:00
Pete Cooper	b753649d63	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	aca4c5cdc6	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Kit Barton	1bc0b4ec67	Find available scratch register to use in function prologue and epilogue as part of shrink wrapping. Phabricator: http://reviews.llvm.org/D13955 llvm-svn: 253247	2015-11-16 20:22:15 +00:00
James Molloy	c197f50184	[SDAG] Introduce a new BITREVERSE node along with a corresponding LLVM intrinsic Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target. This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support. The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently). llvm-svn: 252878	2015-11-12 12:29:09 +00:00
Bill Schmidt	4bb7c62dcb	[PowerPC] Add an MI SSA peephole pass. This patch adds a pass for doing PowerPC peephole optimizations at the MI level while the code is still in SSA form. This allows for easy modifications to the instructions while depending on a subsequent pass of DCE. Both passes are very fast due to the characteristics of SSA. At this time, the only peepholes added are for cleaning up various redundancies involving the XXPERMDI instruction. However, I would expect this will be a useful place to add more peepholes for inefficiencies generated during instruction selection. The pass is placed after VSX swap optimization, as it is best to let that pass remove unnecessary swaps before performing any remaining clean-ups. The utility of these clean-ups are demonstrated by changes to four existing test cases, all of which now have tighter expected code generation. I've also added Eric Schweiz's bugpoint-reduced test from PR25157, for which we now generate tight code. One other test started failing for me, and I've fixed it (test/Transforms/PlaceSafepoints/finite-loops.ll) as well; this is not related to my changes, and I'm not sure why it works before and not after. The problem is that the CHECK-NOT: of "statepoint" from test1 fails because of the "statepoint" in test2, and so forth. Adding a CHECK-LABEL in between keeps the different occurrences of that string properly scoped. llvm-svn: 252651	2015-11-10 21:38:26 +00:00
Hal Finkel	62d1b738df	[PowerPC] Fix LoopPreIncPrep not to depend on SCEV constant simplifications Under most circumstances, if SCEV can simplify X-Y to a constant, then it can also simplify Y-X to a constant. However, there is no guarantee that this is always true, and concensus is not to consider that a correctness bug in SCEV (although it is undesirable). PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and prefetches) into buckets, where in each bucket the relative pointer offsets are constant. We used to keep each bucket as a multimap, where SCEV's subtraction operation was used to define the ordering predicate. Instead, use a fixed SCEV base expression for each bucket, record the constant offsets from that base expression, and adjust it later, if desirable, once all pointers have been collected. Doing it this way should be more compile-time efficient than the previous scheme (in addition to making the implementation less sensitive to SCEV simplification quirks). Fixes PR25170. llvm-svn: 252417	2015-11-08 08:04:40 +00:00
Peter Collingbourne	5b721561aa	DI: Reverse direction of subprogram -> function edge. Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219	2015-11-05 22:03:56 +00:00
Nemanja Ivanovic	6e3343082c	Fix for bootstrap bug introduced in r244921 This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. It turns out that the new code path taken due to legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a micro optimization to change a load followed by a scalar_to_vector into a load and splat instruction on PPC. llvm-svn: 251798	2015-11-02 14:01:11 +00:00
Hal Finkel	8938d74adf	[PowerPC] Recurse through constants when looking for TLS globals We cannot form ctr-based loops around function calls, including calls to __tls_get_addr used for PIC TLS variables. References to such TLS variables, however, might be buried within constant expressions, and so we need to search the entire constant expression to be sure that no references to such TLS variables exist. Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version of the patch suggested by Eric in the bug report, and a test case I created. llvm-svn: 251582	2015-10-28 23:43:00 +00:00
Hal Finkel	8b8e6bf88a	[PowerPC] Don't return unsupported register classes for asm constraints As a follow-up to r251566, do the same for the other optionally-supported register classes (mostly for vector registers). Don't return an unavailable register class (which would cause an assert later), but fail cleanly when provided an unsupported inline asm constraint. llvm-svn: 251575	2015-10-28 23:03:45 +00:00
Hal Finkel	ef38d45534	[PowerPC] Cleanly reject asm crbit constraint with -crbits When crbits are disabled, cleanly reject the constraint (return the register class only to cause an assert later). llvm-svn: 251566	2015-10-28 22:25:52 +00:00
Hal Finkel	50aaa71507	[PowerPC] Fix CodeGen/PowerPC/crbit-asm.ll test for -O1 Add the crbits processor feature so that the test can be run at -O1, etc. regardless of the default crbits setting. Fixes PR23778. llvm-svn: 251548	2015-10-28 19:58:02 +00:00
Hal Finkel	91f52b0e31	[PowerPC] Replace cntlz[.] with cntlzw[.] cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic. This change fixes an issue when -no-integrated-as: The opcode cntlz is unrecognized by gas Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.] This is done for because the POWER cntlz mnemonic has be used by LLVM for a very long time. We need to make sure that assembly programs that are using the cntlz[.] do not break with this change. Change PowerPC tests to reflect the insn change from cntlz to cntlzw. Add assembly test to verify cntlz[.] is encoded correctly. Patch by Tom Rix! llvm-svn: 251489	2015-10-28 03:26:45 +00:00
Sanjoy Das	bc8769d943	[SelectionDAG] Don't inspect !range metadata for extended loads Summary: Don't call `computeKnownBitsFromRangeMetadata` for extended loads -- this can cause a mismatch between the width of the !range metadata and the width of the APInt's accumulating `KnownZero` (and `KnownOne` in the future). This isn't a problem now, but will be after a future change. Note: this can be made more aggressive in the future. Reviewers: nlewycky Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14107 llvm-svn: 251486	2015-10-28 03:20:10 +00:00
Akira Hatanaka	7a13b29b79	[MachO] Stop generating coal sections. Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc. Some background on why we don't have to use coal sections anymore: Long ago when C++ was new and "weak" had not been standardized, an attempt was made in cctools to support C++ inlines that can be coalesced by putting them into their own section (TEXT/textcoal_nt instead of TEXT/text). The current macho linker supports the weak-def bit on any symbol to allow it to be coalesced, but the compiler still puts weak-def functions/data into alternate section names, which the linker must map back to the base section name. This patch makes changes that are necessary to prevent the compiler from using the "coal" sections and have it use the non-coal sections instead when the target architecture is not powerpc: TEXT/textcoal_nt instead use TEXT/text TEXT/const_coal instead use TEXT/const DATA/datacoal_nt instead use DATA/data If the target is powerpc, we continue to use the coal sections since anyone targeting powerpc is probably using an old linker that doesn't have support for the weak-def bits. Also, have the assembler issue a warning if it encounters a coal section in the assembly file and inform the users to use the non-coal sections instead. rdar://problem/14265330 Differential Revision: http://reviews.llvm.org/D13188 llvm-svn: 250370	2015-10-15 05:28:38 +00:00
Akira Hatanaka	a93f930f7c	Revert r250349. Test case coal-sections-powerpc.s is still failing on some buildbots. llvm-svn: 250351	2015-10-15 00:11:03 +00:00
Akira Hatanaka	f53ec10f0a	[MachO] Stop generating coal sections. Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests. Some background on why we don't have to use coal sections anymore: Long ago when C++ was new and "weak" had not been standardized, an attempt was made in cctools to support C++ inlines that can be coalesced by putting them into their own section (TEXT/textcoal_nt instead of TEXT/text). The current macho linker supports the weak-def bit on any symbol to allow it to be coalesced, but the compiler still puts weak-def functions/data into alternate section names, which the linker must map back to the base section name. This patch makes changes that are necessary to prevent the compiler from using the "coal" sections and have it use the non-coal sections instead when the target architecture is not powerpc: TEXT/textcoal_nt instead use TEXT/text TEXT/const_coal instead use TEXT/const DATA/datacoal_nt instead use DATA/data If the target is powerpc, we continue to use the coal sections since anyone targeting powerpc is probably using an old linker that doesn't have support for the weak-def bits. Also, have the assembler issue a warning if it encounters a coal section in the assembly file and inform the users to use the non-coal sections instead. rdar://problem/14265330 Differential Revision: http://reviews.llvm.org/D13188 llvm-svn: 250349	2015-10-14 23:48:10 +00:00
Akira Hatanaka	16b2844360	Revert r250342. Investigate why coal-sections-powerpc.s is failing on some buildbots. llvm-svn: 250346	2015-10-14 23:29:10 +00:00
Akira Hatanaka	3114aeb050	[MachO] Stop generating coal sections. Some background on why we don't have to use coal sections anymore: Long ago when C++ was new and "weak" had not been standardized, an attempt was made in cctools to support C++ inlines that can be coalesced by putting them into their own section (TEXT/textcoal_nt instead of TEXT/text). The current macho linker supports the weak-def bit on any symbol to allow it to be coalesced, but the compiler still puts weak-def functions/data into alternate section names, which the linker must map back to the base section name. This patch makes changes that are necessary to prevent the compiler from using the "coal" sections and have it use the non-coal sections instead when the target architecture is not powerpc: TEXT/textcoal_nt instead use TEXT/text TEXT/const_coal instead use TEXT/const DATA/datacoal_nt instead use DATA/data If the target is powerpc, we continue to use the coal sections since anyone targeting powerpc is probably using an old linker that doesn't have support for the weak-def bits. Also, have the assembler issue a warning if it encounters a coal section in the assembly file and inform the users to use the non-coal sections instead. rdar://problem/14265330 Differential Revision: http://reviews.llvm.org/D13188 llvm-svn: 250342	2015-10-14 22:45:36 +00:00
Bill Schmidt	345a95f427	[PowerPC] Fix invalid lxvdsx optimization (PR25157) PR25157 identifies a bug where a load plus a vector shuffle is incorrectly converted into an LXVDSX instruction. That optimization is only valid if the load is of a doubleword, and in the noted case, it was not. This corrects that problem. Joint patch with Eric Schweitz, who provided the bugpoint-reduced test case. llvm-svn: 250324	2015-10-14 20:45:00 +00:00
Nemanja Ivanovic	a5dac0b623	Vector element extraction without stack operations on Power 8 This patch corresponds to review: http://reviews.llvm.org/D12032 This patch builds onto the patch that provided scalar to vector conversions without stack operations (D11471). Included in this patch: - Vector element extraction for all vector types with constant element number - Vector element extraction for v16i8 and v8i16 with variable element number - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up unnecessarily moving things around between registers Not included in this patch (will be in upcoming patch): - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with variable element number - Vector element insertion for variable/constant element number Testing is provided for all extractions. The extractions that are not implemented yet are just placeholders. llvm-svn: 249822	2015-10-09 11:12:18 +00:00
Hal Finkel	64fd7537a1	[PowerPC] Disable shrink wrapping Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924	2015-09-30 17:29:03 +00:00
Hal Finkel	fd31079026	[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698	2015-09-28 08:02:14 +00:00
Matthias Braun	913f3edce2	PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620	2015-09-25 21:41:40 +00:00
Matt Arsenault	70e6ce5a40	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Matthias Braun	8bd54b9028	SelectionDAG: Introduce PersistentID to SDNode for assert builds. This gives us more human readable numbers to identify nodes in debug dumps. Before: 0x7fcbd9700160: ch = EntryToken 0x7fcbd985c7c8: i64 = Register %RAX ... 0x7fcbd9700160: <multiple use> 0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2] 0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3] 0x7fcbd985c7c8: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3] Now: t0: ch = EntryToken t5: i64 = Register %RAX ... t0: <multiple use> t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2] t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3] t5: <multiple use> t6: <multiple use> t6: <multiple use> t7: ch = RETQ t5, t6, t6:1 [ORD=3] Differential Revision: http://reviews.llvm.org/D12564 llvm-svn: 248010	2015-09-18 17:41:00 +00:00
Quentin Colombet	57a713d5e0	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
Mehdi Amini	fe32b980b3	Make the default triple optional by allowing an empty string When building LLVM as a (potentially dynamic) library that can be linked against by multiple compilers, the default triple is not really meaningful. We allow to explicitely set it to an empty string when configuring LLVM. In this case, said "target independent" tests in the test suite that are using the default triple are disabled by matching the newly available feature "default_triple". Reviewers: probinson, echristo Differential Revision: http://reviews.llvm.org/D12660 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247775	2015-09-16 05:34:32 +00:00
David Blaikie	65b92c4f37	[opaque pointer type] Add textual IR support for explicit type parameter for global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378	2015-09-11 03:22:04 +00:00
Kit Barton	8bde6cb866	Enable the shrink wrapping optimization for PPC64. The changes in this patch are as follows: 1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function 2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function 3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run: Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64. Phabricator review: http://reviews.llvm.org/D11817 llvm-svn: 247237	2015-09-10 01:55:44 +00:00
Eric Christopher	a3c9004b9c	Fix the PPC CTR Loop pass to look for calls to the intrinsics that read CTR and count them as reading the CTR. llvm-svn: 247083	2015-09-08 22:14:58 +00:00
Hal Finkel	2e0ee83cd9	[PowerPC] Don't commute trivial rlwimi instructions To commute a trivial rlwimi instructions (meaning one with a full mask and zero shift), we'd need to ability to form an all-zero mask (instead of an all-one mask) using rlwimi. We can't represent this, however, and we'll miscompile code if we try. The code quality problem that this highlights (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS) will be fixed as a follow-up. Fixes PR24719. llvm-svn: 246937	2015-09-06 04:17:30 +00:00
Hal Finkel	2d02d8085a	[PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from an input pattern that looks like this: and(or(x, c1), c2) but the associated logic does not work if there are bits that are 1 in c1 but 0 in c2 (these are normally canonicalized away, but that can't happen if the 'or' has other users. Make sure we abort the transformation if such bits are discovered. Fixes PR24704. llvm-svn: 246900	2015-09-05 00:02:59 +00:00
Hal Finkel	978e2d8bf0	[PowerPC] Try harder to find a base+offset when looking for consecutive accesses When forming permutation-based unaligned vector loads, we need to know whether it is valid to read ahead of the requested address by a full vector length. Doing so is more efficient (and allows for more CSE with later loads), but could trigger a page fault if invalid. To determine validity, we look for other loads in the same block that access the relevant address range. The relevant point here is that we need to do this as part of the process of forming permutation-based vector loads, and this happens quite early in the SDAG pipeline - specifically before many of the address calculations are fully canonicalized. As a result, we need to try harder to recognize base+offset address computations, because they still might appear as chain of adds (base+offset+offset, for example). To account for this, we'll look through chains of adds, accumulating the constant offsets. llvm-svn: 246813	2015-09-03 22:37:44 +00:00
Hal Finkel	e53c76127a	[PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic If you compute the MMO offset using unsigned arithmetic, you end up with a large positive offset instead of a small negative one. In theory, this could cause bad instruction-scheduling decisions later. I noticed this by inspection from the debug output, and using that for the regression test is the best I can do right now. llvm-svn: 246805	2015-09-03 21:12:15 +00:00
Hal Finkel	4a4ac3736c	[PowerPC] Cleanup cost model for unaligned vector loads/stores I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712	2015-09-02 21:03:28 +00:00
Hal Finkel	daef1ea7bc	[PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the TableGen-generated matching code, and it does this by testing the same predicates used by the TableGen files. Unfortunately, when we added new P8Altivec-only predicates, we started universally testing them in LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8, we'd end up with a selection failure. llvm-svn: 246675	2015-09-02 16:52:37 +00:00
Hal Finkel	2e700a48c1	[DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 246507	2015-08-31 23:15:04 +00:00
Hal Finkel	60e52a69ad	[AggressiveAntiDepBreaker] Check for EarlyClobber on defining instruction AggressiveAntiDepBreaker was doing some EarlyClobber checking, but was not checking that the register being potentially renamed was defined by an early-clobber def where there was also a use, in that instruction, of the register being considered as the target of the rename. Fixes PR24014. llvm-svn: 246423	2015-08-31 07:51:36 +00:00
Hal Finkel	8f9c81dcf5	[PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands There were really two problems here. The first was that we had the truth tables for signed i1 comparisons backward. I imagine these are not very common, but if you have: setcc i1 x, y, LT this has the '0 1' and the '1 0' results flipped compared to: setcc i1 x, y, ULT because, in the signed case, '1 0' is really '-1 0', and the answer is not the same as in the unsigned case. The second problem was that we did not have patterns (at all) for the unsigned comparisons select_cc nodes for i1 comparison operands. This was the specific cause of PR24552. These had to be added (and a missing Altivec promotion added as well) to make sure these function for all types. I've added a bunch more test cases for these patterns, and there are a few FIXMEs in the test case regarding code-quality. Fixes PR24552. llvm-svn: 246400	2015-08-30 22:12:50 +00:00
Hal Finkel	1e41909d2b	[PowerPC/MIR Serialization] Target flags serialization support Add support for MIR serialization of PowerPC-specific operand target flags (based on the generic infrastructure added in r244185 and r245383). I won't even pretend that this is good test coverage, but this includes the regression test associated with r246372. Adding an MIR test for that fix is far superior to adding an IR-level test because particular instruction-scheduling decisions are necessary in order to expose the bug, and using an MIR test we can start the pipeline post-scheduling. llvm-svn: 246373	2015-08-30 07:50:35 +00:00
Duncan P. N. Exon Smith	0c1aee0b16	DI: Require subprogram definitions to be distinct As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327	2015-08-28 20:26:49 +00:00
Matt Arsenault	407c9d3378	Make MergeConsecutiveStores look at other stores on same chain When combiner AA is enabled, look at stores on the same chain. Non-aliasing stores are moved to the same chain so the existing code fails because it expects to find an adajcent store on a consecutive chain. Because of how DAGCombiner tries these store combines, MergeConsecutiveStores doesn't see the correct set of stores on the chain when it visits the other stores. Each store individually has its chain fixed before trying to merge consecutive stores, and then tries to merge stores from that point before the other stores have been processed to have their chains fixed. To fix this, attempt to use FindBetterChain on any possibly neighboring stores in visitSTORE. Suppose you have 4 32-bit stores that should be merged into 1 vector store. One store would be visited first, fixing the chain. What happens is because not all of the store chains have yet been fixed, 2 of the stores are merged. The other 2 stores later have their chains fixed, but because the other stores were already merged, they have different memory types and merging the two different sized stores is not supported and would be more difficult to handle. llvm-svn: 246307	2015-08-28 17:31:28 +00:00
Hal Finkel	d1f0f5ba25	[PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends We might end up with a trivial copy as the addend, and if so, we should ignore the corresponding FMA instruction. The trivial copy can be coalesced away later, so there's nothing to do here. We should not, however, assert. Fixes PR24544. llvm-svn: 245907	2015-08-24 23:48:28 +00:00
Bill Schmidt	a2ec47b475	[PPC64LE] Fix PR24546 - Swap optimization and debug values This patch fixes PR24546, which demonstrates a segfault during the VSX swap removal pass. The problem is that debug value instructions were not excluded from the list of instructions to be analyzed for webs of related computation. I've added the test case from the PR as a crash test in test/CodeGen/PowerPC. llvm-svn: 245862	2015-08-24 19:27:27 +00:00
Hal Finkel	8f05a818d7	[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741	2015-08-21 21:34:24 +00:00
Hal Finkel	b8e2581942	[PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs to be v2i64 (as that's the corresponding SETCC type). Fixes PR24225. llvm-svn: 245535	2015-08-20 03:02:02 +00:00
Hal Finkel	c4b6cee81f	[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128 This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128 operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)), but shouldn't (it should only apply to f32/f64 types). The result was a crash. llvm-svn: 245530	2015-08-20 01:18:20 +00:00
Nemanja Ivanovic	9f042b959b	Temporary fix for the self-host failures introduced by rL244921. This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. I am working on resolving the issue, but in the meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64 and the associated testing until I can get this fixed. llvm-svn: 245481	2015-08-19 19:04:47 +00:00
Nemanja Ivanovic	285f278c18	Scalar to vector conversions using direct moves This patch corresponds to review: http://reviews.llvm.org/D11471 It improves the code generated for converting a scalar to a vector value. With direct moves from GPRs to VSRs, we no longer require expensive stack operations for this. Subsequent patches will handle the reverse case and more general operations between vectors and their scalar elements. llvm-svn: 244921	2015-08-13 17:40:44 +00:00
Alex Lorenz	77ac257c14	StackMap: FastISel: Add an appropriate number of immediate operands to the frame setup instruction. This commit ensures that the stack map lowering code in FastISel adds an appropriate number of immediate operands to the frame setup instruction. The previous code added just one immediate operand, which was fine for a target like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit operands. This caused the machine verifier to report an error when the old code added just one. Reviewers: Juergen Ributzka Differential Revision: http://reviews.llvm.org/D11853 llvm-svn: 244508	2015-08-10 21:27:03 +00:00
Jonathan Roelofs	3ac128b6b7	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481	2015-08-10 19:01:27 +00:00
Hal Finkel	58e66c2716	[MachineCombiner] Don't use the opcode-only form of computeInstrLatency In r242277, I updated the MachineCombiner to work with itineraries, but I missed a call that is scheduling-model-only (the opcode-only form of computeInstrLatency). Using the form that takes an MI* allows this to work with itineraries (and should be NFC for subtargets with scheduling models). llvm-svn: 244020	2015-08-05 07:45:28 +00:00
Duncan P. N. Exon Smith	87c77233df	DI: Disallow uniquable DICompileUnits Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test \| grep -v test/Bitcode \| xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885	2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith	08a36a35c8	DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name ".ll" -o -name ".mir" \| xargs grep -l 'DILocalVariable' \| xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774	2015-07-31 18:58:39 +00:00
Bill Schmidt	e342da2fca	[PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask Given certain shuffle-vector masks, LLVM emits splat instructions which splat the wrong bytes from the source register. The issue is that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp does not ensure that the splat pattern found is requesting bytes that are aligned on an EltSize boundary. This patch detects this situation as not a valid splat mask, resulting in a permute being generated instead of a splat. Patch and test case by Tyler Kenney, cleaned up a bit by me. This is a simple bug fix that would be good to incorporate into 3.7. llvm-svn: 243519	2015-07-29 14:31:57 +00:00
Chih-Hung Hsieh	c40b4e8361	Fix typo. llvm-svn: 243475	2015-07-28 20:38:29 +00:00
Chih-Hung Hsieh	be61f44997	Limit this test only on linux. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243474	2015-07-28 20:31:10 +00:00
Chih-Hung Hsieh	9951404987	Move unit tests to target specific directories. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243454	2015-07-28 17:32:49 +00:00
Eric Christopher	e77ef518e8	Fix PPCMaterializeInt to check the size of the integer based on the extension property we're requesting - zero or sign extended. This fixes cases where we want to return a zero extended 32-bit -1 and not be sign extended for the entire register. Also updated the already out of date comment with the current behavior. llvm-svn: 243192	2015-07-25 00:48:08 +00:00
Eric Christopher	dd0ba8682f	Clean up function attributes on PPC fast-isel tests. llvm-svn: 243079	2015-07-24 01:07:50 +00:00
Bill Schmidt	76e220a5ef	[PPC64LE] More vector swap optimization TLC This makes one substantive change and a few stylistic changes to the VSX swap optimization pass. The substantive change is to permit LXSDX and LXSSPX instructions to participate in swap optimization computations. The previous change to insert a swap following a SUBREG_TO_REG widening operation makes this almost trivial. I experimented with also permitting STXSDX and STXSSPX instructions. This can be done using similar techniques: we could insert a swap prior to a narrowing COPY operation, and then permit these stores to participate. I prototyped this, but discovered that the pattern of a narrowing COPY followed by an STXSDX does not occur in any of our test-suite code. So instead, I added commentary indicating that this could be done. Other TLC: - I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate the direction of the copy. - I factored the insertion of swap instructions into a separate function. Finally, I added a new test case to check that the scalar-to-vector loads are working properly with swap optimization. llvm-svn: 242838	2015-07-21 21:40:17 +00:00
Bill Schmidt	56989d09e1	Add missing test for r242296 (vec_sld) llvm-svn: 242680	2015-07-20 15:43:21 +00:00
Bill Schmidt	af026989c7	[PowerPC] v4i32 is a VSRCRegClass I was looking at some vector code generation and kept seeing unnecessary vector copies into the Altivec half of the VSX registers. I discovered that we overlooked v4i32 when adding the register classes for VSX; we only added v4f32 and v2f64. This means that anything that canonicalizes into v4i32 (which is a LOT of stuff) ends up being forced into VRRC on its way to VSRC. The fix is one line. The rest of the patch is fixing up some test cases whose code generation has changed as a result. This seems like it would be a good candidate for backport to 3.7. llvm-svn: 242442	2015-07-16 21:14:07 +00:00
Hal Finkel	db4f85d64e	[PowerPC] Use the MachineCombiner to reassociate fadd/fmul This is a direct port of the code from the X86 backend (r239486/r240361), which uses the MachineCombiner to reassociate (floating-point) adds/muls to increase ILP, to the PowerPC backend. The rationale is the same. There is a lot of copy-and-paste here between the X86 code and the PowerPC code, and we should extract at least some of this into CodeGen somewhere. However, I don't want to do that until this code is enhanced to handle FMAs as well. After that, we'll be in a better position to extract the common parts. llvm-svn: 242279	2015-07-15 08:23:05 +00:00
Hal Finkel	f7a4ebfecb	[PowerPC] Support symbolic targets in patchpoints Follow-up r235483, with the corresponding support in PPC. We use a regular call for symbolic targets (because they're much cheaper than indirect calls). llvm-svn: 242239	2015-07-14 22:53:11 +00:00
Hal Finkel	2a88231c40	[PowerPC] Use the ABI indirect-call protocol for patchpoints We used to take the address specified as the direct target of the patchpoint and did no TOC-pointer handling. This, however, as not all that useful, because MCJIT tends to create a lot of modules, and they have their own TOC sections. Thus, to call from the generated code to other generated code, you really need to switch TOC pointers. Make this work as expected, and under ELFv1, tread the address as the function descriptor address so that the correct TOC pointer can be loaded. llvm-svn: 242217	2015-07-14 22:26:06 +00:00
Hal Finkel	7783df1bb4	[PowerPC] Fix the PPCInstrInfo::getInstrLatency implementation PowerPC uses itineraries to describe processor pipelines (and dispatch-group restrictions for P7/P8 cores). Unfortunately, the target-independent implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that looks for the largest cycle count in the pipeline for any given instruction. This, however, yields the wrong answer for the PPC itineraries, because we don't encode the full pipeline. Because the functional units are fully pipelined, we only model the initial stages (there are no relevant hazards in the later stages to model), and so the technique employed by getStageLatency does not really work. Instead, we should take the maximum output operand latency, and that's what PPCInstrInfo::getInstrLatency now does. This caused some test-case churn, including two unfortunate side effects. First, the new arrangement of copies we get from function parameters now sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the test cases), and we have one significant test-suite regression: SingleSource/Benchmarks/BenchmarkGame/spectral-norm 56.4185% +/- 18.9398% In this benchmark we have a loop with a vectorized FP divide, and it with the new scheduling both divides end up in the same dispatch group (which in this case seems to cause a problem, although why is not exactly clear). The grouping structure is hard to predict from the bottom of the loop, and there may not be much we can do to fix this. Very few other test-suite performance effects were really significant, but almost all weakly favor this change. However, in light of the issues highlighted above, I've left the old behavior available via a command-line flag. llvm-svn: 242188	2015-07-14 20:02:02 +00:00
Nemanja Ivanovic	a397908a1f	Add missing builtins to the PPC back end for ABI compliance (vol. 4) This patch corresponds to review: http://reviews.llvm.org/D11183 Back end portion of the fourth round of additions to altivec.h. llvm-svn: 242167	2015-07-14 17:25:20 +00:00
Bill Schmidt	381b5e4a38	[PPC64LE] More improvements to VSX swap optimization This patch allows VSX swap optimization to succeed more frequently. Specifically, it is concerned with common code sequences that occur when copying a scalar floating-point value to a vector register. This patch currently handles cases where the floating-point value is already in a register, but does not yet handle loads (such as via an LXSDX scalar floating-point VSX load). That will be dealt with later. A typical case is when a scalar value comes in as a floating-point parameter. The value is copied into a virtual VSFRC register, and then a sequence of SUBREG_TO_REG and/or COPY operations will convert it to a full vector register of the class required by the context. If this vector register is then used as part of a lane-permuted computation, the original scalar value will be in the wrong lane. We can fix this by adding a swap operation following any widening SUBREG_TO_REG operation. Additional COPY operations may be needed around the swap operation in order to keep register assignment happy, but these are pro forma operations that will be removed by coalescing. If a scalar value is otherwise directly referenced in a computation (such as by one of the many XS* vector-scalar operations), we currently disable swap optimization. These operations are lane-sensitive by definition. A MentionsPartialVR flag is added for use in each swap table entry that mentions a scalar floating-point register without having special handling defined. A common idiom for PPC64LE is to convert a double-precision scalar to a vector by performing a splat operation. This ensures that the value can be referenced as V[0], as it would be for big endian, whereas just converting the scalar to a vector with a SUBREG_TO_REG operation leaves this value only in V[1]. A doubleword splat operation is one form of an XXPERMDI instruction, which takes one doubleword from a first operand and another doubleword from a second operand, with a two-bit selector operand indicating which doublewords are chosen. In the general case, an XXPERMDI can be permitted in a lane-swapped region provided that it is properly transformed to select the corresponding swapped values. This transformation is to reverse the order of the two input operands, and to reverse and complement the bits of the selector operand (derivation left as an exercise to the reader ;). A new test case that exercises the scalar-to-vector and generalized XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll. The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to use CHECK-DAG instead of CHECK for two independent instructions that now appear in reverse order. There are two small unrelated changes that are added with this patch. First, the XXSLDWI instruction was incorrectly omitted from the list of lane-sensitive instructions; this is now fixed. Second, I observed that the same webs were being rejected over and over again for different reasons. Since it's sufficient to reject a web only once, I added a check for this to speed up the compilation time slightly. llvm-svn: 242081	2015-07-13 22:58:19 +00:00
Hal Finkel	568c3a41af	[PowerPC] Make use of the TargetRecip system r238842 added the TargetRecip system for controlling use of reciprocal estimates for sqrt and division using a set of parameters that can be set by the frontend. Clang now supports a sophisticated -mrecip option, and this will allow that option to effectively control the relevant code-generation functionality of the PPC backend. llvm-svn: 241985	2015-07-12 02:33:57 +00:00
Hal Finkel	f91045042a	[PowerPC] Support the nest parameter attribute This adds support for the 'nest' attribute, which allows the static chain register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11 is the chain register (which the PPC64 ELF ABI calls the "environment pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded from the function descriptor, but providing an explicit 'nest' parameter will override that process and use the value provided. This allows __builtin_call_with_static_chain to work as expected on PowerPC. llvm-svn: 241984	2015-07-12 00:37:44 +00:00
Nemanja Ivanovic	4dede06034	Add missing builtins to the PPC back end for ABI compliance (vol. 2) This patch corresponds to review: http://reviews.llvm.org/D10874 Back end portion of the second round of additions to altivec.h. llvm-svn: 241398	2015-07-05 06:03:51 +00:00
Bill Schmidt	8da856ce76	[PPC64LE] Remove implicit-subreg restriction from VSX swap removal In r241285, I removed the SUBREG_TO_REG restriction from VSX swap removal, determining that this was overly conservative. We have another form of the same restriction in that we check for the presence of implicit subregs in vector operations. As with SUBREG_TO_REG for partial register conversions, an implicit subreg is safe in and of itself, provided no other operation makes a lane-sensitive assumption about the result. This patch removes that restriction, by removing the HasImplicitSubreg flag and all code that relies on it. I've added a test case that fails to optimize before this patch is applied, and optimizes properly with the patch. Test based on a report from Anton Blanchard. llvm-svn: 241290	2015-07-02 19:01:22 +00:00
Bill Schmidt	0c245b6f0f	[PPC64LE] Teach swap optimization about the doubleword splat idiom With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285	2015-07-02 17:03:06 +00:00
Bill Schmidt	2cbe4a84c5	[PPC64LE] Enable missing lxvdsx optimization, and related swap optimization When adding little-endian vector support for PowerPC last year, I inadvertently disabled an optimization that recognizes a load-splat idiom and generates the lxvdsx instruction. This patch moves the offending logic so lxvdsx is once again generated. This pattern is frequently generated by the vectorizer for scalar loads of an effective constant. Previously the lxvdsx instruction was wrongly listed as lane-sensitive for the VSX swap optimization (since both doublewords are identical, swaps are safe). This patch fixes this as well, so that vectorized code using lxvdsx can now have swaps removed from the computation. There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll that checks for the missing optimization. However, vsx.ll was only being tested for POWER7 with big-endian code generation. I've added a little-endian RUN statement and expected LE code generation for all the tests in vsx.ll to give us a bit better VSX coverage, including what's needed for this patch. llvm-svn: 241183	2015-07-01 19:40:07 +00:00
Nemanja Ivanovic	7d2f845a9d	Fixes a bug with __builtin_vsx_lxvdw4x on Little Endian systems llvm-svn: 241108	2015-06-30 19:45:45 +00:00
Nemanja Ivanovic	634851fffb	Add missing builtins to the PPC back end for ABI compliance (vol. 1) This patch corresponds to review: http://reviews.llvm.org/D10638 This is the back end portion of patch http://reviews.llvm.org/D10637 It just adds the code gen and intrinsic functions necessary to support that patch to the back end. llvm-svn: 240820	2015-06-26 19:26:53 +00:00
Kit Barton	230223268c	[PPC] Implement vmrgew and vmrgow instructions This patch adds support for the vector merge even word and vector merge odd word instructions introduced in POWER8. Phabricator review: http://reviews.llvm.org/D10704 llvm-svn: 240650	2015-06-25 15:17:40 +00:00
Rafael Espindola	d5d60f7168	Improve the --expand-relocs handling of MachO. In a relocation target can take 3 basic forms * A r_value in scattered relocations. * A symbol in external relocations. * A section is non-external relocations. Have the dump reflect that. With this change we go from CHECK-NEXT: Extern: 0 CHECK-NEXT: Type: X86_64_RELOC_SUBTRACTOR (5) CHECK-NEXT: Symbol: 0x2 CHECK-NEXT: Scattered: 0 To just // CHECK-NEXT: Type: X86_64_RELOC_SUBTRACTOR (5) // CHECK-NEXT: Section: __data (2) Since the relocation is with a section, we print the seciton name and don't need to say that it is not scattered or external. Someone motivated can add further special cases for things like ARM64_RELOC_ADDEND and ARM_RELOC_PAIR. llvm-svn: 240073	2015-06-18 22:38:20 +00:00
Rafael Espindola	4cb02a8fb5	Use --expand-relocs in a test. It will make the next change easier to read. llvm-svn: 240053	2015-06-18 20:57:35 +00:00
David Majnemer	c8b1f095a3	Move the personality function from LandingPadInst to Function The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940	2015-06-17 20:52:32 +00:00
Kit Barton	12e4595b58	Properly handle the mftb instruction. The mftb instruction was incorrectly marked as deprecated in the PPC Backend. Instead, it should not be treated as deprecated, but rather be implemented using the mfspr instruction. A similar patch was put into GCC last year. Details can be found at: https://sourceware.org/ml/binutils/2014-11/msg00383.html. This change will replace instances of the mftb instruction with the mfspr instruction for all CPUs except 601 and pwr3. This will also be the default behaviour. Additional details can be found in: https://llvm.org/bugs/show_bug.cgi?id=23680 Phabricator review: http://reviews.llvm.org/D10419 llvm-svn: 239827	2015-06-16 16:01:15 +00:00
Nemanja Ivanovic	d737ff7af7	LLVM support for vector quad bit permute and gather instructions through builtins This patch corresponds to review: http://reviews.llvm.org/D10096 This is the back end portion of the patch related to D10095. The patch adds the instructions and back end intrinsics for: vbpermq vgbbd llvm-svn: 239505	2015-06-11 06:21:25 +00:00
Nemanja Ivanovic	d9fdd9c5e6	Add support for VSX FMA single-precision instructions to the PPC back end This patch corresponds to review: http://reviews.llvm.org/D9941 It adds the various FMA instructions introduced in the version 2.07 of the ISA along with the testing for them. These are operations on single precision scalar values in VSX registers. llvm-svn: 238578	2015-05-29 17:13:25 +00:00
Kit Barton	eb147c8fbd	This patch adds support for the vector quadword add/sub instructions introduced in POWER8: vadduqm vaddeuqm vaddcuq vaddecuq vsubuqm vsubeuqm vsubcuq vsubecuq In addition to adding the instructions themselves, it also adds support for the v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and IntrinsicEmitter.cpp). http://reviews.llvm.org/D9081 llvm-svn: 238144	2015-05-25 15:49:26 +00:00
Hal Finkel	ef674a05e8	[PowerPC] Fix fast-isel when compare is split from branch When the compare feeding a branch was in a different BB from the branch, we'd try to "regenerate" the compare in the block with the branch, possibly trying to make use of values not available there. Copy a page from AArch64's play book here to fix the problem (at least in terms of correctness). Fixes PR23640. llvm-svn: 238097	2015-05-23 12:18:10 +00:00
Bill Schmidt	39c88f95dd	[PPC64] Handle vpkudum mask pattern correctly when vpkudum isn't available My recent patch to add support for ISA 2.07 vector pack/unpack instructions didn't properly check for availability of the vpkudum instruction when recognizing it as a special vector shuffle case. This causes us to leave the vector shuffle in place (rather than converting it to a vector permute) so that it can be recognized later as a vpkudum, but that pattern is invalid for processors prior to POWER8. Thus LLVM crashes with an "unable to select" message. We observed this since one of our buildbots is configured to generate code for a POWER7. This patch fixes the problem by checking for availability of the vpkudum instruction during custom lowering of vector shuffles. I've added a test case variant for the vpkudum pattern when the instruction isn't available. llvm-svn: 237952	2015-05-21 20:48:49 +00:00
Nemanja Ivanovic	78592ebe3a	Add support for VSX scalar single-precision arithmetic in the PPC target http://reviews.llvm.org/D9891 Following up on the VSX single precision loads and stores added earlier, this adds support for elementary arithmetic operations on single precision values in VSX registers. These instructions utilize the new VSSRC register class. Instructions added: xsaddsp xsdivsp xsmulsp xsresp xsrsqrtesp xssqrtsp xssubsp llvm-svn: 237937	2015-05-21 19:32:49 +00:00
Hal Finkel	d380588a20	[PowerPC] Add extra r2 read deps on @toc@l relocations If some commits are happy, and some commits are sad, this is a sad commit. It is sad because it restricts instruction scheduling to work around a binutils linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was updated not to produce code triggering this bug, and now we'll do the same... When resolving an address using the ELF ABI TOC pointer, two relocations are generally required: one for the high part and one for the low part. Only the high part generally explicitly depends on r2 (the TOC pointer). And, so, we might produce code like this: .Ltmp526: addis 3, 2, .LC12@toc@ha .Ltmp1628: std 2, 40(1) ld 5, 0(27) ld 2, 8(27) ld 11, 16(27) ld 3, .LC12@toc@l(3) rldicl 4, 4, 0, 32 mtctr 5 bctrl ld 2, 40(1) And there is nothing wrong with this code, as such, but there is a linker bug in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will misoptimize this code sequence to this: nop std r2,40(r1) ld r5,0(r27) ld r2,8(r27) ld r11,16(r27) ld r3,-32472(r2) clrldi r4,r4,32 mtctr r5 bctrl ld r2,40(r1) because the linker does not know (and does not check) that the value in r2 changed in between the instruction using the .LC12@toc@ha (TOC-relative) relocation and the instruction using the .LC12@toc@l(3) relocation. Because it finds these instructions using the relocations (and not by scanning the instructions), it has been asserted that there is no good way to detect the change of r2 in between. As a result, this bug may never be fixed (i.e. it may become part of the definition of the ABI). GCC was updated to add extra dependencies on r2 to instructions using the @toc@l relocations to avoid this problem, and we'll do the same here. This is done as a separate pass because: 1. These extra r2 dependencies are not really properties of the instructions, but rather due to a linker bug, and maybe one day we'll be able to get rid of them when targeting linkers without this bug (and, thus, keeping the logic centralized here will make that straightforward). 2. There are ISel-level peephole optimizations that propagate the @toc@l relocations to some user instructions, and so the exta dependencies do not apply only to a fixed set of instructions (without undesirable definition replication). The test case was reduced with the help of bugpoint, with minimal cleaning. I'm looking forward to our upcoming MI serialization support, and with that, much better tests can be created. llvm-svn: 237556	2015-05-18 06:25:59 +00:00
Bill Schmidt	d115a77d30	[PPC64] Add vector pack/unpack support from ISA 2.07 This patch adds support for the following new instructions in the Power ISA 2.07: vpksdss vpksdus vpkudus vpkudum vupkhsw vupklsw These instructions are available through the vec_packs, vec_packsu, vec_unpackh, and vec_unpackl built-in interfaces. These are lane-sensitive instructions, so the built-ins have different implementations for big- and little-endian, and the instructions must be marked as killing the vector swap optimization for now. The first three instructions perform saturating pack operations. The fourth performs a modulo pack operation, which means it can be represented with a vector shuffle, and conversely the appropriate vector shuffles may cause this instruction to be generated. The other instructions are only generated via built-in support for now. Appropriate tests have been added. There is a companion patch to clang for the rest of this support. llvm-svn: 237499	2015-05-16 01:02:12 +00:00
James Y Knight	f2154471fd	Fix test added in r236850 for OSX builders. Need to specify triple so that llvm emits the asm syntax that the test expected. llvm-svn: 236855	2015-05-08 14:04:54 +00:00
James Y Knight	7d10114335	Fix alignment checks in MergeConsecutiveStores. 1) check whether the alignment of the memory is sufficient for the merged store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first might have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850	2015-05-08 13:47:01 +00:00
Nemanja Ivanovic	5c7f6e9cdc	Add VSX Scalar loads and stores to the PPC back end This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755	2015-05-07 18:24:05 +00:00
Bill Schmidt	1b44da961a	[PPC64LE] Adjust vector splats during VSX swap optimization The initial code drop for VSX swap optimization permitted the optimization only when all operations in a web of related computation are lane-insensitive. For some lane-sensitive operations, we can still permit the optimization provided that we make adjustments to those operations. This patch adds special handling for vector splats so that their presence doesn't kill the optimization. Vector splats are lane-sensitive since they identify by number a vector element to be used as the source of a splat. When swap optimizations take place, the desired vector element will move to the opposite doubleword of the quadword vector. We thus replace the index I by (I + N/2) % N, where N is the number of elements in the vector. A new test case is added to test that swap optimization succeeds when vector splats are present, and that the proper input element is used as the source of the splat. An ancillary change removes SH_BUILDVEC as one of the kinds of special handling that may be required by VSX swap optimization. From experience with GCC, I had expected to need some modifications for vector build operations, but I did not find that to be the case. llvm-svn: 236606	2015-05-06 15:40:46 +00:00
Kit Barton	4fec877662	This patch adds ABI support for v1i128 data type. It adds v1i128 to the appropriate register classes and checks parameter passing and return values. This is related to http://reviews.llvm.org/D9081, which will add instructions that exploit the v1i128 datatype. Phabricator review: http://reviews.llvm.org/D9475 llvm-svn: 236503	2015-05-05 16:10:44 +00:00
Duncan P. N. Exon Smith	09b5c9c24d	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Bill Schmidt	6661e2ddb2	[PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910	2015-04-27 19:57:34 +00:00
Hal Finkel	acf4a0f1ca	[PowerPC] Use sync inst alias when printing So long as the choice between printing msync and sync is not ambiguous, we can print 'sync 0' and just 'sync'. llvm-svn: 235663	2015-04-23 23:05:08 +00:00
Hal Finkel	4e2def1840	[PowerPC] Enable printing instructions using aliases TableGen had been nicely generating code to print a number of instructions using shorter aliases (and PowerPC has plenty of short mnemonics), but we were not calling it. For some of the aliases we support in the parser, TableGen can't infer the "inverse" alias relationship, so there is still more to do. Thus, after some hours of updating test cases... llvm-svn: 235616	2015-04-23 18:30:38 +00:00
Hans Wennborg	8823c80ce0	Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) Third time's the charm. The previous commit was reverted as a reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--' on an iterator at the beginning of a vector, causing asserts when using debugging iterators. This commit fixes that. llvm-svn: 235608	2015-04-23 16:45:24 +00:00
Aaron Ballman	be6ee771e3	Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range. llvm-svn: 235597	2015-04-23 13:41:59 +00:00
Hans Wennborg	d4bc2d86b6	Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) This is a re-commit of r235101, which also fixes the problems with the previous patch: - Switches with only a default case and non-fallthrough were handled incorrectly - The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here. > This is a major rewrite of the SelectionDAG switch lowering. The previous code > would lower switches as a binary tre, discovering clusters of cases > suitable for lowering by jump tables or bit tests as it went along. To increase > the likelihood of finding jump tables, the binary tree pivot was selected to > maximize case density on both sides of the pivot. > > By not selecting the pivot in the middle, the binary trees would not always > be balanced, leading to performance problems in the generated code. > > This patch rewrites the lowering to search for clusters of cases > suitable for jump tables or bit tests first, and then builds the binary > tree around those clusters. This way, the binary tree will always be balanced. > > This has the added benefit of decoupling the different aspects of the lowering: > tree building and jump table or bit tests finding are now easier to tweak > separately. > > For example, this will enable us to balance the tree based on profile info > in the future. > > The algorithm for finding jump tables is quadratic, whereas the previous algorithm > was O(n log n) for common cases, and quadratic only in the worst-case. This > doesn't seem to be major problem in practice, e.g. compiling a file consisting > of a 10k-case switch was only 30% slower, and such large switches should be rare > in practice. Compiling e.g. gcc.c showed no compile-time difference. If this > does turn out to be a problem, we could limit the search space of the algorithm. > > This commit also disables all optimizations during switch lowering in -O0. > > Differential Revision: http://reviews.llvm.org/D8649 llvm-svn: 235560	2015-04-22 23:14:56 +00:00
Hal Finkel	df195a3bea	[DAGCombine] Disable select(c, load,load) for indexed loads This turned up after r235333, but was a pre-existing bug. The optimization which transforms select(c, load, load) into a load of a select of the addresses does not handle indexed loads (pre/post inc/dec). However, it did not check for them either, leading to a crash if it tried to transform one of them. llvm-svn: 235497	2015-04-22 11:32:25 +00:00
Olivier Sallenave	f748efd9fd	Refactoring and enhancement to FMA combine. llvm-svn: 235344	2015-04-20 20:29:40 +00:00
Hal Finkel	40b204ea4b	[InlineAsm] Remove EarlyClobber on registers that are also inputs When an inline asm call has an output register marked as early-clobber, but that same register is also an input operand, what should we do? GCC accepts this, and is documented to accept this for read/write operands saying, "Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it's used." For write-only operands, the situation seems less clear, but I have at least one existing codebase that assumes this will work, in part because it has syscall macros like this: ({ \ register uint64_t r0 __asm__ ("r0") = (__NR_ ## name); \ register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0)); \ register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1)); \ register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2)); \ __asm__ __volatile__ \ ("sc" \ : "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5) \ : "0"(r0), "1"(r3), "2"(r4), "3"(r5) \ : "r6","r7","r8","r9","r10","r11","r12","cr0","memory"); \ r3; \ }) Furthermore, with register aliases and subregister relationships that only the backend knows about, rejecting this in the frontend seems like a difficult proposition (if we wanted to do so). However, keeping the early-clobber flag on the INLINEASM MI does not work for us, because it will cause the register's live interval to end to soon (so it will not appear defined to be used as an input). Fortunately, fixing this does not seem hard: When forming the INLINEASM MI, check to see if any of the early-clobber outputs are also inputs, and if so, remove the early-clobber flag. llvm-svn: 235283	2015-04-20 00:01:30 +00:00
David Blaikie	dfadb4e9ee	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=\|:\|^\|\s)call\s(?:[^@]?))(\s$\|\s(?:(?:\[\[[a-zA-Z0-9_]+\]\]\|[@%](?:(")?[\\\?@a-zA-Z0-9_.]?(?(3)"\|)\|{{.}}))(?:$\|$)\|undef\|inttoptr\|bitcast\|null\|asm).$)') addrspace_end = re.compile(r"addrspace\(\d+$\s\$") func_end = re.compile("(?:void.\|\)\s)\$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145	2015-04-16 23:24:18 +00:00
Hans Wennborg	ff837f8fc0	Revert the switch lowering change (r235101, r235103, r235106) Looks like it broke the sanitizer-ppc64-linux1 build. Reverting for now. llvm-svn: 235108	2015-04-16 15:43:26 +00:00
Hans Wennborg	bc33cd14d7	Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) This is a major rewrite of the SelectionDAG switch lowering. The previous code would lower switches as a binary tre, discovering clusters of cases suitable for lowering by jump tables or bit tests as it went along. To increase the likelihood of finding jump tables, the binary tree pivot was selected to maximize case density on both sides of the pivot. By not selecting the pivot in the middle, the binary trees would not always be balanced, leading to performance problems in the generated code. This patch rewrites the lowering to search for clusters of cases suitable for jump tables or bit tests first, and then builds the binary tree around those clusters. This way, the binary tree will always be balanced. This has the added benefit of decoupling the different aspects of the lowering: tree building and jump table or bit tests finding are now easier to tweak separately. For example, this will enable us to balance the tree based on profile info in the future. The algorithm for finding jump tables is O(n^2), whereas the previous algorithm was O(n log n) for common cases, and quadratic only in the worst-case. This doesn't seem to be major problem in practice, e.g. compiling a file consisting of a 10k-case switch was only 30% slower, and such large switches should be rare in practice. Compiling e.g. gcc.c showed no compile-time difference. If this does turn out to be a problem, we could limit the search space of the algorithm. This commit also disables all optimizations during switch lowering in -O0. Differential Revision: http://reviews.llvm.org/D8649 llvm-svn: 235101	2015-04-16 14:49:23 +00:00
Rafael Espindola	518baef93e	Update tests to not be as dependent on section numbers. Many of these predate llvm-readobj. With elf-dump we had to match a relocation to symbol number and symbol number to symbol name or section number. llvm-svn: 235015	2015-04-15 15:59:37 +00:00
Hal Finkel	41bf2cfa3a	[PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrep When I fixed these a couple of days ago to iterate over all loops, not just depth == 1 loops, I inadvertently made it such that we'd only look at the first top-level loop. Make sure that we really look at all of them. llvm-svn: 234705	2015-04-12 17:18:56 +00:00
Hal Finkel	cf806a6531	[PowerPC] Disable part-word atomics on the P7 As it turns out, even though these are part of ISA 2.06, the P7 does not support them (or, at least, not any P7s we're tested so far). llvm-svn: 234686	2015-04-11 13:40:36 +00:00
Nemanja Ivanovic	9d5491e285	Add direct moves to/from VSR and exploit them for FP/INT conversions This patch corresponds to review: http://reviews.llvm.org/D8928 It adds direct move instructions to/from VSX registers to GPR's. These are exploited for FP <-> INT conversions. llvm-svn: 234682	2015-04-11 10:40:42 +00:00
Hal Finkel	47533e9dd7	[PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops This pass had the same problem as the data-prefetching pass: it was only checking for depth == 1 loops in practice. Fix that, add some debugging statements, and make sure that, when we grab an AddRec, it is for the loop we expect. llvm-svn: 234670	2015-04-11 00:33:08 +00:00
Hal Finkel	bc9c7308a0	[PowerPC] Prefetching should also consider depth > 1 loops Iterating over loops from the LoopInfo instance only provides top-level loops. We need to search the whole tree of loops to find the inner ones. llvm-svn: 234603	2015-04-10 15:05:02 +00:00
Hal Finkel	79f36597b9	[PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores When we have an instruction for this (and, thus, don't generate a runtime call), we need to custom type legalize this (in a trivial way, just as we do for fp_to_sint). Fixes PR23173. llvm-svn: 234561	2015-04-10 03:39:00 +00:00
Nemanja Ivanovic	5c0e16778c	Add LLVM support for remaining integer divide and permute instructions from ISA 2.06 This is the patch corresponding to review: http://reviews.llvm.org/D8406 It adds some missing instructions from ISA 2.06 to the PPC back end. llvm-svn: 234546	2015-04-09 23:54:37 +00:00
Rafael Espindola	2d125b4495	Revert "Refactoring and enhancement to FMA combine." This reverts commit r234513. It was failing on the bots. llvm-svn: 234518	2015-04-09 18:29:32 +00:00
Olivier Sallenave	e19fcb1120	Refactoring and enhancement to FMA combine. llvm-svn: 234513	2015-04-09 17:55:26 +00:00
Eric Christopher	97beb00b59	Strip trailing whitespace and reword explanatory comment. llvm-svn: 234078	2015-04-04 02:26:47 +00:00
Bill Schmidt	3f570f996a	[PowerPC] Enable splat generation for BUILD_VECTOR with little endian When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR nodes for little endian because wrong results were produced. I've subsequently investigated and found this is due to a call to BuildVectorSDNode::isConstantSplat that was always specifying big-endian. With this changed to correctly identify the target endianness, the optimizations work as expected. I found another case of a call to the same method with big-endian hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was an orphaned method with no callers, so I've just removed it. The existing test/CodeGen/PowerPC/vec_constants.ll checks these optimizations, so for testing I've just added a variant for little endian. llvm-svn: 234011	2015-04-03 13:48:24 +00:00
Simon Pilgrim	335a565d46	[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed. Differential Revision: http://reviews.llvm.org/D8516 llvm-svn: 234004	2015-04-03 10:02:21 +00:00
Hal Finkel	7bfde8133e	[PowerPC] FastISel can't handle i1 return values when using CR bits Under normal circumstances, use of CR bits is disabled when running at -O0, but it is enabled by default otherwise, and if you have optnone functions, they'll still generally be generated with crbits turned on (because nothing else turns them off). FastISel can't handle most things dealing with i1 values when using CR bits, and checks for that, but was not checking the return type on functions; we can't fast-isel function calls with i1 return values either when using CR bits for boolean values. Fixes PR22664. llvm-svn: 233775	2015-04-01 00:40:48 +00:00
Hal Finkel	fcde972bd3	[PowerPC] Don't use a vector preferred memory type at -O0 Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic is a memset/memcpy/etc. we might normally use vector types. At -O0, this is probably not a good idea (because, if there is a bug in the lowering code, there would be no good way to turn it off). At -O0, only use scalar preferred types. Related to PR22754. llvm-svn: 233755	2015-03-31 20:56:09 +00:00
Hal Finkel	60a07f273f	[SDAG] Handle non-integer preferred memset types for non-constant values The existing code in getMemsetValue only handled integer-preferred types when the fill value was not a constant. Make this more robust in two ways: 1. If the preferred type is a floating-point value, do the mul-splat trick on the corresponding integer type and then bitcast. 2. If the preferred type is a vector, do the mul-splat trick on one vector element, and then build a vector out of them. Fixes PR22754 (although, we should also turn off use of vector types at -O0). llvm-svn: 233749	2015-03-31 20:35:26 +00:00
Duncan P. N. Exon Smith	4d29d309de	DebugInfo: Fix bad debug info for compile units and types Fix debug info in these tests, which started failing with a WIP patch to verify compile units and types. The problems look like they were all caused by bitrot. They fell into these categories: - Using `!{i32 0}` instead of `!{}`. - Using `!{null}` instead of `!{}`. - Using `!MDExpression()` instead of `!{}`. - Using `!8` instead of `!{!8}`. - `file:` references that pointed at `MDCompileUnit`s instead of the same `MDFile` as the compile unit. - `file:` references that were numerically off-by-one or (off-by-ten). llvm-svn: 233415	2015-03-27 20:46:33 +00:00
Andrew Trick	75ca61e404	Complete the MachineScheduler fix made way back in r210390. "Fix the MachineScheduler's logic for updating ready times for in-order. Now the scheduler updates a node's ready time as soon as it is scheduled, before releasing dependent nodes." This fix was only made in one variant of the ScheduleDAGMI driver. Francois de Ferriere reported the issue in the other bit of code where it was also needed. I never got around to coming up with a test case, but it's an obvious fix that shouldn't be delayed any longer. I'll try to refactor this code a little better. I did verify performance on a wide variety of targets and saw no negative impact with this fix. llvm-svn: 233366	2015-03-27 06:10:13 +00:00
Eric Christopher	020f333161	Testcase for r233239. llvm-svn: 233240	2015-03-26 00:57:33 +00:00
Kit Barton	d0dd6e5750	Add Hardware Transactional Memory (HTM) Support This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204	2015-03-25 19:36:23 +00:00
Hal Finkel	70188cc8ca	[SDAG] Don't widen VSETCC during type legalization for split operands Because the operands of a vector SETCC node can be of a different type from the result (and often are), it can happen that even if we'd prefer to widen the result type of the SETCC, the operands have been split instead. In this case, the SETCC result also must be split. This mirrors what is done in WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not widened the following calls to GetWidenedVector will assert (which is what was happening in the test case). llvm-svn: 232935	2015-03-23 08:22:43 +00:00
Eric Christopher	496632fcd2	Remove the bare getSubtargetImpl call from the PPC port. As part of this add a test that shows we can generate code with for functions that differ by subtarget feature. llvm-svn: 232882	2015-03-21 03:36:02 +00:00
Owen Anderson	480ad2b319	Fix a nasty bug in DAGCombine of STORE nodes. This is very related to the bug fixed in r174431. The problem is that SelectionDAG does not include alignment in the uniquing of loads and stores. When an otherwise no-op DAGCombine would increase the alignment of a load or store, the original node would be returned (with the alignment increased), which would cause the node not to be processed by any further DAGCombines. I don't have a direct testcase for this that manifests on an in-tree target, but I did see some noise in the tests for other targets and have updated them for it. llvm-svn: 232780	2015-03-19 22:48:57 +00:00
Rafael Espindola	db1aacd9f9	Note that we don't support COFF on PPC. Should bring back the windows bots. llvm-svn: 232701	2015-03-19 02:40:56 +00:00
Samuel Antao	f17109a668	Fix R0 use in PowerPC VSX store for FastIsel. The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address. llvm-svn: 232486	2015-03-17 15:00:57 +00:00
Rafael Espindola	ef1de8209d	Use createTempSymbol to avoid collisions instead of an ad hoc method. llvm-svn: 232483	2015-03-17 14:50:32 +00:00
Ahmed Bougacha	b8ea46fe37	Add a bunch of CHECK missing colons in tests. NFC. Some wouldn't pass; fixed most, the rest will be fixed separately. llvm-svn: 232239	2015-03-14 01:43:57 +00:00
David Blaikie	3ea2df7c7b	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184	2015-03-13 18:20:45 +00:00
Nemanja Ivanovic	54958e2a25	Add support for part-word atomics for PPC http://reviews.llvm.org/D8090#inline-67337 llvm-svn: 231843	2015-03-10 20:51:07 +00:00
Kit Barton	f514e1c5fc	Change the generation of the vmuluwm instruction to be based on the MUL opcode. Phabricator review: http://reviews.llvm.org/D8185 llvm-svn: 231827	2015-03-10 19:49:38 +00:00
Rafael Espindola	0acd6bd828	Use the correct func begin symbol in all places in ppc. I missed an occurrence of the old symbol in my previous patch. llvm-svn: 231398	2015-03-05 19:47:50 +00:00
Rafael Espindola	22f2e68861	Use the generic Lfunc_begin label on ppc. This removes yet another custom label to mark the start of a function. llvm-svn: 231390	2015-03-05 18:55:50 +00:00
Kit Barton	524562d9b6	While reviewing the changes to Clang to add builtin support for the vsld, vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics. llvm-svn: 231378	2015-03-05 16:24:38 +00:00
Nemanja Ivanovic	38e13136f3	Add LLVM support for PPC cryptography builtins Review: http://reviews.llvm.org/D7955 llvm-svn: 231285	2015-03-04 20:44:33 +00:00
Rafael Espindola	eb2f517737	Use the vanilla func_end symbol for .size. No need to create yet another temp symbol. llvm-svn: 231198	2015-03-04 01:35:23 +00:00
Kit Barton	2e98937142	Add the following 64-bit vector integer arithmetic instructions added in POWER8: vaddudm vsubudm vmulesw vmulosw vmuleuw vmulouw vmuluwm vmaxsd vmaxud vminsd vminud vcmpequd vcmpequd. vcmpgtsd vcmpgtsd. vcmpgtud vcmpgtud. vrld vsld vsrd vsrad Phabricator review: http://reviews.llvm.org/D7959 llvm-svn: 231115	2015-03-03 19:55:45 +00:00
Duncan P. N. Exon Smith	8d1b74869c	DebugInfo: Move new hierarchy into place Move the specialized metadata nodes for the new debug info hierarchy into place, finishing off PR22464. I've done bootstraps (and all that) and I'm confident this commit is NFC as far as DWARF output is concerned. Let me know if I'm wrong :). The code changes are fairly mechanical: - Bumped the "Debug Info Version". - `DIBuilder` now creates the appropriate subclass of `MDNode`. - Subclasses of DIDescriptor now expect to hold their "MD" counterparts (e.g., `DIBasicType` expects `MDBasicType`). - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp` for printing comments. - Big update to LangRef to describe the nodes in the new hierarchy. Feel free to make it better. Testcase changes are enormous. There's an accompanying clang commit on its way. If you have out-of-tree debug info testcases, I just broke your build. - `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to update all the IR testcases. - Unfortunately I failed to find way to script the updates to CHECK lines, so I updated all of these by hand. This was fairly painful, since the old CHECKs are difficult to reason about. That's one of the benefits of the new hierarchy. This work isn't quite finished, BTW. The `DIDescriptor` subclasses are almost empty wrappers, but not quite: they still have loose casting checks (see the `RETURN_FROM_RAW()` macro). Once they're completely gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I also expect to make a few schema changes now that it's easier to reason about everything. llvm-svn: 231082	2015-03-03 17:24:31 +00:00
Bill Schmidt	06671b00a3	Regenerated test case from pr 230801 for change in LLVM IR syntax llvm-svn: 230811	2015-02-27 23:29:57 +00:00
Bill Schmidt	57e433433f	Revert test case until it can be fixed llvm-svn: 230803	2015-02-27 22:31:14 +00:00
Bill Schmidt	1d4d434e1c	[PowerPC] Fix PR22711 - Misaligned .toc section Straightforward patch to emit an alignment directive when emitting a TOC entry. The test case was generated from the test in PR22711 that demonstrated a misaligned .toc section. The object code is run through llvm-readobj to verify that the correct alignment has been applied to the .toc section. Thanks to Ulrich Weigand for running down where the fix was needed. llvm-svn: 230801	2015-02-27 22:14:10 +00:00
David Blaikie	ab043ff680	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Hal Finkel	979cca3be8	[PowerPC] Use vector types for memcpy and friends (sometimes) When using Altivec, we can use vector loads and stores for aligned memcpy and friends. Starting with the P7 and VXS, we have reasonable unaligned vector stores. Starting with the P8, we have fast unaligned loads too. For QPX, we use vector loads are stores, but only for aligned memory accesses. llvm-svn: 230788	2015-02-27 19:58:28 +00:00

... 3 4 5 6 7 ...

1413 Commits