llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00

Author	SHA1	Message	Date
Eric Christopher	bce38d60f8	Now that the optimization level is adjusting the feature string before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817	2014-10-01 21:05:35 +00:00
Eric Christopher	4ce55b7a5c	Rework the PPC TargetMachine so that the non-function specific overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805	2014-10-01 20:38:26 +00:00
Sanjay Patel	c095454ca2	Split the estimate() interface into separate functions for each type. NFC. It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698	2014-09-30 20:28:48 +00:00
Sanjay Patel	98a98574c5	Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553	2014-09-26 23:01:47 +00:00
Robin Morisset	64053dff5f	[Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 llvm-svn: 218331	2014-09-23 20:46:49 +00:00
Lang Hames	719ff0ff4f	[MCJIT] Remove PPCRelocations.h - it's no longer used. This was overlooked in r218320, which removed the relocation headers for other targets. Thanks to Ulrich Weigand for catching it. llvm-svn: 218327	2014-09-23 19:17:48 +00:00
Sanjay Patel	1235f854b3	Refactor reciprocal square root estimate into target-independent function; NFC. This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219	2014-09-21 15:19:15 +00:00
Hal Finkel	0c0c256ad7	Optionally enable more-aggressive FMA formation in DAGCombine The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120	2014-09-19 11:42:56 +00:00
Aaron Ballman	c9d2119dc2	Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required. llvm-svn: 218062	2014-09-18 17:34:23 +00:00
Aaron Ballman	2e4b3f3dca	Fixing a bunch of -Woverloaded-virtual warnings due to hiding getSubtargetImpl from the base class. NFC. llvm-svn: 218050	2014-09-18 13:27:14 +00:00
Eric Christopher	c75fbbac7c	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
Samuel Antao	ec112df870	Fix FastISel bug in boolean returns for PowerPC. For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. llvm-svn: 217993	2014-09-17 23:25:06 +00:00
Samuel Antao	7871b3496a	Remove unnecessary blank space (test commit) llvm-svn: 217991	2014-09-17 22:47:28 +00:00
Bill Schmidt	b47e7e1e1d	Address comments on r217622 llvm-svn: 217680	2014-09-12 14:26:36 +00:00
Bill Schmidt	020f302f01	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622	2014-09-11 20:10:03 +00:00
Sanjay Patel	8030ed3639	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Craig Topper	743e490f28	Use cast to MVT instead of EVT on a couple calls to getSizeInBits. llvm-svn: 217473	2014-09-10 04:51:36 +00:00
Juergen Ributzka	76dd2e3da7	[FastISel][tblgen] Rename tblgen generated FastISel functions. NFC. This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075	2014-09-03 20:56:59 +00:00
Juergen Ributzka	fa7bc008ce	[FastISel] Rename public visible FastISel functions. NFC. This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074	2014-09-03 20:56:52 +00:00
Eric Christopher	e1f21228eb	Remove resetSubtargetFeatures as it is unused. llvm-svn: 217071	2014-09-03 20:36:31 +00:00
Benjamin Kramer	e991977346	Add override to overriden virtual methods, remove virtual keywords. No functionality change. Changes made by clang-tidy + some manual cleanup. llvm-svn: 217028	2014-09-03 11:41:21 +00:00
Eric Christopher	2f6f860aaa	Reinstate "Nuke the old JIT." Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982	2014-09-02 22:28:02 +00:00
Alexey Samsonov	2724968a53	Fix signed integer overflow in PPCInstPrinter. This bug was reported by UBSan. llvm-svn: 216917	2014-09-02 17:38:34 +00:00
Hal Finkel	46b8c3f54e	[PowerPC] Guard against illegal selection of add for TargetConstant operands r208640 was reverted because it caused a self-hosting failure on ppc64. The underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant operands. Because we have no patterns for 'add' taking 'timm' nodes, these are selected as r+r add instructions (which is a miscompile). Guard against this kind of behavior in the future by making the backend crash should this occur (instead of silently generating invalid output). llvm-svn: 216897	2014-09-02 06:23:54 +00:00
Craig Topper	57c93cf3ef	Remove 'virtual' keyword from methods markedwith 'override' keyword. llvm-svn: 216823	2014-08-30 16:48:34 +00:00
Justin Hibbits	fd1faff02b	Test commit. Fix whitespace from a previous patch of mine. llvm-svn: 216650	2014-08-28 04:40:55 +00:00
Karthik Bhat	d94045aa5a	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Hal Finkel	7014e3d50f	[PowerPC] Add support for dcbtst and icbt (prefetch) Adds code generation support for dcbtst (data cache prefetch for write) and icbt (instruction cache prefetch for read - Book E cores only). We still end up with a 'cannot select' error for the non-supported prefetch intrinsic forms. This will be fixed in a later commit. Fixes PR20692. llvm-svn: 216339	2014-08-23 23:21:04 +00:00
Sanjay Patel	7498546daf	name change: isPow2DivCheap -> isPow2SDivCheap isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237	2014-08-21 22:31:48 +00:00
Tim Northover	9127b613b1	TableGen: allow use of uint64_t for available features mask. ARM in particular is getting dangerously close to exceeding 32 bits worth of possible subtarget features. When this happens, various parts of MC start to fail inexplicably as masks get truncated to "unsigned". Mostly just refactoring at present, and there's probably no way to test. llvm-svn: 215887	2014-08-18 11:49:42 +00:00
Hal Finkel	5f7466abdb	[PowerPC] Mark fixed-offset byvals as pointed-to by IR values A byval object, even if allocated at a fixed offset (prescribed by the ABI) is pointed to by IR values. Most fixed-offset stack objects are not pointed-to by IR values, so the default is to assume this is not possible. However, we need to override the default in this case (instruction scheduling can cause miscompiles otherwise). Fixes PR20280. llvm-svn: 215795	2014-08-16 00:17:05 +00:00
Hal Finkel	519c3f0279	[PowerPC] Darwin byval arguments are not immutable On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's frame, but are not immutable -- the pointer value is directly available to the higher-level code as the address of the argument, and the value of the byval argument can be modified at the IR level. This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in a follow-up commit, its test case will cover this change. llvm-svn: 215793	2014-08-16 00:16:29 +00:00
Rafael Espindola	80b0c0c71c	Remove HasLEB128. We already require CFI, so it should be safe to require .leb128 and .uleb128. llvm-svn: 215712	2014-08-15 14:01:07 +00:00
Benjamin Kramer	9d3803050f	PPC: Clean up pointer casting, no functionality change. Silences GCC's -Wcast-qual. llvm-svn: 215703	2014-08-15 11:05:45 +00:00
Bill Schmidt	c5f23638d5	[PPC64] Add missing dependency on X2 to LDinto_toc. The LDinto_toc pattern has been part of 64-bit PowerPC for a long time, and represents loading from a memory location into the TOC register (X2). However, this pattern doesn't explicitly record that it modifies that register. This patch adds the missing dependency. It was very surprising to me that this has never shown up as a problem in the past, and that we only saw this problem recently in a single scenario when building a self-hosted clang. It turns out that in most cases we have another dependency present that keeps the LDinto_toc instruction tied in place. LDinto_toc is used for TOC restore following a call site, so this is a typical sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there is a natural anti-dependency between the two that keeps it in place. Therefore we don't usually see a problem. However, in one particular case, one call is followed immediately by another call, and the second call requires a parameter that is a TOC-relative address. This is the code sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Note that the back-to-back stack adjustments are the same size! The back end is smart enough to recognize this and optimize them away: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Now there is nothing to prevent the ADDIStocHA instruction from moving ahead of the LDinto_toc instruction, and because of the longest-path heuristic, this is what happens. With the accompanying patch, %X2 is represented as an implicit def: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1, %X2<imp-def,dead> ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 So now when the two stack adjustments are removed, ADDIStocHA is prevented from being moved above LDinto_toc. I have not yet created a test case for this, because the original failure occurs on a relatively large function that needs reduction. However, this is a fairly serious bug, despite its infrequency, and I wanted to get this patch onto the list as soon as possible so that it can be considered for a 3.5 backport. I'll work on whittling down a test case. Have we missed the boat for 3.5 at this point? Thanks, Bill llvm-svn: 215685	2014-08-15 01:25:26 +00:00
Benjamin Kramer	da144ed5a2	Canonicalize header guards into a common format. Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558	2014-08-13 16:26:38 +00:00
Hal Finkel	97fb1d4d91	[PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512	2014-08-13 01:15:40 +00:00
Joerg Sonnenberger	4a2da5b0e2	@l and friends adjust their value depending the context used in. For ori, they are unsigned, for addi, signed. Create a new target expression type to handle this and evaluate Fixups accordingly. llvm-svn: 215315	2014-08-10 12:41:50 +00:00
Joerg Sonnenberger	df2012d0aa	If available, pass down the Fixup object to EvaluateAsRelocatable. At least on PowerPC, the interpretation of certain modifiers depends on the context they appear in. llvm-svn: 215310	2014-08-10 11:35:12 +00:00
Joerg Sonnenberger	f744f70919	Allow the third argument for the subi family to be an expression. llvm-svn: 215286	2014-08-09 17:10:26 +00:00
Joerg Sonnenberger	a632bc8df4	Use the full form of dccci and iccci from the early PPC 405 documents, since the operands are actually used on those cores. Provide aliases for the only documented case in the newer Power ISA speec. llvm-svn: 215282	2014-08-09 13:58:31 +00:00
Eric Christopher	3461eab467	Initialize PPC DataLayout based on the Triple only. llvm-svn: 215281	2014-08-09 04:53:17 +00:00
Eric Christopher	846e6d954c	Remove extraneous 64-bit argument to the PPC TargetMachine constructor and update initialization. llvm-svn: 215280	2014-08-09 04:38:56 +00:00
Joerg Sonnenberger	71ad99879f	Allow large immediates for branch instructions in 32bit mode. llvm-svn: 215240	2014-08-08 20:57:58 +00:00
Joerg Sonnenberger	8e42fa93a6	Provide an implementation of getNoopForMachoTarget for PPC, otherwise empty functions will assert in the MC object writer. llvm-svn: 215238	2014-08-08 19:13:23 +00:00
Joerg Sonnenberger	4df95fac67	Add low-level option for avoiding float stores from va_start until soft-float is properly supported. llvm-svn: 215221	2014-08-08 16:46:10 +00:00
Joerg Sonnenberger	ca56042175	Add support for SPE load/store from memory. llvm-svn: 215220	2014-08-08 16:43:49 +00:00
Eric Christopher	378bc328f0	Temporarily Revert "Nuke the old JIT." as it's not quite ready to be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154	2014-08-07 22:02:54 +00:00
Joerg Sonnenberger	857466a57b	Add the majority of the remaining SPE instructions. llvm-svn: 215131	2014-08-07 18:52:39 +00:00
Rafael Espindola	e9ebbe5559	Nuke the old JIT. I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111	2014-08-07 14:21:18 +00:00
Joerg Sonnenberger	c6357b6edc	Add mfasr and mtasr llvm-svn: 215110	2014-08-07 13:35:34 +00:00
Joerg Sonnenberger	57c4e076f0	Add mfrtcu and mfrtcl instructions llvm-svn: 215109	2014-08-07 13:16:58 +00:00
Joerg Sonnenberger	91c0c415d7	Support mttbl and mttbu mnemonic llvm-svn: 215108	2014-08-07 13:06:23 +00:00
Joerg Sonnenberger	22c67059fa	Add RFID instruction. llvm-svn: 215105	2014-08-07 12:39:59 +00:00
Joerg Sonnenberger	2b751a0bc6	Fix Itineray class of rfi llvm-svn: 215104	2014-08-07 12:35:16 +00:00
Joerg Sonnenberger	9980cac0b2	Spell e500 feature in lower case. llvm-svn: 215103	2014-08-07 12:31:28 +00:00
Joerg Sonnenberger	d344d15c74	Add first bunch of SPE instructions. As they overlap with Altivec, mark them as parser-only until the disassembler is extended to handle predicates properly. llvm-svn: 215102	2014-08-07 12:18:21 +00:00
Eric Christopher	4a1cdb2ba7	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988	2014-08-06 18:45:26 +00:00
Rafael Espindola	c981388b03	Remove a virtual function from TargetMachine. NFC. llvm-svn: 214929	2014-08-05 22:10:21 +00:00
Bill Schmidt	8159f9047b	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Joerg Sonnenberger	6eb6423316	Add accessors for the PPC 403 bank registers. llvm-svn: 214875	2014-08-05 15:45:15 +00:00
Joerg Sonnenberger	cc8f67da81	Accessors for SSR2 and SSR3 on PPC 403. llvm-svn: 214867	2014-08-05 14:53:05 +00:00
Joerg Sonnenberger	bc6e9b7c12	Add dci/ici instructions for PPC 476 and friends. llvm-svn: 214864	2014-08-05 14:40:32 +00:00
Joerg Sonnenberger	f726eed55e	Add mftblo and mftbhi for PPC 4xx. llvm-svn: 214863	2014-08-05 14:18:16 +00:00
Joerg Sonnenberger	d97414d887	Add lswi / stswi for assembler use with a warning to not add patterns for them. llvm-svn: 214862	2014-08-05 13:34:01 +00:00
Eric Christopher	67c04e77e5	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838	2014-08-05 02:39:49 +00:00
Joerg Sonnenberger	c84fe6e931	Add TCR register access llvm-svn: 214826	2014-08-04 23:53:42 +00:00
Joerg Sonnenberger	cd34d01e29	Add PPC 603's tlbld and tlbli instructions. llvm-svn: 214825	2014-08-04 23:49:45 +00:00
Bill Schmidt	5e2cd3791c	[PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800	2014-08-04 23:21:01 +00:00
Joerg Sonnenberger	b94b28812d	Add simplified aliases for access to DCCR, ICCR, DEAR and ESR llvm-svn: 214797	2014-08-04 22:56:42 +00:00
Joerg Sonnenberger	0f4e0ad62e	tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs. llvm-svn: 214784	2014-08-04 21:28:22 +00:00
Eric Christopher	99307e99a2	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Joerg Sonnenberger	c3da10cbbc	Recognize mftbl as alias for mftb, for symmetry with mttb. llvm-svn: 214769	2014-08-04 20:28:34 +00:00
Joerg Sonnenberger	3be1146503	Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect the systems it represents. llvm-svn: 214755	2014-08-04 18:46:13 +00:00
Joerg Sonnenberger	660877af60	Allow .lcomm with alignment on ELF targets. llvm-svn: 214754	2014-08-04 18:45:10 +00:00
Joerg Sonnenberger	51a8467117	Refactor SPRG instructions. llvm-svn: 214733	2014-08-04 17:26:15 +00:00
Joerg Sonnenberger	f6049406d6	Add support for m[ft][di]bat[ul] instructions. llvm-svn: 214731	2014-08-04 17:07:41 +00:00
Joerg Sonnenberger	b8c7e54901	Add features for PPC 4xx and e500/e500mc instructions. Move the test cases for them into separate files. llvm-svn: 214724	2014-08-04 15:47:38 +00:00
Ulrich Weigand	5df23aacbe	[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718	2014-08-04 13:53:40 +00:00
Ulrich Weigand	bf94969247	[PowerPC] MULHU/MULHS are not legal for vector types I ran into some test failures where common code changed vector division by constant into a multiply-high operation (MULHU). But these are not implemented by the back-end, so we failed to recognize the insn. Fixed by marking MULHU/MULHS as Expand for vector types. llvm-svn: 214716	2014-08-04 13:27:12 +00:00
Ulrich Weigand	32b8ceb243	[PowerPC] Fix and improve vector comparisons This patch refactors code generation of vector comparisons. This fixes a wrong code-gen bug for ISD::SETGE for floating-point types, and improves generated code for vector comparisons in general. Specifically, the patch moves all logic deciding how to implement vector comparisons into getVCmpInst, which gets two extra boolean outputs indicating to its caller whether its needs to swap the input operands and/or negate the result of the comparison. Apart from implementing these two modifications as directed by getVCmpInst, there is no need to ever implement vector comparisons in any other manner; in particular, there is never a need to perform two separate comparisons (e.g. one for equal and one for greater-than, as code used to do before this patch). Reviewed by Bill Schmidt. llvm-svn: 214714	2014-08-04 13:13:57 +00:00
Joerg Sonnenberger	1253396c1a	tlbia support llvm-svn: 214640	2014-08-02 20:16:29 +00:00
Joerg Sonnenberger	e208527e31	mfdcr / mtdcr support llvm-svn: 214639	2014-08-02 20:00:26 +00:00
Joerg Sonnenberger	ca3cb82714	Don't use additional arguments for dss and friends to satisfy DSS_Form, when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636	2014-08-02 15:09:41 +00:00
Eric Christopher	dfc6457da2	Add a non-const subtarget returning function to the target machine so that we can use it to get the old-style JIT out of the subtarget. This code should be removed when the old-style JIT is removed (imminently). llvm-svn: 214560	2014-08-01 21:18:01 +00:00
Ulrich Weigand	9dae728acd	[PowerPC] PR20280 - Slots for byval parameters are not immutable Found by inspection while looking at PR20280: code would mark slots in the parameter save area where a byval parameter is passed as "immutable". This is not correct since code is allowed to modify byval parameters in place in the parameter save area. llvm-svn: 214517	2014-08-01 14:35:58 +00:00
Hal Finkel	f46410e6eb	[PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads Altivec vector loads on PowerPC have an interesting property: They always load from an aligned address (by rounding down the address actually provided if necessary). In order to generate an actual unaligned load, you can generate two load instructions, one with the original address, one offset by one vector length, and use a special permutation to extract the bytes desired. When this was originally implemented, I generated these two loads using regular ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with this: The alignment of a load does not contribute to its identity, and SDNodes are uniqued. So, imagine that we have some unaligned load, L1, that is not aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned). Further imagine that there had already existed a load (L1+16)(unaligned) with the same chain operand as the load L1. When (L1+16)(aligned) is created as part of the lowering of L1, this load is also the (L1+16)(unaligned) node, just now marked as aligned (because the new alignment overwrites the old). But the original users of (L1+16)(unaligned) now get the data intended for the permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists to get its own permutation-based expansion. This was PR19991. A second potential problem has to do with the MMOs on these loads, which can be used by AA during instruction scheduling to break chain-based dependencies. If the new "aligned" loads get the MMO from the original unaligned load, this does not represent the fact that it will load data from below the original address. Normally, this would not matter, but this load might be combined with another load pair for a previous vector, and then the dependency on the otherwise- ignored lower bytes can matter. To fix both problems, instead of generating the necessary loads using regular ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are provided with MMOs with a conservative address range. Unfortunately, I no longer have a failing test case (since PR19991 was reported, other changes in CodeGen have forced this bug back into hiding it again). Nevertheless, this should fix the underlying problem. llvm-svn: 214481	2014-08-01 05:20:41 +00:00
Hal Finkel	3be61a8b81	[PowerPC] Recognize consecutive memory accesses from intrinsics When generating unaligned vector loads, we need to search for other loads or stores nearby offset by one vector width. If we find one, then we know that we can safely generate another aligned load at that address. Otherwise, we must generate the next load using an offset of the vector width minus one byte (so we don't read off the end of the allocation if the base unaligned address happened to be aligned at runtime). We had previously done this using only other vector loads and stores, but did not consider the PowerPC-specific vector load/store intrinsics. Now we'll also consider vector intrinsics. By itself, this change is a feature enhancement, but is a necessary step toward fixing the underlying problem behind PR19991. llvm-svn: 214469	2014-08-01 01:02:01 +00:00
Louis Gerbarg	8048e52537	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449	2014-07-31 21:45:05 +00:00
Joerg Sonnenberger	a5a8f9d13c	Add mtpid/mfpid for BookE. llvm-svn: 214363	2014-07-30 23:59:11 +00:00
Joerg Sonnenberger	3845a9a9b8	Refactor TLBIVAX and add tlbsx. llvm-svn: 214354	2014-07-30 22:51:15 +00:00
Joerg Sonnenberger	acae7c8f64	Add rfdi and rfmci from the e500/e500mc ISA. llvm-svn: 214339	2014-07-30 21:09:03 +00:00
Joerg Sonnenberger	e81540b210	Add BookE's tlbre, tlbwe and tlbivax instructions. llvm-svn: 214332	2014-07-30 20:44:04 +00:00
Joerg Sonnenberger	270c5a66fb	Add BookE's wrtee and wrteei instructions. llvm-svn: 214297	2014-07-30 10:32:51 +00:00
Joerg Sonnenberger	577206124d	SPRG 0 to 3 are valid outside BookE, so move them to the normal test file. Add support for accessing SPRG 4 to 7 on BookE. llvm-svn: 214295	2014-07-30 09:24:37 +00:00
Joerg Sonnenberger	db7b2c7644	Add rfci instruction. llvm-svn: 214256	2014-07-29 23:45:20 +00:00
Joerg Sonnenberger	abd973f309	mbar without argument is equivalent to mbar 0. llvm-svn: 214250	2014-07-29 23:31:27 +00:00
Joerg Sonnenberger	afb8bfcb47	Recognize BookE's mbar instruction. llvm-svn: 214244	2014-07-29 23:16:31 +00:00
Joerg Sonnenberger	5dd0828a9c	Fix typo in alias: DSIR -> DSISR llvm-svn: 214238	2014-07-29 22:42:44 +00:00
Joerg Sonnenberger	d52d4c80b5	Support move to/from segment register. llvm-svn: 214234	2014-07-29 22:21:57 +00:00
Joerg Sonnenberger	f3709585f9	Add a number of aliases for SPR access. llvm-svn: 214196	2014-07-29 18:55:43 +00:00
Joerg Sonnenberger	c3bec0cdc8	Add rfi instruction. Based on feedback by Ulrich Weigand. llvm-svn: 214181	2014-07-29 15:49:09 +00:00
Matt Arsenault	55d94a2290	Fix typos / grammar. llvm-svn: 214147	2014-07-29 00:02:40 +00:00
Ulrich Weigand	37cf88e787	[PowerPC] Support ELFv1/ELFv2 ABI selection via features While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently hard-coded via the target triple: powerpc64-linux is always ELFv1, while powerpc64le-linux is always ELFv2. These are of course the most common scenarios, but in principle it is possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on little-endian systems (and GCC does support that), and there are some special use cases for that (e.g. certain Linux kernel versions could only be built using ELFv1 on LE). This patch implements the LLVM side of supporting this. As precedent on other platforms suggests, ABI options are passed to the back-end as features. Thus, this patch implements two features "elfv1" and "elfv2" that select the desired ABI if present. (If not, the LLVM uses the same default rules as now.) llvm-svn: 214072	2014-07-28 13:09:28 +00:00
Matt Arsenault	76c7b7a591	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Hal Finkel	ba36f6399d	[PowerPC] Support TLS on PPC32/ELF Patch by Justin Hibbits! llvm-svn: 213960	2014-07-25 17:47:22 +00:00
Bill Schmidt	1861624c2e	[PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl. Because the PowerPC vmrgh and vmrgl* instructions have a built-in big-endian bias, it is necessary to swap their inputs in little-endian mode when using them to implement a vector shuffle. This was previously missed in the vector LE implementation. There was already logic to distinguish between unary and "normal" vmrg* vector shuffles, so this patch extends that logic to use a third option: "swapped" vmrg* vector shuffles that are used for little endian in place of the "normal" ones. I've updated the vec-shuffle-le.ll test to check for the expected register ordering on the generated instructions. This bug was discovered when testing the LE and ELFv2 patches for safety if they were backported to 3.4. A different vectorization decision was made in 3.4 than on mainline trunk, and that exposed the problem. I've verified this fix takes care of that issue. llvm-svn: 213915	2014-07-25 01:55:55 +00:00
Joerg Sonnenberger	97f682d3e2	Don't use 128bit functions on PPC32. llvm-svn: 213899	2014-07-24 22:20:10 +00:00
NAKAMURA Takumi	509350ce96	Prune dependency to MC from each target disassembler. llvm-svn: 213856	2014-07-24 11:45:11 +00:00
NAKAMURA Takumi	dbff653a5a	Update library dependencies. llvm-svn: 213832	2014-07-24 02:10:42 +00:00
Ulrich Weigand	eb914f2256	[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs before we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493	2014-07-21 00:13:26 +00:00
Ulrich Weigand	9fcc5caf2d	[PowerPC] ELFv2 explicit CFI for CR fields This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction. Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved. Reviewed by Hal Finkel. llvm-svn: 213492	2014-07-21 00:03:18 +00:00
Ulrich Weigand	fb90fdfb31	[PowerPC] ELFv2 stack space reduction The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack) The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when calling a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch. On the callee side, in LowerFormalArguments, we must track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently always write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an address to implement the byval semantics. To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present. Reviewed by Hal Finkel. llvm-svn: 213490	2014-07-20 23:43:15 +00:00
Ulrich Weigand	41e116ee77	[PowerPC] ELFv2 function call changes This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points if the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be part of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. llvm-svn: 213489	2014-07-20 23:31:44 +00:00
Ulrich Weigand	28c7be6585	[MC] Pass MCSymbolData to needsRelocateWithSymbol As discussed in a previous checking to support the .localentry directive on PowerPC, we need to inspect the actual target symbol in needsRelocateWithSymbol to make the appropriate decision based on that symbol's st_other bits. Currently, needsRelocateWithSymbol does not get the target symbol. However, it is directly available to its sole caller. This patch therefore simply extends the needsRelocateWithSymbol by a new parameter "const MCSymbolData &SD", passes in the target symbol, and updates all derived implementations. In particular, in the PowerPC implementation, this patch removes the FIXME added by the previous checkin. llvm-svn: 213487	2014-07-20 23:15:06 +00:00
Ulrich Weigand	1376d19372	[PowerPC] ELFv2 MC support for .localentry directive A second binutils feature needed to support ELFv2 is the .localentry directive. In the ELFv2 ABI, functions may have two entry points: one for calling the routine locally via "bl", and one for calling the function via function pointer (either at the source level, or implicitly via a PLT stub for global calls). The two entry points share a single ELF symbol, where the ELF symbol address identifies the global entry point address, while the local entry point is found by adding a delta offset to the symbol address. That offset is encoded into three platform-specific bits of the ELF symbol st_other field. The .localentry directive instructs the assembler to set those fields to encode a particular offset. This is typically used by a function prologue sequence like this: func: addis r2, r12, (.TOC.-func)@ha addi r2, r2, (.TOC.-func)@l .localentry func, .-func Note that according to the ABI, when calling the global entry point, r12 must be set to point the global entry point address itself; while when calling the local entry point, r2 must be set to point to the TOC base. The two instructions between the global and local entry point in the above example translate the first requirement into the second. This patch implements support in the PowerPC MC streamers to emit the .localentry directive (both into assembler and ELF object output), as well as support in the assembler parser to parse that directive. In addition, there is another change required in MC fixup/relocation handling to properly deal with relocations targeting function symbols with two entry points: When the target function is known local, the MC layer would immediately handle the fixup by inserting the target address -- this is wrong, since the call may need to go to the local entry point instead. The GNU assembler handles this case by not directly resolving fixups targeting functions with two entry points, but always emits the relocation and relies on the linker to handle this case correctly. This patch changes LLVM MC to do the same (this is done via the processFixupValue routine). Similarly, there are cases where the assembler would normally emit a relocation, but "simplify" it to a relocation targeting a section instead of the actual symbol. For the same reason as above, this may be wrong when the target symbol has two entry points. The GNU assembler again handles this case by not performing this simplification in that case, but leaving the relocation targeting the full symbol, which is then resolved by the linker. This patch changes LLVM MC to do the same (via the needsRelocateWithSymbol routine). NOTE: The method used in this patch is overly pessimistic, since the needsRelocateWithSymbol routine currently does not have access to the actual target symbol, and thus must always assume that it might have two entry points. This will be improved upon by a follow-on patch that modifies common code to pass the target symbol when calling needsRelocateWithSymbol. Reviewed by Hal Finkel. llvm-svn: 213485	2014-07-20 23:06:03 +00:00
Ulrich Weigand	22549677c8	[PowerPC] ELFv2 MC support for .abiversion directive ELFv2 binaries are marked by a bit in the ELF header e_flags field. A new assembler directive .abiversion can be used to set that flag. This patch implements support in the PowerPC MC streamers to emit the .abiversion directive (both into assembler and ELF binary output), as well as support in the assembler parser to parse the .abiversion directive. Reviewed by Hal Finkel. llvm-svn: 213484	2014-07-20 22:56:57 +00:00
Ulrich Weigand	d1393de80e	[PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4 When handling an incoming byval argument, we need to possibly write incoming registers to the stack in order to create an on-stack image of the parameter, so we can return its address to common code. This currently uses CreateFixedObject to access the parts of the parameter save area where the argument is (or needs to be) stored. However, sometimes we need to access multiple parts of that area, e.g. to write multiple registers. The code currently uses a new CreateFixedObject call for each of these accesses, resulting in a patchwork of overlapping (fixed) stack objects. This doesn't really matter in the case of fixed objects, since any access to those turns into a fixed stackpointer + offset address anyway. However, with the upcoming ELFv2 patches, we may actually need to place an incoming argument into our own stack frame instead of the caller's. This means we need to use CreateStackObject instead, and we cannot have multiple overlapping instances of those. To make the rest of the argument handling code work equally in both situations, this patch refactors it to always use just a single call to CreateFixedObject, and access parts of that object as required using address arithmetic. This way, we can in a future patch substitute CreateStackObject without further changes. No change to generated code intended. llvm-svn: 213483	2014-07-20 22:36:52 +00:00
Ulrich Weigand	c274d8aae7	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482	2014-07-20 22:26:40 +00:00
Hal Finkel	006e1d44a6	[PowerPC] 32-bit ELF PIC support This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! llvm-svn: 213427	2014-07-18 23:29:49 +00:00
Sanjay Patel	2f0f025b2b	Move Post RA Scheduling flag bit into SchedMachineModel Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101	2014-07-15 22:39:58 +00:00
NAKAMURA Takumi	365309bb8b	Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt. I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill. llvm-svn: 213064	2014-07-15 11:37:03 +00:00
Ulrich Weigand	1aae06e395	[PowerPC] Fix invalid displacement created by LocalStackAlloc This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.) Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account. This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal. Reviewed by Hal Finkel. llvm-svn: 212832	2014-07-11 17:19:31 +00:00
Ulrich Weigand	1078e34b76	[PowerPC] Implement atomic NAND operations as actual NAND This changes the implementation of atomic NAND operations from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)" (compatible with GCC >= 4.4). This is in line with the common-code and ARM back-end change implemented in r212433. llvm-svn: 212547	2014-07-08 16:16:02 +00:00
Ulrich Weigand	5e8a0b585b	[PowerPC] Fix no-assert build r212476 caused a compile failure (unused variable) in a non-assertion build ... llvm-svn: 212477	2014-07-07 19:39:44 +00:00
Ulrich Weigand	8f2aff0b0c	[PowerPC] Fix "byval align" arguments Arguments passed as "byval align" should get the specified alignment in the parameter save area. There was some code in PPCISelLowering.cpp that attempted to implement this, but this didn't work correctly: while code did update the ArgOffset value, it neglected to update the PtrOff value (which was already computed from the old ArgOffset), and it also neglected to update GPR_idx -- fields skipped due to alignment in the save area must likewise be skipped in GPRs. This patch fixes and simplifies this logic by: - handling argument offset alignment right at the beginning of argument processing, using a new helper routine CalculateStackSlotAlignment (this avoids having to update PtrOff and other derived values later on) - not tracking GPR_idx separately, but always computing the correct GPR_idx for each argument from its ArgOffset - removing some redundant computation in LowerFormalArguments: MinReservedArea must equal ArgOffset after argument processing, so there's no use in computing it twice. [This doesn't change the behavior of the current clang front-end, since that never creates "byval align" arguments at the moment. This will change with a follow-on patch, however.] llvm-svn: 212476	2014-07-07 19:26:41 +00:00
Juergen Ributzka	241f08ad4e	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI. The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135	2014-07-01 22:01:54 +00:00
Craig Topper	4c15d35f50	Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to simplify some code. llvm-svn: 211993	2014-06-29 00:40:57 +00:00
Ulrich Weigand	85c764c15a	[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex I've run into a bug where current LLVM at -O0 (with fast-isel) generated invalid code like: ld 0, 20936(1) # 8-byte Folded Reload stw 12, 10348(0) stw 12, 10344(0) The underlying vreg had been introduced as base register by the Local Stack Slot Allocation pass. That register was constrained to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match the ADDI instruction used to set it, but it was not constrained to G8RC_NOX0 to fit the use of the register in an address. That should have happened in PPCRegisterInfo::resolveFrameIndex. This patch adds an appropriate constrainRegClass call. Reviewed by Hal Finkel. llvm-svn: 211897	2014-06-27 13:04:12 +00:00
Eric Christopher	04713c650c	Remove extraneous includes from the target machines. llvm-svn: 211800	2014-06-26 19:30:05 +00:00
Will Schmidt	fbbc998175	add ppc64/pwr8 as target includes handling DIR_PWR8 where appropriate The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later. llvm-svn: 211779	2014-06-26 13:36:19 +00:00
Rafael Espindola	196881cf29	Move expression visitation logic up to MCStreamer. Remove the duplicate from MCRecordStreamer. No functionality change. llvm-svn: 211714	2014-06-25 15:45:33 +00:00
Rafael Espindola	0e71694bfa	Simplify the visitation of target expressions. No functionality change. llvm-svn: 211707	2014-06-25 15:29:54 +00:00
Bill Schmidt	bfe90f8c83	[PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction) PR20071 identifies a problem in PowerPC's fast-isel implementation for floating-point conversion to integer. The fctiduz instruction was added in Power ISA 2.06 (i.e., Power7 and later). However, this instruction is being generated regardless of which 64-bit PowerPC target is selected. The intent is for fast-isel to punt to DAG selection when this instruction is not available. This patch implements that change. For testing purposes, the existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests for the expected code generation. Additionally, the existing test fast-isel-conversion-p5.ll was found to be incorrectly expecting the unavailable instruction to be generated. I've removed these test variants since we have adequate coverage in fast-isel-conversion.ll. llvm-svn: 211627	2014-06-24 20:05:18 +00:00
Ulrich Weigand	b36d130d19	[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize As of r211495, the only remaining users of getMinCallFrameSize are in core ABI code (LowerFormalParameter / LowerCall). This is actually a good thing, since the details of the parameter save area are ABI specific. With the new ELFv2 ABI in particular, the rules defining the size of the save area will become significantly more complex, so it wouldn't make sense to implement those outside ABI code that has all required information. In preparation, this patch eliminates the getMinCallFrameSize (and associated getMinCallArgumentsSize) routines, and inlines them into all callers. Note that since nearly all call arguments are constant, this allows simplifying the inlined copies to a single line everywhere. No change in generate code expected. llvm-svn: 211497	2014-06-23 14:15:53 +00:00
Ulrich Weigand	ac0412b387	[PowerPC] Allow stack frames without parameter save area The PPCFrameLowering::determineFrameLayout routine currently ensures that every function that allocates a stack frame provides space for the parameter save area (via PPCFrameLowering::getMinCallFrameSize). This is actually not necessary. There may be functions that never call another routine but still allocate a frame; those do not require the parameter save area. In the future, with the ELFv2 ABI, even some routines that do call other functions do not need to allocate the parameter save area. While it is not a bug to allocate the parameter area when it is not needed, it is better to avoid it to save stack space. Note that when any particular function call requires the parameter save area, this space will already have been included by ABI code in the size the CALLSEQ_START insn is annotated with, and therefore included in the size returned by MFI->getMaxCallFrameSize(). This means that determineFrameLayout simply does not need to care about the parameter save area. (It still needs to ensure that every frame provides the linkage area.) This is implemented by this patch. Note that this exposed a bug in the new fast-isel code where the parameter area was not included in the CALLSEQ_START size; this is also fixed. A couple of test cases needed to be adapted for the new (smaller) stack frame size those tests now see. llvm-svn: 211495	2014-06-23 13:47:52 +00:00
Ulrich Weigand	cd08720d8e	[PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls As remarked in the commit message to r211493, in several places throughout the 64-bit SVR4 ABI code there are calls to PPCFrameLowering::getLinkageSize and getMinCallFrameSize using an incorrect IsDarwin argument of "true". (Some of those were made explicit by the above refactoring patch, others have been there all along.) This patch fixes those places to pass "false" for IsDarwin. No change in generated code expected. llvm-svn: 211494	2014-06-23 13:21:43 +00:00
Ulrich Weigand	186fcfa6d3	[PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and CalculateParameterAndLinkageAreaSize are currently used as subroutines from both 64-bit SVR4 and Darwin ABI code. However, the two ABIs are already quite different w.r.t. AltiVec conventions, and they will become more different when the ELFv2 ABI is supported. Also, in general it seems better to disentangle ABI support routines for different ABIs to avoid accidentally affecting one ABI when intending to change only the other. (Actually, the current code strictly speaking already contains a bug: these routines call PPCFrameLowering::getMinCallFrameSize and PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to "true" even on 64-bit SVR4. This bug currently has no adverse effect since those routines always return the same for 64-bit SVR4 and 64-bit Darwin, but it still seems wrong ... I'll fix this in a follow-up commit shortly.) To remove this code sharing, I'm simply inlining both routines into all call sites (there are just two each, one for 64-bit SVR4 and one for Darwin), and simplifying due to constant parameters where possible. A small piece of code that does make sense to share is refactored into the new routine EnsureStackAlignment, now also called from 32-bit SVR4 ABI code. No change in generated code is expected. llvm-svn: 211493	2014-06-23 13:08:27 +00:00
Ulrich Weigand	e1fa36144d	[PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4 Current 64-bit SVR4 code seems to have some remnants of Darwin code in AltiVec argument handing. This had the effect that AltiVec arguments (or subsequent arguments) were not correctly placed in the parameter area in some cases. The correct behaviour with the 64-bit SVR4 ABI is: - All AltiVec arguments take up space in the parameter area, just like any other arguments, whether vararg or not. - They are always 16-byte aligned, skipping a parameter area doubleword (and the associated GPR, if any), if necessary. This patch implements the correct behaviour and adds a test case. (Verified against GCC behaviour via the ABI compat test suite.) llvm-svn: 211492	2014-06-23 12:36:34 +00:00
Ulrich Weigand	3f880bd06e	[PowerPC] Fix small argument stack slot offset for LE When small arguments (structures < 8 bytes or "float") are passed in a stack slot in the ppc64 SVR4 ABI, they must reside in the least significant part of that slot. On BE, this means that an offset needs to be added to the stack address of the parameter, but on LE, the least significant part of the slot has the same address as the slot itself. This changes the PowerPC back-end ABI code to only add the small argument stack slot offset for BE. It also adds test cases to verify the correct behavior on both BE and LE. llvm-svn: 211368	2014-06-20 16:34:05 +00:00
Ulrich Weigand	c393e04673	[PowerPC] Remove unnecessary load of r12 in indirect call When looking at the 64-bit SVR4 indirect call sequence, I noticed an unnecessary load of r12. And indeed the code says: // R12 must contain the address of an indirect callee. But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is no need to load r12 at this point. It seems this code and comment is a remnant of code originally shared with the Darwin ABI ... This patch simply removes the unnecessary load. llvm-svn: 211203	2014-06-18 18:33:36 +00:00
Ulrich Weigand	e256aa48b9	[PowerPC] Simplify and improve loading into TOC register During an indirect function call sequence on the 64-bit SVR4 ABI, generate code must load and then restore the TOC register. This does not use a regular LOAD instruction since the TOC register r2 is marked as reserved. Instead, the are two special instruction patterns: let RST = 2, DS = 2 in def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg), "ld 2, 8($reg)", IIC_LdStLD, [(PPCload_toc i64:$reg)]>, isPPC64; let RST = 2, DS = 10, RA = 1 in def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins), "ld 2, 40(1)", IIC_LdStLD, [(PPCtoc_restore)]>, isPPC64; Note that these not only restrict the destination of the load to r2, but they also restrict the source of the load to particular address combinations. The latter is a problem when we want to support the ELFv2 ABI, since there the TOC save slot is no longer at 40(1). This patch replaces those two instructions with a single instruction pattern that only hard-codes r2 as destination, but supports generic addresses as source. This will allow supporting the ELFv2 ABI, and also helps generate more efficient code for calls to absolute addresses (allowing simplification of the ppc64-calls.ll test case). llvm-svn: 211193	2014-06-18 17:52:49 +00:00
Ulrich Weigand	1458f67d3e	[PowerPC] Do not use BLA with the 64-bit SVR4 ABI The PowerPC back-end uses BLA to implement calls to functions at known-constant addresses, which is apparently used for certain system routines on Darwin. However, with the 64-bit SVR4 ABI, this is actually incorrect. An immediate function pointer value on this platform is not directly usable as a target address for BLA: - in the ELFv1 ABI, the function pointer value refers to the function descriptor, not the code address - in the ELFv2 ABI, the function pointer value refers to the global entry point, but BL(A) would only be correct when calling the local entry point This bug didn't show up since using immediate function pointer values is not usually done in the 64-bit SVR4 ABI in the first place. However, I ran into this issue with a certain use case of LLVM as JIT, where immediate function pointer values were uses to implement callbacks from JITted code to helpers in statically compiled code. Fixed by simply not using BLA with the 64-bit SVR4 ABI. llvm-svn: 211174	2014-06-18 16:14:04 +00:00
Ulrich Weigand	cfcf14bd4c	[PowerPC] Fix emitting instruction pairs on LE My patch r204634 to emit instructions in little-endian format failed to handle those special cases where we emit a pair of instructions from a single LLVM MC instructions (like the bl; nop pairs used to implement the call sequence). In those cases, we still need to emit the "first" instruction (the one in the more significant word) first, on both big and little endian, and not swap them. llvm-svn: 211171	2014-06-18 15:37:07 +00:00
Bill Schmidt	b0bab996e0	[PPC64] Fix PR19893 - improve code generation for local function addresses Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal code generation for forming a function address that is local to the compile unit. The existing code was treating both local and non-local functions identically. This patch fixes the problem by properly identifying local functions and generating the proper addis/addi code. I also noticed that Rafael's earlier changes to correct the surrounding code in PPCISelLowering.cpp were also needed for fast instruction selection in PPCFastISel.cpp, so this patch fixes that code as well. The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new code generation. I've added a -O0 run line to test the fast-isel code as well. Tested on powerpc64[le]-unknown-linux-gnu with no regressions. llvm-svn: 211056	2014-06-16 21:36:02 +00:00
Eric Christopher	2ea2870f36	The hazard recognizer only needs a subtarget, not a target machine so make it take one. Fix up all users accordingly. llvm-svn: 210948	2014-06-13 22:38:52 +00:00
Eric Christopher	72e46a5bfe	Fix typo. llvm-svn: 210947	2014-06-13 22:38:48 +00:00
Eric Christopher	286dd39af0	Move the PPCSelectionDAGInfo off the TargetMachine and onto the subtarget. llvm-svn: 210854	2014-06-12 23:02:32 +00:00
Eric Christopher	0382d62f87	Make PPCSelectionDAGInfo take a DataLayout instead of a TargetMachine since that's all it needs. llvm-svn: 210853	2014-06-12 22:56:48 +00:00
Eric Christopher	42e57db35c	Move PPCTargetLowering off of the TargetMachine and onto the subtarget. llvm-svn: 210852	2014-06-12 22:50:10 +00:00
Eric Christopher	576f8a1a6b	Remove an extraneous this-> to access the subtarget. llvm-svn: 210849	2014-06-12 22:38:20 +00:00
Eric Christopher	d1b1190bd1	Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency. Also remove an extra local subtarget in the initialization functions. llvm-svn: 210848	2014-06-12 22:38:18 +00:00
Eric Christopher	429be5d609	Move PPCJITInfo off of the TargetMachine and onto the subtarget. Needed to migrate a few functions around to avoid circular header dependencies. llvm-svn: 210845	2014-06-12 22:28:06 +00:00
Eric Christopher	40ba57e7f5	Remove the use of TargetMachine from PPCJITInfo and replace with the subtarget. Also remove unnecessary argument to the constructor at the same time, we already have access via the subtarget. llvm-svn: 210844	2014-06-12 22:19:51 +00:00
Eric Christopher	95b7901fd6	Move PPCInstrInfo off of the target machine and onto the subtarget. llvm-svn: 210839	2014-06-12 22:05:46 +00:00
Eric Christopher	d4532ed073	Remove TargetMachine from PPCInstrInfo and all dependencies and replace with the current subtarget. llvm-svn: 210836	2014-06-12 21:48:52 +00:00
Eric Christopher	fe75c4c997	Move DataLayout from the PPCTargetMachine to the subtarget. llvm-svn: 210824	2014-06-12 21:08:06 +00:00
Eric Christopher	0c6467adf3	Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use the initializeSubtargetDependencies code to obtain an initialized subtarget and migrate a couple of subtarget using functions to the .cpp file to avoid circular includes. llvm-svn: 210822	2014-06-12 20:54:11 +00:00
Eric Christopher	a30bc53a6b	Remove duplicate copy of InstrItineraryData from the TargetMachine, it's already on the subtarget. llvm-svn: 210619	2014-06-11 00:53:17 +00:00
Bill Schmidt	6e11183ad7	[PPC64LE] Recognize shufflevector patterns for little endian Various masks on shufflevector instructions are recognizable as specific PowerPC instructions (vector pack, vector merge, etc.). There is existing code in PPCISelLowering.cpp to recognize the correct patterns for big endian code. The masks for these instructions are different for little endian code due to the big-endian numbering employed by these instructions. This patch adds the recognition code for little endian. I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for this. The existing recognizer test (vec_shuffle.ll) is unnecessarily verbose and difficult to read, so I felt it was better to add a new test rather than modify the old one. llvm-svn: 210536	2014-06-10 14:35:01 +00:00
Bill Schmidt	41cd7375c8	[PPC64LE] Generate correct code for unaligned little-endian vector loads The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493	2014-06-09 22:00:52 +00:00
Bill Schmidt	3ff0a8eb8b	[PPC64LE] Generate correct little-endian code for v16i8 multiply The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474	2014-06-09 16:06:29 +00:00
David Blaikie	f670b953e7	AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand I saw at least a memory leak or two from inspection (on probably untested error paths) and r206991, which was the original inspiration for this change. I ran this idea by Jim Grosbach a few weeks ago & he was OK with it. Since it's a basically mechanical patch that seemed sufficient - usual post-commit review, revert, etc, as needed. llvm-svn: 210427	2014-06-08 16:18:35 +00:00
Eric Christopher	db8e2ecde5	Have TargetSelectionDAGInfo take a DataLayout initializer rather than a TargetMachine since the only thing it wants is DataLayout. llvm-svn: 210366	2014-06-06 19:04:48 +00:00
Bill Schmidt	647be1ef2c	[PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336	2014-06-06 14:06:26 +00:00
Bill Schmidt	9380b3b7bf	[PPC64LE] Temporarily disable VSX support in little-endian mode This is a preliminary patch for the PowerPC64LE support. In stage 1 of the vector support, we will support the VMX (Altivec) instruction set, but will not yet support the VSX instructions. This is merely a staging issue to provide functional vector support as soon as possible. llvm-svn: 210271	2014-06-05 16:21:13 +00:00
Eric Christopher	44a7e98ce5	Omit else branch after return. llvm-svn: 210034	2014-06-02 17:29:07 +00:00
Eric Christopher	1aad72164e	Have the TLOF creation take a Triple rather than needing a subtarget. llvm-svn: 209937	2014-05-31 00:07:32 +00:00
Eric Christopher	0cc6977494	isSVR4ABI() returned !isDarwin() so just move that to the else block and remove the unreachable code. llvm-svn: 209927	2014-05-30 22:47:53 +00:00
Eric Christopher	f0478ea2df	Rename CreateTLOF->createTLOF to match the rest of the file and the rest of the targets with a similar function name. llvm-svn: 209926	2014-05-30 22:47:48 +00:00
Rafael Espindola	34f4870951	[PPC] Use alias symbols in address computation. This seems to match what gcc does for ppc and what every other llvm backend does. This is a fixed version of r209638. The difference is to avoid any change in behavior for functions. The logic for using constant pools for function addresseses is spread over a few places and we have to keep them in sync. llvm-svn: 209821	2014-05-29 15:41:38 +00:00
Rafael Espindola	acdb307db3	[pr19844] Add thread local mode to aliases. This matches gcc's behavior. It also seems natural given that aliases contain other properties that govern how it is accessed (linkage, visibility, dll storage). Clang still has to be updated to expose this feature to C. llvm-svn: 209759	2014-05-28 18:15:43 +00:00
Hal Finkel	47a225fb6c	Revert "[PPC] Use alias symbols in address computation." This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the Clang-compiled TableGen would segfault because it jumped to an invalid address from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E (which is within the command-line parameter registration process)). llvm-svn: 209745	2014-05-28 15:25:06 +00:00
Bill Schmidt	b806d02b5b	[PATCH] Correct type used for VADD_SPLAT optimization on PowerPC In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there is an optimization for certain patterns to generate one or two vector splats followed by a vector add or subtract. This operation is represented by a VADD_SPLAT in the selection DAG. Prior to this patch, it was possible for the VADD_SPLAT to be assigned the wrong data type, causing incorrect code generation. This patch corrects the problem. Specifically, the code previously assigned the value type of the BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is correct much of the time, but not always. The problem is that the call to isConstantSplat() may return a SplatBitSize that is not the same as the number of bits in the original element vector type. The correct type to assign is a vector type with the same element bit size as SplatBitSize. The included test case shows an example of this, where the BUILD_VECTOR node has a type of v16i8. The vector to be built is {0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat detects that we can generate a splat of 16 for type v8i16, which is the type we must assign to the VADD_SPLAT node. If we do not, we generate a vspltisb of 8 and a vaddubm, which generates the incorrect result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16}. The correct code generation is a vspltish of 8 and a vadduhm. This patch also corrected code generation for CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked as an XFAIL, so we can remove the XFAIL from the test case. llvm-svn: 209662	2014-05-27 15:57:51 +00:00
Rafael Espindola	94cd9a1ed6	[PPC] Use alias symbols in address computation. This seems to match what gcc does for ppc and what every other llvm backend does. llvm-svn: 209638	2014-05-26 19:08:19 +00:00
Eric Christopher	1b3b092405	Fix typo. llvm-svn: 209377	2014-05-22 01:21:44 +00:00
Eric Christopher	7898bbfc19	Avoid using subtarget features when initializing the pass pipeline on PPC. llvm-svn: 209376	2014-05-22 01:21:35 +00:00
Eric Christopher	4c56d10ea0	Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction. This required updating the generated functions and TD file accordingly to be pointers rather than const references. llvm-svn: 209375	2014-05-22 01:07:24 +00:00
Eric Christopher	7880d61aac	Make early if conversion dependent upon the subtarget and add a subtarget hook to enable. Unconditionally add to the pass pipeline for targets that might want to use it. No functional change. llvm-svn: 209340	2014-05-21 23:40:26 +00:00
Adam Nemet	37337f0359	[PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64 backend. We matched an immediate offset with STWX8 even though it only supports register offset. The culprit is the complex-pattern predicate, SelectAddrIdx, which decides that if the offset is not ISD::Constant it must be a register. Many thanks to Bill Schmidt for testing this. llvm-svn: 209219	2014-05-20 17:20:34 +00:00
Benjamin Kramer	600e24a1cb	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00
Saleem Abdulrasool	501d3b6235	Target: remove old constructors for CallLoweringInfo This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082	2014-05-17 21:50:17 +00:00
Pete Cooper	fa13048706	Use a sized enum for MachineOperandType. No functionality change llvm-svn: 209048	2014-05-16 23:28:17 +00:00
Rafael Espindola	e809bea68e	Delete getAliasedGlobal. llvm-svn: 209040	2014-05-16 22:37:03 +00:00
Jay Foad	e0eac700cb	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Eric Christopher	935299458d	Fix typo in function name. llvm-svn: 208743	2014-05-14 00:31:15 +00:00
Eric Christopher	1091ab4275	Save the optimization level the subtarget was created with in a member variable and sink the initialization of crbits into the subtarget feature reset code. No functional change, but this refactor will be used in a future commit. llvm-svn: 208726	2014-05-13 20:49:08 +00:00
Hal Finkel	f6f53bcd51	[PowerPC] Add global named register support Support for the intrinsics that read from and write to global named registers is added for r1, r2 and r13 (depending on the subtarget). llvm-svn: 208509	2014-05-11 19:29:11 +00:00
Hal Finkel	ef707e1c3e	[PowerPC] On PPC32, 128-bit shifts might be runtime calls The counter-loops formation pass needs to know what operations might be function calls (because they can't appear in counter-based loops). On PPC32, 128-bit shifts might be runtime calls (even though you can't use __int128 on PPC32, it seems that SROA might form them). Fixes PR19709. llvm-svn: 208501	2014-05-11 16:23:29 +00:00
Rafael Espindola	765e5e78cf	Remove the UseCFI option from createAsmStreamer. We were already always passing true, this just removes the option. llvm-svn: 208205	2014-05-07 13:00:43 +00:00
Rafael Espindola	bb77d317cd	Fix pr19645. The fix itself is fairly simple: move getAccessVariant to MCValue so that we replace the old weak expression evaluation with the far more general EvaluateAsRelocatable. This then requires that EvaluateAsRelocatable stop when it finds a non trivial reference kind. And that in turn requires the ELF writer to look harder for weak references. Last but not least, this found a case where we were being bug by bug compatible with gas and accepting an invalid input. I reported pr19647 to track it. llvm-svn: 207920	2014-05-03 19:57:04 +00:00
Craig Topper	79b097d66a	Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently. llvm-svn: 207616	2014-04-30 07:17:30 +00:00
Craig Topper	5f5f448d02	De-virtualize or remove some methods that have no overrides nor override anything. In some cases remove all together if there are no callers either. llvm-svn: 207610	2014-04-30 05:53:27 +00:00
Craig Topper	dcce1d897e	[C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition llvm-svn: 207504	2014-04-29 07:57:37 +00:00
Eric Christopher	4af26f41d6	None of these targets actually define their own CFI_INSTRUCTION opcode so there's no reason to use the target namespace for it rather than TargetOpcode. llvm-svn: 207475	2014-04-29 00:16:46 +00:00
Eric Christopher	17d4cf2e96	80-column, tab characters, comment fixups. llvm-svn: 207473	2014-04-29 00:16:40 +00:00
Craig Topper	9683cb114b	Convert more SelectionDAG functions to use ArrayRef. llvm-svn: 207397	2014-04-28 05:57:50 +00:00
Craig Topper	b663bffa27	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Craig Topper	1efda44640	Convert SelectionDAG::SelectNodeTo to use ArrayRef. llvm-svn: 207377	2014-04-27 19:21:11 +00:00
Craig Topper	536995c0a7	Convert SelectionDAG::getMergeValues to use ArrayRef. llvm-svn: 207374	2014-04-27 19:20:57 +00:00
Craig Topper	e0741a0fcb	Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size. llvm-svn: 207329	2014-04-26 19:29:41 +00:00
Craig Topper	1b1f54bcca	Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>. llvm-svn: 207327	2014-04-26 18:35:24 +00:00
Craig Topper	6d411cb95a	[C++] Use 'nullptr'. Target edition. llvm-svn: 207197	2014-04-25 05:30:21 +00:00
Reid Kleckner	e7e2ccb9e9	Add 'musttail' marker to call instructions This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143	2014-04-24 20:14:34 +00:00
David Blaikie	52ea9cc073	Spread some const around for non-mutating uses of MCSymbolData. I discovered this const-hole while attempting to coalesnce the Symbol and SymbolMap data structures. There's some pending issues with that, but I figured this change was easy to flush early. llvm-svn: 207124	2014-04-24 16:59:40 +00:00
Evgeniy Stepanov	c242bd4b23	Create MCTargetOptions. For now it contains a single flag, SanitizeAddress, which enables AddressSanitizer instrumentation of inline assembly. Patch by Yuri Gorshenin. llvm-svn: 206971	2014-04-23 11:16:03 +00:00
Chandler Carruth	ae889a5f85	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842	2014-04-22 02:41:26 +00:00
Chandler Carruth	72185824a4	[cleanup] Lift using directives, DEBUG_TYPE definitions, and even some system headers above the includes of generated '.inc' files that actually contain code. In a few targets this was already done pretty consistently, but it wasn't done really consistently anywhere. It is strictly cleaner IMO and necessary in a bunch of places where the DEBUG_TYPE is referenced from the generated code. Consistency with the necessary places trumps. Hopefully the build bots are OK with the movement of intrin.h... llvm-svn: 206838	2014-04-22 02:03:14 +00:00
Chandler Carruth	15c7b91ac2	[Modules] Make Support/Debug.h modular. This requires it to not change behavior based on other files defining DEBUG_TYPE, which means it cannot define DEBUG_TYPE at all. This is actually better IMO as it forces folks to define relevant DEBUG_TYPEs for their files. However, it requires all files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't already. I've updated all such files in LLVM and will do the same for other upstream projects. This still leaves one important change in how LLVM uses the DEBUG_TYPE macro going forward: we need to only define the macro after header files have been #include-ed. Previously, this wasn't possible because Debug.h required the macro to be pre-defined. This commit removes that. By defining DEBUG_TYPE after the includes two things are fixed: - Header files that need to provide a DEBUG_TYPE for some inline code can do so by defining the macro before their inline code and undef-ing it afterward so the macro does not escape. - We no longer have rampant ODR violations due to including headers with different DEBUG_TYPE definitions. This may be mostly an academic violation today, but with modules these types of violations are easy to check for and potentially very relevant. Where necessary to suppor headers with DEBUG_TYPE, I have moved the definitions below the includes in this commit. I plan to move the rest of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big enough. The comments in Debug.h, which were hilariously out of date already, have been updated to reflect the recommended practice going forward. llvm-svn: 206822	2014-04-21 22:55:11 +00:00
Nick Lewycky	82ad9fc7c8	Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead. llvm-svn: 206255	2014-04-15 07:22:52 +00:00
Lang Hames	91cdab6916	[MC] Require an MCContext when constructing an MCDisassembler. This patch re-introduces the MCContext member that was removed from MCDisassembler in r206063, and requires that an MCContext be passed in at MCDisassembler construction time. (Previously the MCContext member had been initialized in an ad-hoc fashion after construction). The MCCContext member can be used by MCDisassembler sub-classes to construct constant or target-specific MCExprs. This patch updates disassemblers for in-tree targets, and provides the MCRegisterInfo instance that some disassemblers were using through the MCContext (previously those backends were constructing their own MCRegisterInfo instances). llvm-svn: 206241	2014-04-15 04:40:56 +00:00
Hal Finkel	c4a623f8d4	[PowerPC] [Constant Hoisting] Enable constant hoisting on PPC Implements the various TTI functions to enable constant hoisting on PPC. The only significant test-suite change is this: MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup (which essentially reverses the slowdown from r206120). llvm-svn: 206141	2014-04-13 23:02:40 +00:00
Hal Finkel	4da0e32e2a	[PowerPC] Fix rlwimi isel when mask is not constant We had been using the known-zero values of the operand of the or to construct the mask for an rlwimi; this is not quite correct, but fine when the mask is constant. When the mask is constant, then the known zeros of the operand must be a superset of the zeros in the mask. However, when the mask is not a constant, then there might be bits in the operand that are not known to be zero that, at runtime, might be zero in the mask. Therefore, we check that any bits not known to be zero are known to be one in the mask. Otherwise, we can't fold the mask with the or and shift. This was revealed as a miscompile of MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with constant hoisting. llvm-svn: 206136	2014-04-13 17:10:58 +00:00
Hal Finkel	a1849e7ac8	[PowerPC] Implement some additional TLI callbacks Add implementations of: bool isLegalICmpImmediate(int64_t Imm) const bool isLegalAddImmediate(int64_t Imm) const bool isTruncateFree(Type Ty1, Type Ty2) const bool isTruncateFree(EVT VT1, EVT VT2) const bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const Unfortunately, this regresses counter-register-based loop formation because some of the loops now end up in forms were SE cannot compute loop counts. However, nevertheless, the test-suite results favor committing: SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown llvm-svn: 206120	2014-04-12 21:52:38 +00:00
NAKAMURA Takumi	b6baf3b8a1	LLVMBuild.txt: Reformat. llvm-svn: 205961	2014-04-10 11:16:17 +00:00
Hal Finkel	0e97afbdb4	[PowerPC] Don't return false from PPC::isVSLDOIShuffleMask PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle predicate should be false. Noticed by inspection; no test case (yet). llvm-svn: 205787	2014-04-08 19:00:27 +00:00
Hal Finkel	216842b276	[PowerPC] Remove unused TM member variable to unbreak build Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]" llvm-svn: 205660	2014-04-05 00:16:28 +00:00
Hal Finkel	ade2d32df0	[PowerPC] Adjust load/store costs in PPCTTI This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. llvm-svn: 205658	2014-04-04 23:51:18 +00:00
Hal Finkel	e6580e744f	[PowerPC] PPCTTI Cleanup Remove the declaration of an unimplemented function. llvm-svn: 205657	2014-04-04 23:51:11 +00:00
Hal Finkel	e63f5074c7	[PowerPC] Add a full condition code register to make the "cc" clobber work gcc inline asm supports specifying "cc" as a clobber of all condition registers. Add just enough modeling of the full register to make this work. Fixed PR19326. llvm-svn: 205630	2014-04-04 15:15:57 +00:00
Craig Topper	694437e2ef	Make consistent use of MCPhysReg instead of uint16_t throughout the tree. llvm-svn: 205610	2014-04-04 05:16:06 +00:00
Hal Finkel	7f4525afba	[PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to calculate the base cost of the memory access, and then adjust on top of that. There is no functionality change from this modification, but it will become important so that PPCTTI can take advantage of scalarization information for which BasicTTI::getMemoryOpCost will account in the near future. llvm-svn: 205476	2014-04-02 22:43:49 +00:00
Jim Grosbach	f349849199	Simplify resolveFrameIndex() signature. Just pass a MachineInstr reference rather than an MBB iterator. Creating a MachineInstr& is the first thing every implementation did anyway. llvm-svn: 205453	2014-04-02 19:28:18 +00:00
Hal Finkel	3d67e62e4c	[PowerPC] Add some missing VSX bitcast patterns llvm-svn: 205352	2014-04-01 19:24:27 +00:00
Hal Finkel	0296ed914f	[PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles If we have two unique values for a v2i64 build vector, this will always result in two vector loads if we expand using shuffles. Only one is necessary. llvm-svn: 205231	2014-03-31 17:48:16 +00:00
Hal Finkel	09ae220a41	[PowerPC] Correct P7 dispatch unit allocation for vector instructions llvm-svn: 205222	2014-03-31 17:02:10 +00:00
Hal Finkel	d7201e5971	[PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146	2014-03-30 13:22:59 +00:00
Hal Finkel	92fc087786	[PowerPC] Handle v2i64 comparisons v2i64 is a legal type under VSX, however we don't have native vector comparisons. We can handle eq/ne by casting it to an Altivec type, but everything else must be expanded. llvm-svn: 205106	2014-03-29 16:04:40 +00:00
Hal Finkel	bac4772857	[PowerPC] VSX instruction latency corrections The vector divide and sqrt instructions have high latencies, and the scalar comparisons are like all of the others. On the P7, permutations take an extra cycle over purely-simple vector ops. llvm-svn: 205096	2014-03-29 13:20:31 +00:00
Rafael Espindola	0b8859a3e5	Completely rewrite ELFObjectWriter::RecordRelocation. I started trying to fix a small issue, but this code has seen a small fix too many. The old code was fairly convoluted. Some of the issues it had: * It failed to check if a symbol difference was in the some section when converting a relocation to pcrel. * It failed to check if the relocation was already pcrel. * The pcrel value computation was wrong in some cases (relocation-pc.s) * It was missing quiet a few cases where it should not convert symbol relocations to section relocations, leaving the backends to patch it up. * It would not propagate the fact that it had changed a relocation to pcrel, requiring a quiet nasty work around in ARM. * It was missing comments. llvm-svn: 205076	2014-03-29 06:26:49 +00:00
Hal Finkel	99fd50482e	[PowerPC] Add subregister classes for f64 VSX values We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made. llvm-svn: 205075	2014-03-29 05:29:01 +00:00
Hal Finkel	f15b90e07a	[PowerPC] Fix VSX permutation isel Not only did I invert the indices when I wrote the code, but I also did the same thing when I wrote the regression test. Oops. llvm-svn: 205046	2014-03-28 20:24:55 +00:00
Hal Finkel	c1ab8c2486	[PowerPC] v2[fi]64 need to be explicitly passed in VSX registers v2[fi]64 values need to be explicitly passed in VSX registers. This is because the code in TRI that finds the minimal register class given a register and a value type will assert if given an Altivec register and a non-Altivec type. llvm-svn: 205041	2014-03-28 19:58:11 +00:00
Hal Finkel	786d7d887a	[PowerPC] Use a small cleanup pass to remove VSX self copies As explained in r204976, because of how the allocation of VSX registers interacts with the call-lowering code, we sometimes end up generating self VSX copies. Specifically, things like this: %VSL2<def> = COPY %F2, %VSL2<imp-use,kill> (where %F2 is really a sub-register of %VSL2, and so this copy is a nop) This adds a small cleanup pass to remove these prior to post-RA scheduling. llvm-svn: 204980	2014-03-27 23:12:31 +00:00
Hal Finkel	ee610c3b4d	[PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg Because of how the allocation of VSX registers interacts with the call-lowering code, we sometimes end up generating self VSX copies. Specifically, things like this: %VSL2<def> = COPY %F2, %VSL2<imp-use,kill> (where %F2 is really a sub-register of %VSL2, and so this copy is a nop) The problem is that ExpandPostRAPseudos always assumes that some instruction has been inserted, and adds implicit defs to it. This is a problem if no copy was inserted because it can cause subtle problems during post-RA scheduling. These self copies will have to be removed some other way. llvm-svn: 204976	2014-03-27 22:46:28 +00:00
Hal Finkel	f19bcef675	[PowerPC] Fix v2f64 vector extract and related patterns First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971	2014-03-27 22:22:48 +00:00
Hal Finkel	fa7f1597ca	[PowerPC] Expand v2i64 shifts These operations need to be expanded during legalization so that isel does not crash. In theory, we might be able to custom lower some of these. That, however, would need to be follow-up work. llvm-svn: 204963	2014-03-27 21:26:33 +00:00
Rafael Espindola	d78485af3e	Remove another unused argument. llvm-svn: 204961	2014-03-27 20:49:35 +00:00
Rafael Espindola	4e5d391691	Remove unused argument. llvm-svn: 204956	2014-03-27 20:41:17 +00:00
Rafael Espindola	5c8926deed	Prevent alias from pointing to weak aliases. This adds back r204781. Original message: Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204934	2014-03-27 15:26:56 +00:00
Hal Finkel	ca154788e6	[PowerPC] Generate VSX permutations for v2[fi]64 vectors llvm-svn: 204873	2014-03-26 22:58:37 +00:00
Hal Finkel	800564a97b	[PowerPC] VSX loads and stores support unaligned access I've not yet updated PPCTTI because I'm not sure what the actual relative cost is compared to the aligned uses. llvm-svn: 204848	2014-03-26 19:39:09 +00:00
Hal Finkel	ab7214ddc6	[PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions llvm-svn: 204843	2014-03-26 19:13:54 +00:00
Hal Finkel	49e186f6c1	[PowerPC] Remove some dead VSX v4f32 store patterns These patterns are dead (because v4f32 stores are currently promoted to v4i32 and stored using Altivec instructions), and also are likely not correct (because they'd store the vector elements in the opposite order from that assumed by the rest of the Altivec code). llvm-svn: 204839	2014-03-26 18:26:36 +00:00
Hal Finkel	11338e1f96	[PowerPC] Use VSX vector load/stores for v2[fi]64 These instructions have access to the complete VSX register file. In addition, they "swap" the order of the elements so that element 0 (the scalar part) comes first in memory and element 1 follows at a higher address. llvm-svn: 204838	2014-03-26 18:26:30 +00:00
Hal Finkel	0bf7496bb8	[PowerPC] Add v2i64 as a legal VSX type v2i64 needs to be a legal VSX type because it is the SetCC result type from v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations. This fixes the lowering for v2f64 VSELECT. llvm-svn: 204828	2014-03-26 16:12:58 +00:00
Hal Finkel	00925a52e5	[PowerPC] Lower VSELECT using xxsel when VSX is available With VSX there is a real vector select instruction, and so we should use it. Note that VSELECT will still scalarize for v2f64 because the corresponding SetCC result type (v2i64) is not currently a legal type. llvm-svn: 204801	2014-03-26 12:49:28 +00:00
Rafael Espindola	63a8ff6883	Revert "Prevent alias from pointing to weak aliases." This reverts commit r204781. I will follow up to with msan folks to see what is what they were trying to do with aliases to weak aliases. llvm-svn: 204784	2014-03-26 06:14:40 +00:00
Hal Finkel	7a700cc27d	[PowerPC] Generate logical vector VSX instructions These instructions are essentially the same as their Altivec counterparts, but have access to the larger VSX register file. llvm-svn: 204782	2014-03-26 04:55:40 +00:00
Rafael Espindola	c9179b8b50	Prevent alias from pointing to weak aliases. Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204781	2014-03-26 04:48:47 +00:00
Hal Finkel	066a5cfe42	[PowerPC] Select between VSX A-type and M-type FMA instructions just before RA The VSX instruction set has two types of FMA instructions: A-type (where the addend is taken from the output register) and M-type (where one of the product operands is taken from the output register). This adds a small pass that runs just after MI scheduling (and, thus, just before register allocation) that mutates A-type instructions (that are created during isel) into M-type instructions when: 1. This will eliminate an otherwise-necessary copy of the addend 2. One of the product operands is killed by the instruction The "right" moment to make this decision is in between scheduling and register allocation, because only there do we know whether or not one of the product operands is killed by any particular instruction. Unfortunately, this also makes the implementation somewhat complicated, because the MIs are not in SSA form and we need to preserve the LiveIntervals analysis. As a simple example, if we have: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9 %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16, %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16 ... %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19, %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19 ... We can eliminate the copy by changing from the A-type to the M-type instruction. This means: %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16, %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16 is replaced by: %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9, %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9 and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9 llvm-svn: 204768	2014-03-25 23:29:21 +00:00
Hal Finkel	b10ebe6ad3	[PowerPC] Correct commutable indices for VSX FMA instructions Although the first two operands are the ones that can be swapped, the tied input operand is listed before them, so we need to adjust for that. I have a test case for this, but it goes along with an upcoming commit (so it will come soon). llvm-svn: 204748	2014-03-25 19:26:43 +00:00
Hal Finkel	41960891a5	[PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions TableGen will create a lookup table for the A-type FMA instructions providing their corresponding M-form opcodes. This will be used by upcoming commits. llvm-svn: 204746	2014-03-25 18:55:11 +00:00
Ulrich Weigand	f2e33e8135	[PowerPC] Generate little-endian object files As a first step towards real little-endian code generation, this patch changes the PowerPC MC layer to actually generate little-endian object files. This involves passing the little-endian flag through the various layers, including down to createELFObjectWriter so we actually get basic little-endian ELF objects, emitting instructions in little-endian order, and handling fixups and relocations as appropriate for little-endian. The bulk of the patch is to update most test cases in test/MC/PowerPC to verify both big- and little-endian encodings. (The only test cases not updated are those that create actual big-endian ABI code, like the TLS tests.) Note that while the object files are now little-endian, the generated code itself is not yet updated, in particular, it still does not adhere to the ELFv2 ABI. llvm-svn: 204634	2014-03-24 18:16:09 +00:00
Will Schmidt	8fe019f872	[PPC64LE] ELFv2 ABI updates for the .opd section [PPC64LE] ELFv2 ABI updates for the .opd section The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as such, does not have a ".opd" section. This is keyed off a _CALL_ELF=2 macro check. The CALL_ELF check is not clearly documented at this time. The basis for usage in this patch is from the gcc thread here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html > Adding comment from Uli: Looks good to me. I think the old-style JIT doesn't really work anyway for 64-bit, but at least with this patch LLVM will compile and link again on a ppc64le host ... llvm-svn: 204614	2014-03-24 16:04:15 +00:00
Hal Finkel	feee356b52	[PowerPC] Mark many instructions as commutative I'm under the impression that we used to infer the isCommutable flag from the instruction-associated pattern. Regardless, we don't seem to do this (at least by default) any more. I've gone through all of our instruction definitions, and marked as commutative all of those that should be trivial to commute (by exchanging the first two operands). There has been special code for the RL* instructions, and that's not changed. Before this change, we had the following commutative instructions: RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo XSADDDP XSMULDP XVADDDP XVADDSP XVMULDP XVMULSP After: ADD4 ADD4o ADD8 ADD8o ADDC ADDC8 ADDC8o ADDCo ADDE ADDE8 ADDE8o ADDEo AND AND8 AND8o ANDo CRAND CREQV CRNAND CRNOR CROR CRXOR EQV EQV8 EQV8o EQVo FADD FADDS FADDSo FADDo FMADD FMADDS FMADDSo FMADDo FMSUB FMSUBS FMSUBSo FMSUBo FMUL FMULS FMULSo FMULo FNMADD FNMADDS FNMADDSo FNMADDo FNMSUB FNMSUBS FNMSUBSo FNMSUBo MULHD MULHDU MULHDUo MULHDo MULHW MULHWU MULHWUo MULHWo MULLD MULLDo MULLW MULLWo NAND NAND8 NAND8o NANDo NOR NOR8 NOR8o NORo OR OR8 OR8o ORo RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo VADDCUW VADDFP VADDSBS VADDSHS VADDSWS VADDUBM VADDUBS VADDUHM VADDUHS VADDUWM VADDUWS VAND VAVGSB VAVGSH VAVGSW VAVGUB VAVGUH VAVGUW VMADDFP VMAXFP VMAXSB VMAXSH VMAXSW VMAXUB VMAXUH VMAXUW VMHADDSHS VMHRADDSHS VMINFP VMINSB VMINSH VMINSW VMINUB VMINUH VMINUW VMLADDUHM VMULESB VMULESH VMULEUB VMULEUH VMULOSB VMULOSH VMULOUB VMULOUH VNMSUBFP VOR VXOR XOR XOR8 XOR8o XORo XSADDDP XSMADDADP XSMAXDP XSMINDP XSMSUBADP XSMULDP XSNMADDADP XSNMSUBADP XVADDDP XVADDSP XVMADDADP XVMADDASP XVMAXDP XVMAXSP XVMINDP XVMINSP XVMSUBADP XVMSUBASP XVMULDP XVMULSP XVNMADDADP XVNMADDASP XVNMSUBADP XVNMSUBASP XXLAND XXLNOR XXLOR XXLXOR This is a by-inspection change, and I'm not sure how to write a reliable test case. I would like advice on this, however. llvm-svn: 204609	2014-03-24 15:07:28 +00:00
Hal Finkel	829bfc7c99	[PowerPC] Don't schedule VSX copy legalization unless VSX is enabled There is no need to schedule this extra pass if it will have nothing to do. llvm-svn: 204594	2014-03-24 09:51:41 +00:00
Hal Finkel	d4fe687c4a	[PowerPC] Update comment re: VSX copy-instruction selection I've done some experimentation with this, and it looks like using the lower-latency (but lower throughput) copy instruction is essentially always the right thing to do. My assumption is that, in order to be relatively sure that the higher-latency copy will increase throughput, we'd want to have it unlikely to be in-flight with its use. On the P7, the global completion table (GCT) can hold a maximum of 120 instructions, shared among all active threads (up to 4), giving 30 instructions per thread. So specifically, I'd require at least that many instructions between the copy and the use before the high-latency variant is used. Trying this, however, over the entire test suite resulted in zero cases where the high-latency form would be preferable. This may be a consequence of the fact that the scheduler views copies as free, and so they tend to end up close to their uses. For this experiment I created a function: unsigned chooseVSXCopy(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, unsigned DestReg, unsigned SrcReg, unsigned StartDist = 1, unsigned Depth = 3) const; with an implementation like: if (!Depth) return PPC::XXLOR; const unsigned MaxDist = 30; unsigned Dist = StartDist; for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) { if (J->isTransient() && !J->isCopy()) continue; if (J->isCall() \|\| J->isReturn() \|\| J->readsRegister(DestReg, TRI)) return PPC::XXLOR; ++Dist; } // We've exceeded the required distance for the high-latency form, use it. if (Dist > MaxDist) return PPC::XVCPSGNDP; // If this is only an exit block, use the low-latency form. if (MBB.succ_empty()) return PPC::XXLOR; // We've reached the end of the block, check the successor blocks (up to some // depth), and use the high-latency form if that is okay with all successors. for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) { if (chooseVSXCopy(*J, (J)->begin(), DestReg, SrcReg, Dist, --Depth) == PPC::XXLOR) return PPC::XXLOR; } // All of our successor blocks seem okay with the high-latency variant, so // we'll use it. return PPC::XVCPSGNDP; and then changed the copy opcode selection from: Opc = PPC::XXLOR; to: Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg); In conclusion, I'm removing the FIXME from the comment, because I believe that there is, at least absent other examples, nothing to fix. llvm-svn: 204591	2014-03-24 09:36:36 +00:00
Nuno Lopes	79d18a66ec	remove a bunch of unused private methods found with a smarter version of -Wunused-member-function that I'm playwing with. Appologies in advance if I removed someone's WIP code. include/llvm/CodeGen/MachineSSAUpdater.h \| 1 include/llvm/IR/DebugInfo.h \| 3 lib/CodeGen/MachineSSAUpdater.cpp \| 10 -- lib/CodeGen/PostRASchedulerList.cpp \| 1 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp \| 10 -- lib/IR/DebugInfo.cpp \| 12 -- lib/MC/MCAsmStreamer.cpp \| 2 lib/Support/YAMLParser.cpp \| 39 --------- lib/TableGen/TGParser.cpp \| 16 --- lib/TableGen/TGParser.h \| 1 lib/Target/AArch64/AArch64TargetTransformInfo.cpp \| 9 -- lib/Target/ARM/ARMCodeEmitter.cpp \| 12 -- lib/Target/ARM/ARMFastISel.cpp \| 84 -------------------- lib/Target/Mips/MipsCodeEmitter.cpp \| 11 -- lib/Target/Mips/MipsConstantIslandPass.cpp \| 12 -- lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp \| 21 ----- lib/Target/NVPTX/NVPTXISelDAGToDAG.h \| 2 lib/Target/PowerPC/PPCFastISel.cpp \| 1 lib/Transforms/Instrumentation/AddressSanitizer.cpp \| 2 lib/Transforms/Instrumentation/BoundsChecking.cpp \| 2 lib/Transforms/Instrumentation/MemorySanitizer.cpp \| 1 lib/Transforms/Scalar/LoopIdiomRecognize.cpp \| 8 - lib/Transforms/Scalar/SCCP.cpp \| 1 utils/TableGen/CodeEmitterGen.cpp \| 2 24 files changed, 2 insertions(+), 261 deletions(-) llvm-svn: 204560	2014-03-23 17:09:26 +00:00
Hal Finkel	deec4f1f76	[PowerPC] Make use of VSX f64 <-> i64 conversion instructions When VSX is available, these instructions should be used in preference to the older variants that only have access to the scalar floating-point registers. llvm-svn: 204559	2014-03-23 05:35:00 +00:00
Hal Finkel	47d76a6461	[PowerPC] Fix the VSX v2f64 return register v2f64 values, like other 128-bit values, are returned under VSX in register vs34 (Altivec register v2). llvm-svn: 204543	2014-03-22 18:24:43 +00:00
Alexey Samsonov	f3333428a5	Add llvm_unreachable after fully-covered switches to appease GCC llvm-svn: 204318	2014-03-20 07:30:40 +00:00
Rafael Espindola	9eed3f671f	Look through variables when computing relocations. Given bar = foo + 4 .long bar MC would eat the 4. GNU as includes it in the relocation. The rule seems to be that a variable that defines a symbol is used in the relocation and one that does not define a symbol is evaluated and the result included in the relocation. Fixing this unfortunately required some other changes: * Since the variable is now evaluated, it would prevent the ELF writer from noticing the weakref marker the elf streamer uses. This patch then replaces that with a VariantKind in MCSymbolRefExpr. * Using VariantKind then requires us to look past other VariantKind to see .weakref bar,foo call bar@PLT doing this also fixes zed = foo +2 call zed@PLT so that is a good thing. * Looking past VariantKind means that the relocation selection has to use the fixup instead of the target. This is a reboot of the previous fixes for MC. I will watch the sanitizer buildbot and wait for a build before adding back the previous fixes. llvm-svn: 204294	2014-03-20 02:12:01 +00:00
Bill Schmidt	ec1edc24b0	Fix PR19144: Incorrect offset generated for int-to-fp conversion at -O0. When converting a signed 32-bit integer to double-precision floating point on hardware without a lfiwax instruction, we have to instead use a lfd followed by fcfid. We were erroneously offsetting the address by 4 bytes in preparation for either a lfiwax or lfiwzx when generating the lfd. This fixes that silly error. This was not caught in the test suite since the conversion tests were run with -mcpu=pwr7, which implies availability of lfiwax. I've added another test case for older hardware that checks the code we expect in the absence of lfiwax and other flavors of fcfid. There are fewer tests in this test case because we punt to DAG selection in more cases on older hardware. (We must generate complex fiddly sequences in those cases, and there is marginal benefit in duplicating that logic in fast-isel.) llvm-svn: 204155	2014-03-18 14:32:50 +00:00
Craig Topper	f7e1d669fe	[C++11] Mark the target fast isel classes as 'final' so that the compiler can de-virtualize some of the internal calls. llvm-svn: 204123	2014-03-18 07:27:13 +00:00
Ulrich Weigand	59b05e81f9	[ppc64] Avoid copy relocs in named rodata sections Commit r181723 introduced code to avoid placing initialized variables needing relocations into the .rodata section, which avoid copy relocs that do not work as expected on ppc64 function references. The same treatment is also needed for named .rodata.XXX sections. This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal to modify "Kind" before calling the default SelectSectionForGlobal routine, instead of first calling the default routine and then just checking for the (main) .rodata section afterwards. llvm-svn: 203921	2014-03-14 12:45:22 +00:00
Owen Anderson	e541764c5f	Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing operator* on the by-operand iterators to return a MachineOperand& rather than a MachineInstr&. At this point they almost behave like normal iterators! Again, this requires making some existing loops more verbose, but should pave the way for the big range-based for-loop cleanups in the future. llvm-svn: 203865	2014-03-13 23:12:04 +00:00
Hal Finkel	8b6358ead9	[PowerPC] Initial support for the VSX instruction set VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768	2014-03-13 07:58:58 +00:00
Hal Finkel	68c9b6839e	[TableGen] Optionally forbid overlap between named and positional operands There are currently two schemes for mapping instruction operands to instruction-format variables for generating the instruction encoders and decoders for the assembler and disassembler respectively: a) to map by name and b) to map by position. In the long run, we'd like to remove the position-based scheme and use only name-based mapping. Unfortunately, the name-based scheme currently cannot deal with complex operands (those with suboperands), and so we currently must use the position-based scheme for those. On the other hand, the position-based scheme cannot deal with (register) variables that are split into multiple ranges. An upcoming commit to the PowerPC backend (adding VSX support) will require this capability. While we could teach the position-based scheme to handle that, since we'd like to move away from the position-based mapping generally, it seems silly to teach it new tricks now. What makes more sense is to allow for partial transitioning: use the name-based mapping when possible, and only use the position-based scheme when necessary. Now the problem is that mixing the two sensibly was not possible: the position-based mapping would map based on position, but would not skip those variables that were mapped by name. Instead, the two sets of assignments would overlap. However, I cannot currently change the current behavior, because there are some backends that rely on it [I think mistakenly, but I'll send a message to llvmdev about that]. So I've added a new TableGen bit variable: noNamedPositionallyEncodedOperands, that can be used to cause the position-based mapping to skip variables mapped by name. llvm-svn: 203767	2014-03-13 07:57:54 +00:00
Roman Divacky	023131bd23	Allow exclamation and tilde to be parsed as a part of the ppc asm operand. llvm-svn: 203699	2014-03-12 19:25:57 +00:00
Rafael Espindola	9eaa756fe4	Try harder to evaluate expressions when printing assembly. When printing assembly we don't have a Layout object, but we can still try to fold some constants. Testcase by Ulrich Weigand. llvm-svn: 203677	2014-03-12 16:55:59 +00:00
Will Schmidt	40cf50fd75	Update the datalayout string for ppc64LE. Update the datalayout string for ppc64LE. llvm-svn: 203664	2014-03-12 14:59:17 +00:00
Patrik Hagglund	f6f25d32ac	Replace '#include ValueTypes.h' with forward declarations. In some cases the include is pushed "downstream" (or removed if unused). llvm-svn: 203644	2014-03-12 08:00:24 +00:00
Chandler Carruth	8f25783c45	[TTI] There is actually no realistic way to pop TTI implementations off the stack of the analysis group because they are all immutable passes. This is made clear by Craig's recent work to use override systematically -- we weren't overriding anything for 'finalizePass' because there is no such thing. This is kind of a lame restriction on the API -- we can no longer push and pop things, we just set up the stack and run. However, I'm not invested in building some better solution on top of the existing (terrifying) immutable pass and legacy pass manager. llvm-svn: 203437	2014-03-10 02:45:14 +00:00
Rafael Espindola	72416905a2	Don't avoid cfi instructions on the bg/p. The integrated assembler now works for ppc. Since this was the last use of the bg/p predicate and Hal says that it is now dead, drop the predicate too. llvm-svn: 203269	2014-03-07 19:04:12 +00:00
David Majnemer	b702f79917	MC: Remove superfluous section attribute flag definitions Summary: llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same definitions for the section flags. Instead, grab the definitions out of support. No functionality change. Reviewers: grosbach, Bigcheese, rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2998 llvm-svn: 203211	2014-03-07 07:36:05 +00:00
Rafael Espindola	cb9ca86245	Replace PROLOG_LABEL with a new CFI_INSTRUCTION. The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204	2014-03-07 06:08:31 +00:00
Hal Finkel	c86bafb5ea	The PPC global base register cannot be r0 The global base register cannot be r0 because it might end up as the first argument to addi or addis. Fixes PR18316. I don't have a small stable test case. llvm-svn: 203054	2014-03-06 01:28:23 +00:00
Chandler Carruth	0873afae39	[Layering] Move DebugInfo.h into the IR library where its implementation already lives. llvm-svn: 203046	2014-03-06 00:46:21 +00:00
Hal Finkel	c04da1f4c7	Fixup PPC Darwin i1 argument handling Like on other targets, we need to zero_extend/truncate i1 args before copying them to GPRs. llvm-svn: 203045	2014-03-06 00:45:19 +00:00
Hal Finkel	0373847793	When using CR bit registers on PPC32, handle the i1 vaarg case When copying an i1 value into a GPR for a vaarg call, we need to explicitly zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be generated). llvm-svn: 203041	2014-03-06 00:23:33 +00:00
Hal Finkel	18344a3ff6	With PPC CR bit registers, handle int_to_fp on older cores On cores without fpcvt support, we cannot promote int_to_fp i1 operations, because there is nothing to promote them to. The most straightforward implementation of this uses a select to choose between the two possible resulting floating-point values (and that's what is done here). llvm-svn: 203015	2014-03-05 22:14:00 +00:00
Joerg Sonnenberger	b500454592	Enable integrated assembler on OpenBSD/PPC32 by default, too. From Brad Smith. llvm-svn: 202967	2014-03-05 11:37:04 +00:00
Will Schmidt	701f152950	[PowerPC] support powerpc64le as syntax-checking target (pass2) Register the Asm Printer for the ppc64le target. This fills in a spot that was missed in an earlier change (r187179). llvm-svn: 202861	2014-03-04 16:51:52 +00:00
Chandler Carruth	649f6270aa	[Modules] Move ValueHandle into the IR library where Value itself lives. Move the test for this class into the IR unittests as well. This uncovers that ValueMap too is in the IR library. Ironically, the unittest for ValueMap is useless in the Support library (honestly, so was the ValueHandle test) and so it already lives in the IR unittests. Mmmm, tasty layering. llvm-svn: 202821	2014-03-04 11:17:44 +00:00
Chandler Carruth	0bf5689f06	[Modules] Move GetElementPtrTypeIterator into the IR library. As its name might indicate, it is an iterator over the types in an instruction in the IR.... You see where this is going. Another step of modularizing the support library. llvm-svn: 202815	2014-03-04 10:40:04 +00:00
Hal Finkel	64680c3ba1	Add a PPC inline asm constraint type for single CR bits Now that the PowerPC backend can track individual CR bits as first-class registers, we should also have a way of allocating them for inline asm statements. Because these registers are only one bit, if an output variable is implicitly cast to a larger integer size, we'll get an any_extend to that larger type (this is part of the existing target-independent logic). As a result, regardless of the size of the output type, only the first bit is meaningful. The constraint identifier "wc" has been chosen for this purpose. Although gcc does not currently support allocating individual CR bits, this identifier choice has been coordinated with the gcc PowerPC team, and will be marked as reserved for this purpose in the gcc constraints.md file. llvm-svn: 202657	2014-03-02 18:23:39 +00:00
Benjamin Kramer	e4eb1b495f	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Craig Topper	b0056a4ca7	Switch all uses of LLVM_OVERRIDE to just use 'override' directly. llvm-svn: 202621	2014-03-02 09:09:27 +00:00
Craig Topper	c8a0b9e381	Switch all uses of LLVM_FINAL to just use 'final', and remove the macro. llvm-svn: 202618	2014-03-02 08:08:51 +00:00
Hal Finkel	4937443651	Remove extra truncs/exts around i32 bit operations on PPC64 This generalizes the code to eliminate extra truncs/exts around i1 bit operations to also do the same on PPC64 for i32 bit operations. This eliminates a fairly prevalent code wart: int foo(int a) { return a == 5 ? 7 : 8; } On PPC64, because of the extension implied by the ABI, this would generate: cmplwi 0, 3, 5 li 12, 8 li 4, 7 isel 3, 4, 12, 2 rldicl 3, 3, 0, 32 blr where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At least for the single-BB case (which is all that the DAG combine mechanism can handle), this unnecessary extension is no longer generated. llvm-svn: 202600	2014-03-01 21:36:57 +00:00
Hal Finkel	1970087008	Swap PPC isel operands to allow for 0-folding The PPC isel instruction can fold 0 into the first operand (thus eliminating the need to materialize a zero-containing register when the 'true' result of the isel is 0). When the isel is fed by a bit register operation that we can invert, do so as part of the bit-register-operation peephole routine. llvm-svn: 202469	2014-02-28 06:11:16 +00:00
Hal Finkel	3bd3f4e287	Trying to unbreak the darwin11 builder The CR bit tracking code broke PPC/Darwin; trying to get it working again... (the darwin11 builder, which defaults to the darwin ABI when running PPC tests, asserted when running test/CodeGen/PowerPC/inverted-bool-compares.ll) llvm-svn: 202459	2014-02-28 01:17:25 +00:00
Hal Finkel	94f3724df6	Try to unbreak the C++11 build Cannot use negative numbers in case statements without running afoul of -Wc++11-narrowing. llvm-svn: 202455	2014-02-28 00:45:27 +00:00
Hal Finkel	883c64377d	Add CR-bit tracking to the PowerPC backend for i1 values This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451	2014-02-28 00:27:01 +00:00
Aaron Ballman	9e5315239d	Silencing an MSVC signed comparison warning. llvm-svn: 202295	2014-02-26 20:22:20 +00:00
Hal Finkel	08c64addef	Account for 128-bit integer operations in PPCCTRLoops We need to abort the formation of counter-register-based loops where there are 128-bit integer operations that might become function calls. llvm-svn: 202192	2014-02-25 20:51:50 +00:00
Rafael Espindola	32da4bdd4b	Make DataLayout a plain object, not a pass. Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM don't don't handle passes to also use DataLayout. llvm-svn: 202168	2014-02-25 17:30:31 +00:00
Rafael Espindola	6c834371d9	Make some DataLayout pointers const. No functionality change. Just reduces the noise of an upcoming patch. llvm-svn: 202087	2014-02-24 23:12:18 +00:00
Rafael Espindola	1f7e9d4bed	Rename a few more DataLayout variables. llvm-svn: 201833	2014-02-21 01:53:35 +00:00
Rafael Espindola	60133f3afe	move getNameWithPrefix and getSymbol to TargetMachine. TargetLoweringBase is implemented in CodeGen, so before this patch we had a dependency fom Target to CodeGen. This would show up as a link failure of llvm-stress when building with -DBUILD_SHARED_LIBS=ON. This fixes pr18900. llvm-svn: 201711	2014-02-19 20:30:41 +00:00
Rafael Espindola	aea6192f20	Add back r201608, r201622, r201624 and r201625 r201608 made llvm corretly handle private globals with MachO. r201622 fixed a bug in it and r201624 and r201625 were changes for using private linkage, assuming that llvm would do the right thing. They all got reverted because r201608 introduced a crash in LTO. This patch includes a fix for that. The issue was that TargetLoweringObjectFile now has to be initialized before we can mangle names of private globals. This is trivially true during the normal codegen pipeline (the asm printer does it), but LTO has to do it manually. llvm-svn: 201700	2014-02-19 17:23:20 +00:00
Daniel Jasper	bf4e7d8ac3	Revert r201622 and r201608. This causes the LLVMgold plugin to segfault. More information on the replies to r201608. llvm-svn: 201669	2014-02-19 12:26:01 +00:00
Rafael Espindola	d39a573c72	Fix PR18743. The IR @foo = private constant i32 42 is valid, but before this patch we would produce an invalid MachO from it. It was invalid because it would use an L label in a section where the liker needs the labels in order to atomize it. One way of fixing it would be to just reject this IR in the backend, but that would not be very front end friendly. What this patch does is use an 'l' prefix in sections that we know the linker requires symbols for atomizing them. This allows frontends to just use private and not worry about which sections they go to or how the linker handles them. One small issue with this strategy is that now a symbol name depends on the section, which is not available before codegen. This is not a problem in practice. The reason is that it only happens with private linkage, which will be ignored by the non codegen users (llvm-nm and llvm-ar). llvm-svn: 201608	2014-02-18 22:24:57 +00:00
Rafael Espindola	c898de3245	Rename a DebugLoc variable to DbgLoc and a DataLayout to DL. This is quiet a bit less confusing now that TargetData was renamed DataLayout. llvm-svn: 201606	2014-02-18 22:05:46 +00:00
Daniel Sanders	7a3a160940	Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Changes since review (and last commit attempt): - Fixed test failures that were missed due to configuration of local build. (fixes crash.ll and a couple others). - Fixed tests that happened to pass because the local build was on X86 (should fix 2007-12-17-InvokeAsm.ll) - mature-mc-support.ll's should no longer require all targets to be compiled. (should fix ARM and PPC buildbots) - Object output (-filetype=obj and similar) now forces the integrated assembler to be enabled regardless of default setting or -no-integrated-as. (should fix SystemZ buildbots) Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201333	2014-02-13 14:44:26 +00:00
Daniel Sanders	656c4d360b	Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call It introduced multiple test failures in the buildbots. llvm-svn: 201241	2014-02-12 15:39:20 +00:00
Daniel Sanders	e647d6441b	Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201237	2014-02-12 14:44:54 +00:00
Rafael Espindola	8d47aa1e4e	Pass the Mangler by reference. It is never null and it is not used in casts, so there is no reason to use a pointer. This matches how we pass TM. llvm-svn: 201025	2014-02-08 14:53:28 +00:00
Rafael Espindola	1d50d1310d	Add LLVM_OVERRIDE to a few declarations. llvm-svn: 201022	2014-02-08 06:07:27 +00:00
Rafael Espindola	c27ddf11ba	Just returning false is the default. llvm-svn: 200890	2014-02-06 00:03:15 +00:00
Matt Arsenault	7b69102edb	Add address space argument to allowsUnalignedMemoryAccess. On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887	2014-02-05 23:15:53 +00:00
Rafael Espindola	98165a6a91	Remove support for not using .loc directives. Clang itself was not using this. The only way to access it was via llc. llvm-svn: 200862	2014-02-05 18:00:21 +00:00
Hal Finkel	c649b3ff57	Replace PPC instruction-size code with MCInstrDesc getSize As part of the cleanup done to enable the disassembler, the PPC instructions now have a valid Size description field. This can now be used to replace some custom logic in a few places to compute instruction sizes. Patch by David Wiberg! llvm-svn: 200623	2014-02-02 06:12:27 +00:00
David Woodhouse	6c8fefd999	Delete MCSubtargetInfo data members from target MCCodeEmitter classes The subtarget info is explicitly passed to the EncodeInstruction method and we should use that subtarget info to influence any encoding decisions. llvm-svn: 200350	2014-01-28 23:13:25 +00:00
David Woodhouse	a79a37b435	Propagate MCSubtargetInfo through TableGen's getBinaryCodeForInstr() llvm-svn: 200349	2014-01-28 23:13:18 +00:00
David Woodhouse	4a4c611e36	Explictly pass MCSubtargetInfo to MCCodeEmitter::EncodeInstruction() llvm-svn: 200348	2014-01-28 23:13:07 +00:00
David Woodhouse	5d0b529d58	Change MCStreamer EmitInstruction interface to take subtarget info llvm-svn: 200345	2014-01-28 23:12:42 +00:00
Iain Sandoe	f16c0c72d6	Provide a stub Target Streamer implementation for PPC MachO At present, this handles .tc (error) and needs to be expanded to deal properly with .machine llvm-svn: 200309	2014-01-28 11:03:17 +00:00
Hal Finkel	5c72f63cb7	Handle spilling the PPC GPRC_NOR0 register class GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo register). As a result, we also need to check for it in the spilling code. llvm-svn: 200288	2014-01-28 05:32:58 +00:00
Eric Christopher	2b6e161fce	Revert r199871 and replace it with a simple check in the debug info code to see if we're emitting a function into a non-default text section. This is still a less-than-ideal solution, but more contained than r199871 to determine whether or not we're emitting code into an array of comdat sections. llvm-svn: 200269	2014-01-28 00:49:26 +00:00
Rafael Espindola	bfdd58b802	Pass a MCSubtargetInfo down to the TargetStreamer creation. With this the target streamers will be able to know the target features that are in use. llvm-svn: 200135	2014-01-26 06:38:58 +00:00
Rafael Espindola	806f778fa0	Construct the MCStreamer before constructing the MCTargetStreamer. This has a few advantages: * Only targets that use a MCTargetStreamer have to worry about it. * There is never a MCTargetStreamer without a MCStreamer, so we can use a reference. * A MCTargetStreamer can talk to the MCStreamer in its constructor. llvm-svn: 200129	2014-01-26 06:06:37 +00:00
Rafael Espindola	bb8a9e36c5	Remove an easy use of EmitRawText from PPC. This makes lib/Target/PowerPC EmitRawText free. llvm-svn: 200065	2014-01-25 02:35:56 +00:00
Juergen Ributzka	e4d29eb495	Add final and owerride keywords to TargetTransformInfo's subclasses. llvm-svn: 200021	2014-01-24 18:22:59 +00:00
Alp Toker	1c4b33e8e5	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Eric Christopher	4de8f33ce6	Add a variable to track whether or not we've used a unique section, e.g. linkonce, to TargetMachine and set it when we've done so for ELF targets currently. This involved making TargetMachine non-const in a TLOF use and propagating that change around - I'm open to other ideas. This will be used in a future commit to handle emitting debug information with ranges. llvm-svn: 199871	2014-01-23 06:47:25 +00:00
Rafael Espindola	012bf3f3ac	Fix pr18515. My understanding (from reading just the llvm code) is that * most ppc cpus have a "sync n" instruction and an msync alias that is "sync 0". * "book e" cpus instead have a msync instruction and not the more general "sync n" This patch reflects that in the .td files, allowing a single codepath for asm ond obj streamer and incidentelly fixes a crash when EmitRawText was called on a obj streamer. llvm-svn: 199832	2014-01-22 20:20:52 +00:00
Hal Finkel	ca5b9beeb1	Fix pointer info on PPC byval stores For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters from registers into the stack frame had MachinePointerInfo objects with incorrect offsets. These offsets are relative to the object itself, not to the stack frame base. This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi. llvm-svn: 199763	2014-01-21 20:15:58 +00:00
Rafael Espindola	ebbf4c6cd5	Make getTargetStreamer return a possibly null pointer. This will allow it to be called from target independent parts of the main streamer that don't know if there is a registered target streamer or not. This in turn will allow targets to perform extra actions at specified points in the interface: add extra flags for some labels, extra work during finalization, etc. llvm-svn: 199174	2014-01-14 01:21:46 +00:00
Chandler Carruth	98adff6224	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104	2014-01-13 13:07:17 +00:00
Chandler Carruth	ee051af6e2	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Saleem Abdulrasool	b90512a41c	correct target directive handling error handling The target specific parser should return `false' if the target AsmParser handles the directive, and `true' if the generic parser should handle the directive. Many of the target specific directive handlers would `return Error' which does not follow these semantics. This change simply changes the target specific routines to conform to the semantis of the ParseDirective correctly. Conformance to the semantics improves diagnostics emitted for the invalid directives. X86 is taken as a sample to ensure that multiple diagnostics are not presented for a single error. llvm-svn: 199068	2014-01-13 01:15:39 +00:00
Chandler Carruth	53468087f3	Put the functionality for printing a value to a raw_ostream as an operand into the Value interface just like the core print method is. That gives a more conistent organization to the IR printing interfaces -- they are all attached to the IR objects themselves. Also, update all the users. This removes the 'Writer.h' header which contained only a single function declaration. llvm-svn: 198836	2014-01-09 02:29:41 +00:00
Rafael Espindola	4dc5af8bc2	Move the llvm mangler to lib/IR. This makes it available to tools that don't link with target (like llvm-ar). llvm-svn: 198708	2014-01-07 21:19:40 +00:00
Chandler Carruth	7aa902a488	Move the LLVM IR asm writer header files into the IR directory, as they are part of the core IR library in order to support dumping and other basic functionality. Rename the 'Assembly' include directory to 'AsmParser' to match the library name and the only functionality left their -- printing has been in the core IR library for quite some time. Update all of the #includes to match. All of this started because I wanted to have the layering in good shape before I started adding support for printing LLVM IR using the new pass infrastructure, and commandline support for the new pass infrastructure. llvm-svn: 198688	2014-01-07 12:34:26 +00:00
Chandler Carruth	87f14b4eec	Re-sort all of the includes with ./utils/sort_includes.py so that subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685	2014-01-07 11:48:04 +00:00
Bill Wendling	e1a9065ca0	Remove unnecessary #includes. llvm-svn: 198585	2014-01-06 06:00:00 +00:00
Bill Wendling	c3b5643da4	Refactor function that checks that __builtin_returnaddress's argument is constant. This moves the check up into the parent class so that all targets can use it without having to copy (and keep in sync) the same error message. llvm-svn: 198579	2014-01-06 00:43:20 +00:00
Bill Wendling	be9af41475	Emit an error message if the value passed to __builtin_returnaddress isn't a constant __builtin_returnaddress requires that the value passed into is be a constant. However, at -O0 even a constant expression may not be converted to a constant. Emit an error message intead of crashing. llvm-svn: 198531	2014-01-05 01:47:20 +00:00
Rafael Espindola	eae6386a1e	Make the llvm mangler depend only on DataLayout. Before this patch any program that wanted to know the final symbol name of a GlobalValue had to link with Target. This patch implements a compromise solution where the mangler uses DataLayout. This way, any tool that already links with Target (llc, clang) gets the exact behavior as before and new IR files can be mangled without linking with Target. With this patch the mangler is constructed with just a DataLayout and DataLayout is extended to include the information the Mangler needs. llvm-svn: 198438	2014-01-03 19:21:54 +00:00
Hal Finkel	d0f8236add	[PPC] Fix comment to match function name llvm-svn: 198362	2014-01-02 22:09:39 +00:00
Hal Finkel	0e77f939b4	[PPC] Fix the scheduling of CR logicals on the P7 CR logicals (crand, crxor, etc.) on the P7 need to be in the first slot of each dispatch group. The old itinerary entry was just wrong (but has not mattered because we don't generate these instructions). This will matter when, in an upcoming commit, we start generating these instructions. llvm-svn: 198359	2014-01-02 21:38:26 +00:00
Hal Finkel	dc47cd8c25	[PPC] Use the correct immediate operands on 64-bit instructions Several of the 64-bit fixed-point instructions with immediate operands were using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64) operand definitions (u16imm instead of u16imm64, for example). This error has had no effect so far, but would have caused type-checking violations with an upcoming change. llvm-svn: 198356	2014-01-02 21:26:59 +00:00
Roman Divacky	f81810bdfd	Use r2 when encoding tls on ppc32. Fixes PR18305. llvm-svn: 197878	2013-12-22 10:45:37 +00:00
Roman Divacky	7114adfec5	Add some comments. llvm-svn: 197875	2013-12-22 09:48:38 +00:00
Roman Divacky	513296cd04	Implement initial-exec TLS for PPC32. llvm-svn: 197824	2013-12-20 18:08:54 +00:00
Rafael Espindola	6cf8a98d5f	Long doubles are required to be aligned to 128 bits and svr4 32 bits. Clang was already getting this right. llvm-svn: 197694	2013-12-19 16:23:59 +00:00
Hal Finkel	ce61543897	Add a disassembler to the PowerPC backend The tests for the disassembler were adapted from the encoder tests, and for the most part, the output from the disassembler matches that encoder-test inputs. There are some places where more-informative mnemonics could be produced (notably for the branch instructions), and those cases are noted in the tests with FIXMEs. Future work includes: - Generating more-informative mnemonics when possible (this may also be done in the printer). - Remove the dependence on positional "numbered" operand-to-variable mapping (for both encoding and decoding). - Internally using 64-bit instruction variants in 64-bit mode (if this turns out to matter). llvm-svn: 197693	2013-12-19 16:13:01 +00:00
Rafael Espindola	9b3064d2fb	Fix f64 and f128 for ppc-darwin. This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a structure are only 32 bit aligned. The patch also drop -f128:64:128 from all ppc darwin, since f128 is 128 bit aligned. llvm-svn: 197574	2013-12-18 15:06:25 +00:00
Rafael Espindola	e1792e72e1	One ppc32-darwin, a i64 inside a structure can have 32 bit alignment. Thanks for Iain Sandoe for testing this with the original gcc. Clang was already getting this right. llvm-svn: 197572	2013-12-18 14:35:37 +00:00
Hal Finkel	0cc169d05b	Eliminate PPC instruction decoding ambiguities The instruction definitions in the PPC backend have a number of variants defined for the same instruction to represent differences between 64-bit and 32-bit semantics. In order to generate a disassembler for the PPC backend, we need to mark all but one of these as CodeGen only. No functionality change intended; this is prep work for PPC disassembly support. llvm-svn: 197535	2013-12-17 23:05:18 +00:00
Rafael Espindola	4a6f55789a	Fix the pointer size for the PS3 datalayout. This will be tested from clang. llvm-svn: 197501	2013-12-17 15:29:48 +00:00
Andrew Trick	a3aa2ba174	Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> Test case: cse-add-with-overflow.ll. This exposed an existing bug in PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case: PowerPC/crash.ll. llvm-svn: 197465	2013-12-17 04:50:45 +00:00
Andrew Trick	935a379f21	whitespace llvm-svn: 197464	2013-12-17 04:50:40 +00:00
Rafael Espindola	559bceac20	The preferred alignment defaults to the abi alignment. Omit if it is the same. llvm-svn: 197400	2013-12-16 18:01:51 +00:00
Rafael Espindola	65c80ee4a2	On DataLayout, omit the default of p:64:64:64. llvm-svn: 197397	2013-12-16 17:15:29 +00:00
Hal Finkel	951040af31	Set has_asmparser in PowerPC/LLVMBuild.txt PowerPC now has an asm parser (and has for many months now); indicate this in PowerPC/LLVMBuild.txt. llvm-svn: 197393	2013-12-16 15:48:09 +00:00
Iain Sandoe	d78fe2e004	[Powerpc darwin] AsmParser Base implementation. This is a base implementation of the powerpc-apple-darwin asm parser dialect. * Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions. * Enables parsing of {r,f,v}XX as register identifiers. * Enables parsing of lo16() hi16() and ha16() as modifiers. The changes to the test case are from David Fang (fangism). llvm-svn: 197324	2013-12-14 13:34:02 +00:00
Rafael Espindola	2bd13393e0	Assume defaults to produce smaller datalayout strings. llvm-svn: 197249	2013-12-13 17:56:11 +00:00
Iain Sandoe	fc27a861a1	test commit. Amend a comment. llvm-svn: 197237	2013-12-13 15:46:48 +00:00
Gabor Greif	c84043be50	typo in comment llvm-svn: 197136	2013-12-12 08:00:34 +00:00
Hal Finkel	bb547872c5	Remove unused multiclass from PPCInstrInfo.td llvm-svn: 197100	2013-12-12 00:23:29 +00:00
Hal Finkel	e8c7fef754	Improve instruction scheduling for the PPC POWER7 Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099	2013-12-12 00:19:11 +00:00
Hal Finkel	4a76fae817	Fix the PPC subsumes-predicate check For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089	2013-12-11 23:12:25 +00:00
NAKAMURA Takumi	54fa39136d	Prune redundant dependencies in LLVMBuild.txt. llvm-svn: 196988	2013-12-11 00:30:57 +00:00
Rafael Espindola	db206ff5b6	Move PPC's getDataLayoutString out of line and document it better. llvm-svn: 196987	2013-12-11 00:09:06 +00:00
David Fang	7d43532c99	on darwin<10, fallback to .weak_definition (PPC,X86) .weak_def_can_be_hidden was not yet supported by the system assembler llvm-svn: 196970	2013-12-10 21:37:41 +00:00
NAKAMURA Takumi	3fda77b6c7	Add proper dependencies to LLVMBuild.txt in llvm/lib. I'll prune redundant deps in LLVMBuild.txt, later. llvm-svn: 196881	2013-12-10 05:39:34 +00:00
Rafael Espindola	2b4db8d379	Remove the isImplicitlyPrivate argument of getNameWithPrefix. getSymbolWithGlobalValueBase use is to create a name of a new symbol based on the name of an existing GV. Assert that and then remove the last call to pass true to isImplicitlyPrivate. This gives the mangler API a 1:1 mapping from GV to names, which is what we need to drop the mangler dependency on the target (and use an extended datalayout instead). llvm-svn: 196472	2013-12-05 05:53:12 +00:00
Alp Toker	e845f8af67	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Hal Finkel	2fbbf9299e	Remove PPCScoreboardHazardRecognizer PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer which did only one thing: filtered out nodes in EmitInstruction for which DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo instructions. As far as I can tell, this is no longer true, and so we can use ScoreboardHazardRecognizer directly. llvm-svn: 196171	2013-12-02 23:52:46 +00:00
Rafael Espindola	5f33387c40	Refactor the setting of PrivateGlobalPrefix. No functionality change. llvm-svn: 196170	2013-12-02 23:39:26 +00:00
Rafael Espindola	c36d63a948	Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile. This allows it to be used in TargetLoweringObjectFileImpl.cpp. llvm-svn: 196117	2013-12-02 16:25:47 +00:00
Rafael Espindola	cee11d6f43	Remove dead code. MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm. Keeping parts of the old asm printer just to print inline asm to a string that we then parse back looks like a hack. llvm-svn: 196111	2013-12-02 15:36:37 +00:00
Rafael Espindola	427ca8d886	Change the default of AsmWriterClassName and isMCAsmWriter. llvm-svn: 196065	2013-12-02 04:55:42 +00:00
Rafael Espindola	3192965fa3	Refactor for clarity and efficiency. The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so this should be a nop. llvm-svn: 196059	2013-12-02 03:26:43 +00:00
Hal Finkel	725757ccc8	Add a scheduling model (with itinerary) for the PPC POWER7 This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981	2013-11-30 20:55:12 +00:00
Hal Finkel	fa2b249f38	Split some PPC itinerary classes In preparation for adding scheduling definitions for the POWER7, split some PPC itinerary classes so that the P7's latencies and hazards can be better described. For the most part, this means differentiating indexed from non-index pre-increment loads and stores. Also, differentiate single from double-precision sqrt. No functionality change intended (except for a more-specific latency for single-precision sqrt on the A2). llvm-svn: 195980	2013-11-30 20:41:13 +00:00
Hal Finkel	2d7cc00415	Adjust PPC A2 input operand latencies On the PPC A2, instructions are only issued after their input operands are ready. Model this by specifying that input operands are read at dispatch (0 cycles after issue). This changes all input operand latencies from 1 to 0. Significant test-suite performance changes (these are 99.5% confidence intervals on 6 runs for both before and after): speedups: MultiSource/Benchmarks/sim/sim -1.21915% +/- 0.175063% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -1.23946% +/- 1.05133% SingleSource/Benchmarks/Misc/flops-2 -1.24237% +/- 0.681362% MultiSource/Applications/JM/lencod/lencod -1.33992% +/- 0.757498% MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt -1.51802% +/- 1.21468% MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -2.18818% +/- 1.28605% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt -2.21977% +/- 1.19499% SingleSource/Benchmarks/BenchmarkGame/spectral-norm -2.29822% +/- 0.671871% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -2.40975% +/- 0.355931% SingleSource/Benchmarks/Misc/fp-convert -2.41899% +/- 1.04751% MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -2.50349% +/- 0.126765% SingleSource/Benchmarks/Misc/flops-3 -3.00214% +/- 0.700795% MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt -3.56995% +/- 3.2929% MultiSource/Applications/sgefa/sgefa -4.24908% +/- 2.00413% MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -18.1294% +/- 3.96489% regressions: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 1.03249% +/- 0.178547% MultiSource/Applications/hexxagon/hexxagon 1.16597% +/- 0.285235% MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 1.39576% +/- 1.07855% SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 1.71539% +/- 0.173182% MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1 1.90013% +/- 0.866472% MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 2.39854% +/- 1.05914% MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 2.4402% +/- 0.817904% MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 5.87997% +/- 3.3172% MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 9.02643% +/- 5.79591% MultiSource/Benchmarks/VersaBench/bmm/bmm 10.3517% +/- 1.227% Obviously, there are data points on both sides of this; but I think, overall, this supports making the change. llvm-svn: 195951	2013-11-29 07:04:59 +00:00
Hal Finkel	69f21285ed	Create a PPC440 SchedMachineModel Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949	2013-11-29 06:32:17 +00:00
Hal Finkel	c5a38fd3e6	Fixup PPC440 load/store operand latencies The operand latencies for loads and stores in the PPC440 itinerary were wrong (the store operands are all inputs, and the "with update" (pre-increment) instructions need a latency for the additional output). llvm-svn: 195948	2013-11-29 06:19:43 +00:00
Hal Finkel	a9d93b1740	Adjust PPC440 operand latencies The operand latencies for the PPC440 should be specified relative to dispatch, not relative to the initial fetch-and-decode stages. Because most instructions (ignoring bypass) wait in dispatch until their operands are ready, this is modeled as reading input operands "at dispatch" (0 cycles after issue), and so every input and output operand has 4 cycles subtracted from it. This could alter scheduling slightly, but I don't expect a large effect. llvm-svn: 195947	2013-11-29 05:59:00 +00:00
Hal Finkel	f086fc01ab	Don't model the fetch and decode units for the PPC440 Modeling the fetch and decode units in the PPC440 itinerary does not add anything to the hazard detection capability (and so modeling them just wastes compile time). No functionality change intended. llvm-svn: 195946	2013-11-29 05:58:38 +00:00
NAKAMURA Takumi	9b851e876f	[CMake] Let add_public_tablegen_target() provide intrinsics_gen, too. I think, in principle, intrinsics_gen may be added explicitly. That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen. Almost all source files depend on both CommonTaleGen and intrinsics_gen. Explicit add_dependencies() have been pruned under lib/Target. llvm-svn: 195929	2013-11-28 17:04:31 +00:00
NAKAMURA Takumi	99f544b37e	[CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen. add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS. LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope. llvm-svn: 195927	2013-11-28 17:04:04 +00:00
NAKAMURA Takumi	5dbd3bcf3d	[CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them. llvm-svn: 195921	2013-11-28 14:53:30 +00:00
Rafael Espindola	05d05c4e8a	Use the mangler consistently instead of using getGlobalPrefix directly. llvm-svn: 195911	2013-11-28 08:59:52 +00:00
Hal Finkel	e49bc01fba	Don't share functional units among the PPC itineraries Instead of sharing functional unit names between the various PPC itineraries, give each core its own unit names prefixed with the core name. This follows the convention used by other backends (such as ARM), and removes a non-obvious ordering dependency between the various PPCSchedule*.td files. No functionality change intended. llvm-svn: 195908	2013-11-28 06:05:59 +00:00
Hal Finkel	40fc5609c6	Add IIC_ prefix to PPC instruction-class names This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890	2013-11-27 23:26:09 +00:00
Rafael Espindola	916039a184	Don't set GlobalPrefix to the default value. llvm-svn: 195884	2013-11-27 21:57:54 +00:00
Hal Finkel	4dc7aae3a3	Fix comment in PPCA2Model llvm-svn: 195807	2013-11-27 03:12:56 +00:00
Hal Finkel	fb82ed6bb5	PPC popcnt[dw] do not have record forms The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272	2013-11-20 20:54:55 +00:00
Hal Finkel	d1fc028d62	PPC: Optimize rldicl generation for masked shifts Masking operations (where only some number of the low bits are being kept) are selected to rldicl(x, 0, mb). If x is a logical right shift (which would become rldicl(y, 64-n, n)), we might be able to fold the two instructions together: rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb The right shift is really a left rotate followed by a mask, and if the explicit mask is a more-restrictive sub-mask of the mask implied by the shift, only one rldicl is needed. llvm-svn: 195185	2013-11-20 01:10:15 +00:00
Juergen Ributzka	5357a6d64b	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064	2013-11-19 00:57:56 +00:00
Alexey Samsonov	3bfef6bdb6	Revert r194865 and r194874. This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997	2013-11-18 09:31:53 +00:00
Juergen Ributzka	ee3af15269	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865	2013-11-15 22:34:48 +00:00
Bob Wilson	d433cf7463	Avoid illegal integer promotion in fastisel Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840	2013-11-15 19:09:27 +00:00
Hal Finkel	2d9d341e70	Add PPC option for full register names in asm On non-Darwin PPC systems, we currently strip off the register name prefix prior to instruction printing. So instead of something like this: mr r3, r4 we print this: mr 3, 4 The first form is the default on Darwin, and is understood by binutils, but not yet understood by our integrated assembler. Once our integrated-as understands full register names as well, this temporary option will be replaced by tying this functionality to the verbose-asm option. The numeric-only form is compatible with legacy assemblers and tools, and is also gcc's default on most PPC systems. On the other hand, it is harder to read, and there are some analysis tools that expect full register names. llvm-svn: 194384	2013-11-11 14:58:40 +00:00
Rui Ueyama	cac5d9f1f9	Use StringRef::startswith_lower. No functionality change. llvm-svn: 193796	2013-10-31 19:59:55 +00:00
Rafael Espindola	68ddc56344	Add a helper getSymbol to AsmPrinter. llvm-svn: 193627	2013-10-29 17:07:16 +00:00
Eric Christopher	25d167bd58	Add support for the VSX target attribute. No functional change as we don't actually use it to emit any code yet. llvm-svn: 192837	2013-10-16 20:38:58 +00:00
Rafael Espindola	90d8b36e1e	Add a MCAsmInfoELF class and factor some code into it. We had a MCAsmInfoCOFF, but no common class for all the ELF MCAsmInfos before. llvm-svn: 192760	2013-10-16 01:34:32 +00:00
Rafael Espindola	6267c79fdb	Add a MCTargetStreamer interface. This patch fixes an old FIXME by creating a MCTargetStreamer interface and moving the target specific functions for ARM, Mips and PPC to it. The ARM streamer is still declared in a common place because it is used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are completely hidden in the corresponding Target directories. I will send an email to llvmdev with instructions on how to use this. llvm-svn: 192181	2013-10-08 13:08:17 +00:00
Rafael Espindola	e60c3625e3	Remove getEHExceptionRegister and getEHHandlerRegister. They haven't been used for a long time. Patch by MathOnNapkins. llvm-svn: 192099	2013-10-07 13:39:22 +00:00
Bill Schmidt	b5aca928c2	[PowerPC] Fix PR17354: Generate nop after local calls for PIC code. When generating code for shared libraries, even local calls may be intercepted, so we need a nop after the call for the linker to fix up the TOC. Test case adapted from the one provided in PR17354. llvm-svn: 191440	2013-09-26 17:09:28 +00:00
David Majnemer	be4c5e7ce1	PPC: Allow partial fills in writeNopData() When asked to pad an irregular number of bytes, we should fill with zeros. This is consistent with the behavior specified in the AIX Assembler Language Reference as well as other LLVM and binutils assemblers. N.B. There is a small deviation from binutils' PPC assembler: when handling pads which are greater than 4 bytes but not mod 4, binutils will not emit any NOP sequences at all and only use zeros. This may or may not be a bug but there is no excellent rationale as to why that behavior is important to emulate. If that behavior is needed, we can change writeNopData() to behave in the same way. This fixes PR17352. llvm-svn: 191426	2013-09-26 09:18:48 +00:00
David Majnemer	9ed79f2d96	PPC: Do not introduce ISD nodes for fctid and fctiw llvm-svn: 191421	2013-09-26 05:22:11 +00:00
David Majnemer	2652c503c6	PPC: Add support for fctid and fctiw Encodings were checked against the Power ISA documents and double checked against binutils. This fixes PR17350. llvm-svn: 191419	2013-09-26 04:11:24 +00:00
David Majnemer	1ffe32b198	MC: Add support for treating $ as a reference to the PC The binutils assembler supports a mode called DOLLAR_DOT which treats the dollar sign token as a reference to the current program counter if the dollar sign doesn't precede a constant or identifier. This commit adds a new MCAsmInfo flag stating whether or not a given target supports this interpretation of the dollar sign token; by default, this flag is not enabled. Further, enable this flag for PPC. The system assembler for AIX and binutils both support using the dollar sign in this manner. This fixes PR17353. llvm-svn: 191368	2013-09-25 10:47:21 +00:00
David Majnemer	30b6b79b54	MC: Remove vestigial PCSymbol field from AsmInfo llvm-svn: 191362	2013-09-25 09:36:11 +00:00
Tim Northover	c9a7e47164	ISelDAG: spot chain cycles involving MachineNodes Previously, the DAGISel function WalkChainUsers was spotting that it had entered already-selected territory by whether a node was a MachineNode (amongst other things). Since it's fairly common practice to insert MachineNodes during ISelLowering, this was not the correct check. Looking around, it seems that other nodes get their NodeId set to -1 upon selection, so this makes sure the same thing happens to all MachineNodes and uses that characteristic to determine whether we should stop looking for a loop during selection. This should fix PR15840. llvm-svn: 191165	2013-09-22 08:21:56 +00:00
Hal Finkel	7e8ba290f6	Correct the pre-increment load latencies in the PPC A2 itinerary Pre-increment loads are microcoded on the A2, and the address increment occurs only after the load completes. As a result, the latency of the GPR address update is an additional 2 cycles on top of the load latency. llvm-svn: 191156	2013-09-22 00:08:14 +00:00
Bill Schmidt	ae6c256723	[PowerPC] Add a FIXME. Documenting a design choice to generate only medium model sequences for TLS addresses at this time. Small and large code models could be supported if necessary. llvm-svn: 190883	2013-09-17 20:22:05 +00:00
Bill Schmidt	f26512486a	[PowerPC] Fix problems with large code model (PR17169). Large code model on PPC64 requires creating and referencing TOC entries when using the addis/ld form of addressing. This was not being done in all cases. The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two test cases are also modified to reflect this requirement. Fast-isel was not creating correct code for loading floating-point constants using large code model. This also requires the addis/ld form of addressing. Previously we were using the addis/lfd shortcut which is only applicable to medium code model. One test case is modified to reflect this requirement. llvm-svn: 190882	2013-09-17 20:03:25 +00:00
Bill Schmidt	b9dc991f55	[PowerPC] Fix PR17155 - Ignore COPY_TO_REGCLASS during emit. Fast-isel generates a COPY_TO_REGCLASS for widening f32 to f64, which is a nop on PPC64. This is needed to keep the register class system happy, but on the fast-isel path it is not removed before emit as it is for DAG select. Ignore this op when emitting instructions. llvm-svn: 190795	2013-09-16 17:25:12 +00:00
Hal Finkel	5bb449bca0	PPC: Don't restrict lvsl generation to after type legalization This is a re-commit of r190764, with an extra check to make sure that we're not performing the transformation on illegal types (a small test case has been added for this as well). Original commit message: The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190771	2013-09-15 22:09:58 +00:00
Hal Finkel	c45bfe85cc	Revert r190764: PPC: Don't restrict lvsl generation to after type legalization This is causing test-suite failures. Original commit message: The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190765	2013-09-15 15:41:11 +00:00
Hal Finkel	ae7feec56e	PPC: Don't restrict lvsl generation to after type legalization The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190764	2013-09-15 15:20:54 +00:00
Hal Finkel	ac7ef3f74f	Add missing break statement in PPCISelLowering As it turns out, not a problem in practice, but it should be there. llvm-svn: 190720	2013-09-13 20:09:02 +00:00
Chandler Carruth	e62d2f2be7	Remove an unused variable, fixing -Werror build with latest Clang. llvm-svn: 190640	2013-09-12 23:30:48 +00:00
Hal Finkel	4b3cfb4727	Fix PPC ABI for ByVal structs with vector members When a structure is passed by value, and that structure contains a vector member, according to the PPC ABI, the structure will receive enhanced alignment (so that the vector within the structure will always be aligned). This should resolve PR16641. llvm-svn: 190636	2013-09-12 23:20:06 +00:00
Hal Finkel	47bfa9a072	Make the PPC fast-math sqrt expansion safe at 0 In fast-math mode sqrt(x) is calculated using the fast expansion of the reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal sqrt expansions use the associated estimate instructions along with some Newton iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN, which is not correct. Now we explicitly return a result of zero if the input is zero. llvm-svn: 190624	2013-09-12 19:04:12 +00:00
Roman Divacky	582df28090	Implement asm support for a few PowerPC bookIII that are needed for assembling FreeBSD kernel. llvm-svn: 190618	2013-09-12 17:50:54 +00:00
Hal Finkel	5ddc53b3ef	Mark PPC MFTB and DST (and friends) as deprecated Use the new instruction deprecation feature to mark mftb (now replaced with mfspr) and dst (along with the other Altivec cache control instructions) as deprecated when targeting cores supporting at least ISA v2.03. llvm-svn: 190605	2013-09-12 14:40:06 +00:00
Joey Gouly	fccb3bcae3	Add an instruction deprecation feature to TableGen. The 'Deprecated' class allows you to specify a SubtargetFeature that the instruction is deprecated on. The 'ComplexDeprecationPredicate' class allows you to define a custom predicate that is called to check for deprecation. For example: ComplexDeprecationPredicate<"MCR"> would mean you would have to define the following function: bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI, std::string &Info) Which returns 'false' for not deprecated, and 'true' for deprecated and store the warning message in 'Info'. The MCTargetAsmParser constructor was chaned to take an extra argument of the MCInstrInfo class, so out-of-tree targets will need to be changed. llvm-svn: 190598	2013-09-12 10:28:05 +00:00
Hal Finkel	6164109851	PPC: Enable aggressive anti-dependency breaking Aggressive anti-dependency breaking is enabled by default for all PPC cores. This provides a general speedup on the P7 and other platforms (among other factors, the instruction group formation for the non-embedded PPC cores is done during post-RA scheduling). In order to do this safely, the incompatibility between uses of the MFOCRF instruction and anti-dependency breaking are resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed FIXME, the problem was that MFOCRF's output is sensitive to the identify of the source register, and always paired with a shift to undo this effect. Because anti-dependency breaking is unaware of this hidden dependency of the shift amount on the source register of the MFOCRF instruction, changing that register must be inhibited. Two test cases were adjusted: The SjLj test was made more insensitive to register choices and scheduling; the saveCR test disabled anti-dependency breaking because part of what it is testing is proper register reuse. llvm-svn: 190587	2013-09-12 05:24:49 +00:00
Hal Finkel	2f2965c149	Greatly simplify the PPC A2 scheduling itinerary As Andy pointed out to me a long time ago, there are no structural hazards in the later pipeline stages of the A2, and so modeling them is useless. Also, modeling the top pre-dispatch stages is deceiving because, when multiple hardware threads are active, those resources are shared among the threads. The bypass definitions were mostly wrong, and so those have been removed. The resulting itinerary is much simpler, and more accurate. llvm-svn: 190562	2013-09-11 23:25:21 +00:00
Hal Finkel	bad4087f4a	Enable MI scheduling (and CodeGen AA) by default for embedded PPC cores For embedded PPC cores (especially the A2 core), using the MI scheduler with AA is far superior to the other scheduling options. llvm-svn: 190558	2013-09-11 23:05:25 +00:00
Hal Finkel	87afacb632	Implement TTI getUnrollingPreferences for PowerPC The PowerPC A2 core greatly benefits from aggressive concatenation unrolling; use the new getUnrollingPreferences to enable this by default when targeting the PPC A2 core. llvm-svn: 190549	2013-09-11 21:20:40 +00:00
Bill Wendling	2c532e9c9b	Generate compact unwind encoding from CFI directives. We used to generate the compact unwind encoding from the machine instructions. However, this had the problem that if the user used `-save-temps' or compiled their hand-written `.s' file (with CFI directives), we wouldn't generate the compact unwind encoding. Move the algorithm that generates the compact unwind encoding into the MCAsmBackend. This way we can generate the encoding whether the code is from a `.ll' or `.s' file. <rdar://problem/13623355> llvm-svn: 190290	2013-09-09 02:37:14 +00:00
Charles Davis	5191e0b0d0	Move everything depending on Object/MachOFormat.h over to Support/MachO.h. llvm-svn: 189728	2013-09-01 04:28:48 +00:00
Bill Schmidt	fbcebc7d75	[PowerPC] Fast-isel cleanup patch. Here are a few miscellaneous things to tidy up the PPC64 fast-isel implementation. I corrected a couple of commentary lapses, and added documentation of future opportunities. I also implemented TargetMaterializeAlloca, which I somehow forgot when I split up the original huge patch. Finally, I decided to delete SelectCmp. I hadn't previously hooked it in to TargetSelectInstruction(), and when I did I realized it wasn't serving any useful purpose. This is only useful for compares that don't feed a branch in the same block, and to handle that we would have to have logic to interpret i1 as a condition register. This could probably be done, but would require Unseemly Hackery, and honestly does not seem worth the hassle. This ends the current patch series. llvm-svn: 189715	2013-08-31 02:33:40 +00:00
Bill Schmidt	98963cd850	[PowerPC] Add integer truncation support to fast-isel. This is the last substantive patch I'm planning for fast-isel in the near future, adding fast selection of integer truncates. There are certainly more things that can be improved (many of which are called out in FIXMEs), but for now we are catching most of the important cases. I'll document some of the remaining work in a cleanup patch shortly. llvm-svn: 189706	2013-08-30 23:31:33 +00:00
Bill Schmidt	62a0b5c55b	Correct partially defined variable llvm-svn: 189705	2013-08-30 23:25:30 +00:00
Bill Schmidt	65bf01a470	[PowerPC] Call support for fast-isel. This patch adds fast-isel support for calls (but not intrinsic calls or varargs calls). It also removes a badly-formed assert. There are some new tests just for calls, and also for folding loads into arguments on calls to avoid extra extends. llvm-svn: 189701	2013-08-30 22:18:55 +00:00
Bill Schmidt	886231ba0f	[PowerPC] Add handling for conversions to fast-isel. Yet another chunk of fast-isel code. This one handles various conversions involving floating-point. (It also includes some miscellaneous handling throughout the back end for LWA_32 and LWAX_32 that should have been part of the load-store patch.) llvm-svn: 189677	2013-08-30 15:18:11 +00:00
Bill Schmidt	07bcdd6b9c	[PowerPC] Handle selection of compare instructions in fast-isel. Mostly trivial patch adding support for compares. The meat of the work was added with the branch support. llvm-svn: 189639	2013-08-30 03:16:48 +00:00
Bill Schmidt	3f2df6119d	Remove bogus debug statement. Sheesh. llvm-svn: 189638	2013-08-30 03:07:11 +00:00
Bill Schmidt	b3f46e50b4	[PowerPC] Add loads, stores, and related things to fast-isel. This is the next big chunk of fast-isel code. The primary purpose is to implement selection of loads and stores, but there is a lot of drag-along to support this. The common code to analyze addresses for both loads and stores is substantial. It's also necessary to add the materialization code for global values. Related to load-store processing is the code to fold loads into integer extends, since otherwise we generate lots of redundant instructions. We also need to add some overrides to some FastEmit routines to ensure we don't assign GPR 0 to a virtual register when this would change the meaning of an instruction. I added handling selection of a few binary arithmetic instructions, to enable committing some test cases I wrote a while back. Finally, ap couple of miscellaneous changes: * I cleaned up some poor style from a previous patch in PPCISelLowering.cpp, pointed out by David Blaikie. * I enlarged the Addr.Offset field to avoid sign problems with 32-bit offsets. llvm-svn: 189636	2013-08-30 02:29:45 +00:00
Alexey Samsonov	edf369f1c4	Fix use of uninitialized value added in r189400 (found by MemorySanitizer) llvm-svn: 189456	2013-08-28 08:30:47 +00:00
Joerg Sonnenberger	b2cc802497	Given target assembler parsers a chance to handle variant expressions first. Use this to turn the PPC modifiers into PPC specific expressions, allowing them to work on constants. llvm-svn: 189400	2013-08-27 20:23:19 +00:00
Charles Davis	6e439dabdb	Revert "Fix the build broken by r189315." and "Move everything depending on Object/MachOFormat.h over to Support/MachO.h." This reverts commits r189319 and r189315. r189315 broke some tests on what I believe are big-endian platforms. llvm-svn: 189321	2013-08-27 05:38:30 +00:00
Charles Davis	cecfbfaf57	Move everything depending on Object/MachOFormat.h over to Support/MachO.h. llvm-svn: 189315	2013-08-27 05:00:43 +00:00
Bill Schmidt	6b370e28fb	Dummy code to silence warning from 4189266 llvm-svn: 189272	2013-08-26 20:11:46 +00:00
Bill Schmidt	84204dd94d	[PowerPC] More fast-isel chunks (returns and integer extends) Incremental improvement to fast-isel for PPC64. This allows us to select on ret, sext, and zext. Filling in sext/zext improves some of the existing logic in handling compare-immediates that needed extends. A simplified return convention for fast-isel is also added to the PPC64 calling conventions. All call/return processing for DAG selection is handled with custom code, so there isn't an existing CC to rely on here. The include of PPCGenCallingConv.inc causes compiler warnings due to the 32-bit calling conventions that are not used, so the dummy function "usePPC32CCs()" is added here to silence those. Test cases for the return and extend logic are added. llvm-svn: 189266	2013-08-26 19:42:51 +00:00
Bill Schmidt	c010bfbf75	[PowerPC] Add fast-isel branch and compare selection. First chunk of actual fast-isel selection code. This handles direct and indirect branches, as well as feeding compares for direct branches. PPCFastISel::PPCEmitIntExt() is just roughed in and will be expanded in a future patch. This also corrects a problem with selection for constant pool entries in JIT mode or with small code model. llvm-svn: 189202	2013-08-25 22:33:42 +00:00
Bill Schmidt	178a1ca418	[PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes. (Patch committed on behalf of Mark Minich, whose log entry follows.) This is a continuation of the refactorings performed in svn rev 188573 (see that rev's comments for more detail). This is my stage 2 refactoring: I combined the emitPrologue() & emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a lot of the code since in essence the PPC32 & PPC64 code generation logic is the same, only the instruction forms are different (in most cases). This simplification is necessary because my functional changes (yet to come) add significant complexity, and without the simplification of my stage 2 refactoring, the overall complexity of both emitPrologue() & emitEpilogue() would have become almost intractable for most mortal programmers (like me). This submission was intended to be a pure refactoring (no functional changes whatsoever). However, in the process of combining the PPC32 & PPC64 flows, I spotted a difference that I believe is a bug (see svn rev 186478 line 863, or svn rev 188573 line 888): This line appears to be restoring the BP with the original FP content, not the original BP content. When I merged the 32-bit and 64-bit code, I used the corresponding code from the 64-bit flow, which I believe uses the correct offset (BPOffset) for this operation. llvm-svn: 188741	2013-08-20 03:12:23 +00:00
Hal Finkel	8f395a803a	Add a llvm.copysign intrinsic This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728	2013-08-19 23:35:46 +00:00
Hal Finkel	4bb40e7c8d	Don't form PPC CTR-based loops around a copysignl call copysign/copysignf never become function calls (because the SDAG expansion code does not lower to the corresponding function call, but rather directly implements the associated logic), but copysignl almost always is lowered into a call to the requested libm functon (and, thus, might clobber CTR). llvm-svn: 188727	2013-08-19 23:35:24 +00:00
Hal Finkel	9591220c33	Add the PPC fcpsgn instruction Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653	2013-08-19 05:01:02 +00:00
Bill Schmidt	1bd9d284d6	[PowerPC] Preparatory refactoring for making prologue and epilogue safe on PPC32 SVR4 ABI [Patch and following text by Mark Minich; committing on his behalf.] There are FIXME's in PowerPC/PPCFrameLowering.cpp, method PPCFrameLowering::emitPrologue() related to "negative offsets of R1" on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4 (and any ABI without a Red Zone), no spills may be made until after the stackframe is claimed, which also includes the LR spill which is at a positive offset. The same problem exists in emitEpilogue(), though there's no FIXME for it. I intend to fix this issue, making LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit platforms (including in particular, OS-free embedded systems & kernel code, where interrupts may share the same stack as user code). In preparation for making these changes, to make the diffs for the functional changes less cluttered, I am providing the non-functional refactorings in two stages: Stage 1 does some minor fluffy refactorings to pull multiple method calls up into a single bool, creating named bools for repeated uses of obscure logic, moving some code up earlier because either stage 2 or my final version will require it earlier, and rewording/adding some comments. My stage 1 changes can be characterized as primarily fluffy cleanup, the purpose of which may be unclear until the stage 2 or final changes are made. My stage 2 refactorings combine the separate PPC32 & PPC64 logic, which is currently performed by largely duplicate code, into a single flow, with the differences handled by a group of constants initialized early in the methods. This submission is for my stage 1 changes. There should be no functional changes whatsoever; this is a pure refactoring. llvm-svn: 188573	2013-08-16 20:05:04 +00:00
Craig Topper	2653227b8f	Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove one weird cast from MVT->EVT just to call getSimpleVT(). llvm-svn: 188441	2013-08-15 02:33:50 +00:00
Hal Finkel	a89d228510	Actually fix PPC64 64-bit GPR inline asm constraint matching This is a follow-up to r187693, correcting that code to request the correct register class. The previous version, with the wrong register class, was not really correcting the constraints, but rather was removing them. Coincidentally, this fixed the failing test case in r187693, but obviously created other problems. llvm-svn: 188407	2013-08-14 20:05:04 +00:00
David Fang	3e8c045f48	cast fix to appease buildbot llvm-svn: 188014	2013-08-08 21:29:30 +00:00
David Fang	772a101ff0	initial draft of PPCMachObjectWriter.cpp this records relocation entries in the mach-o object file for PIC code generation. tested on powerpc-darwin8, validated against darwin otool -rvV llvm-svn: 188004	2013-08-08 20:14:40 +00:00
Hal Finkel	e76170ce53	PPC: Map frin to round() not nearbyint() and rint() Making use of the recently-added ISD::FROUND, which allows for custom lowering of round(), the PPC backend will now map frin to round(). Previously, we had been using frin to lower nearbyint() (and rint() via some custom lowering to handle the extra fenv flags requirements), but only in fast-math mode because frin does not tie-to-even. Several users had complained about this behavior, and this new mapping of frin to round is certainly more appropriate (and does not require fast-math mode). In effect, this reverts r178362 (and part of r178337, replacing the nearbyint mapping with the round mapping). llvm-svn: 187960	2013-08-08 04:31:34 +00:00
Hal Finkel	bdc7aa32c1	Add ISD::FROUND for libm round() All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926	2013-08-07 22:49:12 +00:00
Hal Finkel	71d37e18da	Add PPC64 mulli pattern The PPC backend had been missing a pattern to generate mulli for 64-bit multiples. We had been generating it only for 32-bit multiplies. Unfortunately, generating li + mulld unnecessarily increases register pressure. llvm-svn: 187807	2013-08-06 17:03:03 +00:00
NAKAMURA Takumi	0eb9242c56	Target//CMakeLists.txt: Add the dependency to CommonTableGen explicitly for each corresponding CodeGen. Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel. It races to emit .inc files simultaneously. llvm-svn: 187780	2013-08-06 06:38:37 +00:00
Benjamin Kramer	a913e72728	PPCAsmParser: Stop leaking names. Store them in a place that gets cleaned up properly. llvm-svn: 187700	2013-08-03 22:43:29 +00:00
Hal Finkel	f91cfcdaed	Fix PPC64 64-bit GPR inline asm constraint matching Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the 64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with explicit register names, on PPC64 when an i64 MVT has been requested, we need to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent) registers. At some point, we'll probably want to arrange things so that the generic code in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order to match these inline asm register constraints. If we do that, this change can be reverted. llvm-svn: 187693	2013-08-03 12:25:10 +00:00
Bill Wendling	e7b7059f1d	Use function attributes to indicate that we don't want to realign the stack. Function attributes are the future! So just query whether we want to realign the stack directly from the function instead of through a random target options structure. llvm-svn: 187618	2013-08-01 21:42:05 +00:00
Bill Schmidt	f02898d71f	[PowerPC] Skeletal FastISel support for 64-bit PowerPC ELF. This is the first of many upcoming patches for PowerPC fast instruction selection support. This patch implements the minimum necessary for a functional (but extremely limited) FastISel pass. It allows the table-generated portions of the selector to be created and used, but in most cases selection will fall back to the DAG selector. None of the block terminator instructions are implemented yet, and most interesting instructions require some special handling. Therefore there aren't any new test cases with this patch. There will be quite a few tests coming with future patches. This patch adds the make/CMake support for the new code (including tablegen -gen-fast-isel) and creates the FastISel object for PPC64 ELF only. It instantiates the necessary virtual functions (TargetSelectInstruction, TargetMaterializeConstant, TargetMaterializeAlloca, tryToFoldLoadIntoMI, and FastLowerArguments), but of these, only TargetMaterializeConstant contains any useful implementation. This is present since the table-generated code requires the ability to materialize integer constants for some instructions. This patch has been tested by building and running the projects/test-suite code with -O0. All tests passed with the exception of a couple of long-running tests that time out using -O0 code generation. llvm-svn: 187399	2013-07-30 00:50:39 +00:00
Bill Schmidt	f400d5844e	[PowerPC] Add comment explaining preprocessor directive. llvm-svn: 187320	2013-07-28 03:23:32 +00:00
Bill Schmidt	795f27eb7f	Revert 187318 llvm-svn: 187319	2013-07-28 02:13:24 +00:00
Bill Schmidt	0c824086d1	[PowerPC] Remove unnecessary preprocessor checking. The tests !defined(__ppc__) && !defined(__powerpc__) are not needed or helpful when verifying that code is being compiled for a 64-bit target. The simpler test provided by this revision is sufficient to tell if the target is 64-bit. llvm-svn: 187318	2013-07-28 02:08:13 +00:00
Rafael Espindola	326babb234	Revert "[PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc." This reverts commit r187248. It broke many bots. llvm-svn: 187254	2013-07-26 22:13:57 +00:00
Bill Schmidt	149eda1b1e	[PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc. Both GCC and LLVM will implicitly define __ppc__ and __powerpc__ for all PowerPC targets, whether 32- or 64-bit. They will both implicitly define __ppc64__ and __powerpc64__ for 64-bit PowerPC targets, and not for 32-bit targets. We cannot be sure that all other possible compilers used to compile Clang/LLVM define both __ppc__ and __powerpc__, for example, so it is best to check for both when relying on either inside the Clang/LLVM code base. This patch makes sure we always check for both variants. In addition, it fixes one unnecessary check in lib/Target/PowerPC/PPCJITInfo.cpp. (At least one of __ppc__ and __powerpc__ should always be defined when compiling for a PowerPC target, no matter which compiler is used, so testing for them is unnecessary.) There are some places in the compiler that check for other variants, like __POWERPC__ and _POWER, and I have left those in place. There is no need to add them elsewhere. This seems to be in Apple-specific code, and I won't take a chance on breaking it. There is no intended change in behavior; thus, no test cases are added. llvm-svn: 187248	2013-07-26 21:39:15 +00:00
Bill Schmidt	88e45cc177	[PowerPC] Support powerpc64le as a syntax-checking target. This patch provides basic support for powerpc64le as an LLVM target. However, use of this target will not actually generate little-endian code. Instead, use of the target will cause the correct little-endian built-in defines to be generated, so that code that tests for __LITTLE_ENDIAN__, for example, will be correctly parsed for syntax-only testing. Code generation will otherwise be the same as powerpc64 (big-endian), for now. The patch leaves open the possibility of creating a little-endian PowerPC64 back end, but there is no immediate intent to create such a thing. The LLVM portions of this patch simply add ppc64le coverage everywhere that ppc64 coverage currently exists. There is nothing of any import worth testing until such time as little-endian code generation is implemented. In the corresponding Clang patch, there is a new test case variant to ensure that correct built-in defines for little-endian code are generated. llvm-svn: 187179	2013-07-26 01:35:43 +00:00
Roman Divacky	32a21acb65	PPC32 va_list is an actual structure so va_copy needs to copy the whole structure not just a pointer. This implements that and thus fixes va_copy on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz! llvm-svn: 187158	2013-07-25 21:36:47 +00:00
David Fang	e8ec195aa6	allow tests to run on powerpc-darwin8 again, checking for __ppc__ llvm-svn: 187027	2013-07-24 07:52:16 +00:00
Craig Topper	9bf1d910c8	Split generated asm mnemonic matching table into a separate table for each asm variant. This removes the need to store the asm variant in each row of the single table that existed before. Shaves ~16K off the size of X86AsmParser.o. llvm-svn: 187026	2013-07-24 07:33:14 +00:00
Hal Finkel	fdd124178e	PPC: Support dynamic allocas with large alignment Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only just recently implemented fully). Now we can also support dynamic allocas with higher than the default target stack alignment (16 bytes). In order to round-up the requested size to the maximum requested alignment, we need an additional register to hold the rounded-up size. We're already using one scavenged register to hold the previous stack-pointer value (which needs to be stored with the signal-safe stdux update), and so when we have dynamic allocas and a large alignment, we allocate two emergency spill slots for the scavenger. llvm-svn: 186562	2013-07-18 04:28:21 +00:00
Hal Finkel	79a33a00d6	PPC: Add base-pointer support to builtin setjmp/longjmp First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj is implemented): instead of using r31 as the base pointer when it is not needed as a frame pointer, now the base pointer will always be r30 when needed. Second, we introduce another pseudo register, BP, which is used just like the FP pseudo register to refer to the base register before we know for certain what register it will be. Third, we now save BP into the jmp_buf, and restore r30 from that slot in longjmp. If the function that called setjmp did not use a base pointer, then r30 will be overwritten by the setjmp-calling-function's restore code. FP restoration (which is restored into r31) works the same way. llvm-svn: 186545	2013-07-17 23:50:51 +00:00
Hal Finkel	149f358122	PPC: Add CTR-register clobber to builtin setjmp Because the builtin longjmp implementation uses a CTR-based indirect jump, when the control flow arrives at the builtin setjmp call, the CTR register has necessarily been clobbered. Correspondingly, this adds CTR to the list of implicit definitions of the builtin setjmp pseudo instruction. We don't need to add CTR to the implicit definitions of builtin longjmp because, even though it does clobber the CTR register, the control flow cannot return to inside the loop unless there is also a builtin setjmp call. llvm-svn: 186488	2013-07-17 05:35:44 +00:00
Hal Finkel	e625744d86	PPC: Implement base pointer and stack realignment This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer support to function correctly. This implementation follows the strategy suggested by Dale Johannesen in r48188 where the following comment was added: This does not currently work, because the delta between old and new stack pointers is added to offsets that reference incoming parameters after the prolog is generated, and the code that does that doesn't handle a variable delta. You don't want to do that anyway; a better approach is to reserve another register that retains to the incoming stack pointer, and reference parameters relative to that. And now we do exactly that. If we don't need a frame pointer, then we use r31 as a base pointer. If we do need a frame pointer, then we use r30 as a base pointer. The base pointer retains the value of the stack pointer before it was decremented in the prologue. We then use the base pointer to resolve all negative frame indicies. The basic scheme follows that for base pointers in the X86 backend. We use a base pointer when we need to dynamically realign the incoming stack pointer. This currently applies only to static objects (dynamic allocas with large alignments, and base-pointer support in SjLj lowering will come in future commits). llvm-svn: 186478	2013-07-17 00:45:52 +00:00
NAKAMURA Takumi	c21b0543b6	PPCJITInfo.cpp: Tweak r186252 with s/__ppc/__powerpc/ to work on powerpc-linux Fedora 12. g++ (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10) llvm-svn: 186396	2013-07-16 09:59:51 +00:00
Hal Finkel	4fda98b03b	PPC: Refactoring to support subtarget feature changing This change mirrors the changes that were made to the X86 and ARM targets to support subtarget feature changing. As indicated in r182899, the mechanism is still undergoing revision, and so as with the X86 and ARM targets, there is no test case yet (there is no effective functionality change). llvm-svn: 186357	2013-07-15 22:29:40 +00:00
Hal Finkel	608dbe4a4d	Fix register subclass handling in PPCInstrInfo::insertSelect PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the common subclass of the true and false inputs, and then selecting either the 32-bit or the 64-bit isel variant based on the result of calling PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC) (where RC is the common subclass). Unfortunately, this is not quite right: if we have something like this: %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76; G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6 then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8 pseudo-register). As a result, we also need to check the common subclass against GPRC_NOR0 and G8RC_NOX0 explicitly. This had not been a problem for clients of insertSelect that called canInsertSelect first (because it had a compensating mistake), but insertSelect is also used by the PPC pseudo-instruction expander, and this error was causing a problem in that context. This problem was found by csmith. llvm-svn: 186343	2013-07-15 20:22:58 +00:00
Craig Topper	4e9457fd7d	Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]). llvm-svn: 186301	2013-07-15 04:27:47 +00:00
Craig Topper	58fa7a9b4a	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. llvm-svn: 186274	2013-07-14 04:42:23 +00:00
Joerg Sonnenberger	cf53a1cf64	Reduce large list of macros to the primary platform macros. Distingiush between ELF (Linux, FreeBSD, NetBSD) and OSX as platform for the assembler dialect. llvm-svn: 186252	2013-07-13 17:59:55 +00:00
Hal Finkel	f153e34eee	PPC: Add some missing V_SET0 patterns We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for v8i16 (which occurs in the test case) or v16i8. The same was true for V_SETALLONES (so I added the associated patterns for those as well). Another bug found by llvm-stress. llvm-svn: 186108	2013-07-11 17:43:32 +00:00
Hal Finkel	adac2cbb4a	PPCDAGToDAGISel::isRunOfOnes should return false on zero This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI with an out-of-range operand. Most uses of the isRunOfOnes function are guarded by a condition that the value is not zero. This was not true in two places, and in both places a zero input would result in an out-of-rage MB value (= 32). To fix this, isRunOfOnes returns false on a zero input (and I've remove one now-redundant guard). llvm-svn: 186101	2013-07-11 16:31:51 +00:00
Hal Finkel	b6802c0a3e	PPC: Add a better comment about the i64 FI fixup In discussing this change with Bill Schmidt, it was decided that the original comment about negative FIs was incorrect. We'll still exclude them for now, but now with a more-accurate explanation. llvm-svn: 186005	2013-07-10 15:29:01 +00:00
Bill Schmidt	2499045a19	[PowerPC] Better fix for PR16556. A more complete example of the bug in PR16556 was recently provided, showing that the previous fix was not sufficient. The previous fix is reverted herein. The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as custom lowering for FP_TO_SINT during type legalization, without checking whether the input type is handled by that routine. LowerFP_TO_INT requires the input to be f32 or f64, so we fail when the input is ppcf128. I'm leaving the test case from the initial fix (r185821) in place, and adding the new test as another crash-only check. llvm-svn: 185959	2013-07-09 18:50:20 +00:00
Stephen Lin	30b326010c	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956	2013-07-09 18:16:56 +00:00
Ulrich Weigand	b664c03f18	[PowerPC] Revert r185476 and fix up TLS variant kinds In the commit message to r185476 I wrote: >The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD >correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. >This causes some confusion with the asm parser, since VK_PPC_TLSGD >is output as @tlsgd, which is then read back in as VK_TLSGD. > >To avoid this confusion, this patch removes the PowerPC-specific >modifiers and uses the generic modifiers throughout. (The only >drawback is that the generic modifiers are printed in upper case >while the usual convention on PowerPC is to use lower-case modifiers. >But this is just a cosmetic issue.) This was unfortunately incorrect, there is is fact another, serious drawback to using the default VK_TLSLD/VK_TLSGD variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT to return true, which in turn causes the ELFObjectWriter to emit an undefined reference to _GLOBAL_OFFSET_TABLE_. This is a problem on powerpc64, because it uses the TOC instead of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_, so the symbol remains undefined. This means shared libraries using TLS built with the integrated assembler are currently broken. While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation probably ought to be properly fixed at some point, for now I'm simply reverting the r185476 commit. Now this in turn exposes the breakage of handling @tlsgd/@tlsld in the asm parser that this check-in was originally intended to fix. To avoid this regression, I'm also adding a different fix for this problem: while common code now parses @tlsgd as VK_TLSGD, a special hack in the asm parser translates this code to the platform-specific VK_PPC_TLSGD that the back-end now expects. While this is not really pretty, it's self-contained and shouldn't hurt anything else for now. One the underlying problem is fixed, this hack can be reverted again. llvm-svn: 185945	2013-07-09 16:41:09 +00:00
Ulrich Weigand	bf2db94897	[PowerPC] Support ".machine any" The PowerPC assembler is supposed to provide a directive .machine that allows switching the supported CPU instruction set on the fly. Since we do not yet check CPU feature sets at all and always accept any available instruction, this is not really useful at this point. However, it makes sense to accept (and ignore) ".machine any" to avoid spuriously rejecting existing assembler files that use this. llvm-svn: 185924	2013-07-09 10:00:34 +00:00
Ulrich Weigand	1926a433be	[PowerPC] Support .llong and fix .word This adds support for the .llong PowerPC-specifc assembler directive. In doing so, I notices that .word is currently incorrect: it is supposed to define a 2-byte data element, not a 4-byte one. llvm-svn: 185911	2013-07-09 07:59:25 +00:00
Hal Finkel	f972e31b6c	PPC: Allocate RS spill slot for unaligned i64 load/store This fixes another bug found by llvm-stress! If we happen to be doing an i64 load or store into a stack slot that has less than a 4-byte alignment, then the frame-index elimination may need to use an indexed load or store instruction (because the offset may not be a multiple of 4, a requirement of the STD/LD instructions). The extra register needed to hold the offset comes from the register scavenger, and it is possible that the scavenger will need to use an emergency spill slot. As a result, we need to make sure that a spill slot is allocated when doing an i64 load/store into a less-than-4-byte-aligned stack slot. Because test cases for things like this tend to be fairly fragile, I've concatenated a few small bugpoint-reduced test cases together to form the regression test. llvm-svn: 185907	2013-07-09 06:34:51 +00:00
Ulrich Weigand	cb20efc341	[PowerPC] Always use "assembler dialect" 1 A setting in MCAsmInfo defines the "assembler dialect" to use. This is used by common code to choose between alternatives in a multi-alternative GNU inline asm statement like the following: __asm__ ("{sfe\|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2)); The meaning of these dialects is platform specific, and GCC defines those for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for new-style (PowerPC) mnemonics, like in the example above. To be compatible with inline asm used with GCC, LLVM ought to do the same. Specifically, this means we should always use assembler dialect 1 since old-style mnemonics really aren't supported on any current platform. However, the current LLVM back-end uses: AssemblerDialect = 1; // New-Style mnemonics. in PPCMCAsmInfoDarwin, and AssemblerDialect = 0; // Old-Style mnemonics. in PPCLinuxMCAsmInfo. The Linux setting really isn't correct, we should be using new-style mnemonics everywhere. This is changed by this commit. Unfortunately, the setting of this variable is overloaded in the back-end to decide whether or not we are on a Darwin target. This is done in PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect to 1 for both Darwin and Linux no longer allows us to make this distinction. Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter to distinguish Darwin targets, and ignores the SyntaxVariant parameter. As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs to be passed in by the caller when creating a target MCExpr. (To do so this patch implicitly also reverts commit 184441.) llvm-svn: 185858	2013-07-08 20:20:51 +00:00
Hal Finkel	c4d29e61ee	PPC: Mark vector CC action for SETO and SETONE as Expand Another bug found by llvm-stress! This fixes hitting llvm_unreachable("Invalid integer vector compare condition"); at the end of getVCmpInst in PPCISelDAGToDAG. llvm-svn: 185855	2013-07-08 20:00:03 +00:00
Hal Finkel	059614de7f	PPC: Mark vector FREM as Expand by default Another bug found by llvm-stress! This fixes crashing with: LLVM ERROR: Cannot select: v4f32 = frem ... llvm-svn: 185840	2013-07-08 17:30:25 +00:00
Ulrich Weigand	2a068338ad	[PowerPC] Support time base instructions This adds support for the old-style time base instructions; while new programs are supposed to use mfspr, the mftb instructions are still supported and in use by existing assembler files. llvm-svn: 185829	2013-07-08 15:20:38 +00:00
Ulrich Weigand	86dbe652aa	[PowerPC] Support basic compare mnemonics This adds support for the basic mnemoics (with the L operand) for the fixed-point compare instructions. These are defined as aliases for the already existing CMPW/CMPD patterns, depending on the value of L. This requires use of InstAlias patterns with immediate literal operands. To make this work, we need two further changes: - define a RegisterPrefix, because otherwise literals 0 and 1 would be parsed as literal register names - provide a PPCAsmParser::validateTargetOperandClass routine to recognize immediate literals (like ARM does) llvm-svn: 185826	2013-07-08 14:49:37 +00:00
Bill Schmidt	58913550ff	[PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT). PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be either an f32 or f64, but this is not checked. A long double (ppcf128) operand will normally be custom-lowered to a conversion to f64 in this context. However, this isn't the case for an UNDEF node. This patch recognizes a ppcf128 as a legal source operand for FP_TO_INT only if it's an undef, in which case it creates an undef of the target type. At some point we might want to do a wholesale custom lowering of ISD::UNDEF when the type is ppcf128, but it's not really clear that's a great idea, and probably more work than it's worth for a situation that only arises in the case of a programming error. At this point I think simple is best. The test case comes from PR16556, and is a crash-test only. llvm-svn: 185821	2013-07-08 14:22:45 +00:00
Ulrich Weigand	cbd8a1faa9	[PowerPC] Add some special @got@tprel fixup cases When a target@got@tprel or target@got@tprel@l symbol variant is used in a fixup_ppc_half16 (not fixup_ppc_half16ds) context, we currently fail, since the corresponding R_PPC64_GOT_TPREL16 / R_PPC64_GOT_TPREL16_LO relocation types do not exist. However, since such symbol variants resolve to GOT offsets which are always 4-aligned, we can simply instead use the _DS variants of the relocation types, which do exist. The same applies for the @got@dtprel variants. llvm-svn: 185700	2013-07-05 13:49:46 +00:00
Ulrich Weigand	4fe8cda44b	[PowerPC] Support @tls in the asm parser This adds support for the last missing construct to parse TLS-related assembler code: add 3, 4, symbol@tls The ADD8TLS currently hard-codes the @tls into the assembler string. This cannot be handled by the asm parser, since @tls is parsed as a symbol variant. This patch changes ADD8TLS to have the @tls suffix printed as symbol variant on output too, which allows us to remove the isCodeGenOnly marker from ADD8TLS. This in turn means that we can add a AsmOperand to accept @tls marked symbols on input. As a side effect, this means that the fixup_ppc_tlsreg fixup type is no longer necessary and can be merged into fixup_ppc_nofixup. llvm-svn: 185692	2013-07-05 12:22:36 +00:00

... 8 9 10 11 12 ...

4513 Commits